Quantcast

CharRange: some generated code looks bizarre...

classic Classic list List threaded Threaded
5 messages Options
fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

CharRange: some generated code looks bizarre...

fge
Hello,

So, there I am, I track down the first bug open on the new organization, and I have spotted some bizarre stuff...

What I have done is dump the whole byte code of the generated $$parboiled class, put it in the branch to debug, decompiled it with https://bitbucket.org/mstrobel/procyon, fixed the code (genericize, remove casts, fix some errors), and I can reproduce the bug in exactly the same way.

And I stumbled upon this:

    public Rule CharRange(final char c, final char c2) {
        HashMap<CachingGenerator.Arguments, Rule> cache$CharRange;
        if ((cache$CharRange = this.cache$CharRange) == null) {
            cache$CharRange = (this.cache$CharRange = new
                HashMap<CachingGenerator.Arguments, Rule>());
        }
        final CachingGenerator.Arguments cachingGenerator$Arguments;
        final Rule rule = cache$CharRange.get(cachingGenerator$Arguments = new CachingGenerator.Arguments(new Object[] { c, c2 }));
        if (rule != null) {
            return rule;
        }
        final ProxyMatcher proxyMatcher = new ProxyMatcher();
        this.cache$CharRange.put(cachingGenerator$Arguments, proxyMatcher);
        final Rule o2;
        final Rule o = o2 = ((c == c2) ? this.Ch(c) : new
            CharRangeMatcher(c, c2));
        proxyMatcher.arm((Matcher)o);
        this.cache$CharRange.put(cachingGenerator$Arguments, o);
        return o2;
    }

javap -c output follows, but in the first time I'd like to know: why is a ProxyMatcher created at all since it is not even put into the map? It is armed with "o", but "o" is put and not "proxyMatcher". Is this normal?

javap -c output:

public org.parboiled.Rule CharRange(char, char);
  Code:
   0:   aload_0
   1:   getfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
   4:   dup
   5:   ifnonnull       21
   8:   pop
   9:   aload_0
   10:  new     #113; //class java/util/HashMap
   13:  dup_x1
   14:  dup
   15:  invokespecial   #117; //Method java/util/HashMap."<init>":()V
   18:  putfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
   21:  new     #194; //class org/parboiled/transform/CachingGenerator$Arguments
   24:  dup
   25:  bipush  2
   27:  anewarray       #196; //class java/lang/Object
   30:  dup
   31:  bipush  0
   33:  iload_1
   34:  invokestatic    #183; //Method java/lang/Character.valueOf:(C)Ljava/lang/Character;
   37:  aastore
   38:  dup
   39:  bipush  1
   41:  iload_2
   42:  invokestatic    #183; //Method java/lang/Character.valueOf:(C)Ljava/lang/Character;
   45:  aastore
   46:  invokespecial   #199; //Method org/parboiled/transform/CachingGenerator$Arguments."<init>":([Ljava/lang/Object;)V
   49:  dup
   50:  astore_3
   51:  invokevirtual   #121; //Method java/util/HashMap.get:(Ljava/lang/Object;)Ljava/lang/Object;
   54:  checkcast       #123; //class org/parboiled/Rule
   57:  dup
   58:  ifnull  62
   61:  areturn
   62:  pop
   63:  new     #125; //class org/parboiled/matchers/ProxyMatcher
   66:  dup
   67:  invokespecial   #126; //Method org/parboiled/matchers/ProxyMatcher."<init>":()V
   70:  dup
   71:  aload_3
   72:  swap
   73:  aload_0
   74:  getfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
   77:  dup_x2
   78:  pop
   79:  invokevirtual   #130; //Method java/util/HashMap.put:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
   82:  pop
   83:  iload_1
   84:  iload_2
   85:  if_icmpne       96
   88:  aload_0
   89:  iload_1
   90:  invokevirtual   #143; //Method org/parboiled/BaseParser.Ch:(C)Lorg/parboiled/Rule;
   93:  goto    105
   96:  new     #201; //class org/parboiled/matchers/CharRangeMatcher
   99:  dup
   100: iload_1
   101: iload_2
   102: invokespecial   #204; //Method org/parboiled/matchers/CharRangeMatcher."<init>":(CC)V
   105: dup_x1
   106: checkcast       #161; //class org/parboiled/matchers/Matcher
   109: invokevirtual   #165; //Method org/parboiled/matchers/ProxyMatcher.arm:(Lorg/parboiled/matchers/Matcher;)V
   112: dup
   113: aload_3
   114: swap
   115: aload_0
   116: getfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
   119: dup_x2
   120: pop
   121: invokevirtual   #130; //Method java/util/HashMap.put:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
   124: pop
   125: areturn

fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CharRange: some generated code looks bizarre...

fge
fge wrote
[...]
And I stumbled upon this:
In fact, CharRange is not the only one; Ch() also does that.

Does it mean parboiled relies on the JIT to eliminate these basically dead code paths?
fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CharRange: some generated code looks bizarre...

fge
OK, my bad, I have misundertood its role.

In fact, it is pretty much a placeholder waiting for a _real_ rule to be generated, right?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CharRange: some generated code looks bizarre...

mathias
Administrator
In reply to this post by fge
Francis,

the logic with caching and proxying is as follows:

Rule methods create the rule tree at the time the parser is constructed, so before parser runtime.
Now, if you have a cyclic rules like this one:

    Rule LotsOfAs() {
        return Sequence(IgnoreCase('a'), Optional(LotsOfAs()));
    }

your IDE might even warn you that this code will result in an infinite recursion.
Parboiled prevents that by wrapping your actual rule code with caching and proxying logic.

After rewriting this method will look like this (pseudo-code):

    Rule LotsOfAs() {
        if (<cache> != null) return <cache>;
        <cache> = new ProxyMatcher();
        val actualRule = Sequence(IgnoreCase('a'), Optional(LotsOfAs()));
        <proxyMatcher>.arm(actualRule);
        <cache> = new ProxyMatcher();
        return actualRule;
    }

The idea is that we put a proxyMatcher into the cache *before* executing the actual rule construction code because the latter might already try to reenter the same rule method.
If it does it will get back the proxyMatcher.
Therefore, after the actual rule construction logic has run, we arm the proxyMatcher, which closes any circular rule dependency chains (if they exist).
In the end, we replace the proxy matcher in the cache with the actual rule node because, one the actual rule node is available, we can skip the one level of indirection (the proxyMatcher) for all rules that call the rule method *after* it has run the first time.

In the case of your example the actual rule doesn’t call reenter, so creating the proxyMatcher is a wasted efforts. However, currently parboiled isn’t smart enough to detect these inefficiencies.
It would be possible to add another step to the transformation pipeline that analyses the bytecode-to-be-generated and remove superfluous proxyMatcher construction when the rule doesn’t actually reenter. However, since reentering might happen through (potentially long) chains of other rules this is not as easy as it might seem in the beginning.
A simpler approach would be to remove the proxyMatcher only if the rule doesn’t call *any other* rule (as in your example). This is easier to do.
However, the current behaviour appears to be correct, just not as efficient as possible during rule construction.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 19 Apr 2014, at 12:38, fge [via parboiled users] <[hidden email]> wrote:

> Hello,
>
> So, there I am, I track down the first bug open on the new organization, and I have spotted some bizarre stuff...
>
> What I have done is dump the whole byte code of the generated $$parboiled class, put it in the branch to debug, decompiled it with https://bitbucket.org/mstrobel/procyon, fixed the code (genericize, remove casts, fix some errors), and I can reproduce the bug in exactly the same way.
>
> And I stumbled upon this:
>
>     public Rule CharRange(final char c, final char c2) {
>         HashMap<CachingGenerator.Arguments, Rule> cache$CharRange;
>         if ((cache$CharRange = this.cache$CharRange) == null) {
>             cache$CharRange = (this.cache$CharRange = new
>                 HashMap<CachingGenerator.Arguments, Rule>());
>         }
>         final CachingGenerator.Arguments cachingGenerator$Arguments;
>         final Rule rule = cache$CharRange.get(cachingGenerator$Arguments = new CachingGenerator.Arguments(new Object[] { c, c2 }));
>         if (rule != null) {
>             return rule;
>         }
>         final ProxyMatcher proxyMatcher = new ProxyMatcher();
>         this.cache$CharRange.put(cachingGenerator$Arguments, proxyMatcher);
>         final Rule o2;
>         final Rule o = o2 = ((c == c2) ? this.Ch(c) : new
>             CharRangeMatcher(c, c2));
>         proxyMatcher.arm((Matcher)o);
>         this.cache$CharRange.put(cachingGenerator$Arguments, o);
>         return o2;
>     }
>
>
> javap -c output follows, but in the first time I'd like to know: why is a ProxyMatcher created at all since it is not even put into the map? It is armed with "o", but "o" is put and not "proxyMatcher". Is this normal?
>
> javap -c output:
>
> public org.parboiled.Rule CharRange(char, char);
>   Code:
>    0:   aload_0
>    1:   getfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
>    4:   dup
>    5:   ifnonnull       21
>    8:   pop
>    9:   aload_0
>    10:  new     #113; //class java/util/HashMap
>    13:  dup_x1
>    14:  dup
>    15:  invokespecial   #117; //Method java/util/HashMap."<init>":()V
>    18:  putfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
>    21:  new     #194; //class org/parboiled/transform/CachingGenerator$Arguments
>    24:  dup
>    25:  bipush  2
>    27:  anewarray       #196; //class java/lang/Object
>    30:  dup
>    31:  bipush  0
>    33:  iload_1
>    34:  invokestatic    #183; //Method java/lang/Character.valueOf:(C)Ljava/lang/Character;
>    37:  aastore
>    38:  dup
>    39:  bipush  1
>    41:  iload_2
>    42:  invokestatic    #183; //Method java/lang/Character.valueOf:(C)Ljava/lang/Character;
>    45:  aastore
>    46:  invokespecial   #199; //Method org/parboiled/transform/CachingGenerator$Arguments."<init>":([Ljava/lang/Object;)V
>    49:  dup
>    50:  astore_3
>    51:  invokevirtual   #121; //Method java/util/HashMap.get:(Ljava/lang/Object;)Ljava/lang/Object;
>    54:  checkcast       #123; //class org/parboiled/Rule
>    57:  dup
>    58:  ifnull  62
>    61:  areturn
>    62:  pop
>    63:  new     #125; //class org/parboiled/matchers/ProxyMatcher
>    66:  dup
>    67:  invokespecial   #126; //Method org/parboiled/matchers/ProxyMatcher."<init>":()V
>    70:  dup
>    71:  aload_3
>    72:  swap
>    73:  aload_0
>    74:  getfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
>    77:  dup_x2
>    78:  pop
>    79:  invokevirtual   #130; //Method java/util/HashMap.put:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>    82:  pop
>    83:  iload_1
>    84:  iload_2
>    85:  if_icmpne       96
>    88:  aload_0
>    89:  iload_1
>    90:  invokevirtual   #143; //Method org/parboiled/BaseParser.Ch:(C)Lorg/parboiled/Rule;
>    93:  goto    105
>    96:  new     #201; //class org/parboiled/matchers/CharRangeMatcher
>    99:  dup
>    100: iload_1
>    101: iload_2
>    102: invokespecial   #204; //Method org/parboiled/matchers/CharRangeMatcher."<init>":(CC)V
>    105: dup_x1
>    106: checkcast       #161; //class org/parboiled/matchers/Matcher
>    109: invokevirtual   #165; //Method org/parboiled/matchers/ProxyMatcher.arm:(Lorg/parboiled/matchers/Matcher;)V
>    112: dup
>    113: aload_3
>    114: swap
>    115: aload_0
>    116: getfield        #192; //Field cache$CharRange:Ljava/util/HashMap;
>    119: dup_x2
>    120: pop
>    121: invokevirtual   #130; //Method java/util/HashMap.put:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>    124: pop
>    125: areturn
>
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/CharRange-some-generated-code-looks-bizarre-tp4024296.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CharRange: some generated code looks bizarre...

fge
Hello again!
mathias wrote
Francis,

the logic with caching and proxying is as follows:

Rule methods create the rule tree at the time the parser is constructed, so before parser runtime.
Now, if you have a cyclic rules like this one:

    Rule LotsOfAs() {
        return Sequence(IgnoreCase('a'), Optional(LotsOfAs()));
    }

your IDE might even warn you that this code will result in an infinite recursion.
Parboiled prevents that by wrapping your actual rule code with caching and proxying logic.
Yes, that is what I have understood since then. Delving into the code little by little... I am far to understand it all though!

After rewriting this method will look like this (pseudo-code):

    Rule LotsOfAs() {
        if (<cache> != null) return <cache>;
        <cache> = new ProxyMatcher();
        val actualRule = Sequence(IgnoreCase('a'), Optional(LotsOfAs()));
        <proxyMatcher>.arm(actualRule);
        <cache> = new ProxyMatcher();
        return actualRule;
    }

The idea is that we put a proxyMatcher into the cache *before* executing the actual rule construction code because the latter might already try to reenter the same rule method.
If it does it will get back the proxyMatcher.
Therefore, after the actual rule construction logic has run, we arm the proxyMatcher, which closes any circular rule dependency chains (if they exist).
Hmmm, are you sure about the second "<cache> = new ProxyMatcher()"? Well, I'll look at the generated source again and see.

Also, the bug I fixed was about a proxy matcher actually arming with another proxy matcher, in fact...

In the end, we replace the proxy matcher in the cache with the actual rule node because, one the actual rule node is available, we can skip the one level of indirection (the proxyMatcher) for all rules that call the rule method *after* it has run the first time.

In the case of your example the actual rule doesn’t call reenter, so creating the proxyMatcher is a wasted efforts. However, currently parboiled isn’t smart enough to detect these inefficiencies.
It would be possible to add another step to the transformation pipeline that analyses the bytecode-to-be-generated and remove superfluous proxyMatcher construction when the rule doesn’t actually reenter. However, since reentering might happen through (potentially long) chains of other rules this is not as easy as it might seem in the beginning.
A simpler approach would be to remove the proxyMatcher only if the rule doesn’t call *any other* rule (as in your example). This is easier to do.
However, the current behaviour appears to be correct, just not as efficient as possible during rule construction.
It is on my todo list already ;) The "no other rule" case and the rest afterwards...

While we are at it I have a question: why are there strings of bytecode dumps in test? Is that to verify the generated bytecode?

If yes, it should be possible to verify that instead by spying on the RuleMethodVisitors, right?
Loading...