Strange performance issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Strange performance issue

choeger
Hi all,

as I told you in an earlier thread, I am using parboiled for my Modelica frontend.

https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/show?rev=scala_port

You should be able to download the source via git:

https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/show?rev=scala_port

I have a small testsuite for some particular AST elements parsing them directly with the appropriate rule. Lately I discovered scalatests ability to report the runtime of every single testcase. Doing so, I noticed that some parser tests needed more than 200x the time of some similar tests.

I cannot provide a smaller example, as I am clueless about where to start. But here is how to check this yourself. Simply clone the git repository, start a sbt shell and type:

test-only de.tuberlin.uebb.modelica.modim.tests.ParboiledParserSpec -- -oD

You will (hopefully) get 84 succeeded tests. Along the lines:

- Should parse large strings (32 milliseconds)

This is a lexer regression test:

 
    it("Should parse large strings") {
      val x = ("ABC" * (1000))
      ("\"" + x + "\"") parsed_with MoParser.exp should create (StringLit(x))
    }

It gives you some kind of reference value for the maximum parser performance.

No I also see:

[info] - Should parse break statements (256 milliseconds)
[info] - Should parse return statements (1 millisecond)

And this is a big WTF for me, since both test cases look similar:

    it("Should parse break statements") {
      "break;" parsed_with MoParser.statement should create (Break())
    }

    it("Should parse return statements") {
      "return;" parsed_with MoParser.statement should create (Return())
    }

They both use the same rule:

    def statement : Rule1[Statement] = rule {
      exp ~ optional(":= " ~ exp) ~ comment ~ "; " ~~> mkStmt _ |
      kw("for") ~ oneOrMore(index) ~ kw("loop") ~ zeroOrMore(statement) ~ kw("end") ~ kw("for") ~ comment ~ "; " ~~> ForStmt |
      kw("while") ~ exp ~ kw("loop") ~ oneOrMore(statement)  ~ kw("end") ~ kw("while") ~ comment ~ "; " ~~> WhileStmt |   
      kw("break") ~ comment ~ "; " ~~> Break |
      kw("return") ~ comment ~ "; " ~~> Return |
      kw("if") ~ exp ~ kw("then") ~ zeroOrMore(statement) ~ zeroOrMore(elseifstmt) ~ (kw("else") ~ zeroOrMore(statement) | push(List())) ~ kw("end") ~ kw("if") ~ comment ~ "; " ~~> IfStmt |
      kw("when") ~ exp ~ kw("then") ~ oneOrMore(statement) ~ zeroOrMore(elsewhenstmt) ~ kw("end") ~ kw("when") ~ comment ~ "; " ~~> WhenStmt
    }

Does anyone have a clue why there is such a big difference in between the two?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Strange performance issue

mathias
Administrator
From the looks of your `statement` rule I'd suspect the first branch to be the culprit:

    exp ~ optional(":= " ~ exp) ~ comment ~ "; " ~~> mkStmt _

Apparently the `exp` rule takes a long time to determine that a single "return;" doesn't match.

There are several way for you to verify this:

1. Add a negative syntactic predicate as such:

            !kw("return") ~ exp ~ optional(":= " ~ exp) ~ comment ~ "; " ~~> mkStmt _

    If this makes a difference you know where to dig deeper.

2. Use the [TracingParseRunner][1] to "watch" the parser as it digests the "return;". This should allow you to see exactly, what's going on.

HTH and cheers,
Mathias

  [1]: https://github.com/sirthias/parboiled/wiki/Grammar-and-Parser-Debugging

---
[hidden email]
http://www.parboiled.org

On 02.10.2012, at 10:42, choeger [via parboiled users] wrote:

> Hi all,
>
> as I told you in an earlier thread, I am using parboiled for my Modelica frontend.
>
> https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/show?rev=scala_port
>
> You should be able to download the source via git:
>
> https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/show?rev=scala_port
>
> I have a small testsuite for some particular AST elements parsing them directly with the appropriate rule. Lately I discovered scalatests ability to report the runtime of every single testcase. Doing so, I noticed that some parser tests needed more than 200x the time of some similar tests.
>
> I cannot provide a smaller example, as I am clueless about where to start. But here is how to check this yourself. Simply clone the git repository, start a sbt shell and type:
>
> test-only de.tuberlin.uebb.modelica.modim.tests.ParboiledParserSpec -- -oD
>
> You will (hopefully) get 84 succeeded tests. Along the lines:
>
> - Should parse large strings (32 milliseconds)
>
> This is a lexer regression test:
>
>  
>     it("Should parse large strings") {
>       val x = ("ABC" * (1000))
>       ("\"" + x + "\"") parsed_with MoParser.exp should create (StringLit(x))
>     }
>
>
> It gives you some kind of reference value for the maximum parser performance.
>
> No I also see:
>
> [info] - Should parse break statements (256 milliseconds)
> [info] - Should parse return statements (1 millisecond)
>
> And this is a big WTF for me, since both test cases look similar:
>
>     it("Should parse break statements") {
>       "break;" parsed_with MoParser.statement should create (Break())
>     }
>
>     it("Should parse return statements") {
>       "return;" parsed_with MoParser.statement should create (Return())
>     }
>
>
> They both use the same rule:
>
>     def statement : Rule1[Statement] = rule {
>       exp ~ optional(":= " ~ exp) ~ comment ~ "; " ~~> mkStmt _ |
>       kw("for") ~ oneOrMore(index) ~ kw("loop") ~ zeroOrMore(statement) ~ kw("end") ~ kw("for") ~ comment ~ "; " ~~> ForStmt |
>       kw("while") ~ exp ~ kw("loop") ~ oneOrMore(statement)  ~ kw("end") ~ kw("while") ~ comment ~ "; " ~~> WhileStmt |  
>       kw("break") ~ comment ~ "; " ~~> Break |
>       kw("return") ~ comment ~ "; " ~~> Return |
>       kw("if") ~ exp ~ kw("then") ~ zeroOrMore(statement) ~ zeroOrMore(elseifstmt) ~ (kw("else") ~ zeroOrMore(statement) | push(List())) ~ kw("end") ~ kw("if") ~ comment ~ "; " ~~> IfStmt |
>       kw("when") ~ exp ~ kw("then") ~ oneOrMore(statement) ~ zeroOrMore(elsewhenstmt) ~ kw("end") ~ kw("when") ~ comment ~ "; " ~~> WhenStmt
>     }
>
>
> Does anyone have a clue why there is such a big difference in between the two?
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Strange-performance-issue-tp4024079.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Strange performance issue

choeger
I would assume the same, but "return" works just fine while "break" takes >200ms. They are both keywords (and further down the road there actually is such a !kw rule when it comes to ide parsing).

I just tested your idea to be sure and it didn't change anything. I just played around with my test suite and figured it is always the first application of the statement rule (regardless of the input). So it seems to be the rule creation that causes this overhead.

Just out of curiosity: Is it expected to take that long to create a rule?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Strange performance issue

mathias
Administrator
Ahh... I see.
Yes, rule creation can take some time. Mostly because the JVM has to load a lot of classes the very first time you start the parser.
However, usually this is not a problem for an application, because you can easily "warm up" the parser right at application start, e.g. by parsing a small "warm up" input.
Once this is done all further parsing runs during the lifetime of the JVM will be fast.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 02.10.2012, at 13:19, choeger [via parboiled users] wrote:

> I would assume the same, but "return" works just fine while "break" takes >200ms. They are both keywords (and further down the road there actually is such a !kw rule when it comes to ide parsing).
>
> I just tested your idea to be sure and it didn't change anything. I just played around with my test suite and figured it is always the first application of the statement rule (regardless of the input). So it seems to be the rule creation that causes this overhead.
>
> Just out of curiosity: Is it expected to take that long to create a rule?
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Strange-performance-issue-tp4024079p4024081.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Loading...