Quantcast

Clever way to get positions for AST

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Clever way to get positions for AST

choeger
Hi all,

first of all, there is now one more rather complex Parboiled Parser in
the World. It's a Modelica one:

https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/revisions/scala_port/entry/src/main/scala/MoParboiled.scala

This one does around 33k loc/s which is quite far from the Java example,
but it also creates an AST ;). Athough I find the runtime
satisfactionary, performance tips are always welcome.

Since I am now going to port my context checks to scala, I was wondering
if there was a smart way to handle input positions (for error reporting).

In my old Java implementation, every AST node had some position and
source fields. IMO this is a waste of space, clutters the data
structures and makes things like tree-rewriting more complex.

So is there a smarter way? IMO it is acceptable to have spend more time
in error reporting, since this is the "special case". I was thinking
about re-running the parser and somehow store positions with every AST
object in a map or something. Is it possible to instrument a Parboiled
Parser in that way (e.g. somehow overwrite the push method)?

best regards,

Christoph

--
Christoph Höger

Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Übersetzerbau und Programmiersprachen

Sekr. TEL12-2, Ernst-Reuter-Platz 7, 10587 Berlin

Tel.: +49 (30) 314-24890
E-Mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

mathias
Administrator
Christoph,

> first of all, there is now one more rather complex Parboiled Parser in
> the World. It's a Modelica one:
>
> https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/revisions/scala_port/entry/src/main/scala/MoParboiled.scala

Very nice!!
Thanks for the heads up.

> This one does around 33k loc/s which is quite far from the Java example,
> but it also creates an AST ;). Athough I find the runtime
> satisfactionary, performance tips are always welcome.

Yes, AST creation surely does take time.
From a quick look at your parser I think it might benefit (very slightly) from using the `push` helper in all places very you currently do things like

    kw("class") ~> {_=>Class}

With `push` it would look like this:

   kw("class") ~ push(Class)

and you would remove an anonymous function class creation along with the extraction of the matched string (which you don't use).

> Since I am now going to port my context checks to scala, I was wondering
> if there was a smart way to handle input positions (for error reporting).
>
> In my old Java implementation, every AST node had some position and
> source fields. IMO this is a waste of space, clutters the data
> structures and makes things like tree-rewriting more complex.
>
> So is there a smarter way? IMO it is acceptable to have spend more time
> in error reporting, since this is the "special case". I was thinking
> about re-running the parser and somehow store positions with every AST
> object in a map or something. Is it possible to instrument a Parboiled
> Parser in that way (e.g. somehow overwrite the push method)?

Hmm... have you seen the `withContext` helpers, that allow you to get to parsing context from within an action function?
They might provide a way.

However, I would seriously weigh the benefits of not storing the input positions in all parsing runs against the significant complexity added by a two-run architecture for errors. I assume that you rely on parboileds error reporting for producing parse error messages. So, the error messages that you like to produce yourself are not syntax errors but rather some higher-level semantic errors that are only detected in later stages of your processing pipeline.
Having to restart the whole pipeline from scratch, this time with recording or input positions, seems like quite a heavy thing for such a small optimization.

The positioning information should add more than one or two Ints to your AST nodes and it really takes no time to pull it from the parsing context.

HTH and cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 14.09.2012, at 11:22, choeger [via parboiled users] wrote:

> Hi all,
>
> first of all, there is now one more rather complex Parboiled Parser in
> the World. It's a Modelica one:
>
> https://mlcontrol.uebb.tu-berlin.de/redmine/projects/modim/repository/revisions/scala_port/entry/src/main/scala/MoParboiled.scala
>
> This one does around 33k loc/s which is quite far from the Java example,
> but it also creates an AST ;). Athough I find the runtime
> satisfactionary, performance tips are always welcome.
>
> Since I am now going to port my context checks to scala, I was wondering
> if there was a smart way to handle input positions (for error reporting).
>
> In my old Java implementation, every AST node had some position and
> source fields. IMO this is a waste of space, clutters the data
> structures and makes things like tree-rewriting more complex.
>
> So is there a smarter way? IMO it is acceptable to have spend more time
> in error reporting, since this is the "special case". I was thinking
> about re-running the parser and somehow store positions with every AST
> object in a map or something. Is it possible to instrument a Parboiled
> Parser in that way (e.g. somehow overwrite the push method)?
>
> best regards,
>
> Christoph
>
> --
> Christoph Höger
>
> Technische Universität Berlin
> Fakultät IV - Elektrotechnik und Informatik
> Übersetzerbau und Programmiersprachen
>
> Sekr. TEL12-2, Ernst-Reuter-Platz 7, 10587 Berlin
>
> Tel.: +49 (30) 314-24890
> E-Mail: [hidden email]
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Clever-way-to-get-positions-for-AST-tp4024061.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

choeger
The problem with the parser positions is that my AST is really quite abstract. The tree is not really in the concrete syntax tree shape anymore and during parsing I create a lot of partial nodes (basically collections) that I copy() around, maintaining the input positions on them would be cumbersome.

So for now I decided it might suffice to store the input positions only on some AST elements (basically all "create once and never touch again" elements, e.g. type-definitions, value definitions and expressions). That way, I should be able to provide meaningful error messages and avoid location-maintenance.

So I created a trait Locatable (very much like scala's Positional, just without the string contents) to hold the start and end index of a match.

So now I am searching a smart way to stuff that into the AST-nodes from the parser context. Ideally, I do not want to change the parser rules, but instead have some implicit magic apply the correct wrapper to everything that is a Locatable.

So how do I mimize the impact?

fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

fge
On Mon, Sep 17, 2012 at 6:49 PM, choeger [via parboiled users]
<[hidden email]> wrote:

> The problem with the parser positions is that my AST is really quite
> abstract. The tree is not really in the concrete syntax tree shape anymore
> and during parsing I create a lot of partial nodes (basically collections)
> that I copy() around, maintaining the input positions on them would be
> cumbersome.
>
> So for now I decided it might suffice to store the input positions only on
> some AST elements (basically all "create once and never touch again"
> elements, e.g. type-definitions, value definitions and expressions). That
> way, I should be able to provide meaningful error messages and avoid
> location-maintenance.
>
> So I created a trait Locatable (very much like scala's Positional, just
> without the string contents) to hold the start and end index of a match.
>
> So now I am searching a smart way to stuff that into the AST-nodes from the
> parser context. Ideally, I do not want to change the parser rules, but
> instead have some implicit magic apply the correct wrapper to everything
> that is a Locatable.
>
> So how do I mimize the impact?
>

Use a builder, and only build the final result when parsing is done?

--
Francis Galiegue, [hidden email]
JSON Schema: https://github.com/json-schema
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

choeger
Yes, a builder would certainly be an applicable pattern, but it would also be:

1. non-declarative
2. premature optimization (in fact, I assume the scala compiler is reusing immutable objects that are thrown away after creating a copy)

My current problem is more about an easy way to tag certain AST elements (by making them extend a certain trait) and use that tag implicitely in the parser to fill in the AST positions.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

mathias
Administrator
Christoph,

I like the idea of having parboiled recognize certain types signatures as receptacles for position information, which are then automatically filled.
However, this is not something that is currently implemented or supported.

The only way to get to positional data is through the Context, which your action functions can get to if you use the `withContext` wrappers.
If you create a custom helper function with a short name (say `#`) that decorates an action functions with `withContext` and the position extraction logic I think you should be able to keep the required changes to your rules to a minimum.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 18.09.2012, at 09:57, choeger [via parboiled users] wrote:

> Yes, a builder would certainly be an applicable pattern, but it would also be:
>
> 1. non-declarative
> 2. premature optimization (in fact, I assume the scala compiler is reusing immutable objects that are thrown away after creating a copy)
>
> My current problem is more about an easy way to tag certain AST elements (by making them extend a certain trait) and use that tag implicitely in the parser to fill in the AST positions.
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Clever-way-to-get-positions-for-AST-tp4024061p4024065.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

choeger
Just for the sake of google-completeness, in case someone else searches for a related problem. That's what I did now:

    def setLocation[A, T <: Locatable](f : (A) => T) : (A, Context[Any]) => T = {
      case ((a, ctxt)) => f(a).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
    }

    def setLocation[A,B, T <: Locatable](f : (A,B) => T) : (A, B, Context[Any]) => T = {
      case ((a,b, ctxt)) => f(a,b).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
    }

    def setLocation[A,B,C, T <: Locatable](f : (A,B,C) => T) : (A,B,C, Context[Any]) => T = {
      case ((a,b,c, ctxt)) => f(a,b,c).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
    }

    def setLocation[A,B,C,D, T <: Locatable](f : (A,B,C,D) => T) : (A,B,C,D, Context[Any]) => T = {
      case ((a,b,c,d, ctxt)) => f(a,b,c,d).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
    }

    def setLocation[A,B,C,D,E, T <: Locatable](f : (A,B,C,D,E) => T) : (A,B,C,D,E, Context[Any]) => T = {
      case ((a,b,c,d,e, ctxt)) => f(a,b,c,d,e).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
    }

    def pos[A, T <: Locatable](f : (A) => T) : (A) => T = {
      withContext(setLocation(f))
    }

    def pos[A,B, T <: Locatable](f : (A,B) => T) : (A,B) => T = {
      withContext(setLocation(f))
    }

    def pos[A,B,C, T <: Locatable](f : (A,B,C) => T) : (A,B,C) => T = {
      withContext(setLocation(f))
    }

    def pos[A,B,C,D, T <: Locatable](f : (A,B,C,D) => T) : (A,B,C,D) => T = {
      withContext(setLocation(f))
    }

    def pos[A,B,C,D,E, T <: Locatable](f : (A,B,C,D,E) => T) : (A,B,C,D,E) => T = {
      withContext(setLocation(f))
    }

    def setLocationDirect[T <: Locatable](t : T, ctxt : Context[Any]) : T = {
      t.atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
    }

    def pos[T <: Locatable](r : Rule1[T]) : Rule1[T] = rule {
      r ~~> withContext(setLocationDirect)
    }

Unfortunately there seems to be no way to avoid the tuple repitition.

Note that last two functions: They are necessary in case of fully known constructors (e.g. BoolLit(true)). In such cases parboiled will through an unsupported operation exception (probably because the withContextAction is being evaluated prior to actual parsing). There is probably some different way to avoid that evaluation (by inserting some empty abstraction or lazy construct), but it seems to work for now.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clever way to get positions for AST

mathias
Administrator
Christoph,

nice! Glad you found an acceptable solution.

> Unfortunately there seems to be no way to avoid the tuple repitition.

No, not with the current architecture.

My plans for parboiled include a Scala version that utilizes [shapeless] HLists to greatly reduce implementation overhead and increase flexibility.
Once this is available there is no need for duplicating logic across (function- and tuple-)arities anymore.
Let's hope I find the time to tackle this...

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 18.09.2012, at 11:52, choeger [via parboiled users] wrote:

> Just for the sake of google-completeness, in case someone else searches for a related problem. That's what I did now:
>
>     def setLocation[A, T <: Locatable](f : (A) => T) : (A, Context[Any]) => T = {
>       case ((a, ctxt)) => f(a).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
>     }
>
>     def setLocation[A,B, T <: Locatable](f : (A,B) => T) : (A, B, Context[Any]) => T = {
>       case ((a,b, ctxt)) => f(a,b).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
>     }
>
>     def setLocation[A,B,C, T <: Locatable](f : (A,B,C) => T) : (A,B,C, Context[Any]) => T = {
>       case ((a,b,c, ctxt)) => f(a,b,c).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
>     }
>
>     def setLocation[A,B,C,D, T <: Locatable](f : (A,B,C,D) => T) : (A,B,C,D, Context[Any]) => T = {
>       case ((a,b,c,d, ctxt)) => f(a,b,c,d).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
>     }
>
>     def setLocation[A,B,C,D,E, T <: Locatable](f : (A,B,C,D,E) => T) : (A,B,C,D,E, Context[Any]) => T = {
>       case ((a,b,c,d,e, ctxt)) => f(a,b,c,d,e).atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
>     }
>
>     def pos[A, T <: Locatable](f : (A) => T) : (A) => T = {
>       withContext(setLocation(f))
>     }
>
>     def pos[A,B, T <: Locatable](f : (A,B) => T) : (A,B) => T = {
>       withContext(setLocation(f))
>     }
>
>     def pos[A,B,C, T <: Locatable](f : (A,B,C) => T) : (A,B,C) => T = {
>       withContext(setLocation(f))
>     }
>
>     def pos[A,B,C,D, T <: Locatable](f : (A,B,C,D) => T) : (A,B,C,D) => T = {
>       withContext(setLocation(f))
>     }
>
>     def pos[A,B,C,D,E, T <: Locatable](f : (A,B,C,D,E) => T) : (A,B,C,D,E) => T = {
>       withContext(setLocation(f))
>     }
>
>     def setLocationDirect[T <: Locatable](t : T, ctxt : Context[Any]) : T = {
>       t.atLocation(LocRange(ctxt.getMatchStartIndex(), ctxt.getMatchEndIndex()))
>     }
>
>     def pos[T <: Locatable](r : Rule1[T]) : Rule1[T] = rule {
>       r ~~> withContext(setLocationDirect)
>     }
>
>
> Unfortunately there seems to be no way to avoid the tuple repitition.
>
> Note that last two functions: They are necessary in case of fully known constructors (e.g. BoolLit(true)). In such cases parboiled will through an unsupported operation exception (probably because the withContextAction is being evaluated prior to actual parsing). There is probably some different way to avoid that evaluation (by inserting some empty abstraction or lazy construct), but it seems to work for now.
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Clever-way-to-get-positions-for-AST-tp4024061p4024067.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Loading...