Parsing ahead

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Parsing ahead

Eric Torreborre
Hi,

I'm confronted with the following situation which I'd like to know how to handle gracefully with Parboiled.

I want to be able to parse something like that

 1. some text, 1905
 2. some other text, with a comma, 1905
 3. some very long text with 1980, a year, a comma, 1905

 I want the resulting value objects to be

 1. Text("some text") ~ Year(1905)
 2. Text("some other text, with a comma") ~ Year(1905)
 3. Text("some very long text with 1980, a year, a comma") ~ Year(1905)

So, in more general terms, I want to parse *everything* until I meet "," ~ nTimes(4, digit)

What would be the rules for that?

Thanks,

Eric.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

mathias
Administrator
Eric,

the solution is one of the two syntactic predicates available in PEGs.
These syntactic predicates are the elements that give PEGs their expressive edge over other grammar types (like CFGs):

        def TextCommaYear = rule { oneOrMore(!CommaYear ~ ANY) ~ CommaYear }
        def CommaYear = rule { "," ~ nTimes(4, Digit) }

Notice the "!" operator in front of the CommaYear rule. It doesn't consume any characters but only matches if its inner rule doesn't match.
Of course you can wrap the "!CommaYear ~ ANY" rule with zeroOrMore instead of oneOrMore if your use case requires it.

HTH and cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 13.10.2011, at 00:03, Eric Torreborre [via parboiled users] wrote:

> Hi,
>
> I'm confronted with the following situation which I'd like to know how to handle gracefully with Parboiled.
>
> I want to be able to parse something like that
>
>  1. some text, 1905
>  2. some other text, with a comma, 1905
>  3. some very long text with 1980, a year, a comma, 1905
>
>  I want the resulting value objects to be
>
>  1. Text("some text") ~ Year(1905)
>  2. Text("some other text, with a comma") ~ Year(1905)
>  3. Text("some very long text with 1980, a year, a comma") ~ Year(1905)
>
> So, in more general terms, I want to parse *everything* until I meet "," ~ nTimes(4, digit)
>
> What would be the rules for that?
>
> Thanks,
>
> Eric.
>
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-ahead-tp3417121p3417121.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

etorreborre
Fantastic, this works perfectly!

I tried to use ! before but I didn't do it right. I also tried 2 self-recursive rules but I was having hard times putting the actions in.

One more question: is there a shortcut for:

  rule1 ~> ((s: String) => s)

Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

mathias
Administrator
> One more question: is there a shortcut for:
>
>  rule1 ~> ((s: String) => s)

Yes, I always use

        rule1 ~> identity

which doesn't perfectly convey what's going on but is shorter and saves the extra .class creation.

Cheers,
Mathias

---
[hidden email]
http://www.spray.cc

On 13.10.2011, at 11:38, etorreborre [via parboiled users] wrote:

> Fantastic, this works perfectly!
>
> I tried to use ! before but I didn't do it right. I also tried 2 self-recursive rules but I was having hard times putting the actions in.
>
> One more question: is there a shortcut for:
>
>  rule1 ~> ((s: String) => s)
>
> Thanks.
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-ahead-tp3417121p3418159.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

---
[hidden email]
http://www.parboiled.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

etorreborre
Yes, it works indeed. I think I tried it before but the compiler was not happy.

One (last?) question. I'd be interested to know if you think that there is another way to do what I wanted to do with 2 recursive rules instead of a predicate. And if so, how do you place the actions?

E.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

mathias
Administrator
Can you post what you had?
I'm not sure what it is you are going after with two recursive rules..

There are really two ways to build a rule for the kind of situation you described:
1. Constructively: just define everything that is allowed to match. This can sometimes be to large and/or to complex.
2. Via exclusion: If option 1 is not possible PEGs give you the option of defining what you _don't_ want to match and put this into a predicate before matching ANY.

If the alternative you are looking for does not use predicates it must be some type of option 1...

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 13.10.2011, at 11:57, etorreborre [via parboiled users] wrote:

> Yes, it works indeed. I think I tried it before but the compiler was not happy.
>
> One (last?) question. I'd be interested to know if you think that there is another way to do what I wanted to do with 2 recursive rules instead of a predicate. And if so, how do you place the actions?
>
> E.
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-ahead-tp3417121p3418207.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

etorreborre
I was trying to go for something like that:

  def commaSep = rule { "," ~ space }
  def ref   = rule { group(word ~ commaSep) ~ ref1 }
  def ref1 = rule { year | ref }

But then:

 1. adding actions and typing ref and ref1 was difficult: I assume that they should be of type Rule2[String, Year]

 2. placing the actions so that everything compiles was also hard
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

etorreborre
Btw, my previous attempt at this problem using Scala Parser combinators was using option1, by chaining alternatives so that I could have 5 repetitions of comma. This would match:

Freud, Lacan, Winnicott, Klein, Bion, 1905

but this would not:

Freud, Reich, Lacan, Winnicott, Klein, Bion, 1905

As I was revisiting my previous post, I wanted to find a more generic solution. Do you think that I could have used "not" and Scala's standard parsing library instead of ! in parboiled?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

mathias
Administrator
Eric,

> I was trying to go for something like that:
>
>   def commaSep = rule { "," ~ space }
>   def ref   = rule { group(word ~ commaSep) ~ ref1 }
>   def ref1 = rule { year | ref }

I see.
Yeah, that would work in an untyped setting.
But since parboiled encourages proper rule typing you are somewhat at a loss here since.

> Do you think that I could have used "not" and Scala's standard parsing library instead of ! in parboiled?

To be honest, I don't really have a great lot of experience with the parser combinators as I have been a fan of PEGs long before I joined the Scala universe.
The parser combinators are similar to PEGs, in a way, but not quite. Unfortunately the theoretical differences are somewhat beyond me.
However, I'm sure there is a solution in the combinator world for this problem, as it appears very often in real-life scenarios.
It could well be that the "not" operator is an option, but unfortunately I can't give you an example...

However, if you would like some more assistance with regard to the third post of your mini-series I'd be more than happy to be of help!

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 13.10.2011, at 12:33, etorreborre [via parboiled users] wrote:

> Btw, my previous attempt at this problem using Scala Parser combinators was using option1, by chaining alternatives so that I could have 5 repetitions of comma. This would match:
>
> Freud, Lacan, Winnicott, Klein, Bion, 1905
>
> but this would not:
>
> Freud, Reich, Lacan, Winnicott, Klein, Bion, 1905
>
> As I was revisiting my previous post, I wanted to find a more generic solution. Do you think that I could have used "not" and Scala's standard parsing library instead of ! in parboiled?
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-ahead-tp3417121p3418260.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

Eric Torreborre
When I finish rewriting my grammar with Parboiled, I'll see if I can rewrite a better one with scala lib. And I'll post on the differences I saw. The most obvious one for now is the way actions are declared and values managed.

Thanks for your swift answers.

Eric.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing ahead

mathias
Administrator
Eric,

sounds very interesting.
I'm looking forward to your findings...

Maybe you'll arrive at some ideas on how to further improve the parboiled for scala experience.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 14.10.2011, at 01:12, Eric Torreborre [via parboiled users] wrote:

> When I finish rewriting my grammar with Parboiled, I'll see if I can rewrite a better one with scala lib. And I'll post on the differences I saw. The most obvious one for now is the way actions are declared and values managed.
>
> Thanks for your swift answers.
>
> Eric.
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-ahead-tp3417121p3420156.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Loading...