Is lazy matching possible with Parboiled?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Is lazy matching possible with Parboiled?

John Berryman
I need to be able to parse the following line:

search_term.field.

Initially, this was easy to parse. I just scanned the line, once I reached the first period I pushed the search_term onto the value stack and then once I reached the next period I pushed the field onto the stack. However, parsing got tricky when I discovered that the search_term could also include periods. For instance, I need to be able to search for "13/32.5/12.6" in a "classification" field. If I was parsing this with regex, then I would use a lazy matcher:

/^([\w/.]+?)\.(\w+)\.$/

Is there anything similar that I can use with parboiled? I've experimented with various arrangements of Test() and TestNot(), but haven't made any headway.

Thanks!
John
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is lazy matching possible with Parboiled?

mathias
Administrator
John,

how about something along these lines:

Rule Clause() { return Sequence(SearchTerm(), Field()); }

Rule SearchTerm { return Sequence(TermElem(), ZeroOrMore(Dot(), TermElem())); }

Rule TermElem { return Sequence(TestNot(Field()), Word()); }

Rule Field() { return Sequence(Word(), Dot(), EOI); }

Rule Word() { return OneOrMore(NonDot()); }

Rule NonDot() { return Sequence(TestNot(Dot()), ANY); }

Rule Dot() { return ch('.'); }

HTH and cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 15.07.2012, at 23:21, John Berryman [via parboiled users] wrote:

> I need to be able to parse the following line:
>
> search_term.field.
>
> Initially, this was easy to parse. I just scanned the line, once I reached the first period I pushed the search_term onto the value stack and then once I reached the next period I pushed the field onto the stack. However, parsing got tricky when I discovered that the search_term could also include periods. For instance, I need to be able to search for "13/32.5/12.6" in a "classification" field. If I was parsing this with regex, then I would use a lazy matcher:
>
> /^([\w/.]+?)\.(\w+)\.$/
>
> Is there anything similar that I can use with parboiled? I've experimented with various arrangements of Test() and TestNot(), but haven't made any headway.
>
> Thanks!
> John
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is lazy matching possible with Parboiled?

John Berryman
Hey Mathias,

It's great to see how readily available you are in the community you've created. Other work is getting in the way right now, but I hope to get back to this in the next couple of days. I just wanted to let you know we're not ignoring you.

Best,
John

On Mon, Jul 16, 2012 at 4:53 AM, mathias [via parboiled users] <[hidden email]> wrote:
John,

how about something along these lines:

Rule Clause() { return Sequence(SearchTerm(), Field()); }

Rule SearchTerm { return Sequence(TermElem(), ZeroOrMore(Dot(), TermElem())); }

Rule TermElem { return Sequence(TestNot(Field()), Word()); }

Rule Field() { return Sequence(Word(), Dot(), EOI); }

Rule Word() { return OneOrMore(NonDot()); }

Rule NonDot() { return Sequence(TestNot(Dot()), ANY); }

Rule Dot() { return ch('.'); }

HTH and cheers,
Mathias

---
[hidden email]
http://www.parboiled.org


On 15.07.2012, at 23:21, John Berryman [via parboiled users] wrote:

> I need to be able to parse the following line:
>
> search_term.field.
>
> Initially, this was easy to parse. I just scanned the line, once I reached the first period I pushed the search_term onto the value stack and then once I reached the next period I pushed the field onto the stack. However, parsing got tricky when I discovered that the search_term could also include periods. For instance, I need to be able to search for "13/32.5/12.6" in a "classification" field. If I was parsing this with regex, then I would use a lazy matcher:
>
> /^([\w/.]+?)\.(\w+)\.$/
>
> Is there anything similar that I can use with parboiled? I've experimented with various arrangements of Test() and TestNot(), but haven't made any headway.
>
> Thanks!
> John
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML




If you reply to this email, your message will be added to the discussion below:
http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041p4024042.html
To unsubscribe from Is lazy matching possible with Parboiled?, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is lazy matching possible with Parboiled?

John Berryman
In reply to this post by mathias
That seemed like it should have worked, and it was frustratingly close to working, but it did not. I was concerned that I was in one of the non-LL(k) parseable edge cases but then I figured it out:

Rule Clause() { return Sequence(SearchTerm(), Field()); } 
Rule SearchTerm() { return Sequence(Word(), ZeroOrMore(TestNot(Field()),Dot(), Word())); } 
Rule Field() { return Sequence(Dot(), Word(), Dot(), EOI); } 
Rule Word() { return OneOrMore(NonDot()); } 
Rule NonDot() { return Sequence(TestNot(Dot()), ANY); } 
Rule Dot() { return Ch('.'); } 

You can see that it is similar to your recommendation.

So it seems likely that the next generation of search at the United States Patent and Trademark Office will use Parboiled as its query parser. If you're interested in checking out our work with the Patent Office, I'd love to give you a tour. It's likely that we'll have a chance to use Parboiled again in the future and we'd love to have your help. If you're interested, please contact me: [hidden email].

Thanks!
-John

On Mon, Jul 16, 2012 at 4:53 AM, mathias [via parboiled users] <[hidden email]> wrote:
John,

how about something along these lines:

Rule Clause() { return Sequence(SearchTerm(), Field()); }

Rule SearchTerm { return Sequence(TermElem(), ZeroOrMore(Dot(), TermElem())); }

Rule TermElem { return Sequence(TestNot(Field()), Word()); }

Rule Field() { return Sequence(Word(), Dot(), EOI); }

Rule Word() { return OneOrMore(NonDot()); }

Rule NonDot() { return Sequence(TestNot(Dot()), ANY); }

Rule Dot() { return ch('.'); }

HTH and cheers,
Mathias

---
[hidden email]
http://www.parboiled.org


On 15.07.2012, at 23:21, John Berryman [via parboiled users] wrote:

> I need to be able to parse the following line:
>
> search_term.field.
>
> Initially, this was easy to parse. I just scanned the line, once I reached the first period I pushed the search_term onto the value stack and then once I reached the next period I pushed the field onto the stack. However, parsing got tricky when I discovered that the search_term could also include periods. For instance, I need to be able to search for "13/32.5/12.6" in a "classification" field. If I was parsing this with regex, then I would use a lazy matcher:
>
> /^([\w/.]+?)\.(\w+)\.$/
>
> Is there anything similar that I can use with parboiled? I've experimented with various arrangements of Test() and TestNot(), but haven't made any headway.
>
> Thanks!
> John
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML




If you reply to this email, your message will be added to the discussion below:
http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041p4024042.html
To unsubscribe from Is lazy matching possible with Parboiled?, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is lazy matching possible with Parboiled?

mathias
Administrator
John,

glad you were able to find a working solution!

> So it seems likely that the next generation of search at the United States Patent and Trademark Office will use Parboiled as its query parser.

Cool! It's great to see parboiled adding value (even though some might debate the overall value created by the USPTO ...).

> It's likely that we'll have a chance to use Parboiled again in the future and we'd love to have your help.

If support on this list doesn't suffice you can also reach me under "mathias AT parboiled.org"

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 26.07.2012, at 04:15, John Berryman [via parboiled users] wrote:

> That seemed like it should have worked, and it was frustratingly close to working, but it did not. I was concerned that I was in one of the non-LL(k) parseable edge cases but then I figured it out:
>
> Rule Clause() { return Sequence(SearchTerm(), Field()); }
> Rule SearchTerm() { return Sequence(Word(), ZeroOrMore(TestNot(Field()),Dot(), Word())); }
> Rule Field() { return Sequence(Dot(), Word(), Dot(), EOI); }
> Rule Word() { return OneOrMore(NonDot()); }
> Rule NonDot() { return Sequence(TestNot(Dot()), ANY); }
> Rule Dot() { return Ch('.'); }
>
> You can see that it is similar to your recommendation.
>
> So it seems likely that the next generation of search at the United States Patent and Trademark Office will use Parboiled as its query parser. If you're interested in checking out our work with the Patent Office, I'd love to give you a tour. It's likely that we'll have a chance to use Parboiled again in the future and we'd love to have your help. If you're interested, please contact me: [hidden email].
>
> Thanks!
> -John
>
> On Mon, Jul 16, 2012 at 4:53 AM, mathias [via parboiled users] <[hidden email]> wrote:
> John,
>
> how about something along these lines:
>
> Rule Clause() { return Sequence(SearchTerm(), Field()); }
>
> Rule SearchTerm { return Sequence(TermElem(), ZeroOrMore(Dot(), TermElem())); }
>
> Rule TermElem { return Sequence(TestNot(Field()), Word()); }
>
> Rule Field() { return Sequence(Word(), Dot(), EOI); }
>
> Rule Word() { return OneOrMore(NonDot()); }
>
> Rule NonDot() { return Sequence(TestNot(Dot()), ANY); }
>
> Rule Dot() { return ch('.'); }
>
> HTH and cheers,
> Mathias
>
> ---
> [hidden email]
> http://www.parboiled.org
>
>
> On 15.07.2012, at 23:21, John Berryman [via parboiled users] wrote:
>
> > I need to be able to parse the following line:
> >
> > search_term.field.
> >
> > Initially, this was easy to parse. I just scanned the line, once I reached the first period I pushed the search_term onto the value stack and then once I reached the next period I pushed the field onto the stack. However, parsing got tricky when I discovered that the search_term could also include periods. For instance, I need to be able to search for "13/32.5/12.6" in a "classification" field. If I was parsing this with regex, then I would use a lazy matcher:
> >
> > /^([\w/.]+?)\.(\w+)\.$/
> >
> > Is there anything similar that I can use with parboiled? I've experimented with various arrangements of Test() and TestNot(), but haven't made any headway.
> >
> > Thanks!
> > John
> >
> > If you reply to this email, your message will be added to the discussion below:
> > http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041.html
> > To start a new topic under parboiled users, email [hidden email]
> > To unsubscribe from parboiled users, click here.
> > NAML
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041p4024042.html
> To unsubscribe from Is lazy matching possible with Parboiled?, click here.
> NAML
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-lazy-matching-possible-with-Parboiled-tp4024041p4024050.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Loading...