An ambitious project? Write a regex analyser using parboiled...

classic Classic list List threaded Threaded
5 messages Options
fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

An ambitious project? Write a regex analyser using parboiled...

fge

Hello,

Regexes are one subject that I master quite a lot, and I see the potential, using parboiled, to write a regex analyzer which would enable to see graphically the regex engine processing step by step.

For instance, using the simple regex "a" and the input "cat", I'd like to use parboiled to generate this trace:

#step 1
regex: |a
input: |cat
# No match, shift one character in the input
# step 2:
regex: |a
input: c|at
# step 3:
regex: a|
input: ca|t
# match

What would I need to do with parboiled to make it generate such a trace? Do you think it is doable?

The main advantage I see in using parboiled is that I can emulate _any_ regex engine, not only Java's...

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: An ambitious project? Write a regex analyser using parboiled...

mathias
Administrator
Francis,

what you are describing would require the implementation of a regex _engine_, not just a parser.
Parsing a regex source is not that hard, its writing the actual logic that does the regex matching that is hard.
Unfortunately parboiled will probably not be able to help you in this respect...

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 11.01.2012, at 16:37, fge [via parboiled users] wrote:

> Hello,
>
> Regexes are one subject that I master quite a lot, and I see the potential, using parboiled, to write a regex analyzer which would enable to see graphically the regex engine processing step by step.
>
> For instance, using the simple regex "a" and the input "cat", I'd like to use parboiled to generate this trace:
>
> #step 1
> regex: |a
> input: |cat
> # No match, shift one character in the input
> # step 2:
> regex: |a
> input: c|at
> # step 3:
> regex: a|
> input: ca|t
> # match
>
> What would I need to do with parboiled to make it generate such a trace? Do you think it is doable?
>
> The main advantage I see in using parboiled is that I can emulate _any_ regex engine, not only Java's...
>
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/An-ambitious-project-Write-a-regex-analyser-using-parboiled-tp3650927p3650927.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: An ambitious project? Write a regex analyser using parboiled...

fge
On Tue, Jan 24, 2012 at 11:17, mathias [via parboiled users]
<[hidden email]> wrote:
> Francis,
>
> what you are describing would require the implementation of a regex
> _engine_, not just a parser.
> Parsing a regex source is not that hard, its writing the actual logic that
> does the regex matching that is hard.
> Unfortunately parboiled will probably not be able to help you in this
> respect...
>

Well, a regex such as "a+", for instance, can be translated to a
parboiled rule: OneOrMore('a'); This is what I have in mind. Even
lookaround operators can be translated (with Test() and TestNot()). In
fact, I see no regex construct which parboilec annot emulate.

Following that, it is a (no small) matter of implementing:

* "stepping into" the input;
* backtracking (when allowed -- possessive quantifiers and atomic
groups don't backtrack);
* shifting.

I have the intuition it can be done.

--
Francis Galiegue, [hidden email]
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: An ambitious project? Write a regex analyser using parboiled...

David Pennell
Another alternative is to use parboiled to parse a regex grammar, produce an AST and then translate the AST into a specific regex grammar/engine and let the regex engine do the heavy lifting.  I suspect that it will be difficult to achieve comparable performance to existing engines.  You could use whatever regex dialect you like (or even define your own).
fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: An ambitious project? Write a regex analyser using parboiled...

fge
On Tue, Jan 24, 2012 at 20:00, David Pennell [via parboiled users]
<[hidden email]> wrote:
> Another alternative is to use parboiled to parse a regex grammar, produce an
> AST and then translate the AST into a specific regex grammar/engine and let
> the regex engine do the heavy lifting.  I suspect that it will be difficult
> to achieve comparable performance to existing engines.  You could use
> whatever regex dialect you like (or even define your own).
>

My goal is not performance: it is to emulate the regex engine
operation so that the user can see step-by-step operation of his regex
(for instance, the performance difference between <[^>]+> and <.*?>,
or why <.*> swallows up to the last >)

--
Francis Galiegue, [hidden email]
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)
Loading...