Javascript style regexp parsing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Javascript style regexp parsing

Adamansky Anton
Hello! I need to parse following simple construction:

    /regexp here/  

(As in ECMA script)

First intuitive approach:

Rule RELiteral() {
        return Sequence('/', ZeroOrMore(TestNot('/'), ANY), "/ ");              
}

But it does't allow '/' in the regexp clause, so i've traying to use escaped '/' as in example: /a\/b.*/

Rule RELiteral() {
        return Sequence('/', ZeroOrMore(TestNot(NoneOf("\\"), '/'), ANY), "/");
}
But parsing failed, with this error:
Input/RELiteral/'/', matched, cursor at 1:2 after "/"
..(1)../RELiteral/ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:3 after "/a"
..(5)../Sequence/'/', failed, cursor at 1:3 after "/a"
..(5)../Sequence, failed, cursor at 1:3 after "/a"
..(4)../TestNot, matched, cursor at 1:2 after "/"
..(3)../Sequence/ANY, matched, cursor at 1:3 after "/a"
..(3)../Sequence, matched, cursor at 1:3 after "/a"
..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], failed, cursor at 1:3 after "/a"
..(5)../Sequence, failed, cursor at 1:3 after "/a"
..(4)../TestNot, matched, cursor at 1:3 after "/a"
..(3)../Sequence/ANY, matched, cursor at 1:4 after "/a\"
..(3)../Sequence, matched, cursor at 1:4 after "/a\"
..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:5 after "/a\/"
..(5)../Sequence/'/', failed, cursor at 1:5 after "/a\/"
..(5)../Sequence, failed, cursor at 1:5 after "/a\/"
..(4)../TestNot, matched, cursor at 1:4 after "/a\"
..(3)../Sequence/ANY, matched, cursor at 1:5 after "/a\/"
..(3)../Sequence, matched, cursor at 1:5 after "/a\/"
..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:6 after "/a\/b"
..(5)../Sequence/'/', failed, cursor at 1:6 after "/a\/b"
..(5)../Sequence, failed, cursor at 1:6 after "/a\/b"
..(4)../TestNot, matched, cursor at 1:5 after "/a\/"
..(3)../Sequence/ANY, matched, cursor at 1:6 after "/a\/b"
..(3)../Sequence, matched, cursor at 1:6 after "/a\/b"
..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:7 after "/a\/b."
..(5)../Sequence/'/', failed, cursor at 1:7 after "/a\/b."
..(5)../Sequence, failed, cursor at 1:7 after "/a\/b."
..(4)../TestNot, matched, cursor at 1:6 after "/a\/b"
..(3)../Sequence/ANY, matched, cursor at 1:7 after "/a\/b."
..(3)../Sequence, matched, cursor at 1:7 after "/a\/b."
..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:8 after "/a\/b.*"
..(5)../Sequence/'/', matched, cursor at 1:9 after "/a\/b.*/"
..(5)../Sequence, matched, cursor at 1:9 after "/a\/b.*/"
..(4)../TestNot, failed, cursor at 1:9 after "/a\/b.*/"
..(3)../Sequence, failed, cursor at 1:7 after "/a\/b."
..(2)../ZeroOrMore, matched, cursor at 1:7 after "/a\/b."
..(1)../RELiteral/'/', failed, cursor at 1:7 after "/a\/b."
..(1)../RELiteral, failed, cursor at 1:7 after "/a\/b."
Input, failed, cursor at 1:1 after ""
org.parboiled.common.ConsoleSink@8ce0ea

Parse Errors:
Invalid input '*', expected '/' (line 1, pos 7):
/a\/b.*/
      ^

What I'm Missing?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Javascript style regexp parsing

mathias
Administrator
Anton,

if you are trying to parse regular expressions you should probably look for a suitable grammar and then implement it with parboiled,
e.g. this one here: http://www.cs.sfu.ca/~cameron/Teaching/384/99-3/regexp-plg.html

When translating a grammar written in BNF you need to pay attentation to the differences between CFGs and PEGs (most importantly odered choice), which could require you to adapt the grammar somewhat.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 07.03.2012, at 05:22, Adamansky Anton [via parboiled users] wrote:

> Hello! I need to parse following simple construction:
>
>     /regexp here/  
>
> (As in ECMA script)
>
> First intuitive approach:
>
> Rule RELiteral() {
>         return Sequence('/', ZeroOrMore(TestNot('/'), ANY), "/ ");              
> }
>
> But it does't allow '/' in the regexp clause, so i've traying to use escaped '/' as in example: /a\/b.*/
>
> Rule RELiteral() {
>         return Sequence('/', ZeroOrMore(TestNot(NoneOf("\\"), '/'), ANY), "/");
> }
> But parsing failed, with this error:
> Input/RELiteral/'/', matched, cursor at 1:2 after "/"
> ..(1)../RELiteral/ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:3 after "/a"
> ..(5)../Sequence/'/', failed, cursor at 1:3 after "/a"
> ..(5)../Sequence, failed, cursor at 1:3 after "/a"
> ..(4)../TestNot, matched, cursor at 1:2 after "/"
> ..(3)../Sequence/ANY, matched, cursor at 1:3 after "/a"
> ..(3)../Sequence, matched, cursor at 1:3 after "/a"
> ..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], failed, cursor at 1:3 after "/a"
> ..(5)../Sequence, failed, cursor at 1:3 after "/a"
> ..(4)../TestNot, matched, cursor at 1:3 after "/a"
> ..(3)../Sequence/ANY, matched, cursor at 1:4 after "/a\"
> ..(3)../Sequence, matched, cursor at 1:4 after "/a\"
> ..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:5 after "/a\/"
> ..(5)../Sequence/'/', failed, cursor at 1:5 after "/a\/"
> ..(5)../Sequence, failed, cursor at 1:5 after "/a\/"
> ..(4)../TestNot, matched, cursor at 1:4 after "/a\"
> ..(3)../Sequence/ANY, matched, cursor at 1:5 after "/a\/"
> ..(3)../Sequence, matched, cursor at 1:5 after "/a\/"
> ..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:6 after "/a\/b"
> ..(5)../Sequence/'/', failed, cursor at 1:6 after "/a\/b"
> ..(5)../Sequence, failed, cursor at 1:6 after "/a\/b"
> ..(4)../TestNot, matched, cursor at 1:5 after "/a\/"
> ..(3)../Sequence/ANY, matched, cursor at 1:6 after "/a\/b"
> ..(3)../Sequence, matched, cursor at 1:6 after "/a\/b"
> ..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:7 after "/a\/b."
> ..(5)../Sequence/'/', failed, cursor at 1:7 after "/a\/b."
> ..(5)../Sequence, failed, cursor at 1:7 after "/a\/b."
> ..(4)../TestNot, matched, cursor at 1:6 after "/a\/b"
> ..(3)../Sequence/ANY, matched, cursor at 1:7 after "/a\/b."
> ..(3)../Sequence, matched, cursor at 1:7 after "/a\/b."
> ..(2)../ZeroOrMore/Sequence/TestNot/Sequence/![\EOI], matched, cursor at 1:8 after "/a\/b.*"
> ..(5)../Sequence/'/', matched, cursor at 1:9 after "/a\/b.*/"
> ..(5)../Sequence, matched, cursor at 1:9 after "/a\/b.*/"
> ..(4)../TestNot, failed, cursor at 1:9 after "/a\/b.*/"
> ..(3)../Sequence, failed, cursor at 1:7 after "/a\/b."
> ..(2)../ZeroOrMore, matched, cursor at 1:7 after "/a\/b."
> ..(1)../RELiteral/'/', failed, cursor at 1:7 after "/a\/b."
> ..(1)../RELiteral, failed, cursor at 1:7 after "/a\/b."
> Input, failed, cursor at 1:1 after ""
> org.parboiled.common.ConsoleSink@8ce0ea
>
> Parse Errors:
> Invalid input '*', expected '/' (line 1, pos 7):
> /a\/b.*/
>       ^
>
> What I'm Missing?
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Javascript-style-regexp-parsing-tp3805855p3805855.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Loading...