Quantcast

Is this a bug? (EOI related)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Is this a bug? (EOI related)

sulfinu
Hallo, Mathias.

I have started to use Parboiled a few days ago (well, actually, I had the pleasure of discovering its uniqueness in the Java landscape some two weeks ago) and I was surprised to find out that Parboiled is NOT consuming the whole input before deciding it was matched.

This is, the input string is not entirely "explained" (broken recursively into pieces), but Parboiled considers it "explained" by the declared PEG. It doesn't seem right from where I stand, but even if you insist on it, this issue deserves a paragraph in the documentation.

Putting this aside, could you please take this class and run it?

public class EOIParser extends BaseParser
{
        Rule LINES()
        {
                return Sequence(
                                ZeroOrMore(SOME_LINE()),
                                EOI
                );
        }
       
        Rule SOME_LINE()
        {
                return Sequence(
                                ZeroOrMore(AnyOf("a ")),
                                FirstOf("\r\n", EOI)
                                );
        }
       
        public static void main(String[] args)
        {
                EOIParser parser = Parboiled.createParser(EOIParser.class);
                ParsingResult result = new TracingParseRunner(parser.LINES()).run("a a");

                if (! result.matched)
                        if (result.hasErrors())
                                throw new ParsingException(ErrorUtils.printParseErrors(result));
                        else throw new ParsingException();
        }
}

Infinite loop, with tracing messages that report more and more non-existent positions in the input string. I'm perfectly aware that EOI should be matched twice according to this grammar, but the behaviour looks like a bug to me.
What is happening? How is EOI actually treated? I first thought that it it is appended to the input string to serve as a sentinel.

Thanks a lot.
Auch, danke sehr schoen fuer Parboiled!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is this a bug? (EOI related)

sulfinu
This post was updated on .
The funny thing is that Parboiled is guarded against infinite loops because if I rewrite the grammar in my first message by approximately replacing EOI with TestNot(ANY), I'm greeted with a GrammarException upon the second match of the SOME_LINE rule.

But this grammar:

        Rule LINES()
        {
                return Sequence(
                                ZeroOrMore(SOME_LINE()),
                                LAST_LINE()
                );
        }
       
        Rule LAST_LINE()
        {
                return Sequence(
                                ZeroOrMore(AnyOf("a ")),
                                TestNot(ANY)  // this can be replaced with EOI for the same behaviour
                                );
        }

        Rule SOME_LINE()
        {
                return Sequence(
                                ZeroOrMore(AnyOf("a ")),
                                "\r\n"
                                );
        }

works.

The bug I unintentionally demonstrated seems caused by the treatment of EOI: Parboiled behaves as if an INFINTE number of EOI characters get automatically appended to any input, which fools the infinte loop guard since an EOI appears to be consumed endlessly...

Mathias, if you read this, a mere "acknowlegde" is all I'm waiting.
Cheers!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is this a bug? (EOI related)

mathias
Administrator
In reply to this post by sulfinu
> This is, the input string is not entirely "explained" (broken recursively into pieces), but Parboiled considers it "explained" by the declared PEG. It doesn't seem right from where I stand, but even if you insist on it, this issue deserves a paragraph in the documentation.

I'm not sure I fully get what you mean...
Are you referring to parboileds ability to only match on prefixes of input texts if the grammar does not explicitly contain an EOI match?
If so, then this is by design.

> ...
> Infinite loop, with tracing messages that report more and more non-existent positions in the input string. I'm perfectly aware that EOI should be matched twice according to this grammar, but the behaviour looks like a bug to me.
> What is happening? How is EOI actually treated? I first thought that it it is appended to the input string to serve as a sentinel.

The problem with your grammar is that it matches EOI in a loop (SOME_LINE matches single EOIs and is wrapped in a ZeroOrMore).
parboiled allows you to match EOI but will never read past it (since EOI is not a real character and is not actually present in the input). This mean that the character after EOI is another EOI (and so on).

You can easily fix your grammar by changing your SOME_LINE to:

Rule SOME_LINE() {
        return Sequence(
        ZeroOrMore(AnyOf("a ")),
                FirstOf("\r\n", Test(EOI))
        );
}

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 01.07.2011, at 19:25, sulfinu [via parboiled users] wrote:

> Hallo, Mathias.
>
> I have started to use Parboiled a few days ago (well, actually, I had the pleasure of discovering its uniqueness in the Java landscape some two weeks ago) and I was surprised to find out that Parboiled is NOT consuming the whole input before deciding it was matched.
>
> This is, the input string is not entirely "explained" (broken recursively into pieces), but Parboiled considers it "explained" by the declared PEG. It doesn't seem right from where I stand, but even if you insist on it, this issue deserves a paragraph in the documentation.
>
> Putting this aside, could you please take this class and run it?
>
> public class EOIParser extends BaseParser
> {
>         Rule LINES()
>         {
>                 return Sequence(
>                                 ZeroOrMore(SOME_LINE()),
>                                 EOI
>                 );
>         }
>        
>         Rule SOME_LINE()
>         {
>                 return Sequence(
>                                 ZeroOrMore(AnyOf("a ")),
>                                 FirstOf("\r\n", EOI)
>                                 );
>         }
>        
>         public static void main(String[] args)
>         {
>                 EOIParser parser = Parboiled.createParser(EOIParser.class);
>                 ParsingResult result = new TracingParseRunner(parser.LINES()).run("a a");
>
>                 if (! result.matched)
>                         if (result.hasErrors())
>                                 throw new ParsingException(ErrorUtils.printParseErrors(result));
>                         else throw new ParsingException();
>         }
> }
>
> Infinite loop, with tracing messages that report more and more non-existent positions in the input string. I'm perfectly aware that EOI should be matched twice according to this grammar, but the behaviour looks like a bug to me.
> What is happening? How is EOI actually treated? I first thought that it it is appended to the input string to serve as a sentinel.
>
> Thanks a lot.
> Auch, danke sehr schoen fuer Parboiled!
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-this-a-bug-EOI-related-tp3130068p3130068.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is this a bug? (EOI related)

mathias
Administrator
In reply to this post by sulfinu
> The funy thing is that Parboiled is guarded against infinite loops because if I rewrite the grammar in my first message by approximately replacing EOI with TestNot(ANY), I'm greeted with a GrammarException upon the second match of the SOME_LINE rule.

parboiled performs _some_ checks against illegal grammars, like the one you were seeing (where the inner rules of a ZeroOrMore or OneOrMore can match empty).
Other things like left recursion, earlier alternatives of FirstOf rules being prefixes of later ones and so on are (not) yet checked.

> The bug I unintentionally demonstrated seems caused by the treatment of EOI: Parboiled behaves as if an INFINTE number of EOI characters get automatically appended to any input, which fools the infinte loop guard since an EOI appears to be consumed endlessly...

Yes, that is exactly the behavior.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 04.07.2011, at 10:28, sulfinu [via parboiled users] wrote:

> The funy thing is that Parboiled is guarded against infinite loops because if I rewrite the grammar in my first message by approximately replacing EOI with TestNot(ANY), I'm greeted with a GrammarException upon the second match of the SOME_LINE rule.
>
> But this grammar:
>
>         Rule LINES()
>         {
>                 return Sequence(
>                                 ZeroOrMore(SOME_LINE()),
>                                 LAST_LINE()
>                 );
>         }
>        
>         Rule LAST_LINE()
>         {
>                 return Sequence(
>                                 ZeroOrMore(AnyOf("a ")),
>                                 TestNot(ANY)  // this can be replaced with EOI for the same behaviour
>                                 );
>         }
>
>         Rule SOME_LINE()
>         {
>                 return Sequence(
>                                 ZeroOrMore(AnyOf("a ")),
>                                 "\r\n"
>                                 );
>         }
>
> works.
>
> The bug I unintentionally demonstrated seems caused by the treatment of EOI: Parboiled behaves as if an INFINTE number of EOI characters get automatically appended to any input, which fools the infinte loop guard since an EOI appears to be consumed endlessly...
>
> Mathias, if you read this, a mere "acknowlegde" is all I'm waiting.
> Cheers!
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-this-a-bug-EOI-related-tp3130068p3136897.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is this a bug? (EOI related)

sulfinu
In reply to this post by mathias
>Are you referring to parboileds ability to only match on prefixes of input texts if the grammar does >not explicitly contain an EOI match?
>If so, then this is by design.

Yes, this is the contradiction I was referring to: Parboiled states an input is matching but actually only a prefix of the input string does indeed satisfy the declared grammar. Surely, there is a workaround, either include in the parsing expressions "EOI" or "TestNot(ANY)", or verify that the matched string length equals the input length.
But my point is that you should state this design decision in the documentation (for newcomers).

>You can easily fix your grammar by changing your SOME_LINE to:
>
>Rule SOME_LINE() {
>        return Sequence(
>        ZeroOrMore(AnyOf("a ")),
>                FirstOf("\r\n", Test(EOI))
>        );
>}

No, this would merely transform the infinite loop in a GrammarException, as not consuming the EOI wil not by-pass anymore the infinte loop guard. I have even tested it (on "a a\r\n").

Thanks for the answers!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is this a bug? (EOI related)

mathias
Administrator
> Yes, this is the contradiction I was referring to: Parboiled states an input is matching but actuallly only a prefix of the input string does indeed satisfy the declared grammar. Surely, there is a workaround, either include in the parsing expressions "EOI" or "TestNot(ANY)", or verify that the matched string length equals the input length.
> But my point is that you should state this design decision in the documentation (for newcomers).

Yes, an explicit hint towards this behaviour is probably a good idea.

> No, this would merely transform the infinite loop in a GrammarException, as not consuming the EOI wil not by-pass anymore the infinte loop guard. I have even tested it (on "a a\r\n").

Yes, true.
The right fix would have to be this:

Rule SOME_LINE() {
        return Sequence(
                TestNot(EOI),
                ZeroOrMore(AnyOf("a ")),
                FirstOf("\r\n", EOI)
        );
}

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 04.07.2011, at 12:14, sulfinu [via parboiled users] wrote:

> >Are you referring to parboileds ability to only match on prefixes of input texts if the grammar does >not explicitly contain an EOI match?
> >If so, then this is by design.
>
> Yes, this is the contradiction I was referring to: Parboiled states an input is matching but actuallly only a prefix of the input string does indeed satisfy the declared grammar. Surely, there is a workaround, either include in the parsing expressions "EOI" or "TestNot(ANY)", or verify that the matched string length equals the input length.
> But my point is that you should state this design decision in the documentation (for newcomers).
>
> >You can easily fix your grammar by changing your SOME_LINE to:
> >
> >Rule SOME_LINE() {
> >        return Sequence(
> >        ZeroOrMore(AnyOf("a ")),
> >                FirstOf("\r\n", Test(EOI))
> >        );
> >}
>
> No, this would merely transform the infinite loop in a GrammarException, as not consuming the EOI wil not by-pass anymore the infinte loop guard. I have even tested it (on "a a\r\n").
>
> Thanks for the answers!
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Is-this-a-bug-EOI-related-tp3130068p3137118.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Loading...