ReportingParseRunner or RecoveringParseRunner for Key-value parsing

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ReportingParseRunner or RecoveringParseRunner for Key-value parsing

halloleo
Hi there

I want to build a parser for strings like:
[POSSIBLE-FREETEXT] KEY: VALUE-WITH-SPACES KEY: VALUE-WITH-SPACES ...
Colons appear exclusively to separate KEYs and values (i.e. VALUE-WITH-SPACES parts). values contain any character expect colons. KEYs don't contain spaces or colons.

I used the ReportingParseRunner, but this one tries to match as far as possible with the first VALUE-WITH-SPACES part and then get stuck in the second colon (because it has already attributed the letters before the colon to the VALUE-WITH-SPACES rule, without backtracking and trying to further match with another Key

Then I tried using a RecoveringParseRunner to utilise its backtracking, but this one "swallows" all colons after the the first one - because my VALUE-WITH-SPACES rule is defined as:
OneOrMore(NoneOf(":"))

So none of these approaches works right at the moment. What path is more sensible to pursue: Using the ReportingParseRunner or RecoveringParseRunner?

Many thanks,
Leo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

mathias
Administrator
The different parse runners don't really change the way your grammar is applied to the input.
(With the exception of the RecoveringParseRunner which will continue parsing even after a parsing error.)

I suspect your grammar rules are faulty (i.e. don't properly encode what you are trying to do).

Can you show us your grammar?

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

> On 15.9.2016, at 06:50, halloleo [via parboiled users] <[hidden email]> wrote:
>
> Hi there
>
> I want to build a parser for strings like:
> [POSSIBLE-FREETEXT] KEY: VALUE-WITH-SPACES KEY: VALUE-WITH-SPACES ...
>
> Colons appear exclusively to separate KEYs and values (i.e. VALUE-WITH-SPACES parts). values contain any character expect colons. KEYs don't contain spaces or colons.
>
> I used the ReportingParseRunner, but this one tries to match as far as possible with the first VALUE-WITH-SPACES part and then get stuck in the second colon (because it has already attributed the letters before the colon to the VALUE-WITH-SPACES rule, without backtracking and trying to further match with another Key
>
> Then I tried using a RecoveringParseRunner to utilise its backtracking, but this one "swallows" all colons after the the first one - because my VALUE-WITH-SPACES rule is defined as:
> OneOrMore(NoneOf(":"))
>
> So none of these approaches works right at the moment. What path is more sensible to pursue: Using the ReportingParseRunner or RecoveringParseRunner?
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/ReportingParseRunner-or-RecoveringParseRunner-for-Key-value-parsing-tp4024445.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

halloleo
Thanks for your reply. I'm not too familar with writing a formal grammar, but here an example.

An input string could be:

"Some text at the beginning FirstName: John LastName: Smith Country: Flatland behind the Mountain"

This should yield a value stack I can process into:

* Freetext "Some text at the beginning"
* Key-value pair "FirstName": "John"
* Key-value pair "LastName": "Smith"
* Key-value pair "Country": "Flatland behind the Mountain"

And  the Rules I wrote to parse this:

public static final char FIELD_SEP = ':';

Rule Expression() {
	return Sequence(Optional(FreeText()), ZeroOrMore(FieldValuePair()), EOI);
}
	
Rule FieldValuePair() {
	return Sequence(FieldWithSep(), Value());
}

Rule FieldWithSep() {
	return Sequence(FieldGeneric(), FieldSep());
}

@SuppressSubnodes
Rule FieldGeneric() {
	return OneOrMore(FirstOf(CharRange('a', 'z'), CharRange('A', 'Z')));
}

Rule Value() {
	return OneOrMore(ValueChar());
}

@SuppressSubnodes
Rule FreeText() {
	return OneOrMore(FreeTextChar());
}

Rule ValueChar() {
	return NoneOf(String.valueOf(FIELD_SEP));
}

Rule FreeTextChar() {
	// Same as ValueChar!
	return NoneOf(String.valueOf(FIELD_SEP));
}

Rule FieldSep() {
	return Ch(FIELD_SEP);
}

Rule Space() {
	return Ch(' ');
}

The problem is that the first Value rule 'eats' into the second Field ("Lastname"), instead of leaving it to the next FieldValuePair rule.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

mathias
Administrator
There is only one real "problem" in your language (i.e. the rules that describe your input) that is a bit tricky to solve (but solvable nevertheless):

When starting to read your Freetext prefix the parser only knows when it sees the first colon how to classify the last word. If the word is followed by a colon the word is a KEY otherwise it simply belongs to the Freetext.

A simply solution would be to augment the Freetext with a negative syntactic predicate:

    Rule FreeText() {
        return Sequence(TestNot(FieldGeneric(), OneOrMore(FreeTextChar()));
    }

This results in somewhat less efficient parsing of the Freetext() but is probably acceptable.

> The problem is that the first Value rule 'eats' into the second Field ("Lastname"), instead of leaving it to the next FieldValuePair rule.

Yes. The reason is that your `ValueChar` rule also matches spaces. Is that really what you want?

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

> On 15.9.2016, at 10:37, halloleo [via parboiled users] <[hidden email]> wrote:
>
> Thanks for your reply. I'm not too familar with writing a formal grammar, but here an example.
>
> An input string could be:
>
> "Some text at the beginning FirstName: John LastName: Smith Country: Flatland behind the Mountain"
>
> This should yield a value stack I can process into:
>
> * Freetext "Some text at the beginning"
> * Key-value pair "FirstName": "John"
> * Key-value pair "LastName": "Smith"
> * Key-value pair "Country": "Flatland behind the Mountain"
>
> And  the Rules I wrote to parse this:
>
> public static final char FIELD_SEP = ':';
>
> Rule Expression() {
> return Sequence(Optional(FreeText()), ZeroOrMore(FieldValuePair()), EOI);
> }
>
> Rule FieldValuePair() {
> return Sequence(FieldWithSep(), Value());
> }
>
> Rule FieldWithSep() {
> return Sequence(FieldGeneric(), FieldSep());
> }
>
> @SuppressSubnodes
> Rule FieldGeneric() {
> return OneOrMore(FirstOf(CharRange('a', 'z'), CharRange('A', 'Z')));
> }
>
> Rule Value() {
> return OneOrMore(ValueChar());
> }
>
> @SuppressSubnodes
> Rule FreeText() {
> return OneOrMore(FreeTextChar());
> }
>
> Rule ValueChar() {
> return NoneOf(String.valueOf(FIELD_SEP));
> }
>
> Rule FreeTextChar() {
> // Same as ValueChar!
> return NoneOf(String.valueOf(FIELD_SEP));
> }
>
> Rule FieldSep() {
> return Ch(FIELD_SEP);
> }
>
> Rule Space() {
> return Ch(' ');
> }
>
>
> The problem is that the first Value rule 'eats' into the second Field ("Lastname"), instead of leaving it to the next FieldValuePair rule.
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/ReportingParseRunner-or-RecoveringParseRunner-for-Key-value-parsing-tp4024445p4024447.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

halloleo
I definitly want that values can contain spacesonly key do not contain space.

I reckon I need to utilise the forward-looking method via TestNot.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

halloleo
Sorry, I don't get it! I have simplified my grammar by getting rid of the FreeText part, so now I have only a sequence of Key-Value pairs  with the key containing no spaces (but the value can!).

When I run the input

FirstName: John LastName: Smith Country: Flatland behind the Mountain

through the following rules:

public static final char FIELD_SEP = ':';

Rule Expression() {
    return Sequence(ZeroOrMore(FieldValuePair()), EOI);
}
    
Rule FieldValuePair() {
    return Sequence(FieldWithSep(), Value());
}

Rule FieldWithSep() {
    return Sequence(FieldGeneric(), FieldSep());
}

@SuppressSubnodes
Rule FieldGeneric() {
    // A generic field - might be better replaced with keyword list if known
    return OneOrMore(FirstOf(CharRange('a', 'z'), CharRange('A', 'Z')));
}

Rule Value() {
    return Sequence(OneOrMore(ValueChar()), Test(FieldWithSep()));
}

Rule ValueChar() {
    return NoneOf(String.valueOf(FIELD_SEP));
}

Rule FieldSep() {
    return Ch(FIELD_SEP);
}

I get the error:

Invalid input ':', expected ValueChar or FieldWithSep (line 1, pos 25):
FirstName: John LastName: Smith Country: Flatland behind the Mountain
                        ^

Why? In the Value rule I do look ahead with Test(FieldWithSep())...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

mathias
Administrator
When the parser starts matching `John...` via your current `Value` rule it eats all characters up to the colon in col 25 and
finishes the `OneOrMore(ValueChar())` rule. Then it tests whether the subsequent input matches the `FieldWithSep()` rule, which it doesn't.
This failure causes the error you are getting.

I think what you want is this:

   Rule Value() {
     return OneOrMore(ValueChar());
   }

   Rule ValueChar() {
     return Sequence(TestNot(FieldWithSep()), NoneOf(String.valueOf(FIELD_SEP)));
   }

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

> On 16.9.2016, at 09:00, halloleo [via parboiled users] <[hidden email]> wrote:
>
> Sorry, I don't get it! I have simplified my grammar by getting rid of the FreeText part, so now I have only a sequence of Key-Value pairs  with the key containing no spaces (but the value can!).
>
> When I run the input
>
> FirstName: John LastName: Smith Country: Flatland behind the Mountain
>
>
> through the following rules:
>
> public static final char FIELD_SEP = ':';
>
> Rule Expression() {
>    return Sequence(ZeroOrMore(FieldValuePair()), EOI);
> }
>
> Rule FieldValuePair() {
>    return Sequence(FieldWithSep(), Value());
> }
>
> Rule FieldWithSep() {
>    return Sequence(FieldGeneric(), FieldSep());
> }
>
> @SuppressSubnodes
> Rule FieldGeneric() {
>    // A generic field - might be better replaced with keyword list if known
>    return OneOrMore(FirstOf(CharRange('a', 'z'), CharRange('A', 'Z')));
> }
>
> Rule Value() {
>    return Sequence(OneOrMore(ValueChar()), Test(FieldWithSep()));
> }
>
> Rule ValueChar() {
>    return NoneOf(String.valueOf(FIELD_SEP));
> }
>
> Rule FieldSep() {
>    return Ch(FIELD_SEP);
> }
>
>
> I get the errror:
>
> Invalid input ':', expected ValueChar or FieldWithSep (line 1, pos 25):
> FirstName: John LastName: Smith Country: Flatland behind the Mountain
>
>
> Why? In the Value rule I do look ahead with Test(FieldWithSep())...
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/ReportingParseRunner-or-RecoveringParseRunner-for-Key-value-parsing-tp4024445p4024450.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

halloleo
This post was updated on .
Thx Mathias. Of course: I have to look ahead for a key every char I reading into a value!

Can't test it out before Monday, but it makes sense.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ReportingParseRunner or RecoveringParseRunner for Key-value parsing

halloleo
In reply to this post by mathias
It works! Thanks again.

My only concern is: Testing at every character for a whole string ending with a colon - isn't that very inefficient?

(Not a big concern in my case: My texts are never longer than 50-100 characters.)
Loading...