Ignore incomplete patterns ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Ignore incomplete patterns ?

newbie
This post was updated on .
Hi,

So I wanna use parboiled to find specific pattern in files that contain articles and alike.
Basically I wanna find citations. Luckily those patterns are somewhat easy to translate into rules but I'm having trouble getting not only the patterns right but making them work with all the other stuff in between i.e. normal text.

So to my question:

How do I tell parboiled to ignore certain parts of patterns if they appear "alone" i.e. without the other parts ?
For example a pattern - Rule a - might consist of 3 Rules b, c and d. I want parboiled to only match a but ignore b, c and d if they appear alone.
You see those other patterns can often appear in the text but i'm only really interested if the entire pattern appears i.e. Rule a.

Does that make any sense ?

Basically i want parboiled to match my patterns no matter in which context they appear. Which is why I've built on "mail" Rule that uses ANY at the first and last position of the Sequence simply because my pattern could be preceded or followed by pretty much anything and only if the pattern is complete it's not supposed to be ignore.
fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ignore incomplete patterns ?

fge
newbie wrote
Does that make any sense ?
Not really, in fact, sorry...

Can you give some sample inputs and the expected results?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ignore incomplete patterns ?

newbie
I'll try.

Sample text:

"This is just some lame text that i made up § 5 HUHU to give an example. It's quite fitting even if it doesn't make any sense since the actuall files also contain normal Art. 10 HAHA text. "

What I wanna match is § 5 HUHU and Art. 10 HAHA. In the end I actually wanna marke them in the text. It's xml so i want to wrap them into an element.

Expected output:

"This is just some lame text that i made up <match>§ 5 HUHU</match> to give an example. It's quite fitting even if it doesn't make any sense since the actuall files also contain normal <match>Art. 10</match> HAHA text. "

Here's the "main" rule:

Rule Citation() {
        return Sequence(Sequence(OneOrMore(ZeroOrMore(ANY),
        Citation1(),
        ZeroOrMore(ANY)), EOI),
        new Action() {
                   public boolean run(Context context) {
                   push("bla");
                   return true;
                   }
                        });
        }

Sry for not knowing how to format code in this forum.

my problem is that ANY can also match all the characters that make up Citation1() and it's subrules.
I'm still trying to get the matching to work which is why I'm simply pushing "bla".
fge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ignore incomplete patterns ?

fge
newbie wrote
I'll try.

Sample text:

"This is just some lame text that i made up § 5 HUHU to give an example. It's quite fitting even if it doesn't make any sense since the actuall files also contain normal Art. 10 HAHA text. "

What I wanna match is § 5 HUHU and Art. 10 HAHA. In the end I actually wanna marke them in the text. It's xml so i want to wrap them into an element.

Expected output:

"This is just some lame text that i made up <match>§ 5 HUHU</match> to give an example. It's quite fitting even if it doesn't make any sense since the actuall files also contain normal <match>Art. 10</match> HAHA text. "

Here's the "main" rule:

Rule Citation() {
                return Sequence(OneOrMore(ZeroOrMore(ANY),
        Citation1(),
        ZeroOrMore(ZeroOrMore(ANY))), EOI,
        new Action() {
            public boolean run(Context context) {
            push("bla");
            return true;
            }
                        });
        }

Sry for not knowing how to format code in this forum.

my problem is that ANY can also match all the characters that make up Citation1() and it's subrules.
I'm still trying to get the matching to work which is why I'm simply pushing "bla".
OK, please note that the following is written usinggrappa but that's pretty much the same:

<pre>
Rule citation()
{
    return sequence(
        firstOf("Art.", '§'),
        oneOrMore(wsp()),
        oneOrMore(digit()),
        oneOrMore(wsp()),
        oneOrMore(charRange('A', 'Z'))
    );
}

Rule normalTextChar()
{
    return firstOf(
        noneOf("A§"),
        sequence('A', testNot("rt."))
    )
}

Rule all()
{
    return join(oneOrMore(normalTextChar()))
       .using(sequence(citation(), push(match())))
       .min(1);
}
</pre>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ignore incomplete patterns ?

newbie
Thank you. I've actually managed to get it working. After hours of becoming frustrated i get it working just after i asked on a forum...typical :)

Here's the modified rule:

Rule Citation() {
                return Sequence(
                                        Sequence(
                                                        ZeroOrMore(TestNot(Citation1()), ANY),
                                                        OneOrMore(Citation1()),
                                                        ZeroOrMore(TestNot(Citation1()), ANY)
                                                        ),
                                                        new Action() {
                                                                public boolean run(Context context) {
                                                                        push("bla");
                                                                        return true;
                                                                }
                                                        });
        }
Loading...