Quantcast

Parsing a whitespace sensitive language like python

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Parsing a whitespace sensitive language like python

Gavin
Hi,

A stackoverflow answer asserts that PEGs cannot handle grammars for whitespace-sensitive languages like python ? Does this mean that parboiled will not be able to handle python, or haskell or even coffeescript grammars ?

Thanks

Gavin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a whitespace sensitive language like python

tsuckow
It takes some trickery but it can be done.
https://github.com/sirthias/parboiled/wiki/Indentation-Based-Grammars

Thomas Suckow



On Mon, Apr 11, 2011 at 5:20 AM, Gavin [via parboiled users]
<[hidden email]> wrote:

> Hi,
>
> A stackoverflow answer asserts that PEGs cannot handle grammars for
> whitespace-sensitive languages like python ? Does this mean that parboiled
> will not be able to handle python, or haskell or even coffeescript grammars
> ?
>
> Thanks
>
> Gavin
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://users.parboiled.org/Parsing-a-whitespace-sensitive-language-like-python-tp2806286p2806286.html
> To start a new topic under parboiled users, email
> [hidden email]
> To unsubscribe from parboiled users, click here.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a whitespace sensitive language like python

mathias
Administrator
Well, the answer is correct in the regard that PEG _alone_ cannot express indentation-based languages.
However, as Thomas rightly pointed out, parboiled can handle these cases very well.

We (the devs behind parboiled) are using it ourselves to parse a rather complex business DSL that uses indentation-based scoping.
The key is to preprocess the input and turn the line indentations into "Token" that can then be matched by the grammar.

This works very well in practice.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 11.04.2011, at 16:11, tsuckow [via parboiled users] wrote:

> It takes some trickery but it can be done.
> https://github.com/sirthias/parboiled/wiki/Indentation-Based-Grammars
>
> Thomas Suckow
>
>
>
> On Mon, Apr 11, 2011 at 5:20 AM, Gavin [via parboiled users]
> <[hidden email]> wrote:
>
> > Hi,
> >
> > A stackoverflow answer asserts that PEGs cannot handle grammars for
> > whitespace-sensitive languages like python ? Does this mean that parboiled
> > will not be able to handle python, or haskell or even coffeescript grammars
> > ?
> >
> > Thanks
> >
> > Gavin
> >
> > ________________________________
> > If you reply to this email, your message will be added to the discussion
> > below:
> > http://users.parboiled.org/Parsing-a-whitespace-sensitive-language-like-python-tp2806286p2806286.html
> > To start a new topic under parboiled users, email
> > [hidden email]
> > To unsubscribe from parboiled users, click here.
>
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-a-whitespace-sensitive-language-like-python-tp2806286p2806735.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a whitespace sensitive language like python

Freewind
I read that article, and have a question:

How to handle the "multiline comments" and "heredoc" and others which have multilines, and the indentation inside them should not be converted to "INDENT/DEDENT"?

e.g.

    def test
        val doc = """
This is
    a
  multiline
         heredoc
"""
        /* and
this
 is
  a
   multiline
 comment
*/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a whitespace sensitive language like python

mathias
Administrator
The IndentDedentBuffer allows you to specify a "lineCommentStart" as a constructor parameter.
If you use it then line comments are automatically taken care off.

AFAIK, multiline/block comments and heredocs are usually not something that is allowed in indentation based languages, as they create a lot of problems (not only for the parser but also for the user).

If you are designing an indentation-based language you are therefore probably better off to not allow them.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 19.07.2011, at 06:48, Freewind [via parboiled users] wrote:

> I read that article, and have a question:
>
> How to handle the "multiline comments" and "heredoc" and others which have multilines, and the indentation inside them should not be converted to "INDENT/DEDENT"?
>
> e.g.
>
>     def test
>         val doc = """
> This is
>     a
>   multiline
>          heredoc
> """
>         /* and
> this
>  is
>   a
>    multiline
>  comment
> */
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Parsing-a-whitespace-sensitive-language-like-python-tp2806286p3181440.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.

Loading...