Quantcast

Writing a Lexer for generic programming languages

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Writing a Lexer for generic programming languages

ftomassetti
Hi,
for a research project I would like to build a tokenizer/lexer that simply produce a list of tokens from a string containing the code.

I am thinking to use parboiled. I am trying to build a parser which basically take a sequence of any kind of token. I tried something like that:

        Rule ListOfTokens(){
            return Sequence(OneOrMore(Anything()), EOI);
        }

        Rule Anything(){
            return FirstOf(Spacing(),IntegerLiteral(),Keyword(),Identifier());
        }

Unfortunately I get this error when trying to parse a simple text (with just an identifier):

RES Unexpected end of input, expected LetterOrDigit or Spacing (line 1, pos 5)

Could someone help me to make it work or suggest better ways to do that?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Writing a Lexer for generic programming languages

mathias
Administrator
The rules you show appear to be ok.
I guess the problem is in your `Identifier` rule.

Can you show a complete example with all relevant rules?

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 26 Feb 2014, at 14:06, ftomassetti [via parboiled users] <[hidden email]> wrote:

> Hi,
> for a research project I would like to build a tokenizer/lexer that simply produce a list of tokens from a string containing the code.
>
> I am thinking to use parboiled. I am trying to build a parser which basically take a sequence of any kind of token. I tried something like that:
>
>         Rule ListOfTokens(){
>             return Sequence(OneOrMore(Anything()), EOI);
>         }
>
>         Rule Anything(){
>             return FirstOf(Spacing(),IntegerLiteral(),Keyword(),Identifier());
>         }
>
> Unfortunately I get this error when trying to parse a simple text (with just an identifier):
>
> RES Unexpected end of input, expected LetterOrDigit or Spacing (line 1, pos 5)
>
> Could someone help me to make it work or suggest better ways to do that?
>
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Writing-a-Lexer-for-generic-programming-languages-tp4024275.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Writing a Lexer for generic programming languages

ftomassetti
You are absolutely right: there was a spacing rule at the of identifier, removing it the parser now... parse :D
Loading...