Quantcast

Interpreting Escape characters in String Literals

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Interpreting Escape characters in String Literals

chrisl
In the JsonParser example, there are rules for interpreting a string literal and creating a StringNode for the AST:

  def JsonString: Rule1[StringNode] = rule {
    "\"" ~ zeroOrMore(Character) ~> StringNode ~ "\" "
  }
  def Character: Rule0 = rule { EscapedChar | NormalChar }
  def EscapedChar: Rule0 = rule { "\\" ~ (anyOf("\"\\/bfnrt") | Unicode) }
  def NormalChar: Rule0 = rule { !anyOf("\"\\") ~ ANY }
  def Unicode: Rule0 = rule { "u" ~ HexDigit ~ HexDigit ~ HexDigit ~ HexDigit }
  def HexDigit: Rule0 = rule { "0" - "9" | "a" - "f" | "A" - "Z" }

This correctly identifies and parses escape characters such as \n, \t, etc.

However, the StringNode in the AST contains the exact string of characters as present in the source. I think this is actually incorrect - the AST should contain the parsed representation of the string. This is why I'm using a parser :)

The question is: what's the best way to obtain the unescaped string?

One option would be to do the unescaping after parsing, ie. "~> (s => StringNode(unescape(s)))". Yet this seems like a rather strange thing to do, given it means parsing the same string again, and also I don't know of an "unescape" function I could use (any suggestions?).

Thoughts?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interpreting Escape characters in String Literals

chrisl
No love? Has nobody done this?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interpreting Escape characters in String Literals

tsuckow
Not done it, but you could theoretically do something like (untested):

def JsonString: Rule1[StringNode] = rule {
    "\"" ~ zeroOrMore(Character) ~~> {val sb = new StringBuilder;
_.addString(sb); sb.toString} ~~> StringNode ~ "\" "
  }

def EscapedChar = rule { "\\" ~ (EscapedCharB | EscapedCharF |
EscapedCharN | EscapedCharR | EscapedCharT | Unicode) }
def EscapedCharN = rule { "n" ~> push("\n") }
def NormalChar = rule { !anyOf("\"\\") ~ ANY ~> push(_) }
Thomas Suckow


On Mon, Mar 25, 2013 at 8:04 PM, chrisl [via parboiled users]
<[hidden email]> wrote:

> No love? Has nobody done this?
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://users.parboiled.org/Interpreting-Escape-characters-in-String-Literals-tp4024152p4024162.html
> To start a new topic under parboiled users, email
> [hidden email]
> To unsubscribe from parboiled users, click here.
> NAML
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interpreting Escape characters in String Literals

mathias
Administrator
In reply to this post by chrisl
Chris,

sorry, I somehow missed your first post.

The JsonParser example is indeed a bit simplified in that regard.
You might want to check out the actual parser we are using in spray-json:
https://github.com/spray/spray-json/blob/master/src/main/scala/spray/json/JsonParser.scala

In order to achieve better performance it pushes the matched characters in to a StringBuilder instance that was previously put on the value stack.

Cheers,
Mathias

---
[hidden email]
http://www.parboiled.org

On 26.03.2013, at 04:04, "chrisl [via parboiled users]" <[hidden email]> wrote:

>
>
> No love? Has nobody done this?
>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://users.parboiled.org/Interpreting-Escape-characters-in-String-Literals-tp4024152p4024162.html
> To start a new topic under parboiled users, email [hidden email]
> To unsubscribe from parboiled users, visit
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Interpreting Escape characters in String Literals

chrisl
Thanks Mathias, that's exactly what I was looking for.

Cheers,
Chris
Loading...