Wednesday, February 11, 2009

Block Quotes & CDATA

A lot of what I'm adding to xmlsh lately is to support the xproc project.
Xproc places some very unique challenges which is good to expose weaknesses in xmlsh,
and weaknesses in my brain :)

One problem is quoteing (see "Quoting is Hard!" post).

In order to implement passing arguments to xproc steps I have to be very careful with quotes in the code generation as the strings have to be passed unchanged through xmlsh onto the underlying commands or sub-commands such as xpath expression.

One problem with XML and XPATH (and by inference xproc) is the frequent use of both types of quotes, single and double, interchangable.  The unix shells have very unique interpretation of quoting and quoting quotes which is largely incompatable.  Say I need to pass the "foo" unchanged, then I need to quote it as  '"foo"'  .. but If I need to pass 'foo' unchanged it needs to get quoted as  "'foo'".  And if I need to pass through '$foo' I have to use something more complex like  "'"'$foo'"'" ...  mortal brains were not intended for this !!!

So I thought "how does XML handle this".. The closest thing is CDATA sections ... which prety well solve the problem ... but CDATA is one hell of a verbose syntax.  The above would be 

UG ... is there something simplier ?  How about XML Comments ?   <!--'$foo'-->
This is certianly pretier ... but its symantically misleading.  XML Comments are supposed to be comments, not quotes.  So if xmlsh used XML Comment syntax as quotes it would be very misleading.

So the solution ? I made something up of course !!! I'm not fully commited to this syntax yet, but it passes some basic tests.  It needs to be a multi-string comment syntax, needs to be a string not common in either shell or xml expressions, and not "too ugly", and also be somewhat similar to other shell expressions.   

I first tried <{ block quote }> then discovered (duh !) it conflicts with the port syntax  <{port}

I went with
   <{{block quote}}>

the '$foo' example is expressed as

This seems not too bad and wasnt a huge difficulty getting into the grammer.