Wednesday, February 11, 2009

Block Quotes & CDATA

A lot of what I'm adding to xmlsh lately is to support the xproc project.
Xproc places some very unique challenges which is good to expose weaknesses in xmlsh,
and weaknesses in my brain :)

One problem is quoteing (see "Quoting is Hard!" post).

In order to implement passing arguments to xproc steps I have to be very careful with quotes in the code generation as the strings have to be passed unchanged through xmlsh onto the underlying commands or sub-commands such as xpath expression.

One problem with XML and XPATH (and by inference xproc) is the frequent use of both types of quotes, single and double, interchangable.  The unix shells have very unique interpretation of quoting and quoting quotes which is largely incompatable.  Say I need to pass the "foo" unchanged, then I need to quote it as  '"foo"'  .. but If I need to pass 'foo' unchanged it needs to get quoted as  "'foo'".  And if I need to pass through '$foo' I have to use something more complex like  "'"'$foo'"'" ...  mortal brains were not intended for this !!!

So I thought "how does XML handle this".. The closest thing is CDATA sections ... which prety well solve the problem ... but CDATA is one hell of a verbose syntax.  The above would be 
<![CDATA['$foo']]>

UG ... is there something simplier ?  How about XML Comments ?   <!--'$foo'-->
This is certianly pretier ... but its symantically misleading.  XML Comments are supposed to be comments, not quotes.  So if xmlsh used XML Comment syntax as quotes it would be very misleading.

So the solution ? I made something up of course !!! I'm not fully commited to this syntax yet, but it passes some basic tests.  It needs to be a multi-string comment syntax, needs to be a string not common in either shell or xml expressions, and not "too ugly", and also be somewhat similar to other shell expressions.   

I first tried <{ block quote }> then discovered (duh !) it conflicts with the port syntax  <{port}

I went with
   <{{block quote}}>

the '$foo' example is expressed as
   <{{'$foo'}}>

This seems not too bad and wasnt a huge difficulty getting into the grammer.




3 comments:

  1. Perl (hey I cited them before!) got around this by just increasing the number of different available quote operators.

    q('$foo')

    or, if your expression already has parentheses...

    q/'$foo' (it is neat!)/

    I find the process by which syntax is developed to be somewhat interesting. How many were born out of "it's not too ugly"?

    Where do you draw the line? Or rather, what is driving this particular feature? It seems like you're getting to the level of complexity where one might choose instead to fire up Visual Studio/IntelliJ/$IDE/vi and put together a small application to more clearly perform the task.

    ReplyDelete
  2. The problem with most shell language quoting (perl included) is that the quote operators were designed assuming a human is typing the expressions. TO quote you:

    "if your expression already has parentheses..."

    This is fairly easy to know if your a human typing the expression.

    In my case I'm writing an xproc to xmlsh *converter* which has to produce valid xmlsh scripts reguardless of the input. I dont have the luxury of knowing as I write the code what the value is going to be I need to quote.

    I tried this, and it does work but it gives me the willies

    "'" + string.replace("\'" , "'\"'\"''" ) + "'" ;

    ( replace each single quote with single-double-single-double-single )

    thats what you got to do in shell !

    It reads very bad ... say "'foo'" becomes '"'"'"'foo'"'"'"

    I'm personally much happier with
    <{{"'foo'"}}>

    Where do I draw the line ? When I'm tired of it ! :)

    In this case, Its largely drivin by trying to write an implementation for xproc that converts to xmlsh. I find this excersize very useful. Even though I dont personaly find xproc very useful (its way too verbose to do the simpliest things ), I find the process of attempting to implement it useful in that it pokes into areas of xmlsh which I had not thought of before. In theory, the feature set of xproc is a vast subset of xmlsh, however actually proving that requires that I can demonstrate that I can translate any xproc document into an xmlsh script, but not visa-versa.

    As for "fire up Visual Studio" ... well if you've ever tried to do real XML processing in VS or any other language you will find that its a lot harder then you think. I find code bloat factor for the equivilent thing in xmlsh vs say java or C# is about 100:1 or more. (sometimes 1000:1).

    I'm trying to push the 'brick wall' of xmlsh such that I dont hit the point where I need to use another language to do xml processing for "most common use cases".

    Its also an experiment. How complicated does xmlsh need to be to achieve this ?

    ReplyDelete
  3. From an implementation point of view, syntax, in particuarly quoting and backslashing etc, are complicated by the parser. I'm using "javacc" and some types of syntax is very difficult to do. For example, the $(command) is very hard to parse at the tokenizer level where its needed because I have to handle any arbitrary command, including nested commands that contain () such as say
    echo $(echo $(echo \)))

    Since at the tokenizer level where this is needed I dont have access to the parser it has to be done with string manipulation. Simple commands make it through this but some complex commands may make it through with the quoting screwed up. a simple example, the new block quotes are recognized properly so this fails:
    echo $( echo <{{foo)}}> )


    I'm working on the ksh <(command) syntax now.
    To implement that, I'm making it a full blown part of the parser syntax instead of part of the tokenizer. If that works well I may rework the $() and $<() to be parsed at the parser instead of tokenizer level.

    For details about what I'm talking about, see ShellParser.jj

    ReplyDelete

Due to comment spam, moderation is turned on. I will approve all non-spam comments.