Friday, November 27, 2009

Looking for a "Map" or "tie" syntax

I'm trying really hard, but failing, to NOT add more syntax to xmlsh.
The problem I'm trying to solve is the clumbsiness of accessing properties, or name/value pairs,
in particular when serialized to a kind of properties file.

With the advent of the xproperties command, you can now read a standard Java properties file (in either text or xml form) and assign it to a variable.

Suppose you have a simple properties file

a=b

This parses into XML (using the standard Java Properties API) as


<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd" []>
<properties version="1.0">
<entry key="a">b</entry>
</properties>



Read into a variable as
props=$<(xproperties -in file.properties)

You can access the value of key "a" with the expression

<[ $props/properties/entry[key="a"]/string() ]>

Am I the only one who finds this extremely cumbersome ?

I would like to do something like one of these
$props.a
$props[a]
$props["a"]

or even
${props:"a"}


But any of these syntax sugars requires the shell "know" the structure of $props and do something magic with it. I really dont like the idea of extending the core syntax to handle a particular schema, even one as common as Java Properties. Suppose for example I wanted Map instead so I could put non-string values ... Properties wont cut it.

MarkLogic has a xquery function for that (map:map()) and various map:* methods to get at the values. I'd like something like this but built into the shell and expandable to arbitrary XML schemas. What to do ?

The latest thought I had was to borrow somethign from Perl (gasp). The tie funciton. "Bind" a variable to an expression so that you could define your own shortcuts with map or array-like syntax.

Suppose for example I could do something like


xbind props '<[ //entry[key=$key]/string() ]>'

then magically make
$props["a"] or maybe ${props:a} or ${props[a] } invoke the bound expression and produce "b"


You could then use this mechanism to create array or map-like notation out of ANY schema.

Is this worth the effort ? When does adding syntax actually start to deter from a language instead of add to it ?





Thursday, November 26, 2009

New release of xmlsh and marklogic

Yesterday I released a new release of xmlsh and marklogic extensions.
Most notable new feature in xmlsh is the prevelence of serialization options to almost all commands. For example now xcat,xquery,xslt etc can change the output method for its invocation without changing it globaly, example

xcat -output-method html


I have spent many laborous hours updating the xmlsh documentation wiki (www.xmlsh.org) to reflect these updates as well as to try to standardize on a markup for options. My previous attempts at formatting options was horrid, now options are in a tabular format.

Improvements to MarkLogic extension are coming slowly as I am using MarkLogic more myself so find useful new features which I want to become commands. Notable is

* option to ml:list to list only the contents of a particular directory
* new ml:listdir to list directories ( invisible to ml:list)
* new ml:deldir to delete directories


Friday, November 20, 2009

xslt1.0 and servlets

Two major accomplishments this week.

I've had to start processing "SPL" Files from the FDA. Turns out the XSLT the FDA publishes is (how to say this nicely) .. "Not the highest quality".
It wont process at all with saxon 9 due to errors. The errors exist in saxon 6 but are warnings so they were ignored. There were so many that I couldnt easily fix the XSL file so I needed to process it with saxon 6. I was able to implement an "xslt1" command which uses the saxon6 jar in the same JVM as saxon 9 ! Quite a feat ... so now xmlsh has xslt 1.0 and 2.0 commands built in.

In addition I needed to run XSLT from MarkLogic. Since its not natively supported the suggested workaround is using a servlet and http-post the xml to the servelet. Xmlsh to the rescue ! I wrote a simple Tomcat servlet for xmlsh. There were some weird problems with it taking over stdin/stdout so I had to imrove on the assumption about taking over System.in and System.out in a container environment. Also had to buffer the input and output or else strange things happened occasionally depending on the document size, but now its working !
Expect to see a xmlsh servlet, either as a seperate package or builtin to the core distro, I havent decided yet.



Saturday, November 14, 2009

Embedded commands or roll your own ?

Some commands benifit from being integrated into xmlsh, even if they are 'fairly easy' to do *with* xmlsh but not embedded. An example is schematron. Schematron is easily implemented as simple 4 line xmlsh script. But the fact is you have to figure that out. You need to download the schematron xsl files, and figure out how to call them, in what order and how to pass the temporary files around. It took me an hour to figure it it. For that reason I included schematron as a single command in xmlsh even though its not 'magic', it makes life easier.

But what about something thats truely trivial to do without it being "embedded". An example, a user asked for a "html to xml" command using something like tidy or tagsoup.
I just downloaded tagsoup.jar and discovered that it runs perfectly 'out of the box' with a single xmsh command. Assuming you have the jar file downloaded, this command runs tagsoup

jcall -cp tagsoup-1.2.jar org.ccil.cowan.tagsoup.CommandLine file

You (or me) could wrap this into a 1 line script "tagsoup" that does this.
Is it worth embedding this into xmlsh? The advantage is that you dont have to find this jar file, and put it somewhere and reference it. The disadvantage is that *I* have to include this jar file in the distribution just for 1 command, which however useful, is 'just another java call'.

Does this warrent 'first class citizenship' ? Where do I stop ? I could pass the buck and make it an 'extension module' but in fact thats about as hard on the user as just getting this jar file.

Plus there's documentation. When I embed a command I need to document it. That means copying the docs from tagsoup (or referencing them).

Is this good or bad ?

I want xmlsh to include all the necessary, and even useful, tools for common xml processing. On the otherhand I dont want it to be the entire universe of software in one bundle.

How to draw the line ? Inquiring minds want to know !


xmlsh 0.1.0.6 released with xproperties

Thanks to a suggestion by a xmlsh user, I implemented xproperties today and releaesd a new version of xmlsh with this new command and some bug fixes.

Also released an update to the MarkLogic extension with support for rename. Now that I'm experimenting more with MarkLogic, expect more improvements in this extension module.