Tuesday, January 20, 2009

Quoting is hard !

After all these years and I never really appreciated how hard quoting is. I dont mean the simple echo "foo bar" quoting, but all the nuances of blackslash, single quote, variable expansion, wild card expansion etc. I'm getting close in xmlsh but not quite there. I think to go the last 10% I'm going to have to rewrite the entire word expansion module. A complexity is that backslash quotes have to be recognized, and sometimes stripped out, but their effect is longlasting.

Consider this.

echo \*

Seemingly simple ... but when this is split up into the many types of expansion, wild card expansion actually occurs last, AFTER removing backslashes and then doing variable substitution.
So this can work

a=*
echo $a

variable substitution has to occur first ... which means backslash recognition has to occur before that ... so THIS can work

a=*
echo \$a

It gets a lot harder then that. the difference between "foo\bar" "foo\\bar" "foo\\\bar" is strange enough ... but add in '' and $ and then xml expressions and wildcards and it really gets strange.

I think I'm going to have to shift over to a "color" model ... that is, during the expansion keep track on a char by char basis the quote "color" for each and every charactor individually. By color I mean \ vs " vs ' ... any given charactor could be colored by one or more of these attributes simultaneously and it effects further processing ... even when the offending quote chars are removed.


Saturday, January 17, 2009

URI for CWD ?

A few weeks ago I added the feature where URI's can be used in place of files everywhere that filenames are used for input. This works for all internal and builtin commands as well as in IO redirection.

For example this works because IO redirection is done within xmlsh

cat < http://test.xlmsh.org/data/books.xml

This works because xcat is an xmlsh command

xcat http://test.xlmsh.org/data/books.xml

But this does not work (because cat is not an xmlsh program)

cat http://test.xlmsh.org/data/books.xml

I added this both to support easy access to web data, as well as to be able to track the base URI mainly for xproc support. Base URI support is useful not only for xproc but for expanded entities, such as in the following case

xcat <> and xml oriented commands (xquery, xed, xslt) can work correctly with a default namespace. But what about a default base URI ?
So I could do something like


declare base-uri http://test.xmlsh.org/data
xcat books.xml


But now there is a conflict between the base-uri and the current directory.
How does the shell know to pull books.xml from the web not from the filesystem ? Once you set a base URI you cant get at files anymore.
This got me thinking more ... what is the base-uri except for the current directory ? What if they were the same. If you could "cd" to a web address, for example

cd http://test.xmlsh.org/data
cat books.xml


ftp could work too
cd ftp://test.xmlsh.org/data


This would actually be prety easy to implement. And maybe useful ?
But the side effects could be weird. Questions arise if I did this :

What would * expand to ? ( echo *)
What do I set for the current directory to external files ?
How would xls work ? ( I experimented with ftp directories and they may be parsable,
but not most http direcotries).








Friday, January 16, 2009

which pipeline is "this" shell ?

A long time ago ... in a dark collage basement, I discovered that /bin/sh did a weird thing with pipelines. In the pipeline
a | b | c

it is the LAST segment ("c") which is run in the current shell, and "a" and "b" are in forked processes. I always found this somewhat strange until I realized this syntax actually works

echo foo | read a

Because "read a" is executed in "this" shell,

$ echo $a
foo

Try this in bash or other "modern" shells and it doesnt work.
I wonder why noone noticed ? I just tried in a modern linux FC8 ksh and voila ! That wonderful legacy behaviour works !

[dave@home ~]$ ksh
$ echo foo | read a
$ echo $a
foo


But unfortunately xmlsh is different then bash OR ksh .. it runs all commands in a sub thread/shell. Does anyone have any opinions on how useful or important this somewhat arcane behavior is ? It definitely has some use cases, but I wonder why bash authors didn't deem it important enough to preserve.

In xmlsh it may be even more useful, consider this in a script

xslt ... | xquery ... | xread DOC

This cant be implemented easily otherwise ...
DOC=$<(xslt ... | xquery ... )

since the $<( ) syntax doesnt read from stdin, you'd have to do

xread DOC1
DOC=$( echo $DOC1 | xslt ... | xquery ...}

I guess this is back to the forking question ...

Tuesday, January 13, 2009

Forking Input

I'm about to embark on a significant feature enhancement/change to xmlsh to support xproc. Xproc requires that streams (pipes) be able to "fork". That is, the input to a step (command) may have to be copied and sent to multiple places, including expressions used for argument construction. This could be done by reading the input into an XML variable then passing that around to the various places that need it, but I'd like to keep the generated xmlsh script as "natural" as possible and wherever possible to preserve the ability to stream. In xproc, "natural" means prety much everything has access to (potentially a copy of) the input stream. But in xmlsh, streams are read similar to the unix shells, where any reader of the input consumes it.

I'm debuting if I should add explicit syntax to cause a stream fork, or possibly for implicitly. The reason that unix shells consume input is somewhat dependent on the unix OS pipe and file semantics. its not enirely clear if preserving this notion is important in xmlsh.

For example suppose I wanted to run both an xquery and xpath on the standard input.

xquery '//foo'
xpath '//foo'

The first command (currently) actually consumes all the input and the second command fails. Is that best ? Maybe xproc has a good point. Suppose the above commands didnt consume the input. That way both the xquery and xpath could read a 'copy' of the input stream and produce results. Would this be more useful ?

An alternative is to provide an explict syntax to provide forking. For example

| xquery '//foo'
xpath '//foo'

In this case I invent the "|" synatx with no leading command to mean "fork the input". that way its explicit that xquery gets a copy of the input and xpath then consumes it.

A similar problem comes in with command expansion. such as

echo $(xpath '//foo')

right now the xpath gets a null input stream because otherwise it would consume all the input. But suppose that command substitution also got forked copies of the input ?
Something like this is actually required by xproc ( the with-option tags must be able to read from the standard input). Again, I could implement this all by reading the entire input into a variable at the beginning then echo'ing it all over the place, but its a compelling idea to natively support stream forking. Not only would there be some convenience in script authoring, but some optimizations could be done which would be hard if the forking was explicit.

Command substitution file parameters.

I just finished implementing the file option to command substitution. This is the same syntax that bsh/ksh use.

In 0.0.1.4 you can use the expression $(<file)

a=$(<file.txt)

then words are not expanded so that $a is a single string but when used in an argument list such as

set $(<file.txt)

words ARE expanded. The behaviour can be forced to not expand by using quotes such as
set "$(<file.txt)"

In the next release will be the similar, but admitedly clumbsy syntax of $<(<file.xml).



Saturday, January 3, 2009

Welcome !

This blog was created to document in a less formal way the experience of creating xmlsh. My goal is that others may learn from experiences, both successes and failures.

I started xmlsh over a year ago, in Dec 2007. At this point (Jan 2009) it is a in "Alpha 1" state. That is, it is functional and used currently in production but still subject to core enhancements before I recommend it to be used in production environments.

Current work in progress is an experiment to create an "XProc" implementation which converts xproc pipelines to xmlsh scripts.