Thursday, November 18, 2010

xmlsh 1.1 released

I've finally gotten xmlsh back into good robust state after adding a bunch of new features.
Version 1.1 is here !
I have not yet completed documenting all the new features, working on that over the next few days, but you can look in the test cases for examples.

Major new features include

* Scriptable Streaming XML with StAX functions (at the script level)

* Native functions

* Function call expression syntax. eg. echo foo( bar )

* Native Java object creation, variables, and method calls

* Significant performance optimizations including reworking many commands from non-streamable to streamable

* JSON / XML conversions (preview feature, still in progress)

Tuesday, October 12, 2010

Function syntax for native java objects

I'm making some good progress on integrating java objects.
I now have function syntax working for object variables.

Here's an interesting example of mixing functions, jset, and the function syntax

# Define a function jnew which creates a String
$ jnew () {
local _x ;
jset -v _x -c java.lang.String "$@"
return $_x;

# Create a string object
$ a=jnew("hi there")

# Check its value
$ echo $a
hi there

# Check its type
$ xtype $a

# Call the length method
$ echo a.length()

# Call the concat method
$ echo a.concat(" some more")
hi there some more

Thursday, October 7, 2010

import of jar files / setting classpath

In combination to the support of native Java objects, the import command will now have a new option, "java". This appends to the classpath of the current shell and makes the jar file or directory available to all commands which use the classloader (such as xsql, jcall , jset , modules )


import java myfile.jar
jset -c mypackage.MyClass -v var

Will look in myfile.jar (in addition to the global classpath) for mypackage.MyClass

Native Java support

A new project has let me to finally realize something I wanted to do for some time. That is direct support for native java objects as expression values.

Currently there is the "jcall" command which supports calling the main method of a java program in the same JVM. This is useful to avoid a new java process overhead, but is not that useful to get at native java objects and methods which are not already exposed to the language.

There is also extension modules which allow you to write custom java code and expose it as commands. However the types of values going into and out of the modules are still limited to the native types of xmlsh (XDM values).

There is use for being able to pass in native java Objects to some commands. For example the xsql command currently requires connecting to the database every call, but if you could create, persist, and pass in a Connection object it could be reused for multiple calls.

Now you can. Not yet released, but checked into the source repository is support for allowing any java Object to be passed as an expression or stored in a variable.

But how do you *get* these java objects created and assigned ?
One way is to pass them in through the calling application, setting them as an XValue to the shell positional parameters or environment variables.

Another way is a new command "jset" which lets you create java objects, set them to variables and then call methods on the objects and assign those to other variables.


jset -v str -c java.lang.String "Hi There"

will create a java String object and assign it to $str

You can do the same for Date, for example

jset -v date -c java.util.Date
echo $date

You can also call methods (static or instance).

jset -v len -o $str -m length

sets $len to the result of String.length()

I'm still working on a better syntax, possibly with combination to the "tie" command so you can tie method invocation to variable expansion.

I'd like to see something like ${str:length} call the length() method of the object in $str
but I need to flesh out issues like passing in arguments.

Be ready for a new release soon with this support ! Suggestions as always are welcome.

Friday, June 25, 2010

Functions are coming

I've implemented an experimental function call expression syntax. This is coming in Version 1.0.7 (ETA by July).


For some documentation on how this will work.

Considering also adding dynamic type coercion so that variables interact with xquery expressions in a more obvious way.


echo <[ $a + 1 ]>

Fails because $a is a xs:string type.
You need to do

echo <[ $a + 1 ]>

to get an integer

I'm considering having expressions passed to xquery or perhaps when evaluated to automatically be coerced to the apparent type, but I haven't worked out all the ramifications of this yet.

Monday, May 24, 2010

Better error diagnostics coming

In the next release of xmlsh (ETA end of may), there is a significant improvement to error diagnostics. This effects normal function errors, as well as the -v and -x options.

All command errors that cause a usage or exception, as well as -v and -x output
produces file/line diagnostics.

$ echo -foo
[stdin line: 1]
echo: Unknown option: foo

In scripts this includes the filename of the script and line number.

This output is also in -v and -x output (preceeding the function)

$ set -v -x
$ echo foo

- [stdin line: 7]
echo foo
+ [stdin line: 7]
echo foo

Saturday, May 15, 2010

Functions and more functions

Thanks to some great discussions with Dave Pawson, I'm considering extending the syntax for function calls and the possible return values for functions, scripts, and commands.

Right now a command, script, or function can only "return" an integer, often known as the "Exit Status". Historically this is because C programs (and most OS programs) "exit" or "return" with a single integer value. On older systems this was actually limited to a byte. 0 means success, non-zero means fail.

So things have been for 40 years ...

But with xmlsh its an artificial limitation. Why make functions and scripts limited to int return values ? I cant do anything about external programs, but functions and scripts *could* return any XDM type. Why not let them ?

After much discussion I've concluded that the only thing in the way is a syntax to capture the return value which doesn't break compatibility with capturing the stdout.

The typical syntax for capturing the "output" of a command/script/function is $(cmd).
This captures the stdout of the command and converts it to a string.
XMLSH has extended this a bit with $<(cmd) for XML types and cmd >{var} as a synonym. But both extensions still only capture the stdout.
The only way to get the return value is either using $? or using the function in a boolean context like

if command ; then

In which case the exit status of the command is interpreted as a boolean (0=success).

But why not let functions (and possibly scripts and internal commands) "return" non-integer values ?

Like this

function concat()
return "${1}${2}"

What are the implications ?

For one, $? could be any type not just integer. I dont foresee any big problems with that.
concat foo bar

produces "foobar" in $a

Existing code that assumed functions returned itegers would still work as long as those functions or commands were not changed.

What about boolean context ?

if concat foo bar ; then ...

I'd need to define some compatible conversions from any XDM type to boolean.
These couldnt *quite* be the same as the xquery conversion to xs:boolean because I'd want 0=true and integer != 0 false.
But other then that most conversions could be fairly obvious
() => false
non-empty sequence => true
xs:integer => see above (0=true)
"" => false
non empty string => true
node/element/document => true

should be OK.

But why do all this ? One more step is desired. The ability to call functions and make them "look like function calls" ; instead of relying on $?

why not support something like
concat( foo , bar )

This could then nest
concat( concat( foo , bar ) , spam )

This syntax func(...) would evaluate to the return value of the function. The stdout would still go to the parents stdout so you could pipe through them

func(a) | func2(b)

Then there are some details about globbing, and argument lists. If I support the "," as an argument seperator (instead of space for command invoction) then this brings some interesting side effects

foo( *.xml ) => $1 becomes a sequence of xs:string
foo( *.xml , *.c ) => $1 , $2 both sequences

and so on ... this could have some really interesting side effects.
Consider if commands as well as functions could be called this way. Why not ?

cat( foo.xml , bar.txt )

as syntatically equivilent to

cat foo.xml bar.xml

This would allow more "programming like" language style to be used. It also opens the doors for function signatures (or command signatures) for static or dynamic type checking and variable assignment.

For example, suppose you'd write a function now like

function copy() {
cp $from $to

Adding a signature could be closer to xquery or java or C

function copy( from , to )
cp $from $to

Then callers of copy could produce an error automatically if they provided the wrong number of args

copy a b c # -> runtime error by interpreter

could add to that types eventually

function copy( from as xs:string , to as xs:string )
{ ... }

then the next obvious step is return types as part of the signature

Anyway something to think about ...
I suspect I'm going to try the first step and see what breaks. Let return take any type , and let $? take on any type.

Lets see if that breaks anything first ...

Comments welcome !

Thursday, May 6, 2010

Annonnce: xmlsh and marklogic extension update release

I have updated xmlsh (version 1.0.4) and MarkLogic extension to xmslsh (version 1.1) on sourceforge.

I actually included a changelog.txt for the first time !

Tuesday, April 20, 2010

Updates coming, new commands, better servlet code, better MarkLoigc

Just a quick note to say new commands are coming.

* base64 (posix)
* xmd5sum (md5 sums of files)
* xunzip (unzip and list zip archive in xml friendly way)

Improved servlet integration.
Much improved marklogic integration,
* multi-threaded put
* commands for properties and permission control

Sunday, March 21, 2010

xmlsh 1.0.3

Released xmlsh 1.0.3 today.

This includes

  • enhancements to xgetopts
  • enhancements to XML Servlet (headers and parameters)
  • new command httpsession to get/set session parameters
  • Many bug fixes
  • {expr} syntax in command line arguments to preserve sequences

Sunday, March 7, 2010

When are sequences too much ?

I'm struggling with (one of many) artifacts of allowing expressions (variables, and positional parameters) be XdmValues, which allow sequences.
In generally this is a good, if not necessary thing.
It allows sequences to be stored in variables and produced as expressions and be preserved as sequences. For example

A=<[ 1 , 2 , ]>
xquery -f file.query -v sequence_var $A

Note that you can pass a single XdmValue ($A) as a parameter to Xquery.

The problem comes if you want to 'flatten' the sequence to seperate positional params.


A=$(ls) # produces a single sequence of files

echo $A

This looks right but under the hood echo is getting a single argument (argv[1]) which is a sequence.

Suppose I want to delete all these files

posix:rm $A

Ups ... rm now gets 1 argument ... it has to know that if the argument is sequence to iterate over each item like this
for i in $A ; do posix:rm $i ; done

For many commands getting sequences where they expect items is problematic.
But on the other hand being able to preserve sequences is critical.

So What to do ?
I'm working on (and open to suggestion) a syntax which forcibly flattens sequences into positional parameters.

Something like maybe

although I'd like it to work with inline expressions as well
posix:rm $(ls) # How to get this to flatten ?

Comments welcome on this idea

Tuesday, February 23, 2010

Released xlmsh 1.0.2 and marklogic 1.0

I have released an update to xmlsh (1.0.2) and to marklogic extension (1.0).
These include support for the new help command and consolidated usage.
Still looking for volunteers to help add to the online help content, right now only the synopsis is available for most commands and a link to the web site for details. I would like to add all the options to the built in help data ...

Many major bug fixes and improvements.

Wednesday, February 17, 2010

New help command

I'm just about to release 1.0.2. which includes a rudimentary "help" command.
I expect to expand on this greatly, but for now it prints out the URL on associated with the command, and on browser enabled systems launches the default browser (Using the Java Desktop class) pointing to the associated help page.

This works for all builtin, internal, and supplimentary commands as well as extension modules.

I went back and forth literally dozens of times in my mind about how best to do this. I almost implemented it using Java annotations which would have been really cool ...
but got stuck on how to cleanly annotate script based commands. Some commands are actually .xsh scripts not .java classes so these would need an ancillary annotation method.
I finally tossed that whole idea (and implementation) and settled on a help.xml file per package. Right now this just contains the command name and URL, but I expect to enahnce it to provide usage text (text based only) and maybe short help text for each command.

I dont expect to ever provide full text formatted help text (in pure text form) because the state-of-the-art for formatted documents has just evolved past plain text. Gone are the days of nroff, and -man. They still exist ... but I dont want to move back to that technology, especially since its impossible to write a good terminal app using Java. (lack of the basics for console IO such as unbuffered charactor reading, clearing screens, cursor positioning).

I want the best of both worlds ! A docbook sourced help system that can generate HTML, PDF and TEXT yes plain TEXT ! But alas ... its even againsted my stated philosophy to go back to text. Can win for loosing :)

Also coming is a very simplistic more command. The best that can be done without unbuffered console IO. (have to hit ENTER to read characters in java. 10 year old bug that sun refuses to belive is important).

Sunday, February 14, 2010


Thanks to a suggestion from a new user, I realized I had not documented how the XPATH environment variable is implemented. I've corrected that now by documenting it, but it exposed some problems. That is, XPATH cannot be set to more then 1 directory in the environment prior to calling xmlsh. XPATH is a XDM Sequence variable (like an "array"). Instead of how PATH is a : or ; separated string. I think this makes much more sense but it leads to compatibility issues.

The worse is that you cant preset XPATH before calling xmlsh ! I never thought this through completely but thanks to user feedback I am now.
Also PATH and XPATH are treated differently, which is, well, inconsistent to say the least.

What I've decided to do is this.

On startup, xmlsh will read both PATH and XPATH and parse them into an XDM sequence according to the OS path separator (";" or ":") and then from then on they will remain sequence variables in xmlsh. You can operate on them like any other sequence. Directory separators will be converted to "/" (as is already done).

On calling subprocesses, both variables will be re-serialized as a single string using the same (but reverse) algorithm. The result is subprocesses will see the same single-string, path separated and native OS directory separator strings as were passed into xmlsh.

This way PATH and XPATH can be treated the same, and both can be initialized prior to invoking xmlsh using normal OS environment settings.

The one problem is that this may break existing code in xmlsh which attempts to change the PATH variable using

Instead you'll need to do sequence operations like

PATH=($PATH /mydir)

PATH=<[ $PATH , "mydir" ]>

Since PATH will become a sequence variable this syntax wont produce the desired result.
Its possible I could try to hack this by parsing all string assignments to PATH , but I'm not excited about introducing that hack.

Suggestions anyone ? Do will this break any of your existing code ?
I'm relying on the presumption that PATH will have been set prior to invoking xmlsh in most (all?) existing scripts.

Friday, February 5, 2010

xmlsh Phone home !

With 1.0 I'm focusing more on refinements, performance and usability then feature enhancements.
It is critical to know how much xmlsh is being used 'in the wild' and what features are being used.
Unfortunately I have no idea. Sourceforge posts the # of downloads but I dont even know if those are new users or the same userbase downloading each new version.

Calabash (Norm Walsh's xproc implementation) has a "Phone Home" feature, enabled by default.
This keeps track of what steps were run and on exit 'phones home' (posts an XML file to a server) with the statistics. This is critical information to be able to analyze usage and focus optimizations.

I have not implemented such a thing in xmlsh. I am a bit shy of doing so because I dont want to offend people, and even though the data is anonymous it has the impression of being an invasive thing to do. Plus there is a performance impact of both measuring the data (slight) and sending the results (more).

However Norm has told me noone has yet complained about his "Phone Home" feature. And from it he has collected valuable stats which he can use to improve the product for everyone.

Any opinions on this ? Certainly it should be an optional feature. But if I made it "opt in" instead of "opt out" I suspect noone would bother to turn it on.

Maybe this is something that on first use only xmlsh could prompt for the option, as well as indicate how to turn it off in the future.

Any ideas anyone ? How to gather statistics for the good of all users, without being invasive.

Obsolete $_ syntax in <[ expr ]>

Is it to late to revoke a syntax feature ?
I've determined in hindsight that the $_ variable exposed in the<[ xquery expr ]> is not right.
This concatenates all positional parameters into a single sequence. If any of the positional parameters are a sequence > 1 then the result loses information. For example

set <[ 1,2,3 ]> 4 5
echo $#
echo <[ $_[2] ]>



Very non obvious. The fact that $1 is a sequence has been lost and the $_ is a concatenation of 1,2,3,4,5

To fix this I now predeclare distinct variables $_1 $_2 ... for all positional parameters.
This preserves the sequences in the positional parameters.

$ echo <[ $_1 ]>
1 2 3

This means you can access all positional parameters within the <[ ]> expression with no loss of fidelity. I'd like to get rid of $_ as it is now redundant. However someone might be using it ... so I'm keeping it in for now.

A side effect of this, however, is that there is no way to access $# within the <[ ]> or to iterate over all positional parameters in one query. This is a fundamental limit of xquery and XDM which do not allow nested sequences. It also means that if you actually assign a global variable _1 then it is overwritten by the positional parameter $1 within the <[ ]> expression.

So is it too late now that I've published "1.0" of xmlsh ? Even if there is efficiency and usability issues ? Would anyone be affected ?

I have no idea because I have no metrics of how much xmlsh is being used ....
which gets to my next post ...