XMLSH

Update to xmlsh 1.2.3

2013-03-25T05:04:00.002-07:00

Update to xmlsh

I recently pushed an update to xmlsh (v 1.2.3) and all dependant extension modules.

Full release notes are here: (for the core)

http://www.xmlsh.org/ReleaseNotes123

Just some random notes on improvements

New RANDOM32 and RANDOM64 variables
Improved http command by switching to Apache httpclient
MUCH better JSON support (to be documented)
A log window for xmlshui makes using it as a debugger much easier
Improvemetns to xurl and urlencode so that you can create safe URL's much easier by passing in all the components of a URL and query as separate strings. - Critical for creating complex REST calls.
various bug fixes

In addition to core, some enhancements to aws, marklogic, and twitter extensions.

JSON Support

2013-01-22T05:20:00.000-08:00

Today I added the beginnings of native JSON support in xmlsh. This is in SVN only, no new releases yet. I am soliciting oppinions on features for JSON.

Why JSON in xmlsh ? Well I have to learn to stop hating and "Love the bomb".
JSON is prevalent in the web world. I want to add better support for web services. While the existing json2xml command works fine it is a bit clumbsy. So I decided to support native JSON parsing with "jsonread" and JSONPATH with the "jsonpath" command. Also variables can be pure parsed JSON objects.

From here what ? This is a good start but I would like the equivilent of <[ xquery ]> but to work on JSON. Maybe thats overkill ? How about simple array and member access natively like $X.Y[2]

Not sure how far I want to go with this.

Suggestions welcome !

-David

Change is coming to the Markup World

2012-08-13T10:20:00.001-07:00

A small post about my experiences after attending Balisage 2012.

http://blog.calldei.com/2012/08/big-changes-in-th.html

Balisage 2012

2012-07-26T09:00:00.001-07:00

Balisage 2012 is quickly approaching !

The annual congregation of markup geeks is quickly approaching.

http://www.balisage.net/

I will be attending as well as presenting Wed August 8 at 2:00 pm (http://www.balisage.net/2012/Program.html). The schedule looks great.

In addition there will be a MarkLogic DemoJam on Tuesday Aug 7, free booze and food and a chance to win great prizes.

This is the Must Go To conference of the year. Hands Down.

What are you waiting for ! Register now.

Installer for xmlsh

2012-06-14T11:41:00.000-07:00

Installers are pretty cool things. They just do what you want and don't bother you with knowing how. However they are also "black boxes" and tend to be dependent on things like having a GUI or a particular OS and you never know what they actually do under the hood.

For a product like xmlsh I never thought much about an installer. Its so eazy, just unzip the distribution, set a few env variables, maybe chmod a file , oh and dont forget the pixie dust. By not having an installer I can ignore all that and give you the warm fuzzy feeling that I'm doing any magic dirty tricks behind your back. Its just files right ?

But then its not always obvious. An installer is, well, convenient! So I've been looking into java installers and found a few. But before spending any time on it I was wondering if my wonderful captive audience could make any suggestions. Would an installer actually help anyone ? Were you daunted by the obscure installation instructions ? Do you have a GUI always available (X windows or Mac or MS Windows) ?. Does an installer feel more friendly or more opaque ?

Inquiring minds want to know !

Release 1.2.0

2012-06-14T11:40:00.003-07:00

At long last many things came together and I have released xmlsh 1.2.0 along with updates for all the extension modules. This is a really big release and I havent had time to document it all ...
But just to get you excited

xmlshui - a simple GUI for xmlsh
Named Pipes - xmkpipe creates named pipes for either text or XDM Streaming
XDM Streaming - using named pipes or implicit pipes (| with the set -xpipe)
Streaming enhancements - Some of the core commands now can stream unseralized XDM through named or implicit pipes - xsplit, xsql , marklogic:put ... more on the way.
Lots of bug fixes and test case improvements.
Marklogic extension improvements - XDM Streaming, dynamic URI construction using {seq} or {random}

This is a solid new release with extremely powerful new features and improvements on the core features. I will be spending the next few weeks updating the docs to help explain how to make the best use out of the new features.

GUI GUI who wants a GUI ?

2012-05-31T17:07:00.001-07:00

At long last (years) I knuckled down and made a simple xmlsh GUI. This will be in the next release.
Why ? I have been opposing this for a long time for many reasons, the least of which is I dont really like writing GUI's. But I got tired of the limited editing capabilities of DOS command shells and enjoy the very simple BSH UI GUI ...
I think what stopped me for long is the slippery slope. Once you start with a GUI where do you stop ? Xmlsh is a command line shell and a embedded API, not a WYSIWYG tool.
But alas ... a simple GUI is useful sometimes. I played around with various toolkits and settled on using plain AWT and Swing. I found that Eclipse Windows Builder supports simple AWT apps. Quite a nice tool. I tried SWT but while it is much more feature full, it is very much tied to Eclipse and required a dozen more jar files to run even a basic window. With AWT I was able to do a functional GUI in 200 lines of code. This will likely expand to 20000 ... as feature creep sets in, but its a start.

Here is a screenshot of a sample session "xmlshui"

Streaming Streaming Streaming ...

2012-05-16T05:24:00.000-07:00

Now that I work for MarkLogic I am dealing with more and more "Big" "Data" ... and as usual xmlsh + marklogic is a huge win. But as I start ramping up my use of large datasets especially large numbers of small documents (millions, hundred million ...) the old tricks dont work quite so well.

For example recently I needed to upload 3 million XML files to a ML server from a relational DB.
My first pass was my favorite tool for this ... xsql + xsplit + ml:put
Since I like to debug stuff as I build it ... the simple way is to do this.

xsql ... > bigfile.xml
xsplit -o xml bigfile.xml
cd xml
ml:put -baseuri /xxx/ -m 100 - -maxthreads 4 *.xml

On my big beefy server box this worked although a bit slowly. So ok I wanted to now transfer this data to an EC2 instance. Its "only" 10G of data so I did this

tar -cvzf xml.tar.gz xml

then transfered the now compressed file to the EC2 machine.
Then on the EC2 machine I tried to replicate the above steps.

tar -xzf xml.tar.gz

I waited ... waited ... waited ... 3 DAYS and it wasnt done yet. Admitedly this was a medium instance of EC2 but it should have handled this. The problem seemed to be the system was stuck in 90% system time.
My guess is the age old problem of lots of files in a directory. Especially over EBS ... it just doesnt perform well. Its actually exponentially slow to add files to directory once they get big ... particurly nasty when the files are small so the overhead of simply creating a file entry is much bigger then the file IO itself.

So what to do ... I did 2 things ... I restarted the EC2 instance as an m1.xlarge ... ($$$$ ka chink)
Then instead of pre extracting the xml to a directory in whole I used a new feature I recently added to ml:put ...

tar -xzf xml.tar.gz | ml:put -baseuri /xxx/ -m 100 -maxthreads 4 -f - -delete

What this does is let tar still extract the files but it then lists them to stdout.
From there ml:put reads the list of files as they are extracting, batches them up and sends them to MarkLogic then deletes them. The end result is that there is only about 500 or so files in the xml directory at any one time. This completed in about half an hour ... about 2000 docs/sec ... much better.
Of course this speedup was due to the larger instance as well as the technique ...

But this gets me thinking ... Why do I need the overhead of writing to a temp directory for this ? Its still adding a significant unnecessary overhead. I should be able to send a bunch of XML files to ml:put in a stream and use no temporary files. In fact I should be able to do a full pipeline with no overhead like

xquery 'for 1 to 10000000 return document { ... } ' | ml:put ...

or perhaps

xsql 'select * from table' | xsplit -stream | ml:put ....

The core problem here is the lack of a streaming interface for XDM. In order to send a bunch of XML files (or XDM values) through a stream (or to a file and back) they need to be packaged in something. Typically wrapped in a root element or maybe zipped or tar'd.

Zip is really lousy for this because its TOC is at the end so you cant stream unpack a zip file. Tar is good because each file entry is contigous and you can stream unpack them. But what about cases where I just want to dynamically create (or transform) XML and spit it out like the first example
xquery 'for 1 to 10000000 return document { ... } ' | ml:put ...

If I wrap this in a single document it becomes hard to stream. ml:put *could* have xsplit builtin ... but to keep to the tools approach I'd rather split the functionality. So say I put xsplit into the pipeline like the second example. How is xsplit to produce *multiple* documents on its output stream in a way that is readable ? Were back to a serialization format for XDM (http://xml.calldei.com/XDMSerialize)

This is a fundimental problem in traditional XML toolchains. There is simply no standard and efficient way to stream sequences. So what to do ?

I'm considering a 3 phase approach.
1) Implement an enhancement to xmlsh commands and pipes such that they can request, produce, and consume sequences through ports. So for example "xsplit -stream" could output the split documents all to stdout. But what would this look like ? How to implement it ?

2) For pipes implement an optional XDM stream pipe. This would allow streaming of XDM values (including sequences of documents), without serialization directly through the pipe. This does mean that the pipe might get large if the documents are large ... I may have to limit the pipe to a small number of values.

3) Implement some kind of text serialization for sequences. Essentially back to http://xml.calldei.com/XDMSerialize ... although I am not sure I like my proposal so much in the face of this use case. The original proposal does not consider streaming as the major use case. However the use cases it was designed for should overlap with streaming. I'm not even sure I need to support most of XDM ... falling back to what XProc does (streams of documents) may be sufficient although I abhor the restriction on purely theoretical grounds. But the fact is any text serialization of XDM will be lossy. It is just a matter of drawing the line somewhere, and maybe the most valuable use case is drawing the line at documents.

Well back to the drawing board. I'd like to implement this but still so many open issues !!!
Comments welcome.

Spring Update

2012-04-09T11:46:00.001-07:00

Spring Update

I have updated xmlsh to verson 1.1.9. The main enhancement is upgrade to Saxon 9.4.

Also updated the MarkLogic extension.

XMLSH v 1.1.8

2011-12-15T14:06:00.000-08:00

Winter Update I've updated xmlsh and all the extension modules (both documented and non documented) including

MarkLogic
eXist
JSON/JXON
AWS
JMX
Calabash/XProc

Primarily a minor tweek and bug update but also includes new commands and features and updates to the latest run-times of all extension modules (MarkLogic 5.0 , latest Calabash , latest AWS etc).

Definitely a suggested upgrade for all.

One caveat. Due to a suggestion from a reader, as well as my long-term wish, I've changed "xls" to use a different tag for files and directories. More consistent, didn't break any unit tests but might break your scripts. Sorry ...

GUI for xmlsh

2011-09-28T06:27:00.000-07:00

I've long considered implementing a GUI for xmlsh.
I didnt do so from the start because I didnt want xmlsh to *be* the GUI (or require one). But now that the core is stable and mature, there are times when a GUI would be very useful.

Suggestions (and help!) welcome for this upcoming project.

Ideas I have

* optional. Core xmlsh not affected adversly
* portable. Probably (java) means using AWT or Swing ?
* atleast 1 mode that simulates a typical terminal with line editing

Onto things which might be really neat
* GUI view of variables
* debugger
* syntax sensitive source browser/editor
* multiple windows (one per thread ?)
* Eclipse plugin ?

A lot of things (and wasted time) could be put into this. I'd love feedback as if any of this seems useful to you.

Released developer edition of eXist extension

2011-06-03T08:16:00.002-07:00

I have released the 0.1 "developer" edition of the eXist extension module for xmlsh. This is Pre-Alpha quality and should not be used in production.

http://www.xmlsh.org/ModuleExist

Comments welcome ! There's a lot left to go with this, but it supports the core REST operations exposed as DB operations put/get/invoke/query/del and one example list. From these I should be able to build a comprehensive set of tools for eXist.

Released update to MarkLogic extension

2011-06-02T08:56:00.000-07:00

Thanks to some help from the field I found and fixed a bug in the MarkLogic invoke command. When using the "-v" option to pass external variables to stored xquery's the arguments were being misread.

This has been updated as version 1.12 of the MarkLogic extension module ( 2011-05-02)

Release 1.1.5

2011-05-05T18:00:00.000-07:00

I've been really lazy and haven't posted on this blog for a while. Even skipped the last release.

Today I released xmlsh 1.1.5 as well as updates to the MarkLogic and Calabash extension modules. The main feature is updating to Saxon 9.3. Also includes some bug fixes, fixed demo app, improved test cases.

So whats coming up next ? I have a LOT of things in the pipeline. A major extension module I hope to release this year is the JSON extension module. This is an implementation of the JXON processor and associated tools. You may notice the json2xml and xml2json commands have been updated to use the JXML schema. This is just a tiny part of the JXON processor. I hope to release this in time for the upcoming Balisage 2011 conference. But if your curious now, an early implementation is checked into sourceforge.

Also in the pipes are extension modules for the Exist XML DB and also for Amazon Web Services (AWS)

No ETA on these yet but I've started work. Volunteers are welcome !

I'm also experimenting with the PE and EE Editions of Saxon. Some really great stuff in there especially XQuery 3.0, XPath 3.0 and XSLT 3.0. Unfortunately these are all paid/licensd features so I am reluctant to require their use. However I have tested xmlsh with Saxon 9.3 in both PE and EE versions to make sure you can use these features. I'd really love to have them as part of the core technology but not yet willing to pull the rug on a full free open source implementation.

XMLSH 1.1.3 released, 1.1.4 coming soon

2011-01-17T04:28:00.000-08:00

Thanks to some great feedback from users as well as my JSON/XML project I've been making significant improvements, mostly fixes, to xmlsh.
1.1.1 and 1.1.2 , 1.1.3 were released in Dec, Jan. And a 1.1.4 is coming soon.

xmlsh 1.1 released

2010-11-18T08:35:00.000-08:00

I've finally gotten xmlsh back into good robust state after adding a bunch of new features.

Version 1.1 is here !

I have not yet completed documenting all the new features, working on that over the next few days, but you can look in the test cases for examples.

Major new features include

* Scriptable Streaming XML with StAX functions (at the script level)

* Native functions

* Function call expression syntax. eg. echo foo( bar )

* Native Java object creation, variables, and method calls

* Significant performance optimizations including reworking many commands from non-streamable to streamable

* JSON / XML conversions (preview feature, still in progress)

Function syntax for native java objects

2010-10-12T10:31:00.000-07:00

I'm making some good progress on integrating java objects.

I now have function syntax working for object variables.

Here's an interesting example of mixing functions, jset, and the function syntax

# Define a function jnew which creates a String

$ jnew () {

local _x ;

jset -v _x -c java.lang.String "$@"

return $_x;

}

# Create a string object

$ a=jnew("hi there")

# Check its value

$ echo $a

hi there

# Check its type

$ xtype $a

java.lang.String

# Call the length method

$ echo a.length()

# Call the concat method

$ echo a.concat(" some more")

hi there some more

import of jar files / setting classpath

2010-10-07T11:04:00.000-07:00

In combination to the support of native Java objects, the import command will now have a new option, "java". This appends to the classpath of the current shell and makes the jar file or directory available to all commands which use the classloader (such as xsql, jcall , jset , modules )

Example:

import java myfile.jar

jset -c mypackage.MyClass -v var

Will look in myfile.jar (in addition to the global classpath) for mypackage.MyClass

Native Java support

2010-10-07T10:55:00.000-07:00

A new project has let me to finally realize something I wanted to do for some time. That is direct support for native java objects as expression values.

Currently there is the "jcall" command which supports calling the main method of a java program in the same JVM. This is useful to avoid a new java process overhead, but is not that useful to get at native java objects and methods which are not already exposed to the language.

There is also extension modules which allow you to write custom java code and expose it as commands. However the types of values going into and out of the modules are still limited to the native types of xmlsh (XDM values).

There is use for being able to pass in native java Objects to some commands. For example the xsql command currently requires connecting to the database every call, but if you could create, persist, and pass in a Connection object it could be reused for multiple calls.

Now you can. Not yet released, but checked into the source repository is support for allowing any java Object to be passed as an expression or stored in a variable.

But how do you *get* these java objects created and assigned ?

One way is to pass them in through the calling application, setting them as an XValue to the shell positional parameters or environment variables.

Another way is a new command "jset" which lets you create java objects, set them to variables and then call methods on the objects and assign those to other variables.

Example:

jset -v str -c java.lang.String "Hi There"

will create a java String object and assign it to $str

You can do the same for Date, for example

jset -v date -c java.util.Date

echo $date

You can also call methods (static or instance).

jset -v len -o $str -m length

sets $len to the result of String.length()

I'm still working on a better syntax, possibly with combination to the "tie" command so you can tie method invocation to variable expansion.

I'd like to see something like ${str:length} call the length() method of the object in $str

but I need to flesh out issues like passing in arguments.

Be ready for a new release soon with this support ! Suggestions as always are welcome.

Functions are coming

2010-06-25T08:56:00.000-07:00

I've implemented an experimental function call expression syntax. This is coming in Version 1.0.7 (ETA by July).

See
http://www.xmlsh.org/SyntaxFunction

For some documentation on how this will work.

Considering also adding dynamic type coercion so that variables interact with xquery expressions in a more obvious way.

E.g.

a=1
echo <[ $a + 1 ]>

Fails because $a is a xs:string type.
You need to do

a=<[1]>
echo <[ $a + 1 ]>

to get an integer

I'm considering having expressions passed to xquery or perhaps when evaluated to automatically be coerced to the apparent type, but I haven't worked out all the ramifications of this yet.

Better error diagnostics coming

2010-05-24T18:39:00.001-07:00

In the next release of xmlsh (ETA end of may), there is a significant improvement to error diagnostics. This effects normal function errors, as well as the -v and -x options.

All command errors that cause a usage or exception, as well as -v and -x output
produces file/line diagnostics.
Example

$ echo -foo
[stdin line: 1]
echo: Unknown option: foo

In scripts this includes the filename of the script and line number.

This output is also in -v and -x output (preceeding the function)
Example:

$ set -v -x
$ echo foo

- [stdin line: 7]
echo foo
+ [stdin line: 7]
echo foo
foo

Functions and more functions

2010-05-15T12:07:00.000-07:00

Thanks to some great discussions with Dave Pawson, I'm considering extending the syntax for function calls and the possible return values for functions, scripts, and commands.

Right now a command, script, or function can only "return" an integer, often known as the "Exit Status". Historically this is because C programs (and most OS programs) "exit" or "return" with a single integer value. On older systems this was actually limited to a byte. 0 means success, non-zero means fail.

So things have been for 40 years ...

But with xmlsh its an artificial limitation. Why make functions and scripts limited to int return values ? I cant do anything about external programs, but functions and scripts *could* return any XDM type. Why not let them ?

After much discussion I've concluded that the only thing in the way is a syntax to capture the return value which doesn't break compatibility with capturing the stdout.

The typical syntax for capturing the "output" of a command/script/function is $(cmd).
This captures the stdout of the command and converts it to a string.
XMLSH has extended this a bit with $<(cmd) for XML types and cmd >{var} as a synonym. But both extensions still only capture the stdout.
The only way to get the return value is either using $? or using the function in a boolean context like

if command ; then

In which case the exit status of the command is interpreted as a boolean (0=success).

But why not let functions (and possibly scripts and internal commands) "return" non-integer values ?

Like this

function concat()
{
return "${1}${2}"
}

What are the implications ?

For one, $? could be any type not just integer. I dont foresee any big problems with that.
e.g.
concat foo bar
a=$?

produces "foobar" in $a

Existing code that assumed functions returned itegers would still work as long as those functions or commands were not changed.

What about boolean context ?

if concat foo bar ; then ...

I'd need to define some compatible conversions from any XDM type to boolean.
These couldnt *quite* be the same as the xquery conversion to xs:boolean because I'd want 0=true and integer != 0 false.
But other then that most conversions could be fairly obvious
() => false
non-empty sequence => true
xs:integer => see above (0=true)
"" => false
non empty string => true
node/element/document => true

should be OK.

But why do all this ? One more step is desired. The ability to call functions and make them "look like function calls" ; instead of relying on $?

why not support something like
concat( foo , bar )

This could then nest
concat( concat( foo , bar ) , spam )

This syntax func(...) would evaluate to the return value of the function. The stdout would still go to the parents stdout so you could pipe through them

func(a) | func2(b)

Then there are some details about globbing, and argument lists. If I support the "," as an argument seperator (instead of space for command invoction) then this brings some interesting side effects

foo( *.xml ) => $1 becomes a sequence of xs:string
foo( *.xml , *.c ) => $1 , $2 both sequences

and so on ... this could have some really interesting side effects.
Consider if commands as well as functions could be called this way. Why not ?

cat( foo.xml , bar.txt )

as syntatically equivilent to

cat foo.xml bar.xml

This would allow more "programming like" language style to be used. It also opens the doors for function signatures (or command signatures) for static or dynamic type checking and variable assignment.

For example, suppose you'd write a function now like

function copy() {
from=$1
to=$2
cp $from $to
}

Adding a signature could be closer to xquery or java or C

function copy( from , to )
{
cp $from $to
}

Then callers of copy could produce an error automatically if they provided the wrong number of args

copy a b c # -> runtime error by interpreter

could add to that types eventually

function copy( from as xs:string , to as xs:string )
{ ... }

then the next obvious step is return types as part of the signature

Anyway something to think about ...
I suspect I'm going to try the first step and see what breaks. Let return take any type , and let $? take on any type.

Lets see if that breaks anything first ...

Comments welcome !

Annonnce: xmlsh and marklogic extension update release

2010-05-06T13:31:00.001-07:00

I have updated xmlsh (version 1.0.4) and MarkLogic extension to xmslsh (version 1.1) on sourceforge.

https://sourceforge.net/projects/xmlsh/

I actually included a changelog.txt for the first time !

Updates coming, new commands, better servlet code, better MarkLoigc

2010-04-20T08:57:00.001-07:00

Just a quick note to say new commands are coming.

* base64 (posix)

* xmd5sum (md5 sums of files)

* xunzip (unzip and list zip archive in xml friendly way)

Improved servlet integration.

Much improved marklogic integration,

* multi-threaded put

* commands for properties and permission control

xmlsh 1.0.3

2010-03-21T15:40:00.000-07:00

Released xmlsh 1.0.3 today.

This includes

enhancements to xgetopts
enhancements to XML Servlet (headers and parameters)
new command httpsession to get/set session parameters
Many bug fixes
{expr} syntax in command line arguments to preserve sequences