Monday, June 15, 2009

Shifting sequences

My other foray into trying to bring functional parity between positional parameters and sequence variables is "shift". I'd like a shift operator that operates on a sequence variable similar to how shift works on positional parameters. Something like perl's shift.

example

a=(foo bar spam)
shift a

now a is (bar spam)

Not sure if I should do this though. It is the equivilent of this not-too-ugly (but not too obvious either)

a=<[ $a[position() > 1] ]>

Expanding sequences into positional parameters.

I'm debauting over a syntax to convert a sequence into positional parameters.

The problem is this. Suppose I have a sequence expression, say a variable
a=(a b c d e)

then want to pass that sequence as separate parameters to a command. Using $a does NOT expand the sequence. For example
command $a

passes 1 argument of type sequence (unless command is an external program in which case the sequence is converted to args).

this is typically desirable. Sequence expressions should be able to be passed to commands without "flatening". Particularly useful if you want to pass multiple sequences for example as say xquery or xslt parameters
xquery -q ... -v value1 $a -v value2 $b

a and b can be sequence values. But suppose I actually want a sequence to be flattened. I haven't figured out a clean way to do it in the current syntax. Posix shells don't help with this as they don't have sequences. Bash has arrays which are similar but you cant pass an array around to a command.

Eval can do this if the sequence contains simple unquoted strings like above
eval set $a
sets $1,2,3 to a,b,c BUT

a=<[ <foo/> , 2 , <bar>spam</bar> ]>
eval set $a

is a syntax error.
I believe this works but haven't tried it in all cases
set --
for arg in $a ; do
set -- "$@" $arg
done

ugly though !


I'm considering the "all array" syntax to do this like ${a[*]}
which would instead of returning all elements of the sequence as a single sequence value, returning them as multiple positional values similar to $*

That might do the trick but I'm not sure I like it. The odd thing is that this feature is only really useful when in the context of command argument expansion.
e.g
b=${a[*]}

would do nothing different then
b=$a

This is a bit subtle but the difference is that single variables can only hold 1 level of sequence whereas positional parameters hold 2 levels. That is "$*" is a List of Values where each value can be a sequence.

also considering both a prefix and suffix modifier such as ${+a} or ${a:+}

Of course none of these syntax would be obvious at all what they do until you learn them !

Wednesday, June 10, 2009

Limited background execution

I'm preparing test cases for the upcoming Balisage 2009 conference.

I'm re-encountering an age-old scripting problem/hack/idea.
Typical scripting languages (say unix shell scripts or CMD or other scripting languages), if they support "background execution" is a very limited form.
Take the unix shells for example. You can run a program "in the background" (as a non-blocking process) with "&". On Windows (cmd.exe) you can use "start".
Xmlsh is the same as unix shells, you use "&". Thats all great when your doing simple things. & + wait == job control. Great. But a common problem is that if your doing N background processes and N is large or unknown its hard to control. I've written shell scripts that can handle "N" (say 10) background processes and block if you try to do N+1 but its very clumbsy scripting, not something I recommend. And very special use case. but in todays multi-core world, being able to run N background processes (or threads) is very useful, *extremely* useful if you can arrange for N not to become "huge".

take a simple XML example. Suppose I have a directory of files and I want to run xslt on them all.

xmlsh script:

for i in *.xml ; do
xslt -s script.xslt < $i > output_dir/$i
done

But suppose I'm running on a 4 or 8 core CPU and this is a CPU bound process. I'd like to run these in some kind of parallelism.
This will work ...

for i in *.xml ; do
xslt -s script.xslt < $i > output_dir/$i &
done
wait

But if the number of files becomes large (say > 10) then this so greatly thrashes the system that not only does it slow down, but it risks eating up the system memory such that you cant guarentee completeness.

If you were writing a enterprise type product you could use a Thread Pool or Worker Thread or Worker Process model, and then use a queuing system and send requests into the queue, and have say 10 worker threads/processes doing the work from the queue. This may actually be implementable in the ongoing experimental http server module in xmlsh but suppose I wanted something less elegent but nearly as useful.

Imagine a syntax that says "Start a background thread but ONLY if there are < N outstanding background threads".

I can imagine a very simple syntax like maybe this

for i in *.xml ; do
xslt -s script.xslt < $i > output_dir/$i &[10]
done
wait

This would mean "start up to 10 concurrent xslt threads but no more"

It may not be quite as efficient as a worker pool but the syntax and implementation could be very simple. My estimation is that the end result would be nearly as efficient.

Thursday, May 28, 2009

"Worlds Simpliest Web Server"

Ok thats a bold claim. I'm not going to do the research to prove it. But to me this is awsome.

I'm working on http client and server support for xmlsh so that I can use xmlsh to prototype a content server. Not quite complete yet, but for an example, this code implements a full HTTP server serving content from the current directory tree. I tested it out by cd'ing to the JDK docs/api directory and launching IE with "http://localhost/index.html" - works !

-- xmlsh code

get () { cat ${PWD}$1 ; }
httpserver -port 80 -get get start


--

Thats it !
Every GET request gets executed by the local "get" function which cats out the file.

Speaking of "cat" though, I've decided that I need to implement basic unix commands natively in xmlsh. Originally I didnt want to do this because it was "reinventing the wheel", but the above example is a great case for it. In the above, the get() method acutally has to spawn 3 threads and a subprocess just to cat the file.
There is no native syntax to stream from input to output without running a command.
If these were all xml files then I could use "xcat" which would be very efficient, or even "xread a < file ; xecho $a;" but for text files there is no builtin "cat" command so its subproces/thread time. Yuck !

Similar for some of the basic unix commands like touch,mv,cp,cat,rm,mkdir,ls.
All of these require a unix subsystem currently (like a real unix OS or cygwin).
It would be very nice if these basics were 'built in'. I'm thinking of truely building them in as internal commands (very simplified option set) or as an "extension module". Comments appreciated.
The advantage would be not only performance on the basic commands, but also usability. A pure xmlsh script could depend on these basic commands existing.
For example the test cases check the environment for these commands already, it would be nice if they could be relied on.

Friday, May 22, 2009

Wiki Spam

I've finally had to go authoritarian and turn off self registration on the wiki (http://www.xmlsh.org)

At first I noticed some new posts and was excited someone was helping to add content to the site. Then I read closely and discovered it was subtle spam for paid web service unrelated to xmlsh or even xml. I edited out that part leaving in the good part which was added. Then today I discover pure graffiti, a plain web link added to the top of the page.

This "community authoring" model might work for something heavily trafficked like wikipedia, but its not working for me. I beg my friends to add content, but instead strangers login and spam the site. I don't have the time to keep up after it so I've turned it off.

If you would like to add non-spam content or even correct bad spelling, please let me know and I'll gladly register you and send you the login. If you want to add spam you'll have to try a little harder now. Sorry.

Wednesday, May 13, 2009

Working on Documentation

As part of moving into Alpha2 and preparing for Beta, I'm slowly working on the documentation.  This is in wiki format on the main site (http://www.xmlsh.org)

Any comments or suggestions on how to improve the documentation greatly welcome. 
Any volunteers to help with documentation even more welcome !

There's a dual purpose for documention.  First, of course, is to help document things so people can use it.  (even for me, I actually just stumbled on a feature I forgot I implemented).

But the other purpose is to flesh out problems that are not obvious in the test cases or the code, but become obvious when documenting.  For example I just realized (and fixed) that the only named port was "error", I hadnt actually implemented the implicit stdin/stdout as named ports.  I only discovered this while cleaning up the port redirection page.

The problem with the 2nd part is its way too easy to get sidetracked and start working on implementation ... 1 minute of documentation can easily lead to hours of implementation ... and hence thats why the docs are in such bad shape :(

Suggestions on what to focus on and how to avoid getting caught up in implementation welcome.

Friday, May 8, 2009

Alpha 2 released

I think this is a major milestone. I released Alpha 2 today (0.0.2.0).
While these version numbers are somewhat arbitrary they are a mental guide.
With Alpha 2 I have semi-formally "frozen" the syntax and will focus on stability, minor feature enhancements and command enhancements while attempting to not change the syntax or at least not change it in an incompatible way. It would be foolish to promise I wont add new syntax (Thinking of try/catch for example ...) But I am going to try to keep any syntax changes minimal and compatible.

I believe this release is ready for production environments, in controlled situations. At my day job, xmlsh has been running in production for about 9 months so you can be assured that I'm not going to do anything that would break that.