Monday, June 15, 2009

Shifting sequences

My other foray into trying to bring functional parity between positional parameters and sequence variables is "shift". I'd like a shift operator that operates on a sequence variable similar to how shift works on positional parameters. Something like perl's shift.

example

a=(foo bar spam)
shift a

now a is (bar spam)

Not sure if I should do this though. It is the equivilent of this not-too-ugly (but not too obvious either)

a=<[ $a[position() > 1] ]>

Expanding sequences into positional parameters.

I'm debauting over a syntax to convert a sequence into positional parameters.

The problem is this. Suppose I have a sequence expression, say a variable
a=(a b c d e)

then want to pass that sequence as separate parameters to a command. Using $a does NOT expand the sequence. For example
command $a

passes 1 argument of type sequence (unless command is an external program in which case the sequence is converted to args).

this is typically desirable. Sequence expressions should be able to be passed to commands without "flatening". Particularly useful if you want to pass multiple sequences for example as say xquery or xslt parameters
xquery -q ... -v value1 $a -v value2 $b

a and b can be sequence values. But suppose I actually want a sequence to be flattened. I haven't figured out a clean way to do it in the current syntax. Posix shells don't help with this as they don't have sequences. Bash has arrays which are similar but you cant pass an array around to a command.

Eval can do this if the sequence contains simple unquoted strings like above
eval set $a
sets $1,2,3 to a,b,c BUT

a=<[ <foo/> , 2 , <bar>spam</bar> ]>
eval set $a

is a syntax error.
I believe this works but haven't tried it in all cases
set --
for arg in $a ; do
set -- "$@" $arg
done

ugly though !


I'm considering the "all array" syntax to do this like ${a[*]}
which would instead of returning all elements of the sequence as a single sequence value, returning them as multiple positional values similar to $*

That might do the trick but I'm not sure I like it. The odd thing is that this feature is only really useful when in the context of command argument expansion.
e.g
b=${a[*]}

would do nothing different then
b=$a

This is a bit subtle but the difference is that single variables can only hold 1 level of sequence whereas positional parameters hold 2 levels. That is "$*" is a List of Values where each value can be a sequence.

also considering both a prefix and suffix modifier such as ${+a} or ${a:+}

Of course none of these syntax would be obvious at all what they do until you learn them !

Wednesday, June 10, 2009

Limited background execution

I'm preparing test cases for the upcoming Balisage 2009 conference.

I'm re-encountering an age-old scripting problem/hack/idea.
Typical scripting languages (say unix shell scripts or CMD or other scripting languages), if they support "background execution" is a very limited form.
Take the unix shells for example. You can run a program "in the background" (as a non-blocking process) with "&". On Windows (cmd.exe) you can use "start".
Xmlsh is the same as unix shells, you use "&". Thats all great when your doing simple things. & + wait == job control. Great. But a common problem is that if your doing N background processes and N is large or unknown its hard to control. I've written shell scripts that can handle "N" (say 10) background processes and block if you try to do N+1 but its very clumbsy scripting, not something I recommend. And very special use case. but in todays multi-core world, being able to run N background processes (or threads) is very useful, *extremely* useful if you can arrange for N not to become "huge".

take a simple XML example. Suppose I have a directory of files and I want to run xslt on them all.

xmlsh script:

for i in *.xml ; do
xslt -s script.xslt < $i > output_dir/$i
done

But suppose I'm running on a 4 or 8 core CPU and this is a CPU bound process. I'd like to run these in some kind of parallelism.
This will work ...

for i in *.xml ; do
xslt -s script.xslt < $i > output_dir/$i &
done
wait

But if the number of files becomes large (say > 10) then this so greatly thrashes the system that not only does it slow down, but it risks eating up the system memory such that you cant guarentee completeness.

If you were writing a enterprise type product you could use a Thread Pool or Worker Thread or Worker Process model, and then use a queuing system and send requests into the queue, and have say 10 worker threads/processes doing the work from the queue. This may actually be implementable in the ongoing experimental http server module in xmlsh but suppose I wanted something less elegent but nearly as useful.

Imagine a syntax that says "Start a background thread but ONLY if there are < N outstanding background threads".

I can imagine a very simple syntax like maybe this

for i in *.xml ; do
xslt -s script.xslt < $i > output_dir/$i &[10]
done
wait

This would mean "start up to 10 concurrent xslt threads but no more"

It may not be quite as efficient as a worker pool but the syntax and implementation could be very simple. My estimation is that the end result would be nearly as efficient.