Saturday, May 15, 2010

Functions and more functions

Thanks to some great discussions with Dave Pawson, I'm considering extending the syntax for function calls and the possible return values for functions, scripts, and commands.

Right now a command, script, or function can only "return" an integer, often known as the "Exit Status". Historically this is because C programs (and most OS programs) "exit" or "return" with a single integer value. On older systems this was actually limited to a byte. 0 means success, non-zero means fail.

So things have been for 40 years ...

But with xmlsh its an artificial limitation. Why make functions and scripts limited to int return values ? I cant do anything about external programs, but functions and scripts *could* return any XDM type. Why not let them ?

After much discussion I've concluded that the only thing in the way is a syntax to capture the return value which doesn't break compatibility with capturing the stdout.

The typical syntax for capturing the "output" of a command/script/function is $(cmd).
This captures the stdout of the command and converts it to a string.
XMLSH has extended this a bit with $<(cmd) for XML types and cmd >{var} as a synonym. But both extensions still only capture the stdout.
The only way to get the return value is either using $? or using the function in a boolean context like

if command ; then

In which case the exit status of the command is interpreted as a boolean (0=success).

But why not let functions (and possibly scripts and internal commands) "return" non-integer values ?

Like this

function concat()
{
return "${1}${2}"
}

What are the implications ?

For one, $? could be any type not just integer. I dont foresee any big problems with that.
e.g.
concat foo bar
a=$?

produces "foobar" in $a

Existing code that assumed functions returned itegers would still work as long as those functions or commands were not changed.

What about boolean context ?

if concat foo bar ; then ...

I'd need to define some compatible conversions from any XDM type to boolean.
These couldnt *quite* be the same as the xquery conversion to xs:boolean because I'd want 0=true and integer != 0 false.
But other then that most conversions could be fairly obvious
() => false
non-empty sequence => true
xs:integer => see above (0=true)
"" => false
non empty string => true
node/element/document => true

should be OK.


But why do all this ? One more step is desired. The ability to call functions and make them "look like function calls" ; instead of relying on $?

why not support something like
concat( foo , bar )

This could then nest
concat( concat( foo , bar ) , spam )

This syntax func(...) would evaluate to the return value of the function. The stdout would still go to the parents stdout so you could pipe through them

func(a) | func2(b)

Then there are some details about globbing, and argument lists. If I support the "," as an argument seperator (instead of space for command invoction) then this brings some interesting side effects

foo( *.xml ) => $1 becomes a sequence of xs:string
foo( *.xml , *.c ) => $1 , $2 both sequences

and so on ... this could have some really interesting side effects.
Consider if commands as well as functions could be called this way. Why not ?

cat( foo.xml , bar.txt )

as syntatically equivilent to

cat foo.xml bar.xml

This would allow more "programming like" language style to be used. It also opens the doors for function signatures (or command signatures) for static or dynamic type checking and variable assignment.

For example, suppose you'd write a function now like

function copy() {
from=$1
to=$2
cp $from $to
}


Adding a signature could be closer to xquery or java or C

function copy( from , to )
{
cp $from $to
}

Then callers of copy could produce an error automatically if they provided the wrong number of args

copy a b c # -> runtime error by interpreter

could add to that types eventually

function copy( from as xs:string , to as xs:string )
{ ... }


then the next obvious step is return types as part of the signature



Anyway something to think about ...
I suspect I'm going to try the first step and see what breaks. Let return take any type , and let $? take on any type.

Lets see if that breaks anything first ...

Comments welcome !

2 comments:

  1. I've implemented the beginnings of a function call expression syntax for functions only.

    $ function foo() {
    return <[ <xml>{$_1}</xml> ]>
    }
    $
    $ echo foo(bar)
    <xml>bar</xml>
    $ echo foo( foo( spam) )
    <xml>
    <xml>spam</xml>
    </xml>
    $ a=foo(bar)
    $ echo foo($a)
    <xml>
    <xml>bar</xml>
    </xml>

    ReplyDelete

Due to comment spam, moderation is turned on. I will approve all non-spam comments.