Saturday, January 17, 2009

URI for CWD ?

A few weeks ago I added the feature where URI's can be used in place of files everywhere that filenames are used for input. This works for all internal and builtin commands as well as in IO redirection.

For example this works because IO redirection is done within xmlsh

cat <

This works because xcat is an xmlsh command


But this does not work (because cat is not an xmlsh program)


I added this both to support easy access to web data, as well as to be able to track the base URI mainly for xproc support. Base URI support is useful not only for xproc but for expanded entities, such as in the following case

xcat <> and xml oriented commands (xquery, xed, xslt) can work correctly with a default namespace. But what about a default base URI ?
So I could do something like

declare base-uri
xcat books.xml

But now there is a conflict between the base-uri and the current directory.
How does the shell know to pull books.xml from the web not from the filesystem ? Once you set a base URI you cant get at files anymore.
This got me thinking more ... what is the base-uri except for the current directory ? What if they were the same. If you could "cd" to a web address, for example

cat books.xml

ftp could work too

This would actually be prety easy to implement. And maybe useful ?
But the side effects could be weird. Questions arise if I did this :

What would * expand to ? ( echo *)
What do I set for the current directory to external files ?
How would xls work ? ( I experimented with ftp directories and they may be parsable,
but not most http direcotries).


  1. So 'cat' takes a file name that must be open()able. I don't think UNIX utils have any great way of reporting which of their arguments are expected to be files. I think this discussion goes beyond the shell itself, don't you? XML/OS in 2010!

    Of course, you COULD always have a means of determining (a command-line prefix, environment variable, or a table of magic knowledge) which commands are external and expecting open()able files, then find all the URLs in the command line, open sockets, and then replace them in argv with '/dev/fd/%d' (for the fd associated with the socket pointed at the web server, ready to receive).

    But that seems a little silly.

    I think the listing problem can be resolved somewhat easily, if HTTP (or some standard we care about... *handwave* WebDAV?) doesn't offer any kind of standard means of listing resources, then consider this case:

    $ cd /tmp
    $ mkdir monkey-mojo
    $ ls -ld monkey-mojo
    drwxr-xr-x 2 chriscos chriscos 4096 Jan 22 00:03 monkey-mojo
    $ touch monkey-mojo/{f1,f2,f3,f4}
    $ ls monkey-mojo/
    f1 f2 f3 f4
    $ chmod a-r monkey-mojo
    $ cd monkey-mojo
    $ ls
    ls: .: Permission denied
    $ ls f1
    $ ls *
    ls: *: No such file or directory

    You don't have read access to the directory, but you do to the individual files, so being able to get a listing is not necessary and you can still use the 'base URI' concept for convenience.

    Btw, have you looked into KDE's 'ioslave' model at all?

  2. Thanks for the ideas !
    as for tracking what gets sent to external commands, I'm not going to do that :)

    The comment about it not being a requirement to be able to read the current directory is good, you are totally right, its not necessary.

    Now for running external commands I still have to set the CWD to SOMETHING ... so if you do


    that would have to print something ... a CWD is not optional on either unix or windows.
    (although interestingly it doesnt exist at all on some platforms like Palm/OS and Windows Mobile).

    I think thats solvable by mainting 2 CWD's ...

  3. Don't you think that might get a bit confusing?

    "I can xquery this file, but I can't grep it, dammit!"

    "What's your cwd?"


    "What's your OTHER cwd?"

    I'm betting that for Linux (and probably for Windows) there's some file system that allows you to mount HTTP resources that might solve this a bit more easily (except that it's no longer self-contained in your app/framework).

  4. You are absolutely right, this belongs at the OS layer. And I'm excited to hear your volutneering to write a Web FS for Linux, Mac/OS and Win32 ! Thats great!

    If you read the Philosophy page for xmlsh ( ) you will see that I hint that a goal of this project is to experiment with ideas for a whole new OS.

    "Ferris" seems to be a project that has similar goals of a filesytsem.
    I have not deeply investigated this yet.

    As for "confusing" ... it is indeed confusing. Similarly today in xmlsh you can do

    cat <

    but not


    Is this confusing ? Yes. Is it bad ?
    I dont know ... I sorta think not. The alternative is to remove the first feature then they are consistant (both dont work).

    The difference between what the "shell" does and what the "commands" do and what the "os" does has always been a confusing topic in the unix shells. (in ALL OS shells actually). In the Ideal OS and Shell it wouldnt matter.

  5. Redirection expressions and file name expressions are different in many shells.

    (This is from memory and the syntax may not be exactly right. See 'man ksh' if you don't already know what I mean.)

    exec 4 <& /etc/motd

    cat 1<&4 -- works, or something like it does

    cat &4 -- don't.

    Sure, I'll get on WebFS Everywhere (TM), just as soon as I finish porting CLR to those kernels.

  6. Redirection is different in ways then the CWD. the CWD has a meaning in most OS's which is explictly the "current directory" in the process environment. I can override this meaning in my own universe (xmlsh) but when calling out to external processes I cannot. At best I can set the CWD to something they would understand.

    Lacking a Web filesystem that the OS understands, my hands are tied. This means if I used a CWD that could be a web address that xmlsh worked with, external processes would not "get it" ... but that may not be too bad.

    Analogy ... when you use IE and "browse" to a web page, then run an external command ... your CWD is typically the "Home" directory of your system ... there is no fundimental guarentee that when switching out of a process context that the CWD makes sense as you cross contexts. Its really nice if it does, but its not a given.

    With that in line, I've solved the "forking" problem a different way for now. There are now "Ports" (soon to be "Named Ports") which store content in variables. You can redirect stdin/out/err to these ports.


    xread doc < file.xml

    xquery / <{doc}

    This is equivilent to
    xquery -i $doc /

    Its an interesting difference because this works across user commands and scripts so that you can do
    my_script <{doc}

    which is equivilent to
    echo $doc | my_script

    but much more efficient. The port redirect is *direct* as long as the receiving side accepts the type of the port. No serialization or parsing is performed. The XML tree is passed as is to the stdin of the pipeline.

    Similarly you can pipe OUT to a port using
    xls >{doc}

    now the variable "doc" contains the output of xls. This is equivilent similarly to the input case of
    xls | xread doc

    but is more efficient because the data does not have to "pipe" it is directly sent to the variable.

    Ultimately piping itself may be optimized to be able to perform similarly but for now these ports are the start of what is needed to support xproc's wired use of "streams" (quoted because they dont behave at all like "streams").

  7. Nice post, I was looking for it I can tell that you are a professional at the field! I’m launching an internet site soon & your data will be very ideal for me. Thanks for all of your help & wish you all the success.


Due to comment spam, moderation is turned on. I will approve all non-spam comments.