XMLSH: URI for CWD ?

Saturday, January 17, 2009

URI for CWD ?

A few weeks ago I added the feature where URI's can be used in place of files everywhere that filenames are used for input. This works for all internal and builtin commands as well as in IO redirection.

For example this works because IO redirection is done within xmlsh

cat < http://test.xlmsh.org/data/books.xml

This works because xcat is an xmlsh command

xcat http://test.xlmsh.org/data/books.xml

But this does not work (because cat is not an xmlsh program)

cat http://test.xlmsh.org/data/books.xml

I added this both to support easy access to web data, as well as to be able to track the base URI mainly for xproc support. Base URI support is useful not only for xproc but for expanded entities, such as in the following case

xcat <> and xml oriented commands (xquery, xed, xslt) can work correctly with a default namespace. But what about a default base URI ?
So I could do something like

declare base-uri http://test.xmlsh.org/data
xcat books.xml

But now there is a conflict between the base-uri and the current directory.
How does the shell know to pull books.xml from the web not from the filesystem ? Once you set a base URI you cant get at files anymore.
This got me thinking more ... what is the base-uri except for the current directory ? What if they were the same. If you could "cd" to a web address, for example

cd http://test.xmlsh.org/data
cat books.xml

ftp could work too
cd ftp://test.xmlsh.org/data

This would actually be prety easy to implement. And maybe useful ?
But the side effects could be weird. Questions arise if I did this :

What would * expand to ? ( echo *)
What do I set for the current directory to external files ?
How would xls work ? ( I experimented with ftp directories and they may be parsable,
but not most http direcotries).

7 comments:

ChrisJanuary 21, 2009 at 10:16 PM
So 'cat' takes a file name that must be open()able. I don't think UNIX utils have any great way of reporting which of their arguments are expected to be files. I think this discussion goes beyond the shell itself, don't you? XML/OS in 2010!

Of course, you COULD always have a means of determining (a command-line prefix, environment variable, or a table of magic knowledge) which commands are external and expecting open()able files, then find all the URLs in the command line, open sockets, and then replace them in argv with '/dev/fd/%d' (for the fd associated with the socket pointed at the web server, ready to receive).

But that seems a little silly.

I think the listing problem can be resolved somewhat easily, if HTTP (or some standard we care about... *handwave* WebDAV?) doesn't offer any kind of standard means of listing resources, then consider this case:

$ cd /tmp
$ mkdir monkey-mojo
$ ls -ld monkey-mojo
drwxr-xr-x 2 chriscos chriscos 4096 Jan 22 00:03 monkey-mojo
$ touch monkey-mojo/{f1,f2,f3,f4}
$ ls monkey-mojo/
f1 f2 f3 f4
$ chmod a-r monkey-mojo
$ cd monkey-mojo
$ ls
ls: .: Permission denied
$ ls f1
f1
$ ls *
ls: *: No such file or directory

You don't have read access to the directory, but you do to the individual files, so being able to get a listing is not necessary and you can still use the 'base URI' concept for convenience.

Btw, have you looked into KDE's 'ioslave' model at all?
ReplyDelete
Replies
UnknownJanuary 22, 2009 at 3:54 AM
Thanks for the ideas !
as for tracking what gets sent to external commands, I'm not going to do that :)

The comment about it not being a requirement to be able to read the current directory is good, you are totally right, its not necessary.

Now for running external commands I still have to set the CWD to SOMETHING ... so if you do

cd http://test.xmlsh.org
/bin/pwd

that would have to print something ... a CWD is not optional on either unix or windows.
(although interestingly it doesnt exist at all on some platforms like Palm/OS and Windows Mobile).

I think thats solvable by mainting 2 CWD's ...
ReplyDelete
Replies
ChrisJanuary 22, 2009 at 9:24 AM
Don't you think that might get a bit confusing?

"I can xquery this file, but I can't grep it, dammit!"

"What's your cwd?"

"http://foo.example.com/"

"What's your OTHER cwd?"

I'm betting that for Linux (and probably for Windows) there's some file system that allows you to mount HTTP resources that might solve this a bit more easily (except that it's no longer self-contained in your app/framework).
ReplyDelete
Replies
UnknownJanuary 24, 2009 at 8:44 AM
You are absolutely right, this belongs at the OS layer. And I'm excited to hear your volutneering to write a Web FS for Linux, Mac/OS and Win32 ! Thats great!

If you read the Philosophy page for xmlsh ( http://www.xmlsh.org/Philosophy ) you will see that I hint that a goal of this project is to experiment with ideas for a whole new OS.

"Ferris" seems to be a project that has similar goals of a filesytsem. http://www.libferris.com/
I have not deeply investigated this yet.

As for "confusing" ... it is indeed confusing. Similarly today in xmlsh you can do

cat < http://test.xmlsh.org/

but not

cat http://test.xmlsh.org/

Is this confusing ? Yes. Is it bad ?
I dont know ... I sorta think not. The alternative is to remove the first feature then they are consistant (both dont work).

The difference between what the "shell" does and what the "commands" do and what the "os" does has always been a confusing topic in the unix shells. (in ALL OS shells actually). In the Ideal OS and Shell it wouldnt matter.
ReplyDelete
Replies
ChrisJanuary 24, 2009 at 12:03 PM
Redirection expressions and file name expressions are different in many shells.

(This is from memory and the syntax may not be exactly right. See 'man ksh' if you don't already know what I mean.)

exec 4 <& /etc/motd

cat 1<&4 -- works, or something like it does

cat &4 -- don't.

Sure, I'll get on WebFS Everywhere (TM), just as soon as I finish porting CLR to those kernels.
ReplyDelete
Replies
UnknownFebruary 2, 2009 at 10:56 PM
Redirection is different in ways then the CWD. the CWD has a meaning in most OS's which is explictly the "current directory" in the process environment. I can override this meaning in my own universe (xmlsh) but when calling out to external processes I cannot. At best I can set the CWD to something they would understand.

Lacking a Web filesystem that the OS understands, my hands are tied. This means if I used a CWD that could be a web address that xmlsh worked with, external processes would not "get it" ... but that may not be too bad.

Analogy ... when you use IE and "browse" to a web page, then run an external command ... your CWD is typically the "Home" directory of your system ... there is no fundimental guarentee that when switching out of a process context that the CWD makes sense as you cross contexts. Its really nice if it does, but its not a given.

With that in line, I've solved the "forking" problem a different way for now. There are now "Ports" (soon to be "Named Ports") which store content in variables. You can redirect stdin/out/err to these ports.

Example

xread doc < file.xml

xquery / <{doc}

This is equivilent to
xquery -i $doc /

Its an interesting difference because this works across user commands and scripts so that you can do
my_script <{doc}

which is equivilent to
echo $doc | my_script

but much more efficient. The port redirect is *direct* as long as the receiving side accepts the type of the port. No serialization or parsing is performed. The XML tree is passed as is to the stdin of the pipeline.

Similarly you can pipe OUT to a port using
xls >{doc}

now the variable "doc" contains the output of xls. This is equivilent similarly to the input case of
xls | xread doc

but is more efficient because the data does not have to "pipe" it is directly sent to the variable.

Ultimately piping itself may be optimized to be able to perform similarly but for now these ports are the start of what is needed to support xproc's wired use of "streams" (quoted because they dont behave at all like "streams").
ReplyDelete
Replies
CharleneJanuary 9, 2012 at 1:55 AM
Nice post, I was looking for it I can tell that you are a professional at the field! I’m launching an internet site soon & your data will be very ideal for me. Thanks for all of your help & wish you all the success.
ReplyDelete
Replies

Add comment

Due to comment spam, moderation is turned on. I will approve all non-spam comments.