## Tuesday, January 20, 2009

### Quoting is hard !

After all these years and I never really appreciated how hard quoting is. I dont mean the simple echo "foo bar" quoting, but all the nuances of blackslash, single quote, variable expansion, wild card expansion etc. I'm getting close in xmlsh but not quite there. I think to go the last 10% I'm going to have to rewrite the entire word expansion module. A complexity is that backslash quotes have to be recognized, and sometimes stripped out, but their effect is longlasting.

Consider this.

echo \*

Seemingly simple ... but when this is split up into the many types of expansion, wild card expansion actually occurs last, AFTER removing backslashes and then doing variable substitution.
So this can work

a=*
echo $a variable substitution has to occur first ... which means backslash recognition has to occur before that ... so THIS can work a=* echo \$a

It gets a lot harder then that. the difference between "foo\bar" "foo\\bar" "foo\\\bar" is strange enough ... but add in '' and $and then xml expressions and wildcards and it really gets strange. I think I'm going to have to shift over to a "color" model ... that is, during the expansion keep track on a char by char basis the quote "color" for each and every charactor individually. By color I mean \ vs " vs ' ... any given charactor could be colored by one or more of these attributes simultaneously and it effects further processing ... even when the offending quote chars are removed. #### 6 comments: 1. These last 3 weeks I've tried several attempts at rewriting the quoting logic and failed miserably. In the end I settled for incremental improvements and added additional tests along the way. So far all the tests pass (dont they always!) but I'm still not happy. I will probably rewrite this module sometime. But for now ... ce' la ve'. 2. So what are your test cases? It seems fairly simple, and I can't imagine the 'coloring' needs to go too deeply. To borrow from, let's say, Perl: '"' -> " "'" -> ' "\"" -> " \' -> ' \" -> " '\"' -> \" [single quoted strings do not have escape sequences expanded] To get more complex: '"\n"' -> "\n" '"\\"' -> "\\" "\\'" -> \' '''''' -> ''''' -> unterminated literal. probably ends up looking like " quit exit :wq! ^C" ;) Are you excited about ridding your environment of escape sequences and relying solely on XML entities? 3. Test cases for shells (xmlsh/sh/bsh/ksh) are a lot more complex then most languges. Your above examples are prety easy. The difficulty comes in mainly with variable expansion and wildcard "globbing". Those really screw things up. Especialy wildcard globbing. This one is really tricky because quotes to be removed AFTER variable expansion but before globbing, and globbing only occurs on unquoted values. I have to preseve where the quotes came from somehow in order to know when to expand "*" a=* echo$a # expands *
echo "$a" # does NOT expand * echo *$a # expands like *'*'
echo *'*' # matches foo* but not foo

The quotes have to be removed BEFORE globbing but the globbing has to know what was quoted and what was not.

Perl doesnt have this problem. Shells do ... its ugly.

Its hard to explain well until you really dig through the variations.

You can see the test cases that work in the tests/core/core_quotes.xsh script

4. If your REALLY interested, take a look at Expander.java and let me know what you think.

If you can improve or simplify it and it still passes all the test cases, please do send it and I'll check it out !

If you just have horrible things to say about how ugly the code is, post that as well.

5. I haven't forgotten about this, but I've got quite a bit to install on this new machine it seems... but it is open in Eclipse now. I forgot how fun parsing stuff like this is!

I'll install a build of it on my work laptop tomorrow and get to playing with it throughout the week as I have time. Looking at it, I might have a couple ideas, but they are certainly about 1/16th baked at this point.

6. Found and fixed yet another quoting bug.

"$*" was expanding as "$@" should and adding an extra empty argument.
e.g.

set a b c
cmd "\$*"

was passing
"a" "b" "c" ""