Wednesday, April 15, 2009

XML Event Pipeline - Initial Results

At long last I got the code in shape so that I could implement "native" piping in xmlsh using a binary event queue instead of serializing to text. From the beginning this has been my goal but its been more difficult then I thought. A primary complication is that not all commands input or output XML, unlike say XProc which can only have XML in the pipes, xmlsh requires working with text streams.
For example "echo foo | cat" should work just as well as "xcat foo | xcat".

I finally got it to work. Truly event driven pipes that can stream both text and XML events (StAX events). All the tests finally passed and I was feeling really good, until I ran performance tests. Ug. The event pipes are about 2x slower then text pipes ! Including the overhead of serializing and parsing the text format ! This is totally shocking to me as I had always presumed that the majority of the overhead would be in serializing and parsing XML text. But nope, turns out that the overhead of creating the event objects to stick in the pipe are less efficient then serializing to text then parsing the text on the other end. Certainly this is a consequence of a particular implementation, not a general statement. But still, its not a result I expected and performance analysis doesn't have any huge smoking guns. The biggest issue seems to be in the StAX event creation. I didn't fully realize until this that the XMLEvent object is non-symetric. That is, when writing events you don't write the same events as you receive. You write out StartElement , Namespace , Attribute, but when you read events they are consolidated into one StartElement event. The overhead during write of consolidating these events turns out to be much bigger then serializing them to text and re-parsing the text.

I'm going to put this aside for a while, whilst I contemplate the error of my thinking. For now if you want to experiment with the XML pipeline, the code is checked in, and you can enable them by setting the "XPIPE" environment variable. I will probably change that in the future.

No comments:

Post a Comment

Due to comment spam, moderation is turned on. I will approve all non-spam comments.