Friday, March 27, 2009

GUI tool for discovering class relationships idea

I work a lot with libraries of code which I did not write myself.
I also have this problem with my own code but more often with sets of "foreign" code.
The problem is that I can end up with an 'impedance mismatch' between types and need to figure out how to match them up. Say I end up with class A but need a class B.
I'm hoping there is some way to convert A to B. A real example recently I had a XdmNode and wanted an XMLStreamReader. In a simple case I would look in A and see if there is a method that returns B, or look in B and see if there is a constructor that takes A. But in the real world its rarely that simple, there are frequently dependency issues and also chains of transformations. Eg. to convert A to B might require a C, or might require going through an intermediary path like A->D->B.
In the above real example, the path ended up being

XdmValue -> ValueRepresentation -> { Value | NodeInfo } -> SequenceIterator -> PullFromIterator -> PullToStax -> XMLStreamReader

Thats 6 levels of transformations ! I have been working in that codebase for several years and still it took some hints from the author to figure it out, and then lots of luck.

This got me thinking ... Wouldn't it be nice if there was a tool for this task ?
There are lots of java class browsers (I use the Package explorer in Eclipse),
but none that I know of that do this job. Given 2 classes, show all the paths between the classes as well as all the dependencies along the paths.

Now to be really useful, the routes need to be appropriate, as well as existent.
For example, in the above example, I first found, and used, a different path from XdmValue to XMLStreamReader that turned out used a class which was not entirely functional (It lost Location information). There's no way I can imagine a tool knowing this short of AI. Which leads to a whole different set of ideas, an AI that can understand software. Something you could ask "Whats the best way to create an XMLStream from an XdmNode in this context". Or maybe "will this code actually work right ?", "Please generate unit tests for this set of code and run them and show me what broke and why".

This seems an awful lot like "Theorm Prover" software that in the 70's and 80's was being worked on agressively, but I havent heard much of since.

Sunday, March 8, 2009

Text serialization

I'm working at adding more internal interfaces to the IPort hierarchy, which is used by OutputPort and InputPort (and hence IOEnvironment).
I'm adding a StAX interface in preparation for a true binary event based pipe.
As I add more options for interfaces the problem of both interop and serialization get worse. I'll leave Interop out for now.
But for Serialization ... I'd like a common way of specifying text serialization options. Right now its prety much left up to the individual commands, but thats just not right.
For example xpwd, xls, xcat, xquery etc may use different internal interfaces (SAX , Saxon, DOM , StAX etc). If they end up in a text file or stdout (text) then the serialization may differ. For example they may or may not omit the xml declaration, use indentation, do namespace fixups etc. Its all pretty hog-wild.
I've tried to standardize on a common serialization format but its not manageable in the long run. Users really need a way to force the serialization options explicitly, both on a command basis and globally.

When I first started xmlsh I imagined some kind of common "output filter",
either explicitly as a pipe like
command | xformat -options

then maybe implicitly as a common filter on the output of all commands.
Lately I've been thinking about building this in deeper as a set of properties that are inherited via the environment variables. a "serialization" property set.
xproc has a well defined set of these paramateres, borrowed from xquery.
There's a lot to these. The problem is related to the multiple interfaces.
Not all interfaces and all API's support all serialization parameters.
A simple example, StAX doesnt have a property to avoid writing out the xml declaration, although you can fake it by avoiding writeStartDocument(), but there's no global way to set it. Similarly StAX doesnt have a "indentation" property, although you can fake it with a filter. SAX, DOM and Saxon all have different sets of properties they can support (Saxon being the most rich, or atleast closest aligned with xquery and xslt).

So how to implement this ?
At the user level I think an XSERIALIZATION variable makes sense, this can be inherited and overwritten for child process/commands.
But internally ... I have yet to figure out a way to consistently apply this property. It may be I have to filter all output through a serialization pipe/stream ... which adds unnecessary performance problems in the cases where its unneeded.

Ideas very welcome !