cgp at star.le.ac.uk
Tue Jan 20 08:24:15 PST 2004
While we have the attention of the XML experts, can I raise a couple of
other questions that have disturbed me recently.
1. An XML document is a *file* right? It's natural home is on disc. But
as far as I can see any program which ingests one has to parse the whole
thing, and thus read the entire contents into memory - is this correct?
The FITS format was originally a tape format, as it happens, but it now
seems like its natural home is also on disc. All FITS libraries that I
have used allow you to parse just the headers of each HDU, and then decide
whether you want to read all or part or even none of the data unit.
Since we want our system to cope with data files (whatever format they
use) up to sizes that may be larger than available memory, this selective
parsing seems a very good feature. Is there any equivalent in the XML
world, except the expedient (or bodge as it seems to me) of splitting the
metadata and data into separate files?
2. I have seen a couple of complaints that when you wrap an XML document
in SOAP wrappers, the SOAP processor parses the entire file, which causes
it to run slowly, or out of memory, or both. My analogy is with internet
mail: it would be ridiculous for every computer and router which handles
this mail message to interpret all its contents, when all that is really
needed is for it to interpret the *envelope* of the message and pass the
rest along unread. Our current solution for SOAP-wrapped XML seems to be
to obfuscate the payload (e.g. with gzip). Again this seems just another
bodge. Is there really no better solution?
Dept of Physics & Astronomy,
University of Leicester, Tel +44 116 252 3551
Leicester, LE1 7RH, U.K. Fax +44 116 252 3311
More information about the votable