DM3 Notes by Norman Gray - Data Modelling, session 3. Friday 28 May - Arnold Rots, STC - STC metadata appears in four contexts: STCResourceProfile, SearchLocation, CatalogEntryLocation, ObsDataLocation. - Three components: CoordinateSystem, CoordinateArea (volume), Coordinates (point). Need pointers to a particular coordinate system - Coordintes contain 6 components with a comon unit: Name (UCD?), Value, Error, Resolution, Size (depends on context: means size of `Area' in that context; size of a typical field in a Resource record; concedes that this needs more working out), Sampling (pixel) size. All may be values or IDREFs. - Working draft currently at 0.51, with description and UML representation. - New schemas, version 2.0 - Generalised CoordSys and Coords, units must be homogeneous (ie, can't specify positions in degrees and sizes in arcsecs) - Examples. Contains any number of Frames of any kind, but AstroCoordSystem contains a specific set. UML diagrams. - Region: A region may be one of several shapes (circle, ellipse, polygon, sector, convex, convex hull), or the result of negation, intersection, union on other regions. - The XML fragment shown is intended to be returned from a Resource, so it defines a sort of schema for the information which would be returned by that resource. - Coverage: shows what's in an archive -- information like bounds and coverage fractions - Other examples seem to be focused on providing information about a resource: ``if you get coordinates from this resource they will have this accuracy''. - Discussion about Resource vs VOTable - Arnold sees this STC schema being imported into VOTable - Fran=C3=A7ois Ochsenbein is nervous about that -- STC is = complex, and possibly too complex for VOTable's needs - Ray Plante emphasised that the Resource schema needs this now, so shouldn't be delayed - Norman Gray mentioned that AST has done work on reading and writing STC, and mentioned David Berry's Mapping proposal to the DM group. Some consensus that Coordinate description and Mappings need to be kept orthogonal. - Discussion - Fran=C3=A7ois Bonnarel): profiling for DAL? - Fran=C3=A7ois said he had the same response as for the previous topic - Ray mentioned that the complexity added complication and cost to the data provider. Simple cases could be denoted by a URI. - Arnold: bad to take the attitude that ``all the data from this repository has the same properties, so we don't need to be explicit''. Returned to the point that a user interface might do a lot of the work of adding boilerplate and defaults, but that this should not be contained in the STC schema. - Anita: might be hard to get data providers to produce correct and full details for inclusion in these Resource descriptions. Arnold/Tony Linde: general agreement that we need to provide user interfaces which help. - `Doug Mink: Analogy with Mark Calabretta's FITS-WCS library. He and Starlink both added layers round this. - Jonathan summary: there are clearly issues with a future version of the STC schema, but the current version should be pushed forward. We need software to make this usable, and a number of example common cases (VOTable). Need implementors. AST is doing that, Ray says they'll do something with the Registry. Asked Arnold to incorporate comments, and make it an IVOA Working Draft for general comment. - Gerard Lemson - Goal was to find a place for seemingly disparate sub-areas in DM: quantity, observation, simulation, phenomenology, proposal. - Model has a different purpose to other models discussed here. Intended as an analysis model, as part of a structured software methodology. Aim to understand the ``universe of discourse'', or problem domain. - Benefits: patterns for design and implementation models, formalise comparability, use it as a type of esperanto (cut down the n x n problem, which scales poorly). Similar to an ontology, but not fully -- step towards it, - Methodology - O-O analysis - Universe of discourse: list the concepts and relations.=20 Chose UML (could have chosen other languages, but not XML).=20 Normalised, explicit, and able to generate patterns. - Talk about ``the work that astronomers, astrophysicists and support scientists do, and the results they have obtained'', and do this in a rigourous manner. - Long list of concepts -- list of nouns - Analysis pattern: Observation and Measurement (Martin Fowler, Analysis Patterns, AW 1997). This text has an example of a Measurement/quantity, which is broadly similar to the WD/Brian version. Showed NIST definitions of units/quantities/measurement. - Domain model: packages=20 - Phenomenology: things are comparable in principle if they correspond to the same phenomenon; thus magnitudes and fluxes are the same phenomenon -- and thus comparable -- but need conversion before they're directly comparable - Standard - Types - Values: what experiments produce - Experiments - Protocols: descriptions of procedures, such as a simulator code (implements Protocols::Protocol), Protocols::Objective might be ``create a source catalogue''. - Products: physical artefacts, such as files, which actually store the results - Physics - [These are potential classes, but is not necessarily advocating mapping these directly to actual classes] - Binding -- what you'd do with this - XMLSchema: formalise bindings as XSLT working on XMI representation of UML diagrams. Or Java + Hibernate mapping files + AXIS. Wants to create a metadata repository. Can use Java2WSDL, and Hibernate takes care of persisting classes to a database. - This is indeed a complex model. True, but the world is complex, and we can simplify this with something analogous to the `view', as in the RDB world -- prepackaged SQL query. You can do something similar with models. - Questions: Jonathan: What's the path forward? Make XSLT to translate to and from this domain model? - Gerard: has some plans to do this, as proof of concept. Wants to go in the ontology direction. - Brian and Tony Linde agree that ontologies are useful. Aim for OWL. General discussion about RDF. - Ray: Good to use XSL for transformations. How do you handle information loss when you go between models? - It shows up -- at that point you know what further information to look for - Tamas: impressed: this agrees with his experience of the Sloan - Fran=C3=A7ois: in translation from a specific data model to this one using XSLT, what do you with details that you can't map to this? - Lots of details aren't filled in. For example Provenance doesn't have a simple counterpart in Gerard's model - Martin Hill: O-O design in 5 minutes - Gather requirements, do the design (components, iteractions, relationships), factor out, test, reiterate forever. Example of designing classes for Apple, Orange, deduce superclass Fruit.=20 Similarly model Passband. - Tricks: separate interface from submodel, model one thing at a time, move context-dependent data to the context, ignore representations, test with examples, use design patterns (composite, listener/observer, streams), keep diagrams simple, use activity diagrams. - Peter Lamb: are passbands enough for what we want to model? - This was just an example - David Giaretta: what do you see as the goal of modelling? - Want to be able to share data with a common vocabulary