From roy at cacr.caltech.edu Thu Oct 2 18:09:45 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Thu, 2 Oct 2003 18:09:45 -0700 Subject: UCD document 1.9.4 for IVOA Interop Message-ID: <053601c3894b$08d03080$6b91d783@cacr.caltech.edu> Welcome back to the UCD Forum at the IVOA. This message is to present the latest draft (1.9.4) of a paper that the UCD Steering Committee has been working on over the last few weeks (especially Derriere, Mann, McDowell, Ochsenbein, Osuna, and myself), and to request that you read it in preparation for the UCD session of the IVOA Interop meeting two weeks from today. This is the time to make your opinions known. Please try to generate edits to the document in the form of replacement paragraphs, addtional paragraphs or sentences. This will make your view much more likely to be incorporated than if you send comments or opinions. After your revisions, we would like the paper to be labelled version 2.0, placed into the IVOA standards process as a definition of the new UCD structure (known as UCD2). This new structure has much in common with what was discussed at the last Interop at Cambridge, but it has significant change, to bring the UCD in line with semantic web language. The idea from Cambridge (base+specifiers) has been replaced by (property+concepts). The document is here: http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.4.pdf The UCD Twiki is at: http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD Thank You Roy Williams Chair, UCD Steering Committee -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From rgm at roe.ac.uk Thu Oct 9 01:54:45 2003 From: rgm at roe.ac.uk (Bob Mann) Date: Thu, 9 Oct 2003 09:54:45 +0100 (BST) Subject: UCD document 1.9.4 for IVOA Interop In-Reply-To: <053601c3894b$08d03080$6b91d783@cacr.caltech.edu> Message-ID: Hi folks, I have only a few comments on v1.9.4 - I think it's a great improvement on the previous version, so Roy and Sebastien are to be congratulated. In particular, Section 3.1 is very clear and lays out the path to UCD3 very well. My first comment picks up something from the discussion leading up to this draft. It was said that we should emphasise that UCD2s are about discovering relevant data, not using them, as that requires more expressive power. If people still believe that, can a sentence to that effect be added in the introductory stuff?...it's in Section 4.1, of course, but I think that the restriction is important enough that it should come earlier, too. My next comment is just to check something. It seems that "data" has been dropped from the list of basic elements, and "metadata" has arrived. Am I right to assume that "metadata" is now the branch under which I will find the UCDs for recording provenance information? - e.g. versions of data reduction software used, names of astrometric catalogues, etc. If so, that's fine - it's just that that has to go somewhere. My final comment is my only real concern about this draft, which is whether the examples of Section 3.4 are too ambitious - especially the ones for the error on a right ascension of a galaxy. That seems to be introducing semantics by stealth, and without enough support - i.e. there seems to me to be too much structure in the relationships between those three quantities for UCD2. Maybe I missed it, but is there any semantic distinction to be made between the second and third terms in stat.error, pos.eq.ra, src.galaxy ? The third term relates to the second which relates to the third, so if I swap the order of the second and third the meaning is lost. On the other hand, I'm not sure that the example for photometric colour is good enough, since I don't see how to specify a generic flux ratio between two random passbands - I don't think it's good enough to enumerate the standard optical colours, since, when we federate multiwavelength datasets, I may well be interested in the ratio of a hard X-ray band flux and a K band magnitude, say. cheers Bob From roy at cacr.caltech.edu Wed Oct 15 19:07:57 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Wed, 15 Oct 2003 19:07:57 -0700 Subject: Version 1.9.9 of UCD definition Message-ID: <02b701c3938a$b2c2b2e0$e5c54f82@Ropy> There is a new version of the UCD specification document on the IVOA Twiki, labelled 1.9.9 in expectation of its promotion to version 2.0 at the IVOA interop meeting today in Strasbourg. The draft is available as pdf and MS word (links below). Please try to read this before the meeting today. Thank You Roy Williams http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9.pdf http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9.doc -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From Thomas.A.McGlynn at nasa.gov Mon Oct 20 07:07:01 2003 From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn) Date: Mon, 20 Oct 2003 10:07:01 -0400 Subject: UCDS vs DM - Peacekeeping In-Reply-To: <1066406745.3f90135915f9f@netmail.pipex.net> References: <1066406745.3f90135915f9f@netmail.pipex.net> Message-ID: <3F93EC05.8050108@nasa.gov> Martin, So far I've been a non-combatant in this war, but as you may know I had some reservations about the UCD2 framework presented at the interoperability meeting. Over the past two days I have been writing a revised proposal. This proposal includes a whole new discussion of the interaction between UCDs and table grouping constructs that I feel may play a major role in mediating among UCDs, data models and the DAL. I'll be publishing this within a day or two, but one key concept is the UCDTree which shows the UCDs of a table in a structured way. It is my hope that methods of abstract data models can be translated into actions on data by straightforward analysis of UCDTree. I expect to be send out the proposal today or tomorrow after I have a chance to check that what I've written during the flight back makes sense when I read it in a non-sleep-deprived state. Regards, Tom martin hill wrote: >I've been trying to sort out in my own head the differences between UCD2s and >data models. Particularly as one doesn't seem to work entirely without the >other. So donning my UN peacekeepers hat (which in the British case is a >tatty gardeners hat in a rather trendy camouflage, not a kevlar helmet): > >It strikes me that data models are about structure, and UCDs about describing >elements in that structure. > >Now it is probably possible that the way data models are defined could include >naming elements to define what they mean. I suggest that these should be (or >include as attributes) UCDs, at the very least so that we can compare data >items that have been formally modelled with those that haven't. > >For example, we can say (simplistically) that a coordinate is an RA, DEC, >error, and refers to some co-ordinate frame, and might look like this in XML: > > > > > 42 > 23 > > > 42 > 23 > > > > > 72.3 > 1 > > > etc > > >Now this is a horribly simple example (sorry about the mixed-up case >conventions) - how do people feel about it? It means that we should avoid >trying to describe structure/context in UCDs (which has the potential of >making them horribly long and complicated) and gives us an immediately useful >way of giving wider meaning to our data structures. > >It kind of implies that we then have a method for appending our UCDs up a data >model tree if we need to get more context for them. Thus we don't have to >have src.galaxy;phot.mag.ObscureOptical;error *as a defined UCD*. Instead >such strings are constructed out of individual UCDs as required by the program >that is investigating the data. > >It also means that UCDs don't have to be specific (which the UCD group are >avoiding cos it's a horrible task, small wonder) and yet I as a developer can >assemble specifics for doing cross comparisons. > >I've only had a pint and it still seems a good idea. It was a big pint though. > > > > From Thomas.A.McGlynn at nasa.gov Mon Oct 20 10:55:38 2003 From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn) Date: Mon, 20 Oct 2003 13:55:38 -0400 Subject: UCDS vs DM - Peacekeeping In-Reply-To: <3F93EC05.8050108@nasa.gov> References: <1066406745.3f90135915f9f@netmail.pipex.net> <3F93EC05.8050108@nasa.gov> Message-ID: <3F94219A.4050901@nasa.gov> One thing I should have made clearer in the earlier message is that the revision I'm sending does not reflect any consensus on the correct approach. Rather it's a very detailed alternative to the approach presented by Roy at Strasbourg. It builds upon the same basic elements that Roy and Sebastien suggested but puts them together in -- for me at least -- a more coherent fashion. This approach does naturally lend itself to the concerns Martin raised, but it's certainly not a done deal. Tom Tom McGlynn wrote: > Over the > past two days I have been writing a revised proposal. > From mchill at dial.pipex.com Fri Oct 17 09:05:45 2003 From: mchill at dial.pipex.com (martin hill) Date: Fri, 17 Oct 2003 17:05:45 +0100 Subject: UCDS vs DM - Peacekeeping Message-ID: <1066406745.3f90135915f9f@netmail.pipex.net> I've been trying to sort out in my own head the differences between UCD2s and data models. Particularly as one doesn't seem to work entirely without the other. So donning my UN peacekeepers hat (which in the British case is a tatty gardeners hat in a rather trendy camouflage, not a kevlar helmet): It strikes me that data models are about structure, and UCDs about describing elements in that structure. Now it is probably possible that the way data models are defined could include naming elements to define what they mean. I suggest that these should be (or include as attributes) UCDs, at the very least so that we can compare data items that have been formally modelled with those that haven't. For example, we can say (simplistically) that a coordinate is an RA, DEC, error, and refers to some co-ordinate frame, and might look like this in XML: 42 23 42 23 72.3 1 etc Now this is a horribly simple example (sorry about the mixed-up case conventions) - how do people feel about it? It means that we should avoid trying to describe structure/context in UCDs (which has the potential of making them horribly long and complicated) and gives us an immediately useful way of giving wider meaning to our data structures. It kind of implies that we then have a method for appending our UCDs up a data model tree if we need to get more context for them. Thus we don't have to have src.galaxy;phot.mag.ObscureOptical;error *as a defined UCD*. Instead such strings are constructed out of individual UCDs as required by the program that is investigating the data. It also means that UCDs don't have to be specific (which the UCD group are avoiding cos it's a horrible task, small wonder) and yet I as a developer can assemble specifics for doing cross comparisons. I've only had a pint and it still seems a good idea. It was a big pint though. -- Martin Hill 07901 55 24 66 www.mchill.net From posuna at iso.vilspa.esa.es Tue Oct 21 01:50:24 2003 From: posuna at iso.vilspa.esa.es (posuna at iso.vilspa.esa.es) Date: Tue, 21 Oct 2003 10:50:24 +0200 Subject: UCDS and DM and "Catalogue" tables In-Reply-To: <3F94219A.4050901@nasa.gov> References: <1066406745.3f90135915f9f@netmail.pipex.net> <3F93EC05.8050108@nasa.gov> <3F94219A.4050901@nasa.gov> Message-ID: <20031021105024.0698fc07.posuna@iso.vilspa.esa.es> Dear all, the fact that I have been one of the authors of the UCD2 draft document (as a member of the UCD steering committee) made it very bizarre that I "voted" for the UCD1 option, together with other very few people. I think that if we use a Data Model to access metadata, then it is enough with the UCD1 that was created at the beginning. For me, there is no need to have more specificity, neither is there a need for matching functions and other more complex syntactical operations to be performed with the UCDs. For me, all that should be done through the Data Model and/or proper VOQL language. However, I do appreciate that many providers do have their data in the form of catalogues (what I normally call 2D tables, or X-Y tables, as catalogue, for me, can be several different things). I understand that people having data in X-Y tables do NOT want to hear about data models: they only want to be able to perform operations with the columns of their tables. It is in that context (X-Y table handling) where I believe the need of an overall complex UCD structure appears, as any comparison operation (or more complex ones as addition, substraction, ...) they want to do it by comparing (or adding/substracting/...) directly the columns. As the UCD2 is just adding more capabilities to the UCD1 without chopping off any of the existing ones, and as we ("Data Model" oriented ones) can always use the UCDs in the limited way we want them (i.e., just to describe the metadata we give back so that they are "Universally" understandable) I see no reason to stop the UCD2 to include more syntactical and operational capabilities: people "Data-Model oriented" will make use of very limited UCD capabilities (as they don't need them) and people "X-Y Table" oriented will make use of as many UCD "functionalities" as possible. In summary, I'm OK with new funtionalities of the UCDs but I guess I'll do a very limited use of them in the case that I use the Data Model to access my data, and hence my vote for the UCD1 "paradigm". Maybe the UCD document should reflect two main "Areas of Interest", something like: - Simple UCD handling: Data Model access - Complex UCD handling: Syntactical operations on X-Y (catalogue) table columns I wait for your comments... Cheers, Pedro. On Mon, 20 Oct 2003 13:55:38 -0400 Tom McGlynn wrote: > > One thing I should have made clearer in the earlier message is that > the revision I'm sending > does not reflect any consensus on the correct approach. Rather it's a > > very detailed alternative to the > approach presented by Roy at Strasbourg. It builds upon the same > basic elements that Roy and Sebastien suggested but puts them > together in -- for me at least -- > a more coherent fashion. This approach does naturally lend itself > to the concerns Martin raised, but > it's certainly not a done deal. > > Tom > > Tom McGlynn wrote: > > > Over the > > past two days I have been writing a revised proposal. > > > -- Pedro Osuna Alcalaya SOFTWARE Development Group XMM-Newton Science Archive e-mail: Pedro.Osuna at esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN From cgp at star.le.ac.uk Tue Oct 21 02:15:35 2003 From: cgp at star.le.ac.uk (Clive Page) Date: Tue, 21 Oct 2003 10:15:35 +0100 (BST) Subject: UCDS and DM and "Catalogue" tables In-Reply-To: <20031021105024.0698fc07.posuna@iso.vilspa.esa.es> Message-ID: On Tue, 21 Oct 2003 posuna at iso.vilspa.esa.es wrote: > I understand that people having data in X-Y tables do NOT want to hear > about data models: they only want to be able to perform operations with > the columns of their tables. I'm not sure that's entirely true. I think the problem is that three partially overlapping groups of people have been trying over several years to make sense of astronomical data, so they canclassify or organise it to make data access simpler and more uniform. (1) Those devising UCDs, which started out exclusively for tables in VizieR. I think it is now generally agreed that some hierarchical scheme should replace the original flat namespace (though designed with components which split naturally into layers). (2) Those devising Data Models. The problem for those outside this effort is that there have been so many data models, different in structure and detail, but all apparently equally valid. I didn't manage to attend the DM sesssion in Strasbourg, but am pleased to hear of serious convergence. (3) Those devising query languages and data access routines, who need some DM or UCD scheme to do the job properly, except for the serious problem that a UCD does not lead to a unique column in some (many?) tables. I thought that Tom's ideas at Strasbourg sounded good, and look forward to seeing his proposal in print. If it's true that there is now a single agreed Data Model, does that not suggest that UCDs should be assigned on the same hierarchical basis? -- Clive Page Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 From amsr at jb.man.ac.uk Tue Oct 21 04:08:57 2003 From: amsr at jb.man.ac.uk (Anita Richards) Date: Tue, 21 Oct 2003 12:08:57 +0100 (BST) Subject: Version 1.9.9 of UCD definition Message-ID: At the interoperability workshop following ADASS, there was some mention of contradictions between the spectral divisions in the UCD2 document and the groupings in RM v0.82. There was also debate about whether and where an mm or sub-mm band should be introduced in RMv0.82. I think that consistency between UCD2 and the RM (whatever is the stable outcome for defining the Registry) would be very useful but in fact the discrepancies are minor. Personally, I hope that we do not spend much time discussing the exact divisions as we cannot avoid splitting the coverage of some present instruments, let alone future ones. However it might be sensible to be guided by the current/immediate future major observatories. Perhaps the responses to the SSA / spectral data model questionnaire might help. These are my suggestions: The boundaries of the spectral UCD2s are consistent with fitting into the RM categories with minor changes: Some of the UCD2 frequency ranges at the high freq end of IR have become transposed (I think - it is indeed hard to think in 3 sets of units!) I apologise for having created confusion over the new mm band in RMv0.82. The proposed range of ALMA is 30 - 900 GHz according to the web page. In fact, I understand that initially the lower limit is more like 86 GHz. Therefore I suggest that consistency can be achieved by having instead: SUBMM 100 micron - 3 mm (100 - 3000 GHz) The RMv0.82 x-ray regime goes from 0.12 - 120 keV; the UCD2 regime goes from 0.12 - 12 keV Accoding to their web pages, CHANDRA covers 0.1 - 10 keV (I have also been told 0.12 - 12...) and XMM covers 0.15 - 15 keV; ROSAT was within this. The CRO covered > 30 keV and SWIFT, 15 - 150 keV Thus, to keep the 'decades', this seems more suitable: XRAY 0.12 - 12 keV (as per UCD2) GAMMARAY > 12 keV but maybe a high energy astronomer can advise.... OVERLAPPING DATA Suppose that I have a catalogue with data taken around 1 micron. I hope I am right in thinking that the Registry entry can contaiin both Optical and IR as values of the relevant spectral coverage keyword, and this is used OK in searches etc. How will the UCDs be used? As I understand it, in order to use Vizier or the Aladin SED tool to search for e.g. radio observations between 1.3 and 1.7 GHz (radio L-band), the software would look for UCDs em.radio.750-1500MHz and em.radio.1.5-3GHz. However there may be catalogues giving radio observations or flux densities between 1.4 and 1.7 GHz. This might or might not give the exact observing frequency, but for many purposes the general flux density or image would be useful. As I understand it, this would simply have the UCD em.radio and would not be found. This is _not_ a plea to divide the ranges differently, but a reflection of the fact that we are no longer restricted to observing in well-defined filters. Broad-band receivers and optical fibres in the radio, ALMA, space observatories etc. mean that we observe at all frequencies seamlessly, and sensitivity to higher redshifts means that lines no longer fall even in a single regime. Maybe there is a solution already. If not, the only satisfactory things I can see are to do one of the following: * allow bracketting UCDs e.g. em.radio.0.75-3GHz * allow two (or more) UCDs to a column, where these are adjacent (up to a sensible max) * not use the fine divisions in UCDs, but treat frequency like position and treat every query as a 1D cone search, ie a linear segment search. cheers a - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dr. Anita M. S. Richards, AVO Astronomer MERLIN/VLBI National Facility, University of Manchester, Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax). From derriere at newb6.u-strasbg.fr Tue Oct 21 03:40:21 2003 From: derriere at newb6.u-strasbg.fr (Sebastien Derriere) Date: Tue, 21 Oct 2003 12:40:21 +0200 Subject: UCDS vs DM - Peacekeeping References: <1066406745.3f90135915f9f@netmail.pipex.net> Message-ID: <3F950D15.AEDBC59C@astro.u-strasbg.fr> martin hill wrote: > > It strikes me that data models are about structure, and UCDs about describing > elements in that structure. > > Now it is probably possible that the way data models are defined could include > naming elements to define what they mean. I suggest that these should be (or > include as attributes) UCDs, at the very least so that we can compare data > items that have been formally modelled with those that haven't. This kind of work has been done in the case of the IDHA model: trying to find a relevant UCD for the different model elements. But I don't think that we should impose every element of every data model to be associated to a UCD. In fact, the link between UCD and DM can exist in both directions: - there can be a 'ucd' attribute for elements of data model, to indicate the corresponding UCD - the 'utype' attribute in VOTable allows to give a link to some data model element In the first approach, people building the data model make the effort to associate UCD to their view. The second approach can be used when there are no ucd attribute in the DM, to link the description of a dataset to a DM. Of course, both are not exclusive: there can be a ucd attribute in the DM, AND a utype for a or that would point, hopefully, to the same element of the DM ! Sebastien. -- _______ / ~ /, Sebastien Derriere mailto:derriere at astro.u-strasbg.fr / ~~~~ // Observatoire de Strasbourg Phone +33 (0) 390 242 444 /______// 11, rue de l'universite Telefax +33 (0) 390 242 417 (______(/ F-67000 Strasbourg France From pfo at star.le.ac.uk Tue Oct 21 04:45:18 2003 From: pfo at star.le.ac.uk (Patricio F. Ortiz) Date: Tue, 21 Oct 2003 12:45:18 +0100 (BST) Subject: UCDS and DM In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Anita Richards wrote: > Please can I clarify a possible application, which may initially be done > in an ad-hoc way until data models are fully harnessed: > > Supposing we are harvesting a catalogue which consists of radio flux > densities measured at 22 GHz. This is stated in the Abstract or ReadMe or > maybe even the title, but there is no table in the data saying 'observing > frequency'. As I understand it, we could add a virtual column > em.radio.12-25GHz, in which all the entries are the same. Can this be > done as a default, so that the data are recognised at their proper > frequency? Anita, the way UCD1 would handle this is that the UCD associated to the flux will reflect the region of the radio spectrum, eg, PHOT_FLUX_RADIO_11-25GHz, therefore, your catalogue will be discovered by the presence of that UCD without having to add a virtual column describing the observed frequency range. This case was very common, and not only in radio. There are hundreds of columns with "magnitude" as explanation, and you find the waveband in either the readme or the table title. UCDs are supposed to reflect the quantity as well as possible, that's why even with UCD1 one can launch discovery queries of this type and discover catalogues which have no indication of that part of their content in other pieces of meta-data information. Cheers, Patricio --- Patricio F. Ortiz pfo at star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UK From cgp at star.le.ac.uk Tue Oct 21 04:27:48 2003 From: cgp at star.le.ac.uk (Clive Page) Date: Tue, 21 Oct 2003 12:27:48 +0100 (BST) Subject: Version 1.9.9 of UCD definition In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Anita Richards wrote: > Thus, to keep the 'decades', this seems more suitable: > XRAY 0.12 - 12 keV (as per UCD2) > GAMMARAY > 12 keV > but maybe a high energy astronomer can advise.... I think that boundary between the two is reasonable: some past instruments called 'X-ray' have some coverage up to maybe 15 keV, but most modern telescopes depend on grazing-incidence reflection, which practically stops working above about 10 or 12 keV. > OVERLAPPING DATA > How will the UCDs be used? As I understand it, in order to use Vizier or > the Aladin SED tool to search for e.g. radio observations between 1.3 and > 1.7 GHz (radio L-band), the software would look for UCDs > em.radio.750-1500MHz and em.radio.1.5-3GHz. I think the way it has to work is this: the UCD defined for some dataset should be as specific as possible, so if the data fall practically within a single band (say within 750-1500 MHz) then you declare the UCD as "em.radio.750-1500MHz". If not then you have to fall back on a less specific UCD of "em.radio". A user wanting radio measurments at some frequency should be prepared to search for both _the_ most specific UCD and the less specific UCD, i.e. "em.radio". This may bring up some false positives, but so will any scheme that we can devise. The alternative, as you say, is to apply a two or more UCDs to a dataset. That might be better in principle, but difficult in practice. > * not use the fine divisions in UCDs, but treat frequency like position > and treat every query as a 1D cone search, ie a linear segment search. Well that's similar to a proposal I made some time ago (maybe in a Data Models context). Datasets should specify the range of frequencies they cover (which requires two numbers for the min frequency and max frequency, which doesn't imply that coverage between the limits is continuous, gaps should be ignored); then users also specify the range of interest. It's then a trivial exercise for a computer to compare the two intervals and work out which resources overlap the range of interest. All this without us having to dream up any artificial divisions between wavebands. But that doesn't fit in with UCDs as devised up to now, and I don't see any easy way to make it fit. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 From pfo at star.le.ac.uk Tue Oct 21 05:52:27 2003 From: pfo at star.le.ac.uk (Patricio F. Ortiz) Date: Tue, 21 Oct 2003 13:52:27 +0100 (BST) Subject: Version 1.9.9 of UCD definition In-Reply-To: Message-ID: Hi, I'm glad we are going into this discussion, not because it's an easy subject, but because it is a relevant one. The EMS is perhaps the most important 'tree' which we want to describe properly. I see several different regimes: a) broad band observations (eg, Johnson's V, radio bands, X-ray bands) b) narrow bands c) line spectroscopy (H-alpha, 21cm, CO, OIII, etc) d) continuum spectroscopy (eg, from blue to red) e) flux ratios The question of how to describe these quantities is strongly linked to what we want to use with the descriptors. It is different what we assign to the column (the UCD) and the type of search we intend to perform. If our goal is to find all radio fluxes, then we could cut the UCD in phot.flux.radio at the level of search, so even if we have infinite granularity (eg, phot.flux.radio.22Ghz), this column will be recognized. If on the other hand we are interested in finding observations in 22GHz, having an assigned UCD phot.flux.radio assures us to find the 22GHz data (eventually), but the S/N will be quite low, having to browse through hundreds of catalogues to find 1 is not desirable, and it's not what we had in mind when UCDs were introduced. Specifity also means that one can discover quantities which are comparable, that one will not mix a line measurement with a continuum one (despite being very close in frequency). What I think is important to decide now is if up to what extent the UCDs can do the job, without complicating them too much or oversimplifying them to the point of becoming useless for discovery purposes. One thing I wouldn't like to see is to see UCDs become as ambiguous as column descriptions. It is quite possible that we should resource to another element of meta-information for the description of a quantity and for its discovery. I thought that 'utype' could be used to link observational windows with their description (data model?). eg, UCD="phot.mag.opt" utype="strongrem.B" or UCD="phot.flux.radio" utype="JBO.22Ghz" or UCD="phot.flux.radio" utype="VLA.6cm" For accurate matching, one would have to use both pieces of information to avoid mixing apples with oranges. The method that Anita and Clive mention (a fine piano-keyboard mask) is something I've considered as well. It is broader than the UCD, but requires accurate knowledge of each band observation window (I'm not saying it's not doable). Users could then specify which part of the spectrum they are interested in and they will find the tables which contain data in those parts of the EMS. This works quite well for broad bands, but it is not so great if one is interested in narrow bands. Imagine one searches data around HBeta. Most catalogues with "blue broad band" observations will have 1's in that area of the spectrum, therefore, the noise will be huge! The same is valid for lines in other areas of the spectruma. (that's why I didn't pursue this model farther). Same is valid for colour indices: looking for V-K in terms of its initial and final wavelength will bring a large number of undesired catalogues/columns. One solution I've proposed is to stick to a very fine granularity at the UCD level which will allow us to compare alike quantities, but to create "virtual containers" in as many orthogonal directions as needed to solve the EM problem, as boundaries will continue shifting and being taste dependent. Imagine for a second that we keep the fine granularity, having UCDs like phot.jhn.B, phot.jhn.V, phot.jhn.R, phot.jhn.I, phot.str.B, .... phot.sloan.b (if exists) The original UCD structure allowed users to retrieve any catalogue with Johnson's photometry (just search for /phot.jhn/ and voila), but there was no room to look for "catalogues with blue broad-band observations" A "virtual container" for this example could be phot.opt.blue.bb. No column will ever be assigned this VC-UCD, but a search engine will understand it as "hmm, phot.opt.blue.bb is not a UCD, I need to look for catalogues which contain any of the UCDs listed in its index". To follow the example, phot.opt.blue.bb := ("phot.jhn.B", "phot.str.B", "phot.sloan.b"); And if later, GAIA introduces its own blue filter, phot.gaia.B will be created as a UCD, and "phot.gaia.B" will become part of this list: phot.opt.blue.bb := ("phot.jhn.B", "phot.str.B", "phot.sloan.b", "phot.gaia.b"); Finally, alhough a column should have one and only one UCD, nothing should prevent that a UCD could belong to several virtual containers. UCDs would describe what quantities are, VCs are used to describe contexts or related concepts accepted by the community at any time, and quite possibly user-definable in the future. Food for thought... after all, > 50% of the current and future quantities are related to the EMS! Cheers, Patricio --- Patricio F. Ortiz pfo at star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UK From posuna at iso.vilspa.esa.es Tue Oct 21 05:35:45 2003 From: posuna at iso.vilspa.esa.es (posuna at iso.vilspa.esa.es) Date: Tue, 21 Oct 2003 14:35:45 +0200 Subject: UCDS and DM In-Reply-To: References: <1066406745.3f90135915f9f@netmail.pipex.net> <3F950D15.AEDBC59C@astro.u-strasbg.fr> Message-ID: <20031021143545.375a8cf4.posuna@iso.vilspa.esa.es> Hi, > I was slightly puzzled by comments at the interoperability workshop > about data collections which do not have catalogues/tables. Surely > even a collection of FITS spectra or images or etc. must be accessed > via a descriptive list? Although I suppose a data collection could > consist of a single image (or etc.). In either case, though, is this > is another example of where it would be useful to take UCDs from what > might be a single description ("images taken at 0.05 arcsec resolution > ...") of the whole data collection - and use them to form or augment a > table to allow access to the data? We do have quite complex data models for our data; you can see part of the ISO model in the attached figures called iso_cdm_obss.gif and iso_pdm_obss.gif which are part of our (database jargon) CDM (Conceptual Data Model) and PDM (Physical Data Model) documents (where you can see loads of very project-specific things). The access to our data is not done through a descriptive list, but through database queries. However, in the case that someone wants a catalogue/table from our data, we can provide one. This is what we do with CDS: we provide them with the table they store in Vizier and that can be used as the rest of catalogues there for data mining (I'm not sure whether this is the word...I mean data discovery, etc.). You can find this table at: ftp://cdsarc.u-strasbg.fr/cats/B/iso/isolog.dat.gz However, that is only a very specific view of our data which can, otherwise, be accessed through other means, e.g., by SIAP or future SSAP, etc., or in the case of a proper general Data Model, through a proper VOQl to the Data Model (how we translate the future "Standard VO data Model" to ours is another story...(unfortunately, only up to us I'm afraid...)). Cheers, Pedro. -- Pedro Osuna Alcalaya SOFTWARE Development Group XMM-Newton Science Archive e-mail: Pedro.Osuna at esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN -------------- next part -------------- A non-text attachment was scrubbed... Name: iso_cdm_obss.gif Type: application/octet-stream Size: 29042 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: iso_pdm_obss.gif Type: application/octet-stream Size: 24526 bytes Desc: not available URL: From roy at cacr.caltech.edu Tue Oct 21 07:40:58 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Tue, 21 Oct 2003 07:40:58 -0700 Subject: UCD use cases References: Message-ID: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Please find below some possible use cases for UCD. -- How do the different proposed UCD/DM schemes help with these cases? -- What does the query look like in each case, and how would that query be coded? -- Can you supply additional use cases? Please try to stay within the realm of the possible here! -- If you refer to a Data Model, please try to be concrete about who/how/example. Please do not simply invoke "the Future IVOA Data Model". Thank you Roy -------------------------- (1) Cone search. How to decide which columns are the RA,Dec that was used in the search. What frame (B1950, J2000, ...) do these come from? If there are columns with ID, what sort of ID is it, and how do I resolve it? (2) SIAP search. Find the column that contains the URLs where the images are. Find out if there are other columns that have RA,Dec of the image center. (3) We have for example a crossmatch service that is clever enough to know about error ellipses. How does it get from a table the most sophisticated error info that is there: (a) position (b) circular error (c) ellipse error. (4) We want to compare photometry in two tables covering the same star cluster. How do I decide if they share measurements in the same filter? One has R band, the other has Halpha. What happens if fluxes are expressed differently -- eg number / energy / magnitude / luminosity density. (5) I want distances to stellar objects measured in meters, so I can make a 3D display for the children. How do I recognize a redshift (z) value, how do I recognize a radial velocity, how do I recognize an actual distance measure? (6) I am looking for supernovae that have both optical and Xray measurements. Can I (should I) use UCD to help my search? (7) How do I find 21cm observations (that may be redshifted), which also have polarization information? From pfo at star.le.ac.uk Tue Oct 21 08:46:37 2003 From: pfo at star.le.ac.uk (Patricio F. Ortiz) Date: Tue, 21 Oct 2003 16:46:37 +0100 (BST) Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: Hi Roy, here are my 2 cents. On Tue, 21 Oct 2003, Roy Williams wrote: > Please find below some possible use cases for UCD. > > -- How do the different proposed UCD/DM schemes help with these cases? > > -- What does the query look like in each case, and how would that query be > coded? > > -- Can you supply additional use cases? Please try to stay within the realm > of the possible here! > > -- If you refer to a Data Model, please try to be concrete about > who/how/example. Please do not simply invoke "the Future IVOA Data Model". > > Thank you > Roy > -------------------------- > > (1) Cone search. How to decide which columns are the RA,Dec that was used in > the search. What frame (B1950, J2000, ...) do these come from? If there are > columns with ID, what sort of ID is it, and how do I resolve it? pick RA and dec from UCDs: pos.eq.ra;main, pos.eq.dec;main If the source is a VOTable, one can use the COOSYS element to figure out the equinox/epoch. If the source is plain metadata, we're missing the equinox element by using the plain UCD. Within a DM, the equinox should be there and can be used to solve the ambiguity. Now, if they are different, the question is whether they should be precessed. I assume that you mean src.object_id... Hmmm, invoke simbad or ned or any name-solver with the appropriate output equinox. However, I would advocate at this point in time to strongly avoid the existence of catalogues where only the ID is provided. It would be an effort to solve those names for all these catalogues, but once done, we don't have to worry about it anymore. > (2) SIAP search. Find the column that contains the URLs where the images > are. Find out if there are other columns that have RA,Dec of the image > center. I can see your point of differentiating RA/dec for sources as opposed to observations in a log. RA/dec are coordinates regardless of what one measure, so UCD1 didn't do anything about it. In the image catalogues, pos.eq.[ra|dec];main represent the image pointing. > (3) We have for example a crossmatch service that is clever enough to know > about error ellipses. How does it get from a table the most sophisticated > error info that is there: (a) position (b) circular error (c) ellipse error. Thought about this one before :-) Most complete case: elliptic error: - semi-major axis error.pos.smaj - semi-minor axis error.pos.smin - ellipse orientation (PA) error.pos-ang Alternatively, one could have ellipticity (error.ellipt) xor axis-ratio (error.pos.ax-ratio) and major axis, in which case one has to compute the semi-minor axis Circular, simpler, we only need err.pos.smaj (or err.pos.rad) Roy, please extend this case not just to errors, but to extended objects, as we could be talking about overlap between galaxies or molecular clouds or other extended structures/objects. > (4) We want to compare photometry in two tables covering the same star > cluster. How do I decide if they share measurements in the same filter? One > has R band, the other has Halpha. What happens if fluxes are expressed > differently -- eg number / energy / > magnitude / luminosity density. In a first approximation, I'd say stick to compare apples with apples and oranges with oranges. UCDs should tell you which band is observed (oops, that's UCD1s), therefore it should be clear whether the two tables contain the same type of observations. One interesting scenario here is to have a real Xmatch machine which identifies the stars of one table with the ones with the other. I may want to produce a diagram type R vs Halpha. If more than two tables are involved, and one chooses one filter, we are talking about forming light curves. > (5) I want distances to stellar objects measured in meters, so I can make a > 3D display for the children. How do I recognize a redshift (z) value, how do > I recognize a radial velocity, how do I recognize an actual distance > measure? Redshift, by its UCD: redshift.hc radial velocity: veloc.hc (beware of radial velocities of expanding stars) distance: phys.distance.true, drop anything measured in kpc or Mpc :-) beware of things measured in km or au Or, go to any catalogue measuring parallax and convert to distance. convert your distances to meters (tool to be built) > (6) I am looking for supernovae that have both optical and Xray > measurements. Can I (should I) use UCD to help my search? Optical and Xray UCDs appear not only in catalogues related to supernovae. Look for catalogues with supernova in their title and then use the X-RAy UCDs (SN are still discovered in optical, so they'll surely have optical fluxes). Look at a few catalogues with SNe for quantities which are proper of these objects (var.??) > (7) How do I find 21cm observations (that may be redshifted), which also > have polarization information? UCDs will help you with polarization. Try /pol./, 21cm for sure frequency-wise: phot.flux.radio.1.4G (that's why we didn't use '.' to separate the UCDs :-) but no assurance if the redshift is too high. Scan the table title for /21/ && /cm/ if the radio UCD. I'd expect the results to be nearly the same. Note: I just used http://barbara.star.le.ac.uk/datoz/mykats.html to perform the last search... I found a few cats with POL_ and RADIO_FLUX in 1.4Ghz Sorry, no time to write the queries in a WLQL (wish list query language) :-) Cheers, Patricio --- Patricio F. Ortiz pfo at star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UK From gtr at ast.cam.ac.uk Tue Oct 21 08:29:14 2003 From: gtr at ast.cam.ac.uk (Guy Rixon) Date: Tue, 21 Oct 2003 16:29:14 +0100 (BST) Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> References: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: An additional case: I have a heap of data expressing modelled and observed values for a handful of quantities; celestial coords and proper motions would do as an example. I want to read out all the modelled versions of one of these quantities and compare them to assess the spread of the models. Then I want to do the same with the observations. What do I look for? On Tue, 21 Oct 2003, Roy Williams wrote: > Please find below some possible use cases for UCD. > > -- How do the different proposed UCD/DM schemes help with these cases? > > -- What does the query look like in each case, and how would that query be > coded? > > -- Can you supply additional use cases? Please try to stay within the realm > of the possible here! > > -- If you refer to a Data Model, please try to be concrete about > who/how/example. Please do not simply invoke "the Future IVOA Data Model". > > Thank you > Roy > -------------------------- > > (1) Cone search. How to decide which columns are the RA,Dec that was used in > the search. What frame (B1950, J2000, ...) do these come from? If there are > columns with ID, what sort of ID is it, and how do I resolve it? > > (2) SIAP search. Find the column that contains the URLs where the images > are. Find out if there are other columns that have RA,Dec of the image > center. > > (3) We have for example a crossmatch service that is clever enough to know > about error ellipses. How does it get from a table the most sophisticated > error info that is there: (a) position (b) circular error (c) ellipse error. > > (4) We want to compare photometry in two tables covering the same star > cluster. How do I decide if they share measurements in the same filter? One > has R band, the other has Halpha. What happens if fluxes are expressed > differently -- eg number / energy / > magnitude / luminosity density. > > (5) I want distances to stellar objects measured in meters, so I can make a > 3D display for the children. How do I recognize a redshift (z) value, how do > I recognize a radial velocity, how do I recognize an actual distance > measure? > > (6) I am looking for supernovae that have both optical and Xray > measurements. Can I (should I) use UCD to help my search? > > (7) How do I find 21cm observations (that may be redshifted), which also > have polarization information? > Guy Rixon gtr at ast.cam.ac.uk Institute of Astronomy Tel: +44-1223-337542 Madingley Road, Cambridge, UK, CB3 0HA Fax: +44-1223-337523 From amsr at jb.man.ac.uk Tue Oct 21 10:28:24 2003 From: amsr at jb.man.ac.uk (Anita Richards) Date: Tue, 21 Oct 2003 18:28:24 +0100 (BST) Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> References: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: Hi Roy, Great to see something practical happening. One comment - we do need to see the use of UCDs alongside the use of the Registry and data models (e.g. Pedro Osuna's very helpful reply which has indeed lifted my confusion about non-table data - or that bit of it anyway). Hence, when people provide implimentation of use cases, it would be nice to see: Where/if UCDs are used in the Registry to select data-sets (and if only UCDs, not other entries e.g. spectral coverage, keywords, serve a particular purpose); How UCDs are used (ideally in the context of a data model) to enable the processing steps in the execution of a query. Use case - if you are sick of Brown Dwarfs read no further: http://wiki.astrogrid.org/bin/view/Astrogrid/BrownDwarfRegistryRequirements The RegistryQuery steps use some things which could be UCDs like Proper Motion but they might also be key words. Specific colours are also used, but it might be sufficient to know that a catalogue contains optical and colour data. The DataSetQuery/Evaluation steps use these UCDs above and others. The new UCD structure enabling the use of partial matches would be a great help as the workflow contains choices where you use a quantity (e.g. colour) if already defined but if not you derive it ("you" being a data processing software agent). More examples could be drawn from the AstroGrid Ten... cheers a - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dr. Anita M. S. Richards, AVO Astronomer MERLIN/VLBI National Facility, University of Manchester, Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax). From tam at lheapop.gsfc.nasa.gov Tue Oct 21 12:28:31 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Tue, 21 Oct 2003 15:28:31 -0400 Subject: A suggested revision for UCDs Message-ID: <3F9588DF.3030805@lheapop.gsfc.nasa.gov> A few minutes ago I uploaded a version of my suggested revised proposal for UCDs to the Twiki. This is just a Word version since I don't have a PDF generator handy. The URL is http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.doc This builds upon Roy and Sebastien's division of UCDs into concepts and propeties but puts them together rather differently. In addition and largely independently, it includes a discussion of the use of UCDs in the context of groups of columns and tables as a whole. With the approach suggested, I hope that the ambiguities that have heretofore precluded considering using UCDs to mediate between data models and data can be removed and we may no longer require a utype parameter. Some discussion of the document is included (in red) within the text. This mostly describes the relationship of this version to the 1.9.9 version. Sections 3 and 6 of the document are copied from the previous version. (though section 6 was section 8 in that document) from the previous version. The abstract is altered to reflect the discussion of grouping constructs. Section 2 -- describing the status of the document -- has been changed to discuss the implications of the adoption of this recommendation upon existing software systems and protocols. I think that some such statement should be part of draft recommendations. Not sure if that's part of the recommendation process but it probably should be. Section 4 is the discussion of UCDs and UCD syntax. At the end of section 4 there is a long parenthetical discussion of some of the ways this proposal differs from 1.9.9 since people seemed to be concerned about that last Thursday. If they've gotten this far they probably don't need this anymore but it may be of interest. This section tries to be rather rigorous -- addressing a fair number of nits that had not been talked about in the earlier proposal even though it makes the proposal somewhat longer (e.g., the discussion of array valued cells, a much more detailed discussion of comparability of columns, how to ensure the uniqueness of UCDs, ...) The actual definition of all of the valid words for UCDs is deferred as it was in the earlier version, but a substantial number of examples are given. [In fact, most of the words should transfer between the two versions fairly transparently. The differences lie mostly in how they are put together.] Section 5 is the dicussion of UCDs and grouping structures. I'm quite excited by this since I think it real potential for helping to unify discussions of data, data models and data access. The discussion in this chapter is less rigorous -- even if the basic idea is adopted I'm sure it will need substantially more work but I think it has real possibilities for linking data models and data. This chapter is why I've sent this message to the DAL and DM groups as well as to the UCD group. Apologies to all of you who get this twice or thrice! I trust there will be comments... Tom From tam at lheapop.gsfc.nasa.gov Tue Oct 21 12:51:17 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Tue, 21 Oct 2003 15:51:17 -0400 Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> References: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: <3F958E35.1090703@lheapop.gsfc.nasa.gov> Here are some thoughts about how the new UCD scheme I've proposed might help in Roy's scenarios. I'd sent this to Roy directly earlier without realizing that it had been sent to the UCD and DM groups.... Tom > > (1) Cone search. How to decide which columns are the RA,Dec that was used in > the search. What frame (B1950, J2000, ...) do these come from? If there are > columns with ID, what sort of ID is it, and how do I resolve it? > UCDtrees look something like: src;instance meta.id[.type];value pos;instance ra/dec, l/b columns pos.coosys;value Optional, either table coosys element of the table or a table wide parameter Look for highest pos;instance and meta.id;value to get the correct id and pos columns. If we want to handle B1950 coordinates then we need to include a table parameter or parameters including the coordinate system. Of course the cone search protocol requires that the output be in J2000, but it would be easy to support genericity if we want. I suggested that the coosys element in tables could have a UCD, but less controversial would be a table parameter giving the information. That's not quite as elegant. However I don't like duplicating metadata information. Maybe the best solution is to include a parameter but make it point to the coosys using the XML ID. Not sure what you mean by resolve the id, but assuming that this isn't addressed by specific subtypes of id, replace meta.id;value with meta.id;instance meta.id;value meta.id.resolver;value meta.resolver;value would presumably be a parameter but in principal if there was a different resolver for each row in the table it could be a column. > (2) SIAP search. Find the column that contains the URLs where the images > are. Find out if there are other columns that have RA,Dec of the image > center. Basic UCDTree is: obs;instance meta.id;value # The required id field meta.link;value # The link field pos;instance # The required position These are mandated columns in the SIAP protocol and the protocol should also specify that there is only one pos, meta.id and meta.link element at the top level. Note, by the by, that if we want to allow the SIAP protocol to return galactic positions, it's easy to do. [This isn't necessarily a good idea for SIAP, but the it shows the expressivity of the UCDTrees.] Could there be an image center that is distinct from this image? This would require there to be an associated observation to the main observation. [If I really want to go overboard I might have made the table UCD obs;filter.cutout;instance but I don't think that that is going to be useful to most people, though it does show a nice use of my suggested filter hierarchy.] This could be suggested by adding the following to the tree obs;instance meta.id;value pos;instance The issue which is harder to address is understanding the relationship between this secondary observation and the primary observation. This is a semantic issue not a structural one, so a UCD must be involved. E.g., we need to define concepts for background, center, offset, ... If we have done that (and I would suggest they belong in my meas tree since they have to do with idea of a measurement) then the previous addendum becomes [and I've put in a background image for good measure] obs.instance meta.id;value obs;meas.center pos;instance obs.instance meta.id;value obs;meas.background pos;instance How do I find this... I look for associated observations and search for one which has the desired measurement attribute. I believe an XQuery expression can do this pretty easily. > > (3) We have for example a crossmatch service that is clever enough to know > about error ellipses. How does it get from a table the most sophisticated > error info that is there: (a) position (b) circular error (c) ellipse error. This is pretty easy... Looking just at a given position, we might have: pos;instance pos.eq.ra;value pos.eq.dec;value pos.eq.ra;meas.error Errors in the coordinates pos.eq.ra;meas.error pos;meas.2d.error This is a circular 2-d error pos;meas.2d.error.elliptical;instance pos;meas.2d.error.elliptical.x pos;meas.2d.error.elliptical.y pos;meas.2d.error.elliptical.posang which gives all three of these. While there may be a better name for the last two errors, I don't think it matters much. Whatever error there is is directly associated with the position and it's easy to find out what errors are availalbe. The question of which error the user should use, is not appropriate for the UCD scheme. That's up to the user. The UCDs indicate which errors are available. I'm not sure if we want to elaborate the measurement tree to this level, maybe it could be done more simply in some other way. However I don't think this is an unreasonable approach. > > (4) We want to compare photometry in two tables covering the same star > cluster. How do I decide if they share measurements in the same filter? One > has R band, the other has Halpha. What happens if fluxes are expressed > differently -- eg number / energy / > magnitude / luminosity density. Well I don't think unit conversions are an issue that UCDs should address. So I'll pass on that. [That's what units keywords are for after all.] The columns should describe the bands in which they were taken. If we want to assure that we actually have the same filter, then we need UCDs that are specific down to the filter level. The old UCD tree (and I assume you've kept it in the current one) puts all of that info in the initial word. I'd tend to use the em modifier here to define the band. E.g., phot.mag;em.optical.filter.v.johnson;value phot.flux;em.optical.filter.v.johnson;value and phot.counts;em.optical.filter.v.johnson;value for magnitude, flux and counts (in a photon counting instrument) If the question is whether the filters overlap, then I don't think this is a question the UCDs answer directly. It's expert knowledge about the concepts (Just like the exact relationship between RA and Dec isn't specified in the UCD. That's expert knowledge too.) > (5) I want distances to stellar objects measured in meters, so I can make a > 3D display for the children. How do I recognize a redshift (z) value, how do > I recognize a radial velocity, how do I recognize an actual distance > measure? Units are not a UCD issue. The other questions simply need distinct UCDs. As I understand the schema they would just be different elements of the phys tree but that's a bit of a guess. Redshift is a different concept from distance as is a radial velocity. They can be converted into each other in certain circumstances, but it is not the role of UCDs to understand the transformation rules -- just as it is not the role of UCDs to understand how to convert from RA,Dec to L,B. I think it may be reasonable to have all of the base concepts: phys.distance A distance of some kind phys.distance.z A redshift in a cosmological context phys.distance.vrad A radial velocity in a cosmological context phys.z A redshift in non-cosmological context phys.vrad A radial or to only have the single distance concept and leave it to the user to recognize that they are in a cosmological context so that radial velocities can be converted to distances. That's probably safer from the context of trying to ensure that there is only one UCD for a given concept so I'd tend to go that way. Users put z in a table rather than distance for a reason. UCDs describe what a column is, they need not and should not describe what you can do with a column. I gather that is the role of ontologies. > > (6) I am looking for supernovae that have both optical and Xray > measurements. Can I (should I) use UCD to help my search? You can certainly try. This depends a bit upon the specific UCD tree. If table UCDs are often of the form src.[type];instance you can search the registry for tables of the form src.supernova;instance Similarly ou can use search columns the phot and look for optical and X-ray qualifiers. If you get tables that meet all the criteria you've got a great place to start. I.e., look for UCDTree that matches the template src.supernovae;instance //phot*;em.xray*;value //phot*;em.optical*;value [Where this is some Xqueryish kind of match to the UCDTree hierarchy] and you've got a great chance that this has the information you wnat. Even if there is no match, you can consider tables that are partial hits and look to see if you can join information appropriately using multiple tables. A real win here would be a VO service to enable comparison of class information among tables. Right now source classes are a real mess and we need a Simbad- or NED-like service to address it. That will probably be needed if you want to interrogate the vast majority of UCDTrees that with have src;instance and some classification parameter inside. > > (7) How do I find 21cm observations (that may be redshifted), which also > have polarization information? > > Look for all tables that have columns that match flux.phot*;em.pol*;value to get polarization information Limit to tables that have a flux value in the 21cm region -- don't know if that has it's own special UCD -- or which has flux in the region from 10-100 cm and a redshift or radial velocity. This assumes there is some service that gives me the em qualifiers I need to look for given a specific energy/wavelength/frequency range. It wouldn't be too hard by hand either. This is a straightforward analysis of the column UCDs that doesn't need to worry about the structure UCDs at all. From hanisch at stsci.edu Wed Oct 22 10:15:37 2003 From: hanisch at stsci.edu (Robert Hanisch) Date: Wed, 22 Oct 2003 13:15:37 -0400 Subject: Fw: A suggested revision for UCDs Message-ID: <021901c398c0$1ec33c00$7deca782@stsci.edu> Here is a discussion that Tom and I had off-list, but I think are number of points of more general interest are raised. Warning -- it is quite long! Bob - - - - - Hi Bob, Thanks for the review and comments. I'm particularly interested in the areas that were unclear. It seemed to me that I needed to actually put the ideas out where I could get some detailed reactions. A fair number of typo issues were addressed in the version I uploaded to the Twiki and announced to the UCD, DM and DAL groups. Haven't heard of any reaction. I've responded to your comments below (there a lot of detail but I thought it user to think these things through). Tom Robert Hanisch wrote: > Hi Tom. I read through your revised UCD document this evening. Phew. > There is much in it I like, much I don't, and much I don't follow. Perhaps > the two latter categories mix together. > > I guess my biggest problem is that the roles of concept, attribute, and > modifier are partly defined by syntax (where they appear in the string) and > partly by having to know what names (em, pos, flux) have been allocated to > which category. This seems very arbitrary (and very confusing) to me. > Although I have never written a parser in my life, it looks to me like a > parser for this would be a zillion if statements. Maybe this is fewer if > statements than for other approaches, but it still looks very complex. > I agree that this is a major issue, although my biggest concern with it is a little different. I'm giving a long answer to help me organize my thoughts. The writer of a table presumably has access to the documentation for UCDs so it shouldn't be a big problem dealing with the three types -- especially once there are examples. The problem is more in using UCDs when reading tables. In practice I'm not sure this would be a big deal for 'real' tools. E.g., something like VOPlot is going to need to know about the value and meas.error attributes internally so that it can plot values and error bars for a given quantity. I.e., it's just going to look for pairs of columns within the same group of the form: SomeString;value and SomeString;meas.error A spectral processing tool is going to look for pairs like phot.flux*;value and phys.wavelength;value. Specific tools internalize this kind of knowledge -- or even better read it in as a data model. These tools don't really know about how UCDs are organized. The organization is intended to make it easy for them to search for the appropriate strings, but they just take advantage of that. Generic tools for manipulating UCDs and for validating them are where the problem really begin to show up. Currently there are only 6 trees that are not basic concepts (em, frame and intent for modifiers and filter, stat and meas for attributes). I think the single word attributes are important enough that they will not cause a problem. So a complete algorithm to determine what word belongs in what vocabulary is currently pretty easy... Psuedocode is just: firstAtom = substring(ucd, index(ucd,".")) switch (firstAtom) { 'em', 'frame','intent': return thisIsAModifier 'stat','filter','meas': return thisIsAnAttribute 'value','local','instance','multiplet', 'vector': return thisIsAnAttribute } return thisIsAConcept Alternatively we're talking about validating UCDs against an IVOA schema to define the valid words and the match against this could give the type. There are other simple ways to deal with this: Begin all modifiers with m. and attributes with a. Or I've suggested in the draft that all modifiers could be in the frame tree -- the idea is that the role of modifiers is to limit the context to which the concept applies. I don't think the attribute trees join as easily but if it's important enough we could pick a name for all of the attribute trees. The biggest problem is non-standard namespaces. How do we handle a new UCD tree? In some sense the issue is moot. Non-standard words shouldn't be used outside of some developers local context. They can be responsible for handling them. However I suspect that non-standard words will escape into the wild. The validate against the schema approach still works, but it's impossible for writers for tables to know how to use these UCDs. There are some other ideas that might help address this issue: Your suggestion of another separator character is nice. I thought about it but decided that it was too radial a change. Maybe separate atributes and modifiers within themselves by commas but separate them by '-'s. e.g., a complex UCd might be: flux.phot-em.optical,intent.calculated-meas.error,stat.max I'd still like to keep the vocabularies separate, but now it's trivial to parse the UCD. For the moment I tried to minimize the change from the original proposal. Note that this is all much harder in the original proposal. There is no way to tell what anything after the first word is. In that proposal the first word is a property, but all subsequent words can be either properties or concepts. Nor there any lexical definition of what a property is (i.e., any word can be a property). > The document has a lot of signs of a rush job -- is it Uniform or Unified? > (Unified, I think.) I always thought it was Uniform so that wasn't a typo but an error or my part... Sigh... Is flux a 0-level concept? Or is it phot.flux? That I think is fixed in the published version (it's always phot.flux) On p. > 3 you say that units are not part of UCDs, but on p.16 you create a UCD, > phys.degrees;value I wasn't quite sure what the UCD should be there. Maybe phys.angle.separation;value? , that is all about units. On p.12, I really like the > typo(?) in 'pudding' (pubbing). Alas that is also fixed. [That kind of error must reflect some curious things about the mind. I clearly picked the mirror image letter even though the typing motion for it is nothing like 'd'] > > I'm not sure how others have reacted -- have not gone to the UCD list yet to > see. But I was particularly confused by the following things. > > o p.4, you say that > > phot.flux;em.optical;intent.calculated;value > > is equivalent to > > phot.flux;intent.calculated;em.flux;value > > But there must be a mistake here. Shouldn't 'flux' in the second line be > 'optical'? And isn't the first form illegal if alphabetical order is > required? The typos in the UCD were fixed and I hope that would help clarify what I was trying to say. The two UCDs should have been phot.flux;em.optical;intent.calculated;value and phot.flux;intent.calculated;em.optical;value The statement I was trying to make was that there is no natural reason to prefer one of these to the other, so we had to choose an arbitrary rule to try to ensure uniqueness of UCDs. Thus indeed the second is illegal. > > I find the goal of brevity at conflict with the goal of clarity. What does > 'em' mean to a human reader? Why 'src' and not 'source'? Why 'value' and > not 'scalar' (parallel structure to 'vector')? Why default on 'value' in a > otherwise well-defined ontology? I can't really argue with most of these. The tension between various goals it why I tried to list them all together. I would be happy to change to longer words. The default for value was just meant to be a convenience for writers of tables. If it confuses things I'm happy to drop it. I like value rather than scalar because a value can be a vector quantity. E.g., if we have a cell that contains an array of fluxes it's UCD might be phot.spectrum;value That's because the concept of spectrum is inherently non-scalar. A field that had a UCD of phot.spectrum;vector would imply that each cell contained an array of spectra (i.e, that the cell was presumably a 2-d array). However this is no big deal. > > I think if a clear distinction is to be made between attributes and > modifiers, it must be encoded explicitly (i.e., not just based on a list of > magic words). I do not like the semicolons as delimiters; this is not what > they mean in English grammar. (The semicolon in the last sentence was used > properly. The second clause is not necessarily a direct modifier of the > first, but rather is related in some intimate way.) This is fine by me -- I gave an example above using different separators. I think the grammar is just as simple. > > I don't understand how to use the concept 'concept' in a practical sense. > Well I tried to give two examples: If you have a VOTable in an editor how do you find the fields that don't have a defined concept? If a user simply omits the UCD field it's kind of painful to find them. However one can just do a string search for "concept" if the user has entered ucd='concept;value' to explicitly mark that the underlying UCD is unknown. The real reason is given in the last example in section 5. When correlating two tables that describe different kinds of quantities, e.g., sources and observations, I need to be able to describe what the ouput table is. There are two objects in every row so it's a multiplet (in my scheme), but what kind of multiplet? I can't call it a source, and I can't call it an observation, so I need to go up to a more generic word, i.e., concept. Basically it just provides the root for entire concept hierarchy. If we really wanted to be regular, we could start all of the base concepts as using this word... > Your definition of 'pos' does not include solar or planetary coordinate > systems, though later you give an example that does. I don't know what the current hierarchy under pos is... What I'd guess is that it would contain something like: pos.body.lat and pos.body.lon and then the frame modifier would be used to specify which body. [Or maybe I left an inconsistency in from the previous version] > > 'intent' is defined as the 'human context' of the concept. Huh? How are > 'calculated', 'predicted', and 'simulated' anymore human concepts than > 'observed' or 'measured'? Observed and measured would be fine additions here except that they are likely to be considered the default. I.e., a time.exposure;value is assumed to be the measured time, so I don't need to put that in. [Note that meas is short for measurement]. The explanation probably needs to be better, but I think we need some kind of modifier that distinguishes between 'real' values and predicted, scheduled, calculated, ... values. This doesn't come up so much in VizieR tables, but many of the tables that I deal with are riddled with situations where I may have an allocated exposure time, a predicted exposure time and an actual exposure time. So something is assumed to be actual/measured/observed unless an intent is specified. > > In 4.4 you insist that full words should be used ('electron' instead of > 'el'), but at the same time assert that 'phys', 'temp', 'em', etc., are all > ok. I don't have a horse in this race... I tried to match the usage of the previous paper, but I'd be happy to go either way. > > Example 2 (p.14) does not convey to me anything semantically different if I > disregard your comments. How am I supposed to understand something about > guide stars and plate centers from the structure of the UCDs alone? I take > issue with your assertion that "both software and humans should have no > trouble distinguishing the very different semantics of the two tables." > Well... I'd hope that by looking at the table UCD, you would immediately note that one table returned source information and the other returned observation information. That's no small matter. The structure immediately shows which concept is subordinate to the other. The actual semantics of the relationship were not described. You could do that if you want that level of detail. I'm not sure what the right UCDs are. E.g., in the source table might have included (hope the indentation survives the mail): obs.instance meta.id;value pos;meas.center pos.eq.ra;value pos.eq.dec;value I guess if we really want to include the concept of a guide star in the UCD hierarchy, they probably belong in the base concept or maybe in frame somehow, but I think this is too detailed. If we went ahead with it... The guide star might be src;frame.usage.guiding;instance meta.id;value pos.instance Note that in the first case it's the position that got the extra information, because the observation is just a standard observation (as far as we know). In the second case we're suggesting that this is a special kind of source. But I don't think I want to put that in the relatively simple examples. What I was trying to show was how the need for main columns has disappeared and that we could get source or observation information from either table with equivalent ease. > I don't like 'arith' as a concept. 'math' would be ok. If we need it at > all. Well I did try to discourage it... I have no problem with math. > > I don't like 'soft' as a concept. Is it so bad to just say 'software'? All > this stuff will be encoded in XML, which is notoriously verbose. If we > chose unclear abbreviations we will obscure whatever semantic meaning is to > be found. Fine with me... > > OK, a lot of these criticisms are not really directed to you, but to the > predecessor document. I understood your presentation in Strasbourg (I > thought) but do not follow the document sufficiently well that I would ever > be comfortable promoting it forward. I did not like Roy and Sebatien's > premise that concept and property could morph, one into the other, depending > on context. I do like your attempt to structure things more rigidly. It > seems to me not rigid enough. And when I ran into phys.degrees I felt like > the whole thing was falling down around me. The concept is an angular > distance, which of course can be expressed in degrees, radians, arcsec, etc. Agreed... {see above) > > It might be worth our time to look at the AIPS++ measures definitions. If I > were to construct a quick hierarchy, what we are trying to do here is > distinguish various sorts of measurements, metadata about those > measurements, and metadata about the people/organizations associated with > those measurements. So our fundamental concept is a measurement, of which > there are various sorts: > > measurement > photometric > spectroscopic (which is just photometric per wavelength in an ordered sort > of way) > astrometric ('pos') > temporal > instrumental > > Ancillary information about measurements comes in the form of metadata: > > metadata > identifiers > people > organizations > > And we may have some special classes: > > software > source (to collect measurements of an object in space-time) > > Measurements are taken in bandpasses, and in certain coordinate frames, and > from either the real universe or from computer simulations. A bandpass is a > 'frame' restricting coverage in the em-spectrum. A coordinate frame > describes a restriction on the spatial coverage. The idea of 'intent' has > nothing to do with anything; it is simply a mode of collecting measurements. > > Allright, enough of my rantings for this evening. I applaud your attempt to > add rationality to Roy and Sebastien's work, but feel we still have some way > to go. > Thanks... I don't disagree with what you are saying and I hope that we can a least reopen the discussion. Tom From norman at astro.gla.ac.uk Wed Oct 22 10:43:48 2003 From: norman at astro.gla.ac.uk (Norman Gray) Date: Wed, 22 Oct 2003 18:43:48 +0100 (BST) Subject: A suggested revision for UCDs In-Reply-To: <3F9588DF.3030805@lheapop.gsfc.nasa.gov> Message-ID: Greetings, all, and Tom in particular. On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > A few minutes ago I uploaded a version of my suggested revised > proposal for UCDs to the Twiki. This is just a Word version since > I don't have a PDF generator handy. The URL is > http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.doc I've appended a (longish) set of comments below. I've just noticed that Bob has forwarded a long set of comments to the list. I haven't read those yet. By the way, I notice that this announcement/discussion has been posted to no fewer than _three_ lists, namely ucd, dm and dal. It would be at the very least neater if it were on only one -- ucd at ivoa.net is the obvious one. What do folk think -- are there folk on the other two lists who have an interest in this and aren't on the ucd at ivoa.net list? I'm sure I'm not the only one to find Tom's proposal very thought-provoking. The suggestions bring up several new use-cases; and the idea of the `local' atom in particular is valuable, and a gap in the 1.9.9 proposals (though I'd put it in a different place). I think there are very likely several places in the 1.9.9 proposals which are underspecified, and some where I personally would probably explain things slightly differently from Roy and Sebastien, but these are editorial matters. I have a few difficulties with some aspects of Tom's proposal, however, which I'll discuss here, and add a few more general remarks at the end. I'm speaking for myself of course, rather than the group of authors, and thus it's probable that my opinion and interpretation of some 1.9.9 points is at variance with others in the group, or goes beyond what the document aims to say (which would be a useful datapoint). Most urgent, I think, is Tom's discussion, in his section 4.5, of the distinction between his proposals and the 1.9.9 ones. These are crucial, since these criticisms are what would ultimately justify replacing the 1.9.9 proposals with Tom's more complicated ones. In the 1.9.9 proposals, the function of a word is always the same: some things such as `src' are concepts (and only concepts), and every other word names a property. The distinction is that concepts can't have a value, but can have properties; and a property always has a value. Now, the property;concept _pair_ also names a concept, which can therefore have properties in turn (this has the same potential as Tom's proposals for generating long UCDs in principle, but probably very unlikely in practice). There will doubtless be some rather formal language which makes this cast-iron, but it's actually fairly intuitive once you get the property/concept dichotomy and read `;' as `of a' or something like that. Section 3.1 in the 1.9.9 proposals -- the crucial section of the document, for which everything else is to some extent just scaffolding, and without which the rest of the document makes rather less sense -- is what attempts to describe this. Perhaps that explanation needs work. At any rate, I do not believe that one has to sign up to the (basically ontology-inspired) language in that section in order to use the UCDs thus justified. Indeed, it might be useful for that section to be split into two, one to communicate the underlying idea to folk who simply want to _use_ UCDs, and another to reexpress it more formally for the ontology enthusiasts. In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any string of words can be determined to be illegal in the old scheme''. I'd probably agree in outline: there are significantly fewer rules necessary in the 1.9.9 proposals than in Tom's proposals. The only place a base concept can go is in the right-most position, and thus you can't have a concept sitting on its own, since the left-most position is the name of the property, the value of which is the number/column/whatever which has been annotated by this UCD (the syntactic mechanism for making that annotation is outside of scope for the UCD proposals, I'd think). Also, there are some property-concept pairs that make no sense, such as stat.err;src. But that's about it -- you don't need any more rules than that. Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does look unwieldy (but note there's no need for parentheses in the 1.9.9 proposals), but I get the impression that the `arith' UCD tree was to some extent a kite being flown, and I for one would be surprised if it made it much beyond this version, partly because it would seem to encourage such odd-looking UCDs. Also, there's no tying of one table to another in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and quite properly so: I'll mention this below). The 1.9.9 proposals allow no ambiguity in the way that UCDs are written: properties queue up in front of the single base concept, and ordering matters, so that stat.max;stat.err;phot.flux is different from stat.err;stat.max;phot.flux. More specific points in Tom's proposal, in document order rather than any other (section references are to Tom's document): Section 4.1: Bringing the number of terms up to three -- concept, attribute and modifier -- reminds me of the qualifier/modifier idea that was in previous versions of the draft, which I still think is an unstable distinction, and which Roy and Sebastien thankfully managed to get rid of by simplifying the syntax down to just concept plus properties (but see below). Also, there's no syntactic distinction between modifiers and attributes, so in order to apply the extra ordering rules for those, or even to break the UCD into its three parts, you have to know which words are of which type. That is, you can't do it at parse time. Section 4.1.2 (not an important point, I don't think): I'm puzzled at the requirement that words in the non-standard namespace must be distinct from all words in the IVOA namespace. The point of having a namespace is to make this possible, or (since such duplication would surely be condemned as bad practice) at least not an error. The rule also means that if a new word were added to the IVOA namespace which happened to match a word in a private namespace, the namespaced UCDs would thereby suddenly become invalid, with no change in the spec. Section 4.2.2: The `intent' modifier has no corresponding notion in the 1.9.9 proposals, but it's not clear to me where in those proposals this would fit in, and I think this is a _problem_ for the 1.9.9 proposals. I can see how it would fit in to what I take the underlying 1.9.9 model to be, but not into the serialisation of that model that the 1.9.9 syntax represents. I can see three approaches to this problem within the general framework of the 1.9.9 proposals. (i) Rule it out of scope: it's not UCD's problem to talk about what values are intended to be, since they're only for data discovery, and are not required to be capable of driving analysis, so that if this `intent' distinction matters to you, you're going to have to understand the utype somehow. (ii) Add modifiers like this to the 1.9.9 model and syntax: that's potentially quite a lot of work, since it would require thinking very clearly about just what the distinction is between modifiers and properties, _and_ working out a usable syntax for adding them in -- they _have_ to be distinguishable at parse time. (iii) Think about it more and discover a way they can be viewed as properties in a principled way. The point isn't just about this `intent' modifier: if we can convince ourselves that there are things like `intent' (and that they're in scope) which are in principle qualitatively distinct from properties (and I would at least dispute that `em' and `frame' count here), then that has to be dealt with. Perhaps this example will help us find the stable distinction between `qualifiers' and `modifiers' that escaped us in earlier versions. Section 4.2.3: The `value', `vector', `instance' and `multiplet' attributes seem overly complicated. The `value' attribute is not required in the 1.9.9 proposals because all properties have a value, namely the value they're being used to annotate. The other three seem artefacts of the `complex UCDs' which Tom is introducing in these proposals. These complex UCDs seem problematic to me because they seem tightly bound to VOTable. That destroys the orthogonality of the UCD and VOTable specs (the W3C has had _terrible_ trouble with non-orthogonal specs, tying itself in knots trying to resolve their dependencies on each other), and makes it harder to use UCDs in other contexts, such as queries. I feel that UCDs should be seen as annotating a `thing', whether that `thing' be a value, a column, a group, or a query `phrase', and it should be the responsibility of whatever defines the syntax of that annotation (that is, VOTable or SIA) to define precisely what the thing is that the annotation applies to. Thus, VOTable might say that when a UCD appears in a then it indicates a set of relationships between the corresponding entries of the table; when it appears in a it means something different; and so on. Dealing with the typing and complexity issues of this in a general way within the UCD spec would surely make it impossibly unwieldy and limit its scope. This is also a general worry for all of Tom's Section 5; I really think this should be out of scope for UCD, to the extent that Tom's ``The grouping does not describe the semantics of the relationship. That is the role of UCDs'' would be much better as ``The grouping describes (some of?) the semantics of the relationship. That is not the role of UCDs''. This is a can of worms. Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals. Another way of dealing with it would be to say that a UCD `local.X' meant exactly the same as the `X', but was not comparable with it. More general points: Tom's document seems to discuss his proposals in object terms. However the property-concept parts of the UCD proposal are _not_ an object model, and if you cram them into an object model, they won't fit, and the result will inevitably look like a mess, and look backwards. The model is simpler than this, however: things which are purely concepts (such as `src') don't have values. Concepts do have properties though, and these properties have numeric values, namely the numeric values we're trying to annotate with this UCD. As regards ordering, yes, as Tom said, it doesn't fundamentally matter, and it's just a matter of syntax, rather than of the model. However having the property first seems natural, since it's this which posesses the numerical value which is being annotated, and so it's this which I would have thought it best would be shown up-front. Now, there is a _vague_ object model implicit in the construction of the UCD words like `pos.eq.ra', but this is only because, along with the replacement of underscores with dots, came the explicit freedom to crop each word at a dot from the right, and use the result as a UCD word also. This prompts a natural perception of the words as hierarchical, or object-oriented if you must. The actual words are basically little changed from the original UCDs, though there's a review of these under way. These words weren't the main point of the UCD2 proposals. At present these words are those mined from the column names actually occurring in the databases in the CDS collection; they are thus unprincipled. Whether this is a good or a bad thing is an open question. I'm sure it is this which causes some people (I'm thinking of Gerard Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra for special deprecation as incoherent. If you believe that principled generation of UCD words would be a Good Thing (and that would probably be my prejudice), then I suspect that paths in (say) Gerard and Pat's model would be a good way to do it (do Gerard and Pat claim that every UCD word is thus expressible?). If you believe, on the other hand, that the mined nature of the words is of primary importance (and I can see the force of that, too), then they might need little more than a review or tidy-up, to make sure that the `croppability' is reasonable in fact, and that the implications, or suggestions, of the words chosen do in fact fit in with a properties-based model (or whatever we end up with). Phew! I think that's probably quite enough for just now -- I should let someone else get a word in. All the best, Norman -- --------------------------------------------------------------------------- Norman Gray http://www.astro.gla.ac.uk/users/norman/ Physics and Astronomy, University of Glasgow, UK norman at astro.gla.ac.uk From Edward.J.Shaya.1 at gsfc.nasa.gov Wed Oct 22 11:35:46 2003 From: Edward.J.Shaya.1 at gsfc.nasa.gov (Ed Shaya) Date: Wed, 22 Oct 2003 14:35:46 -0400 Subject: UCD changes on top of McGlynn's changes Message-ID: <3F96CE02.8020804@gsfc.nasa.gov> I have made numerous tracked changes in the 1-9.9b of McGlynn and created a 1-9.9c. So if one has a recent version of Word one can accept or reject these changes as seen fit. I think someone will need to post this to the UCD and dal lists cause I am not on them. Here are some "highlights" -------------------------------------------------------------------------- The term property was used in a confusing manner. Everything was at some point in the specification referred to as a property including the basic concept as well as the attribute and modifier. So, I changed attribute property to attribute and modifier property to modifier. I think this is pretty good but it is missing something. The modifiers are in effect extending the hierarchicial tree of basic concepts into the virtual tree of full concepts. This is not mentioned and probably should be. But, if it is true then sometimes the order of the modifiers should make a difference. I don't have an example but I am worried that there will be concepts that require A;B;C and other concepts that are A;C;B but they are not the same. I don't see how to ensure that that never happens. ------------------------------------------------------------ Word and atom were a bit confused. Atoms looked like words to me and Words were clearly composed of several words (not good). There was no atom in the Backus-Naur notation. But there were examples of atoms of the physics type (yikes). So I made the following simplification. atom -> word word -> term word-component -> word ----------------------------------- I changed a few occurrence of "column" to "contents" so that it did not seem that this was for tables only. And so the Contents in "UCD" would be meaningful. -------------------------------------- Why is there a meas.error and a stat.erro, and one is a concept and one is an attribute? Perhaps this was suppose to be a stat:max? 1 x1:experimental.quantity;x2:new.modifier;stat.error ---------------------------------------------------- Why not have a different symbol to separate the attributes from the base and modifiers? pos.eq;phys.electron#value;vector pos.eq;phys.electron#stat.error;vector This is clearer. It says, if you are looking for instances of the concept of a positional measure of electrons, here it is. By the way it is in vector format and there is an error associate with it, you may need to transform this format. Queries will be keying on the concept and so it should be cleanly separated. If the query finds additional attribute information it may grab them for completeness even if they were not specificied in the query. --------------------------- Why not allow a namespaced term to reuse existing term? That is what namespace is for! 1 phot.flux;x1:meas.error Namespace reuses existing term-------------------------- In the Group of pos.eq.ra and pos.eq.dec the UCD of the group should be pos.eq, not pos.instance. The "eq" is there because one should be as specifying as possible. The instance should not be there since this is a table level term and is redundant here. ------------------------------------------------------------ I don't buy the idea that the main pos is always in the least indented or grouped column. This is a extremely fragile and restrictive way to go. There are many ways that the targets are in a more groups than the plate positions or the guide stars. What if the target stars have position groups and so one wants a second grouping of ra, dec l,b or what if the grouping of stars is by cluster or by spectral type or by accuracy of measurements etc? That is not to say that I like the idea of a main UCD. Rather, the best way is to ensure that the structural container of the data has a way of refering the properties to the objects that has these properties. A quanitty needs to have a isPropertyOf attribute that refers to the object. So, a positional property column should have isPropertyOf="column(starName)". The default could be isPropertyOf=column(1). --------------------------------------------------------- Finally. I find it curious that this system makes no descrimination between properties of objects (color, brightness, distance, size) and objects (electrons, atoms, planets, stars, galaxies). Every time one uses a UCD-property there must be implicitly a UCD-object that has been left off. A brightness is always of a star or a planet or a human. A query system must then be able to infer this from other metadata in the dataset. Therefore one needs to ensure that every data set has somewhere atleast one UCD-object. This will be hard to do if they are not somehow separated out I just wanted to point that out. It may or may not be a fatal flaw. Ed -------------- next part -------------- A non-text attachment was scrubbed... Name: UCD-1.9.9c.doc Type: application/msword Size: 387584 bytes Desc: not available URL: From patrick.dowler at nrc-cnrc.gc.ca Wed Oct 22 12:19:29 2003 From: patrick.dowler at nrc-cnrc.gc.ca (Patrick Dowler) Date: Wed, 22 Oct 2003 12:19:29 -0700 Subject: A suggested revision for UCDs In-Reply-To: References: Message-ID: <200310221219.29068.patrick.dowler@nrc-cnrc.gc.ca> On Wednesday 22 October 2003 10:43, Norman Gray wrote: > I'm sure it is this which causes some people (I'm thinking of Gerard > Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra > for special deprecation as incoherent. If you believe that principled > generation of UCD words would be a Good Thing (and that would probably > be my prejudice), then I suspect that paths in (say) Gerard and Pat's > model would be a good way to do it (do Gerard and Pat claim that every > UCD word is thus expressible?). That particular example was chosen to show that the different parts of the POS_EQ_RA_MAIN (UCD1) are quite different types of things with different realtionships to the thing being described. In the data model, POS is the "position" phenomenon, EQ is a particular ReferenceSystem, RA is one component of a point (the type used to represent a position in that ReferenceSystem). I still don't know that MAIN would belong in a data model. It isn't "incoherent" so much as it includes some very different kinds of things. In the concept;property style of UCD2, I don't think it is inconsistent with the data model we presented. The difference between UCD and DM is that in a model one explicitly states the relationship between the concept and the property, So, for example, in our DM we are expliclty saying that the relationship between a position and the ReferenceSystem is a different relationship that between the position and the RA (a component of the data type/structure). UCD2 leaves the relationship implicit by having only one relationship: "propertyOf". Whether that's good or bad thing is an open issue. -- Patrick Dowler Tel/T?l: (250) 363-6914 | fax/t?l?copieur: (250) 363-0045 Canadian Astronomy Data Centre | Centre canadien de donnees astronomiques National Research Council Canada | Conseil national de recherches Canada Government of Canada | Gouvernement du Canada 5071 West Saanich Road | 5071, chemin West Saanich Victoria, BC | Victoria (C.-B.) From tam at lheapop.gsfc.nasa.gov Wed Oct 22 13:14:39 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 22 Oct 2003 16:14:39 -0400 Subject: A suggested revision for UCDs In-Reply-To: References: Message-ID: <3F96E52F.8060402@lheapop.gsfc.nasa.gov> Hi Norman, Thanks for your comments. I've got some responses for some, I'm glad you found some good in this! In keeping with your suggestion I've only sent this to the UCD group. I sent the original proposal to the other groups since I thought the use of complex UCDs is relevant, but I'm sure we can live without three copies and hopefully they have subscribed so they can hear this interesting(!?) debate. Tom Norman Gray wrote: ... > > Most urgent, I think, is Tom's discussion, in his section 4.5, ... That was really meant to be at the end of section 4, rather than part of 4.5... >... of the > distinction between his proposals and the 1.9.9 ones. These are crucial, > since these criticisms are what would ultimately justify replacing the > 1.9.9 proposals with Tom's more complicated ones. > > In the 1.9.9 proposals, the function of a word is always the same: > some things such as `src' are concepts (and only concepts), and > every other word names a property. The distinction is that > concepts can't have a value, but can have properties; and a property > always has a value. Now, the property;concept _pair_ also names a > concept, which can therefore have properties in turn (this has the > same potential as Tom's proposals for generating long UCDs in > principle, but probably very unlikely in practice). There will > doubtless be some rather formal language which makes this cast-iron, > but it's actually fairly intuitive once you get the property/concept > dichotomy and read `;' as `of a' or something like that. Well there are two distinct issues here. First lexically how can I tell what is a property and what is a concept? In 1.9.9 there is no way to tell. The only lexical statement it makes is that the first word is a property. What about the second word, or the third? The 1.9.9 proposal allows properties that modify other properties. The second issue is the semantic confusion. The second word can be (according to section 4.2 in 1.9.9) a concept referred to another property referred to information related to the primary word. In section 3.4 in defining the error in RA of a galaxy we have the phrase We identify the central property as "error", and the concept as right ascension", with a subsidiary word about "galaxy". So in fact version 1.9.9 has all three of concept property and modifier but just tries to hide the fact and doesn't give you any way of telling which is which... Suppose I give one a UCD of word1;word2;word3 Does word3 modify word1 or do both of them modify word1? No way to tell. stat.error;phot.flux;em.optical (word3 modifies word2) or phot.flux;em.optical;src.galaxy (word3 modifies word1) There is an explicit statement that some things (pos.eq.ra) are sometimes concepts and sometimes secondary words and the rules make it trivial to build UCDs that are simply incomplete semantically. stat.error is a perfectly valid UCD but it has no rooted semantic content. You may say "Well what about meta.id, isn't that the same?" Not really because I can suggest (and indeed I do suggest) that UCDs need to be interpreted in the context of what the table is about. So for a source table meta.id refers to the id for a source, for an observation table meta.id refers to the id for a source. But what does stat.error refer to.... The error in the source? That doesn't make sense. > > Section 3.1 in the 1.9.9 proposals -- the crucial section of the document, > for which everything else is to some extent just scaffolding, and without > which the rest of the document makes rather less sense -- is what attempts > to describe this. Perhaps that explanation needs work. At any rate, I do > not believe that one has to sign up to the (basically ontology-inspired) > language in that section in order to use the UCDs thus justified. > Indeed, it might be useful for that section to be split into two, one to > communicate the underlying idea to folk who simply want to _use_ UCDs, > and another to reexpress it more formally for the ontology enthusiasts. > > In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any > string of words can be determined to be illegal in the old scheme''. > I'd probably agree in outline: there are significantly fewer rules > necessary in the 1.9.9 proposals than in Tom's proposals. The only place > a base concept can go is in the right-most position, and thus you can't > have a concept sitting on its own, since the left-most position is the > name of the property, the value of which is the number/column/whatever > which has been annotated by this UCD (the syntactic mechanism for making > that annotation is outside of scope for the UCD proposals, I'd think). > Also, there are some property-concept pairs that make no sense, such > as stat.err;src. But that's about it -- you don't need any more rules > than that. > > Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does > look unwieldy (but note there's no need for parentheses in the 1.9.9 > proposals), but I get the impression that the `arith' UCD tree was to > some extent a kite being flown, and I for one would be surprised if it > made it much beyond this version, partly because it would seem to encourage > such odd-looking UCDs. Also, there's no tying of one table to another > in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and > quite properly so: I'll mention this below). > Sorry that's a typo... Should have said tying of one column to another. > The 1.9.9 proposals allow no ambiguity in the way that UCDs are > written: properties queue up in front of the single base concept, and > ordering matters, so that stat.max;stat.err;phot.flux is different > from stat.err;stat.max;phot.flux. What I call attributes and the properties you specify here are indeed largely unambiguous in both cases. However what I call modifiers and what 1.9.9 calls either subsidiary words or 'information related to the primary word' are less clear. E.g., suppose I'm detecting circularly polarized light in the radio. That natural UCD for this would be: phot.flux;em.radio;em.polarized;circular or is it phot.flux;em.polarized;circular;em.radio or do we have to multiply the size of the vocabulary to add polarized and polarized.circular (and the other variants) to every wavelength spec we have? That seems silly... So we need to fix that. There's a similar problem with arith (another nail for the coffin perhaps) is it arith.sum;property1;property2 or arith.sum;property2;property2 And this general idea that properties can refer to other properties in an uncontrolled way... Here's a UCD describing the flux of galaxies... phot.flux;em.optical;src.galaxy or is it phot.flux;src.galaxy;em.optical ? E.g., suppose I have a column that is the maximum flux from any of three wavebands. Can I write stat.max;flux.phot;em.optical;flux.phot;em.xray;phot.flux;em.radio I hope not, but the document seems to encourage it. This would be illegal in the revision since it includes three base concepts. > > > > > More specific points in Tom's proposal, in document order rather than > any other (section references are to Tom's document): > > Section 4.1: Bringing the number of terms up to three -- concept, > attribute and modifier -- reminds me of the qualifier/modifier idea > that was in previous versions of the draft, which I still think is an > unstable distinction, and which Roy and Sebastien thankfully managed > to get rid of by simplifying the syntax down to just concept plus > properties (but see below). ... but they haven't, they have just not told you about the difference. The words stat.max, em.optical, and phot.flux have distinct grammar rules in how they are used in 1.9.9 but you have no way to tell that. I.e., phot.flux can appear as he initial word or any subsidiary word but can never appear before stat.max or any other word of the class I would call attribute. em.optical can appear after words of the same class as phot.flux and possibly after words of the same class as itself. stat.max can appear anywhere in a UCD but it really should appear either before either a word of the class of phot.flux or a word of its own class. There are three kinds of words and we should just recognize that in the grammar. Also, there's no syntactic distinction > between modifiers and attributes, so in order to apply the extra > ordering rules for those, or even to break the UCD into its three > parts, you have to know which words are of which type. That is, you > can't do it at parse time. Sure you can. At least if the number of modifiers remains small. Note that table writers should have access to appropriate documentation when writing their tables (or writing the software that writes tables) so even if it gets more complex the writers have no problems and the readers don't care. See my response to Bob on this issue. I've suggested that all modifiers be put in the frame tree, though largely to address this issue. > > Section 4.1.2 (not an important point, I don't think): I'm puzzled at > the requirement that words in the non-standard namespace must be > distinct from all words in the IVOA namespace. The point of having a > namespace is to make this possible, or (since such duplication would > surely be condemned as bad practice) at least not an error. The rule > also means that if a new word were added to the IVOA namespace which > happened to match a word in a private namespace, the namespaced UCDs > would thereby suddenly become invalid, with no change in the spec. > This idea is copied from the previous proposal. I think the idea is that we don't want proliferation of new uncontrolled UCDs. I put this in a separate section, but I believe the content is the same as the previous proposal. I leave it to others to decide which is right. > Section 4.2.2: The `intent' modifier has no corresponding notion in > the 1.9.9 proposals, but it's not clear to me where in those proposals > this would fit in, and I think this is a _problem_ for the 1.9.9 > proposals. I can see how it would fit in to what I take the > underlying 1.9.9 model to be, but not into the serialisation of that > model that the 1.9.9 syntax represents. I can see three approaches to > this problem within the general framework of the 1.9.9 proposals. (i) > Rule it out of scope: it's not UCD's problem to talk about what values > are intended to be, since they're only for data discovery, and are not > required to be capable of driving analysis, so that if this `intent' > distinction matters to you, you're going to have to understand the utype > somehow. That's not acceptable for observation tables. We frequently have multiple columns in a table which differ only in intent (proposed and actual exposure times), predicated and actual times of events, predicated and actual fluxes and we need to know which to use for various purposes. Spectral fitting will be sadly served if we can't put distinguish the calculated and actual spectra. What happens when we want to compare simulated and real data? (ii) Add modifiers like this to the 1.9.9 model and syntax: > that's potentially quite a lot of work, since it would require > thinking very clearly about just what the distinction is between > modifiers and properties, _and_ working out a usable syntax for adding > them in -- they _have_ to be distinguishable at parse time. I'm happy to trade intent for frame.human and put all the modifiers in frame. (iii) > Think about it more and discover a way they can be viewed as > properties in a principled way. The point isn't just about this > `intent' modifier: if we can convince ourselves that there are things > like `intent' (and that they're in scope) which are in principle > qualitatively distinct from properties (and I would at least dispute > that `em' and `frame' count here), then that has to be dealt with. > Perhaps this example will help us find the stable distinction between > `qualifiers' and `modifiers' that escaped us in earlier versions. Personally I take a modifier as something that limits the context of a concept. > > Section 4.2.3: The `value', `vector', `instance' and `multiplet' > attributes seem overly complicated. The `value' attribute is not > required in the 1.9.9 proposals because all properties have a value, > namely the value they're being used to annotate. The word value is the price I pay for making sure attributes, concepts are distinct. Personally I think it's worth it. The other three seem > artefacts of the `complex UCDs' which Tom is introducing in these > proposals. Vector is not... It's simply to warn the user that the column has is a vector. While VOTables have a array attribute that does this, I don't want to tie this proposal to VOTables... More on that below. .These complex UCDs seem problematic to me because they > seem tightly bound to VOTable. That destroys the orthogonality of the > UCD and VOTable specs (the W3C has had _terrible_ trouble with > non-orthogonal specs, tying itself in knots trying to resolve their > dependencies on each other), and makes it harder to use UCDs in other > contexts, such as queries. I feel that UCDs should be seen as > annotating a `thing', whether that `thing' be a value, a column, a > group, or a query `phrase', and it should be the responsibility of > whatever defines the syntax of that annotation (that is, VOTable or > SIA) to define precisely what the thing is that the annotation applies > to. Thus, VOTable might say that when a UCD appears in a then > it indicates a set of relationships between the corresponding entries > of the table; when it appears in a it means something > different; and so on. Dealing with the typing and complexity issues > of this in a general way within the UCD spec would surely make it > impossibly unwieldy and limit its scope. This is also a general worry > for all of Tom's Section 5; I really think this should be out of scope > for UCD, to the extent that Tom's ``The grouping does not describe the > semantics of the relationship. That is the role of UCDs'' would be > much better as ``The grouping describes (some of?) the semantics of > the relationship. That is not the role of UCDs''. This is a can of > worms. I think this is completely wrong. The grouping proposal has no special relationship to VOTables other than that they happen to support it. [Or they may soon!] Any other structure that supports groupings of tables would do just as well. This is a fairly natural attribute of object relational as well as hierarchical databases. It just that VOTables have finally decided to enable the natural abilities that XML's hierarchical structure supports. > > Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals. > Another way of dealing with it would be to say that a UCD > `local.X' meant exactly the same as the `X', but was not > comparable with it. > That's essentially what my proposal mod the order difference. > > > > > More general points: > > Tom's document seems to discuss his proposals in object terms. > However the property-concept parts of the UCD proposal are _not_ an > object model, and if you cram them into an object model, they won't > fit, and the result will inevitably look like a mess, and look > backwards. The model is simpler than this, however: things which are > purely concepts (such as `src') don't have values. Concepts do have > properties though, and these properties have numeric values, namely > the numeric values we're trying to annotate with this UCD. Sounds like objects and attributes to me... What's the difference here? But the old proposal doesn't agree anyway! Is phot.flux a concept? Seems like it to me. But it's a property in 1.9.9. Sometimes... In the UCD phot.flux. But it's sort of a concept in stat.err;phot.flux Or is it a property there? I'm not sure and there is no way to tell! By specifying a value attribute I've cleared away this confusion. > > As regards ordering, yes, as Tom said, it doesn't fundamentally > matter, and it's just a matter of syntax, rather than of the model. > However having the property first seems natural, since it's this > which posesses the numerical value which is being annotated, and > so it's this which I would have thought it best would be shown > up-front. This is not critical, but since I believe the model is analogous to the object/attribute relationship using the same order that has conventionally been used there is helpful. > > Now, there is a _vague_ object model implicit in the construction of > the UCD words like `pos.eq.ra', but this is only because, along with > the replacement of underscores with dots, came the explicit freedom to > crop each word at a dot from the right, and use the result as a UCD > word also. This prompts a natural perception of the words as > hierarchical, or object-oriented if you must. Well I don't have to but I sure would like to! .. The actual words are > basically little changed from the original UCDs, though there's a > review of these under way. These words weren't the main point of the > UCD2 proposals. > > At present these words are those mined from the column names actually > occurring in the databases in the CDS collection; they are thus > unprincipled. Whether this is a good or a bad thing is an open question. > I'm sure it is this which causes some people (I'm thinking of Gerard > Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra > for special deprecation as incoherent. If you believe that principled > generation of UCD words would be a Good Thing (and that would probably > be my prejudice), then I suspect that paths in (say) Gerard and Pat's > model would be a good way to do it (do Gerard and Pat claim that every > UCD word is thus expressible?). If you believe, on the other hand, that > the mined nature of the words is of primary importance (and I can see > the force of that, too), then they might need little more than a review > or tidy-up, to make sure that the `croppability' is reasonable in fact, > and that the implications, or suggestions, of the words chosen do in > fact fit in with a properties-based model (or whatever we end up with). > > As in 1.9.9 I didn't build a complete list but I agree that most words will transfer between the two proposals. From roy at cacr.caltech.edu Wed Oct 22 13:32:32 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Wed, 22 Oct 2003 13:32:32 -0700 Subject: UCD changes on top of McGlynn's changes References: <3F96CE02.8020804@gsfc.nasa.gov> Message-ID: <01c501c398db$9ef8ef80$6b91d783@cacr.caltech.edu> All: >From now on, please do not cross-post UCD material to dm and dal. Please send UCD-related posting to ucd at ivoa.net only. Those of you on the other lists, please join the ucd list to continue reading UCD material. Thank You Roy -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From tam at lheapop.gsfc.nasa.gov Wed Oct 22 13:40:59 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 22 Oct 2003 16:40:59 -0400 Subject: UCD changes on top of McGlynn's changes In-Reply-To: <3F96CE02.8020804@gsfc.nasa.gov> References: <3F96CE02.8020804@gsfc.nasa.gov> Message-ID: <3F96EB5B.8000001@lheapop.gsfc.nasa.gov> Hi Ed, Most of this ounds reasonable to me... Some short (just to show I can do it) comments. Tom Ed Shaya wrote: ... > I think this is pretty good but it is missing something. The modifiers > are in effect extending the hierarchicial tree of basic concepts into > the virtual tree of full concepts. This is not mentioned and probably > should be. But, if it is true then sometimes the order of the modifiers > should make a difference. I don't have an example but I am worried > that there will be concepts that require A;B;C and other concepts that > are A;C;B but they are not the same. I don't see how to ensure that > that never happens. I just haven't seen where it happens... So I'm crossing my fingers! > -------------------------------------- > Why is there a meas.error and a stat.erro, and one is a concept and one > is an attribute? > Perhaps this was suppose to be a stat:max? > > 1 x1:experimental.quantity;x2:new.modifier;stat.error > Probably just missed it when I introduced the meas tree. > > Why not have a different symbol to separate the attributes from the base > and modifiers? > pos.eq;phys.electron#value;vector > pos.eq;phys.electron#stat.error;vector Bob and you both suggested that. Sounds good to me. I'd probably have written the first as just pos.eq;phys.electron#vector > --------------------------- > > Why not allow a namespaced term to reuse existing term? That is what > namespace is for! Talk to Ray and Sebastien. I think they feel that it's best for UCDs to be highly controlled so that namespaces are only used for experimental terms before they are introduced to the standard namespace. Both approaches seem reasonable to me. > ------------------------------------------------------------ > I don't buy the idea that the main pos is always in the least indented > or grouped column. This is a extremely fragile and restrictive way to > go. There are many ways that the targets are in a more groups than the > plate positions or the guide stars. I don't want to suggest that one cannot build tables that would break the proposal. But I was unable to find a circumstance where I couldn't use structure to make the main elements clear. I.e., I'm not trying to cater to every reasonable structure for tables, but trying to see if the proposal allows sufficient flexibility for tables to express mainness assuming they are written by friends. I tried to allude to that with the sense that we'll need templates for how to use structures. What if the target stars have > position groups > and so one wants a second grouping of > > > > ra, dec > > > l,b > > > or what if the grouping of stars is by cluster or by spectral type or > by accuracy of measurements etc? In these examples it would be the responsibility of the table writer to think about how the table could be written to express what readers need to know. If we're allowed the full flexibility of the VOTable grouping structures, then we might have the same fields referenced more than once: near the root so that they are seen to be main, and within some more nested structures. This would use the reference capability. However I'd be interested if we have any actual instances of tables where this is needed or whether it's something that's possible but not very likely. > That is not to say that I like the idea of a main UCD. > Rather, the best way is to ensure that the structural container of the > data has a way of refering the properties to the objects that has these > properties. A quanitty needs to have a isPropertyOf attribute that > refers to the object. So, a positional property column should have > isPropertyOf="column(starName)". The default could be > isPropertyOf=column(1). With references to virtual columns I suspect this is equivalent to what I suggest, but it might be cleaner. > --------------------------------------------------------- > Finally. I find it curious that this system makes no descrimination > between > properties of objects (color, brightness, distance, size) and objects > (electrons, > atoms, planets, stars, galaxies). Every time one uses a UCD-property there > must be implicitly a UCD-object that has been left off. A brightness is > always of a star or a planet or a human. A query system must then be > able to > infer this from other metadata in the dataset. Therefore one needs to > ensure that every data set has somewhere atleast one UCD-object. This > will be hard to do if they are > not somehow separated out I just wanted to point that out. > It may or may not be a fatal flaw. > I guess this is what I'm typically indicting as the table UCD. E.g., the table UCD is the source, but the column UCDs are the properties of the sources. From roy at cacr.caltech.edu Wed Oct 22 18:02:10 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Wed, 22 Oct 2003 18:02:10 -0700 Subject: is UCD out of control? Message-ID: <05e001c39901$4a189d50$6501a8c0@Ropy> I must say that I find Tom's version of the UCD paper has a number of definite improvements, such as the importance of Groups, with the child inheriting UCD from its parent. However, I find the suggested syntax confusing and muddying. It seems to be going back to the old model of "base + other stuff" that we discussed in Cambridge. What I do not understand is how a machine would parse the other stuff, these modifiers and attribute properties and so on. I do not understand which is a modifier and which is an attribute. The reason we went with the new scheme was that we couldn't imagine writing code to disentangle the Cambridge scheme. In the 1.9.9 document, the first word of the UCD corresponds to the thing that has the units. In "stat.variance; phys.length", we know that the unit is L*L (its a variance). The second word was the concept to which this relates. Everything in UCD2 should be of the form "The of the ". Forget the attempts to justify three words. Leave that for UCD3. Every UCD has at most two words. Keep It Simple! In the 1.9.9 document, we tried to keep as close as we can to the metadata mines -- the 3000 tables of Vizier from which all this comes. We thought that had more validity than somebody (anybody) sitting down and inventing structure. Look at the problems we get when we move away from mining real metadata: Tom thinks that "error" belongs in a tree called "measurement", and the earlier version put it in a tree called "statistics". There is no right or wrong here, just opinion. I pointed this out in the earlier document concerning the "equinox" concept, but that has been deleted. We must make every attempt to follow what 3000 published paper have done -- not push our own opinions. In Tom's paper, there seem to be lots of new attributes (value, vector, multiplet, local, human, soft) that further stretch the scope UCD. If there are multiple values in a table cell, then the VOTable will indicate this in other ways. Perhaps Tom can put in a few more attibutes so we can find out if the data quantitiy is a float or an integer? UCD is about *semantic type*, not all this other stuff. What *real* tables use the "human" section? Are humans base, attribute, or modifier? I think we can all agree that UCD as currently formulated cannot express the complexity inherent in its task. What is really needed is a well-thought RDF vocabulary of predicates and objects, and that is the idea of UCD3. The intention of UCD2 is to provide a stopgap that will be backward compatible when UCD3 arrives. We use only one predicate for now "propertyOf". But Tom has chosen to remove all the discussion of why and what we are doing, where we are going, and driven instead down a road that tries to put a lot of complexity into this string representation. The result is something terribly complicated and not very understandable. Of course the proof is in the pudding. As usual in the VO, we are making a language that is very expressive, then hope to eventually write the code that understands it. So let's think it through now. How do I construct code that "understands" something like "phot.flux; em.optical; intent.calculated; value". I want to know what kind of data structure can be created from this, I want to know how to compare UCDs, I want to know how to convert a UCD into a human-readable description of what it represents. I know how to do these things with the 2-word property/concept style, but not with this grab-bag of attributes and modifiers. In conclusion is my IF ... ELSE clause: IF { we cannot find a killer app for UCD2, if we cannot write code to understand them, we should stick with UCD1, that has been improved and groomed in the last months. Then next year we can make UCD3. } ELSE { I like simplicity. I want to turn every table cell into " of the " so that every UCD2 would have at most two words. } -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From cgp at star.le.ac.uk Thu Oct 23 01:40:37 2003 From: cgp at star.le.ac.uk (Clive Page) Date: Thu, 23 Oct 2003 09:40:37 +0100 (BST) Subject: A suggested revision for UCDs In-Reply-To: <3F9588DF.3030805@lheapop.gsfc.nasa.gov> Message-ID: On Tue, 21 Oct 2003, Thomas McGlynn wrote: > A few minutes ago I uploaded a version of my suggested revised > proposal for UCDs to the Twiki. This is just a Word version since > I don't have a PDF generator handy. An off-topic note to Tom (and others with the same problem in producing PDFs): Many of us on Unix/Linux systems find Word documents inconvenient; I thought OpenOffice.org might cope with your file, but it produces illegible results without a lot of playing with fonts, and even then was very poor. Not long ago, however, I came across an on-line service for converting Word documents to PDF (or HTML) which seems quite useful: http://www.gobcl.com/ You upload your document and they email you the PDF a couple of minutes later. I haven't received any obvious spam as a result, and don't know what their privacy policy is, but for documents to be published, this doesn't seem important. I don't know why they provide this service free of charge, but I've found it useful. Actually I got an error message when I submitted Tom's recent UCD document, but the PDF came back anyway, and seems ok, still in glorious technicolor. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 From jcm at head-cfa.cfa.harvard.edu Thu Oct 23 04:58:57 2003 From: jcm at head-cfa.cfa.harvard.edu (Jonathan McDowell) Date: Thu, 23 Oct 2003 07:58:57 -0400 (EDT) Subject: A suggested revision for UCDs Message-ID: <200310231158.h9NBwv9a025610@urania.cfa.harvard.edu> > Many of us on Unix/Linux systems find Word documents inconvenient I strongly second Clive's remarks; thanks for putting the 1.9.9b PDF version up, there's now a chance I'll read it before this morning's NVO telecon. We're trying for interoperability here, and Word isn't that. Jonathan From tam at lheapop.gsfc.nasa.gov Thu Oct 23 05:58:20 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Thu, 23 Oct 2003 08:58:20 -0400 Subject: A suggested revision for UCDs In-Reply-To: <200310231158.h9NBwv9a025610@urania.cfa.harvard.edu> References: <200310231158.h9NBwv9a025610@urania.cfa.harvard.edu> Message-ID: <3F97D06C.8000105@lheapop.gsfc.nasa.gov> Jonathan McDowell wrote: >>Many of us on Unix/Linux systems find Word documents inconvenient > > > I strongly second Clive's remarks; thanks for putting the 1.9.9b > PDF version up, there's now a chance I'll read it before this morning's > NVO telecon. We're trying for interoperability here, and Word isn't that. > Jonathan > > It looks like Marco Leoni posted a PDF version shortly after I uploaded the original... The URL is http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.pdf So it's there! Cheers, Tom From jcm at head-cfa.cfa.harvard.edu Thu Oct 23 06:54:18 2003 From: jcm at head-cfa.cfa.harvard.edu (Jonathan McDowell) Date: Thu, 23 Oct 2003 09:54:18 -0400 (EDT) Subject: A suggested revision for UCDs Message-ID: <200310231354.h9NDsIii025671@urania.cfa.harvard.edu> Tom, Now that I've read the document and absorbed at least some of the emails, I must say that I like your proposal a lot. Although there are certainly details to be cleaned up, and I agree with most of Norman's and some of Bob's comments, it feels to me a more solid basis for a UCD2. My biggest beef (which goes directly against Norman's prejudice that UCDs are not an object model!) is that I don't like the distinction between "value" and "instance", especially given that array-valued "values" like "spectrum.value" are mentioned by you. I think your instance is just the value of a higher level term, and I would like to replace "pos.instance" with "pos.value", and then immediately drop the optional ".value" and just say "pos". Why is that a bad idea? Thanks for doing this work! Jonathan From arots at head-cfa.cfa.harvard.edu Thu Oct 23 06:26:21 2003 From: arots at head-cfa.cfa.harvard.edu (Arnold Rots) Date: Thu, 23 Oct 2003 09:26:21 -0400 (EDT) Subject: A suggested revision for UCDs In-Reply-To: Message-ID: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> It's a simple choice: those who shelled out the money for a Word license can either try to force the rest of the community to do the same by sending around Word documents, or spend themselves a little more by buying an Acrobat license and thus allow everybody to spend his/her money as (s)he sees fit. Personally, I think Word should always be bundled with Acrobat, at least in our community. - Arnold Clive Page wrote: > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > A few minutes ago I uploaded a version of my suggested revised > > proposal for UCDs to the Twiki. This is just a Word version since > > I don't have a PDF generator handy. > > An off-topic note to Tom (and others with the same problem in producing > PDFs): > > Many of us on Unix/Linux systems find Word documents inconvenient; I > ... > > -- > Clive Page > Dept of Physics & Astronomy, > University of Leicester, Tel +44 116 252 3551 > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head-cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From ael at star.le.ac.uk Thu Oct 23 07:29:05 2003 From: ael at star.le.ac.uk (Tony Linde) Date: Thu, 23 Oct 2003 15:29:05 +0100 Subject: Another pdf generator In-Reply-To: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> Message-ID: <007801c39972$031c1780$6124d28f@gnowee> A PDF generator that I use a lot is at http://www.pdf995.com/ - it is a printer driver that, afaik, is windows only but works well. Cheers, Tony. > -----Original Message----- > From: owner-ucd at eso.org [mailto:owner-ucd at eso.org] On Behalf > Of Arnold Rots > Sent: 23 October 2003 14:26 > To: Clive Page > Cc: ucd at ivoa.net > Subject: Re: A suggested revision for UCDs > > > It's a simple choice: those who shelled out the money for a > Word license can either try to force the rest of the > community to do the same by sending around Word documents, or > spend themselves a little more by buying an Acrobat license > and thus allow everybody to spend his/her money as (s)he sees > fit. Personally, I think Word should always be bundled with > Acrobat, at least in our community. > > - Arnold > > Clive Page wrote: > > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > > > A few minutes ago I uploaded a version of my suggested revised > > > proposal for UCDs to the Twiki. This is just a Word > version since I > > > don't have a PDF generator handy. > > > > An off-topic note to Tom (and others with the same problem in > > producing > > PDFs): > > > > Many of us on Unix/Linux systems find Word documents > inconvenient; I > > ... > > > > -- > > Clive Page > > Dept of Physics & Astronomy, > > University of Leicester, Tel +44 116 252 3551 > > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > > > -------------------------------------------------------------- > ------------ > Arnold H. Rots Chandra X-ray > Science Center > Smithsonian Astrophysical Observatory tel: +1 > 617 496 7701 > 60 Garden Street, MS 67 fax: +1 > 617 495 7356 > Cambridge, MA 02138 > arots at head-cfa.harvard.edu > USA > http://hea-www.harvard.edu/~arots/ > > -------------------------------------------------------------- > ------------ > From mchill at dial.pipex.com Thu Oct 23 07:11:26 2003 From: mchill at dial.pipex.com (martin hill) Date: Thu, 23 Oct 2003 15:11:26 +0100 Subject: Off topic document formats... In-Reply-To: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> References: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> Message-ID: <1066918286.3f97e18ebc274@netmail.pipex.net> Er, so everyone is forced to send around Acrobat documents? And buy Acrobat licences instead of Word ones? Isn't there a Word converter for Open Office? Even much more better still, how about posting as hmtl (if there are no diagrams)? Everyone can read it and almost everything has a converter to it. Or, since it's an AVO standard, I'm sure we could find some way of using VOTable... Cheers, Martin from the "Standards are evil" dept Quoting Arnold Rots : > It's a simple choice: those who shelled out the money for a Word > license can either try to force the rest of the community to do the > same by sending around Word documents, or spend themselves a little > more by buying an Acrobat license and thus allow everybody to spend > his/her money as (s)he sees fit. > Personally, I think Word should always be bundled with Acrobat, at > least in our community. > > - Arnold > > Clive Page wrote: > > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > > > A few minutes ago I uploaded a version of my suggested revised > > > proposal for UCDs to the Twiki. This is just a Word version since > > > I don't have a PDF generator handy. > > > > An off-topic note to Tom (and others with the same problem in producing > > PDFs): > > > > Many of us on Unix/Linux systems find Word documents inconvenient; I > > ... > > > > -- > > Clive Page > > Dept of Physics & Astronomy, > > University of Leicester, Tel +44 116 252 3551 > > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > > > -------------------------------------------------------------------------- > Arnold H. Rots Chandra X-ray Science Center > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > 60 Garden Street, MS 67 fax: +1 617 495 7356 > Cambridge, MA 02138 arots at head-cfa.harvard.edu > USA http://hea-www.harvard.edu/~arots/ > -------------------------------------------------------------------------- > > -- Martin Hill 07901 55 24 66 www.mchill.net From tam at lheapop.gsfc.nasa.gov Thu Oct 23 06:33:03 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Thu, 23 Oct 2003 09:33:03 -0400 Subject: is UCD out of control? In-Reply-To: <05e001c39901$4a189d50$6501a8c0@Ropy> References: <05e001c39901$4a189d50$6501a8c0@Ropy> Message-ID: <3F97D88F.9030501@lheapop.gsfc.nasa.gov> Hi Roy, I guess I find the current discussion stimulating and not a sign of a problem. One concern on my part.... My guess is that we only get to make one change to UCDs where the revision is incompatible with the previous set. I don't think the community will want to change their software twice while we're getting our act together. Even Microsoft can't get away with that! Absent your killer app, I'm not sure that a revision to UCDs is on the critical path for the VO so we should take the time to do it right and live with the current standard as long as needed. Personally, I think that UCDTrees or something like them (along with XQuery style queries on them) can be a link between abstract data models and concrete data representations. Tools that search for data matching a given data model could be that killer app. We'll see if this pans out when looked at in detail. Talk to you later. Tom From jcm at head-cfa.cfa.harvard.edu Thu Oct 23 08:03:40 2003 From: jcm at head-cfa.cfa.harvard.edu (Jonathan McDowell) Date: Thu, 23 Oct 2003 11:03:40 -0400 (EDT) Subject: Off topic document formats... Message-ID: <200310231503.h9NF3e82025726@urania.cfa.harvard.edu> > so everyone is forced to send around Acrobat documents? And buy Acrobat licences instead of Word ones? Well, ps2pdf and /usr/bin/acroread are free... I'm happy if you make it postscript or ascii, but PDF seems a reasonable compromise that's ok for people in both the Windows and Unix worlds. OpenOffice is still - at least for me - a continual struggle with not-quite-adequate compatibility, slow printing, yucky interface, and of course fails whenever MS updates Word. And I worry about .doc documents on the twiki being reliably readable in the future. > how about posting as hmtl (sic) Fine by me, although it's inconvenient to print a long document all at once. > I'm sure we could find some way of using VOTable Y'know, I was tempted to make this suggestion before, but manfully resisted... :-) Indeed the IVOA standard for documents, so I understand, is html. Perhaps this should be discussed more widely (in the standards WG, or whatever it's called?) In the meantime, perhaps I'll just post latex DVI files.... Jonathan From roy at cacr.caltech.edu Thu Oct 23 07:58:51 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Thu, 23 Oct 2003 07:58:51 -0700 Subject: Off topic document formats... References: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> <1066918286.3f97e18ebc274@netmail.pipex.net> Message-ID: <074601c39976$2cb69620$6501a8c0@Ropy> > Even much more better still, how about posting as hmtl (if there are no > diagrams)? Everyone can read it and almost everything has a converter to it. I think this is best, when it converges, I will have the UCD document converted to the clean HTML -- I mean the stuff that can be edited by a human! Roy From dtody at nrao.edu Thu Oct 23 07:53:38 2003 From: dtody at nrao.edu (Doug Tody) Date: Thu, 23 Oct 2003 08:53:38 -0600 (MDT) Subject: A suggested revision for UCDs In-Reply-To: Message-ID: Ok, we are off topic but... A simple solution is to run vmware (or something similar) on Linux, with Windows in the vm, using samba to share the unix file system. This provides the best of both worlds. Doug On Thu, 23 Oct 2003, Clive Page wrote: > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > A few minutes ago I uploaded a version of my suggested revised > > proposal for UCDs to the Twiki. This is just a Word version since > > I don't have a PDF generator handy. > > An off-topic note to Tom (and others with the same problem in producing > PDFs): > > Many of us on Unix/Linux systems find Word documents inconvenient; I > thought OpenOffice.org might cope with your file, but it produces > illegible results without a lot of playing with fonts, and even then was > very poor. > From arots at head-cfa.cfa.harvard.edu Thu Oct 23 07:17:43 2003 From: arots at head-cfa.cfa.harvard.edu (Arnold Rots) Date: Thu, 23 Oct 2003 10:17:43 -0400 (EDT) Subject: Off topic document formats... In-Reply-To: <1066918286.3f97e18ebc274@netmail.pipex.net> Message-ID: <200310231417.h9NEHhaS001976@xebec.cfa.harvard.edu> martin hill wrote: > Er, so everyone is forced to send around Acrobat documents? And buy Acrobat > licences instead of Word ones? Only those who pay for Word licenses in the first place. Acrobat reader is free. > > Isn't there a Word converter for Open Office? Doesn't always work well - see Clive's roiginal post. > > Even much more better still, how about posting as hmtl (if there are no > diagrams)? Everyone can read it and almost everything has a converter to it. Fine with me, though PDF is quite a reasonable standard for documents. > Or, since it's an AVO standard, I'm sure we could find some way of using VOTable... > > Cheers, > > Martin > from the "Standards are evil" dept > > > > Quoting Arnold Rots : > > > It's a simple choice: those who shelled out the money for a Word > > license can either try to force the rest of the community to do the > > same by sending around Word documents, or spend themselves a little > > more by buying an Acrobat license and thus allow everybody to spend > > his/her money as (s)he sees fit. > > Personally, I think Word should always be bundled with Acrobat, at > > least in our community. > > > > - Arnold > > > > Clive Page wrote: > > > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > > > > > A few minutes ago I uploaded a version of my suggested revised > > > > proposal for UCDs to the Twiki. This is just a Word version since > > > > I don't have a PDF generator handy. > > > > > > An off-topic note to Tom (and others with the same problem in producing > > > PDFs): > > > > > > Many of us on Unix/Linux systems find Word documents inconvenient; I > > > ... > > > > > > -- > > > Clive Page > > > Dept of Physics & Astronomy, > > > University of Leicester, Tel +44 116 252 3551 > > > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > > > > > -------------------------------------------------------------------------- > > Arnold H. Rots Chandra X-ray Science Center > > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > > 60 Garden Street, MS 67 fax: +1 617 495 7356 > > Cambridge, MA 02138 arots at head-cfa.harvard.edu > > USA http://hea-www.harvard.edu/~arots/ > > -------------------------------------------------------------------------- > > > > > > > -- > Martin Hill > 07901 55 24 66 > www.mchill.net > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head-cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From dide at discovery.saclay.cea.fr Thu Oct 23 10:26:42 2003 From: dide at discovery.saclay.cea.fr (DIDELON Pierre) Date: Thu, 23 Oct 2003 19:26:42 +0200 (MEST) Subject: A suggested revision for UCDs Message-ID: <200310231726.h9NHQgH11982@rosetta.saclay.cea.fr> Hi Tom, Big and impressive work. I have read and try to absorb the document and most of the following mails. I try to digest all. It is not so obvious for a frenchy like me and I hope that my comments below are appropriate. I apologize for any misunderstanding or confusion, by jumping in the discussion, but a really want to make a few points. If a clearly understand the concern of Pedro Osuna, the actual discussion illustrate what he stressed in strasbourg and his further mail. UCD1 and a data model for context evaluation is sufficient, because DM is not available some structure is needed to be introduced and so UCD2 came. But unless we put DM in UCD (eventually using VOTable structure) it will be incomplete. Even UCDTree (which I really like) needed additional structure/template or external reference as stressed in the document 1.9.9b: p16 last paragraph, p18 last paragraph. I like several aspect of this proposal, and I agree of some of the comments of Norman, Ed and Jonathan, but as it is very tedious for me to write in english, I will not do an extensive comment list of what is hapenning in the discussion but go directly to the pb I had. Instead of specific words or context meaning to distinguish between 3 kind of thing; concept, modifier property and attribute property I would be more comfortable with a syntax distinction and a specific separator (i.e. #). The complexity of the three trees with specific words seems very strange and is confusing me. My main concern is related to Attributes. - I did not understand the meaning of local. You can always correlated a property with an identical or similar property, it depend of the purpose of your correlation, and without presuposition you cannot exclude apriori some intressting thing that can be extract from data. Taking your example: you can extract data where temperature jitter of the data acquisition system is above a certain threshold and try to correlate these with phot,flux;error or any kind of measurement error available. Perhaps not used very often, but not forbidden I hope? - It seems to me that there are some dangerous redundancy in the basic word of the attribute tree. For example value, vector, instance, multiplet are common data properties distinguish by their use or the context in which they appear. IMO there is no differences between vector and multiplet and each time you use multiplet you could replace it by vector. No? Like jonathan, if you put vectors in value, I did not see the need for instance. The relation of measurements, errors and all this kind of things with the precedding is even more complicated. A value (I mean a real scalar) can be a unique measurement value, a statistical property of a measurement serie (like mean, mode, error, std.dev., skewness...), a parameter and perhaps other meaning I am not able to think of now. Trying to be brief, (but unfortunatly uncomplete) I feel that the pb came from the fact that diff data property are mixed together (I believe that the same pb occurs in DM group). Mainly data structure, data meaning (could be extended to purpose) and data representation (without speaking of format and location). Data structure would include ; value (scalar), vector (avoiding multiplet) tree, and extended easly with matrix or composite structure. For me instance is A structure, but wich one? A free formatted structure, a VOTable tree formatted? It is not clear. data meaning is related to measurement, error and all staistical properties. data representation mixed both things; a measurement serie can be represented by a vector (even in a VOTable cell), or by one or several statistical prop. (mean, mean+std.dev.) structured in very diff way (mean and error in separate cols, mean+mode+median in one col ...) I feel that refurbishing is needed here to clarify UCD usage more than UCD existance. I agree with earlier comments that stressed that the big advantage of UCD1 is the fact that they are not due to apriori/re-invented structure but are the illustration of the existing data. - I did not see the filter needs. It seems to me that it try to catch a part of the data history, but it seems so restrive that it will be very soon unappropriate I bet. - concept seems only due to the uncompltness of the words available in the concept tree root. In your example (p18) a word correlation (which would certainly be needed for VO) would better match the needs. I stop here because it's late and I would become confused, if not yet done. Thanks for the food for thought, sincerely, Pierre ------------------------------------------------------------------------------- DIDELON e-mail : pdidelon_at_cea.fr CEA SACLAY - Service d'Astrophysique W3 : http://www-dapnia.cea.fr/Sap/ 91191 Gif-Sur-Yvette Cedex Phone : 33 (0)1 69 08 58 89 ------------------------------------------------------------------------------- From Edward.J.Shaya.1 at gsfc.nasa.gov Thu Oct 23 13:02:05 2003 From: Edward.J.Shaya.1 at gsfc.nasa.gov (Ed Shaya) Date: Thu, 23 Oct 2003 16:02:05 -0400 Subject: Use case: distance Message-ID: <3F9833BD.40604@gsfc.nasa.gov> An HTML attachment was scrubbed... URL: From tam at lheapop.gsfc.nasa.gov Wed Oct 29 10:38:28 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 29 Oct 2003 13:38:28 -0500 Subject: Further thoughts on UCDs Message-ID: <3FA00924.3040502@lheapop.gsfc.nasa.gov> One of my problems in understanding how or whether we should change the UCD strings is the indefiniteness of the current UCD2 proposals which do not specify the full list of UCDs. I've just spent a few minutes looking at the PHOT hierarchy. Probably Sebastien and Roy have done this with far greater care, but I've gone through the old UCD tree and tried to see what UCDs are suggested by what's in the PHOT hierarchy, where I have explicitly left out any band information from the UCDs assuming that to be supplied by qualifiers in the em hierarchy. There are only about 25 distinct phot words here versus just under 500 in the original UCD1 tree. Most of this savings is at the cost of having an extensive em tree describing the bands, but I think that's helpful since we want to be able to combine fluxes from different bands. I think it makes the photometry tree much more accessible. One thought expressed here is that we should not distinguish between flux, magnitudes and counts in the UCDs, but rather do that with the units. I'm not sure if this is a good idea but it seems to me that they are more 'similar' to each other than each is to say a fluence or a color. There is a question as to whether we specify colors by providing special keywords for each color, or by using a pair of em qualifiers. In the latter case, then the order of the em qualifiers may be significant. If we go this route (and I think it makes more sense) then my proposal for well-formed UCDs would have to be modified to say something like: 'where the order of qualifiers is not significant, they should be given in alphabetical order'. phot phot.flux ? distinctions between counts, magnitudes, fluxes, etc should be carried in the ? units not the UCD. phot.flux.surfaceBrightness phot.fluence phot.flux.absolute phot.color (This is really a ratio, but we use color as the traditional name). phot.color.diff (A difference in colors, i.e., a kind of second derivative) phot.color.excess em.bolometric (Bolometric measurements are after all just specifying a very broad band) phot.atmosphere phot.atmosphere.airmass phot.atmosphere.extinction phot.class phot.extinction.galactic phot.extinction.internal phot.extinction.ism phot.extinction.total phot.flux.isophotal phot.flux.central phot.correction.k phot.flux.limit phot.flux.offset phot.profile math/arith.ratio (for the phot_sd/b-bright, phot_tot-Bright/b-bright UCDs) phot.system (is this different from phot.class?) phot.parameter (some parameter of the photometric system phot.zeropoint phot.spectrum (vector valued fields) phot.timeseries (vector valued fields) phot.image (vector valued fields) This was just a quick exercise and doubtless there are better specific choices, but the phot hierarchy is one area where I imagine that our software may be very UCD aware. If we really can simplify it to this extent, then I think this alone is a major impetus for going to UCD2. Tom From roy at cacr.caltech.edu Thu Oct 2 18:09:45 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Thu, 2 Oct 2003 18:09:45 -0700 Subject: UCD document 1.9.4 for IVOA Interop Message-ID: <053601c3894b$08d03080$6b91d783@cacr.caltech.edu> Welcome back to the UCD Forum at the IVOA. This message is to present the latest draft (1.9.4) of a paper that the UCD Steering Committee has been working on over the last few weeks (especially Derriere, Mann, McDowell, Ochsenbein, Osuna, and myself), and to request that you read it in preparation for the UCD session of the IVOA Interop meeting two weeks from today. This is the time to make your opinions known. Please try to generate edits to the document in the form of replacement paragraphs, addtional paragraphs or sentences. This will make your view much more likely to be incorporated than if you send comments or opinions. After your revisions, we would like the paper to be labelled version 2.0, placed into the IVOA standards process as a definition of the new UCD structure (known as UCD2). This new structure has much in common with what was discussed at the last Interop at Cambridge, but it has significant change, to bring the UCD in line with semantic web language. The idea from Cambridge (base+specifiers) has been replaced by (property+concepts). The document is here: http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.4.pdf The UCD Twiki is at: http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD Thank You Roy Williams Chair, UCD Steering Committee -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From rgm at roe.ac.uk Thu Oct 9 01:54:45 2003 From: rgm at roe.ac.uk (Bob Mann) Date: Thu, 9 Oct 2003 09:54:45 +0100 (BST) Subject: UCD document 1.9.4 for IVOA Interop In-Reply-To: <053601c3894b$08d03080$6b91d783@cacr.caltech.edu> Message-ID: Hi folks, I have only a few comments on v1.9.4 - I think it's a great improvement on the previous version, so Roy and Sebastien are to be congratulated. In particular, Section 3.1 is very clear and lays out the path to UCD3 very well. My first comment picks up something from the discussion leading up to this draft. It was said that we should emphasise that UCD2s are about discovering relevant data, not using them, as that requires more expressive power. If people still believe that, can a sentence to that effect be added in the introductory stuff?...it's in Section 4.1, of course, but I think that the restriction is important enough that it should come earlier, too. My next comment is just to check something. It seems that "data" has been dropped from the list of basic elements, and "metadata" has arrived. Am I right to assume that "metadata" is now the branch under which I will find the UCDs for recording provenance information? - e.g. versions of data reduction software used, names of astrometric catalogues, etc. If so, that's fine - it's just that that has to go somewhere. My final comment is my only real concern about this draft, which is whether the examples of Section 3.4 are too ambitious - especially the ones for the error on a right ascension of a galaxy. That seems to be introducing semantics by stealth, and without enough support - i.e. there seems to me to be too much structure in the relationships between those three quantities for UCD2. Maybe I missed it, but is there any semantic distinction to be made between the second and third terms in stat.error, pos.eq.ra, src.galaxy ? The third term relates to the second which relates to the third, so if I swap the order of the second and third the meaning is lost. On the other hand, I'm not sure that the example for photometric colour is good enough, since I don't see how to specify a generic flux ratio between two random passbands - I don't think it's good enough to enumerate the standard optical colours, since, when we federate multiwavelength datasets, I may well be interested in the ratio of a hard X-ray band flux and a K band magnitude, say. cheers Bob From roy at cacr.caltech.edu Wed Oct 15 19:07:57 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Wed, 15 Oct 2003 19:07:57 -0700 Subject: Version 1.9.9 of UCD definition Message-ID: <02b701c3938a$b2c2b2e0$e5c54f82@Ropy> There is a new version of the UCD specification document on the IVOA Twiki, labelled 1.9.9 in expectation of its promotion to version 2.0 at the IVOA interop meeting today in Strasbourg. The draft is available as pdf and MS word (links below). Please try to read this before the meeting today. Thank You Roy Williams http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9.pdf http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9.doc -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From Thomas.A.McGlynn at nasa.gov Mon Oct 20 07:07:01 2003 From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn) Date: Mon, 20 Oct 2003 10:07:01 -0400 Subject: UCDS vs DM - Peacekeeping In-Reply-To: <1066406745.3f90135915f9f@netmail.pipex.net> References: <1066406745.3f90135915f9f@netmail.pipex.net> Message-ID: <3F93EC05.8050108@nasa.gov> Martin, So far I've been a non-combatant in this war, but as you may know I had some reservations about the UCD2 framework presented at the interoperability meeting. Over the past two days I have been writing a revised proposal. This proposal includes a whole new discussion of the interaction between UCDs and table grouping constructs that I feel may play a major role in mediating among UCDs, data models and the DAL. I'll be publishing this within a day or two, but one key concept is the UCDTree which shows the UCDs of a table in a structured way. It is my hope that methods of abstract data models can be translated into actions on data by straightforward analysis of UCDTree. I expect to be send out the proposal today or tomorrow after I have a chance to check that what I've written during the flight back makes sense when I read it in a non-sleep-deprived state. Regards, Tom martin hill wrote: >I've been trying to sort out in my own head the differences between UCD2s and >data models. Particularly as one doesn't seem to work entirely without the >other. So donning my UN peacekeepers hat (which in the British case is a >tatty gardeners hat in a rather trendy camouflage, not a kevlar helmet): > >It strikes me that data models are about structure, and UCDs about describing >elements in that structure. > >Now it is probably possible that the way data models are defined could include >naming elements to define what they mean. I suggest that these should be (or >include as attributes) UCDs, at the very least so that we can compare data >items that have been formally modelled with those that haven't. > >For example, we can say (simplistically) that a coordinate is an RA, DEC, >error, and refers to some co-ordinate frame, and might look like this in XML: > > > > > 42 > 23 > > > 42 > 23 > > > > > 72.3 > 1 > > > etc > > >Now this is a horribly simple example (sorry about the mixed-up case >conventions) - how do people feel about it? It means that we should avoid >trying to describe structure/context in UCDs (which has the potential of >making them horribly long and complicated) and gives us an immediately useful >way of giving wider meaning to our data structures. > >It kind of implies that we then have a method for appending our UCDs up a data >model tree if we need to get more context for them. Thus we don't have to >have src.galaxy;phot.mag.ObscureOptical;error *as a defined UCD*. Instead >such strings are constructed out of individual UCDs as required by the program >that is investigating the data. > >It also means that UCDs don't have to be specific (which the UCD group are >avoiding cos it's a horrible task, small wonder) and yet I as a developer can >assemble specifics for doing cross comparisons. > >I've only had a pint and it still seems a good idea. It was a big pint though. > > > > From Thomas.A.McGlynn at nasa.gov Mon Oct 20 10:55:38 2003 From: Thomas.A.McGlynn at nasa.gov (Tom McGlynn) Date: Mon, 20 Oct 2003 13:55:38 -0400 Subject: UCDS vs DM - Peacekeeping In-Reply-To: <3F93EC05.8050108@nasa.gov> References: <1066406745.3f90135915f9f@netmail.pipex.net> <3F93EC05.8050108@nasa.gov> Message-ID: <3F94219A.4050901@nasa.gov> One thing I should have made clearer in the earlier message is that the revision I'm sending does not reflect any consensus on the correct approach. Rather it's a very detailed alternative to the approach presented by Roy at Strasbourg. It builds upon the same basic elements that Roy and Sebastien suggested but puts them together in -- for me at least -- a more coherent fashion. This approach does naturally lend itself to the concerns Martin raised, but it's certainly not a done deal. Tom Tom McGlynn wrote: > Over the > past two days I have been writing a revised proposal. > From mchill at dial.pipex.com Fri Oct 17 09:05:45 2003 From: mchill at dial.pipex.com (martin hill) Date: Fri, 17 Oct 2003 17:05:45 +0100 Subject: UCDS vs DM - Peacekeeping Message-ID: <1066406745.3f90135915f9f@netmail.pipex.net> I've been trying to sort out in my own head the differences between UCD2s and data models. Particularly as one doesn't seem to work entirely without the other. So donning my UN peacekeepers hat (which in the British case is a tatty gardeners hat in a rather trendy camouflage, not a kevlar helmet): It strikes me that data models are about structure, and UCDs about describing elements in that structure. Now it is probably possible that the way data models are defined could include naming elements to define what they mean. I suggest that these should be (or include as attributes) UCDs, at the very least so that we can compare data items that have been formally modelled with those that haven't. For example, we can say (simplistically) that a coordinate is an RA, DEC, error, and refers to some co-ordinate frame, and might look like this in XML: 42 23 42 23 72.3 1 etc Now this is a horribly simple example (sorry about the mixed-up case conventions) - how do people feel about it? It means that we should avoid trying to describe structure/context in UCDs (which has the potential of making them horribly long and complicated) and gives us an immediately useful way of giving wider meaning to our data structures. It kind of implies that we then have a method for appending our UCDs up a data model tree if we need to get more context for them. Thus we don't have to have src.galaxy;phot.mag.ObscureOptical;error *as a defined UCD*. Instead such strings are constructed out of individual UCDs as required by the program that is investigating the data. It also means that UCDs don't have to be specific (which the UCD group are avoiding cos it's a horrible task, small wonder) and yet I as a developer can assemble specifics for doing cross comparisons. I've only had a pint and it still seems a good idea. It was a big pint though. -- Martin Hill 07901 55 24 66 www.mchill.net From posuna at iso.vilspa.esa.es Tue Oct 21 01:50:24 2003 From: posuna at iso.vilspa.esa.es (posuna at iso.vilspa.esa.es) Date: Tue, 21 Oct 2003 10:50:24 +0200 Subject: UCDS and DM and "Catalogue" tables In-Reply-To: <3F94219A.4050901@nasa.gov> References: <1066406745.3f90135915f9f@netmail.pipex.net> <3F93EC05.8050108@nasa.gov> <3F94219A.4050901@nasa.gov> Message-ID: <20031021105024.0698fc07.posuna@iso.vilspa.esa.es> Dear all, the fact that I have been one of the authors of the UCD2 draft document (as a member of the UCD steering committee) made it very bizarre that I "voted" for the UCD1 option, together with other very few people. I think that if we use a Data Model to access metadata, then it is enough with the UCD1 that was created at the beginning. For me, there is no need to have more specificity, neither is there a need for matching functions and other more complex syntactical operations to be performed with the UCDs. For me, all that should be done through the Data Model and/or proper VOQL language. However, I do appreciate that many providers do have their data in the form of catalogues (what I normally call 2D tables, or X-Y tables, as catalogue, for me, can be several different things). I understand that people having data in X-Y tables do NOT want to hear about data models: they only want to be able to perform operations with the columns of their tables. It is in that context (X-Y table handling) where I believe the need of an overall complex UCD structure appears, as any comparison operation (or more complex ones as addition, substraction, ...) they want to do it by comparing (or adding/substracting/...) directly the columns. As the UCD2 is just adding more capabilities to the UCD1 without chopping off any of the existing ones, and as we ("Data Model" oriented ones) can always use the UCDs in the limited way we want them (i.e., just to describe the metadata we give back so that they are "Universally" understandable) I see no reason to stop the UCD2 to include more syntactical and operational capabilities: people "Data-Model oriented" will make use of very limited UCD capabilities (as they don't need them) and people "X-Y Table" oriented will make use of as many UCD "functionalities" as possible. In summary, I'm OK with new funtionalities of the UCDs but I guess I'll do a very limited use of them in the case that I use the Data Model to access my data, and hence my vote for the UCD1 "paradigm". Maybe the UCD document should reflect two main "Areas of Interest", something like: - Simple UCD handling: Data Model access - Complex UCD handling: Syntactical operations on X-Y (catalogue) table columns I wait for your comments... Cheers, Pedro. On Mon, 20 Oct 2003 13:55:38 -0400 Tom McGlynn wrote: > > One thing I should have made clearer in the earlier message is that > the revision I'm sending > does not reflect any consensus on the correct approach. Rather it's a > > very detailed alternative to the > approach presented by Roy at Strasbourg. It builds upon the same > basic elements that Roy and Sebastien suggested but puts them > together in -- for me at least -- > a more coherent fashion. This approach does naturally lend itself > to the concerns Martin raised, but > it's certainly not a done deal. > > Tom > > Tom McGlynn wrote: > > > Over the > > past two days I have been writing a revised proposal. > > > -- Pedro Osuna Alcalaya SOFTWARE Development Group XMM-Newton Science Archive e-mail: Pedro.Osuna at esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN From cgp at star.le.ac.uk Tue Oct 21 02:15:35 2003 From: cgp at star.le.ac.uk (Clive Page) Date: Tue, 21 Oct 2003 10:15:35 +0100 (BST) Subject: UCDS and DM and "Catalogue" tables In-Reply-To: <20031021105024.0698fc07.posuna@iso.vilspa.esa.es> Message-ID: On Tue, 21 Oct 2003 posuna at iso.vilspa.esa.es wrote: > I understand that people having data in X-Y tables do NOT want to hear > about data models: they only want to be able to perform operations with > the columns of their tables. I'm not sure that's entirely true. I think the problem is that three partially overlapping groups of people have been trying over several years to make sense of astronomical data, so they canclassify or organise it to make data access simpler and more uniform. (1) Those devising UCDs, which started out exclusively for tables in VizieR. I think it is now generally agreed that some hierarchical scheme should replace the original flat namespace (though designed with components which split naturally into layers). (2) Those devising Data Models. The problem for those outside this effort is that there have been so many data models, different in structure and detail, but all apparently equally valid. I didn't manage to attend the DM sesssion in Strasbourg, but am pleased to hear of serious convergence. (3) Those devising query languages and data access routines, who need some DM or UCD scheme to do the job properly, except for the serious problem that a UCD does not lead to a unique column in some (many?) tables. I thought that Tom's ideas at Strasbourg sounded good, and look forward to seeing his proposal in print. If it's true that there is now a single agreed Data Model, does that not suggest that UCDs should be assigned on the same hierarchical basis? -- Clive Page Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 From amsr at jb.man.ac.uk Tue Oct 21 04:08:57 2003 From: amsr at jb.man.ac.uk (Anita Richards) Date: Tue, 21 Oct 2003 12:08:57 +0100 (BST) Subject: Version 1.9.9 of UCD definition Message-ID: At the interoperability workshop following ADASS, there was some mention of contradictions between the spectral divisions in the UCD2 document and the groupings in RM v0.82. There was also debate about whether and where an mm or sub-mm band should be introduced in RMv0.82. I think that consistency between UCD2 and the RM (whatever is the stable outcome for defining the Registry) would be very useful but in fact the discrepancies are minor. Personally, I hope that we do not spend much time discussing the exact divisions as we cannot avoid splitting the coverage of some present instruments, let alone future ones. However it might be sensible to be guided by the current/immediate future major observatories. Perhaps the responses to the SSA / spectral data model questionnaire might help. These are my suggestions: The boundaries of the spectral UCD2s are consistent with fitting into the RM categories with minor changes: Some of the UCD2 frequency ranges at the high freq end of IR have become transposed (I think - it is indeed hard to think in 3 sets of units!) I apologise for having created confusion over the new mm band in RMv0.82. The proposed range of ALMA is 30 - 900 GHz according to the web page. In fact, I understand that initially the lower limit is more like 86 GHz. Therefore I suggest that consistency can be achieved by having instead: SUBMM 100 micron - 3 mm (100 - 3000 GHz) The RMv0.82 x-ray regime goes from 0.12 - 120 keV; the UCD2 regime goes from 0.12 - 12 keV Accoding to their web pages, CHANDRA covers 0.1 - 10 keV (I have also been told 0.12 - 12...) and XMM covers 0.15 - 15 keV; ROSAT was within this. The CRO covered > 30 keV and SWIFT, 15 - 150 keV Thus, to keep the 'decades', this seems more suitable: XRAY 0.12 - 12 keV (as per UCD2) GAMMARAY > 12 keV but maybe a high energy astronomer can advise.... OVERLAPPING DATA Suppose that I have a catalogue with data taken around 1 micron. I hope I am right in thinking that the Registry entry can contaiin both Optical and IR as values of the relevant spectral coverage keyword, and this is used OK in searches etc. How will the UCDs be used? As I understand it, in order to use Vizier or the Aladin SED tool to search for e.g. radio observations between 1.3 and 1.7 GHz (radio L-band), the software would look for UCDs em.radio.750-1500MHz and em.radio.1.5-3GHz. However there may be catalogues giving radio observations or flux densities between 1.4 and 1.7 GHz. This might or might not give the exact observing frequency, but for many purposes the general flux density or image would be useful. As I understand it, this would simply have the UCD em.radio and would not be found. This is _not_ a plea to divide the ranges differently, but a reflection of the fact that we are no longer restricted to observing in well-defined filters. Broad-band receivers and optical fibres in the radio, ALMA, space observatories etc. mean that we observe at all frequencies seamlessly, and sensitivity to higher redshifts means that lines no longer fall even in a single regime. Maybe there is a solution already. If not, the only satisfactory things I can see are to do one of the following: * allow bracketting UCDs e.g. em.radio.0.75-3GHz * allow two (or more) UCDs to a column, where these are adjacent (up to a sensible max) * not use the fine divisions in UCDs, but treat frequency like position and treat every query as a 1D cone search, ie a linear segment search. cheers a - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dr. Anita M. S. Richards, AVO Astronomer MERLIN/VLBI National Facility, University of Manchester, Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax). From derriere at newb6.u-strasbg.fr Tue Oct 21 03:40:21 2003 From: derriere at newb6.u-strasbg.fr (Sebastien Derriere) Date: Tue, 21 Oct 2003 12:40:21 +0200 Subject: UCDS vs DM - Peacekeeping References: <1066406745.3f90135915f9f@netmail.pipex.net> Message-ID: <3F950D15.AEDBC59C@astro.u-strasbg.fr> martin hill wrote: > > It strikes me that data models are about structure, and UCDs about describing > elements in that structure. > > Now it is probably possible that the way data models are defined could include > naming elements to define what they mean. I suggest that these should be (or > include as attributes) UCDs, at the very least so that we can compare data > items that have been formally modelled with those that haven't. This kind of work has been done in the case of the IDHA model: trying to find a relevant UCD for the different model elements. But I don't think that we should impose every element of every data model to be associated to a UCD. In fact, the link between UCD and DM can exist in both directions: - there can be a 'ucd' attribute for elements of data model, to indicate the corresponding UCD - the 'utype' attribute in VOTable allows to give a link to some data model element In the first approach, people building the data model make the effort to associate UCD to their view. The second approach can be used when there are no ucd attribute in the DM, to link the description of a dataset to a DM. Of course, both are not exclusive: there can be a ucd attribute in the DM, AND a utype for a or that would point, hopefully, to the same element of the DM ! Sebastien. -- _______ / ~ /, Sebastien Derriere mailto:derriere at astro.u-strasbg.fr / ~~~~ // Observatoire de Strasbourg Phone +33 (0) 390 242 444 /______// 11, rue de l'universite Telefax +33 (0) 390 242 417 (______(/ F-67000 Strasbourg France From pfo at star.le.ac.uk Tue Oct 21 04:45:18 2003 From: pfo at star.le.ac.uk (Patricio F. Ortiz) Date: Tue, 21 Oct 2003 12:45:18 +0100 (BST) Subject: UCDS and DM In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Anita Richards wrote: > Please can I clarify a possible application, which may initially be done > in an ad-hoc way until data models are fully harnessed: > > Supposing we are harvesting a catalogue which consists of radio flux > densities measured at 22 GHz. This is stated in the Abstract or ReadMe or > maybe even the title, but there is no table in the data saying 'observing > frequency'. As I understand it, we could add a virtual column > em.radio.12-25GHz, in which all the entries are the same. Can this be > done as a default, so that the data are recognised at their proper > frequency? Anita, the way UCD1 would handle this is that the UCD associated to the flux will reflect the region of the radio spectrum, eg, PHOT_FLUX_RADIO_11-25GHz, therefore, your catalogue will be discovered by the presence of that UCD without having to add a virtual column describing the observed frequency range. This case was very common, and not only in radio. There are hundreds of columns with "magnitude" as explanation, and you find the waveband in either the readme or the table title. UCDs are supposed to reflect the quantity as well as possible, that's why even with UCD1 one can launch discovery queries of this type and discover catalogues which have no indication of that part of their content in other pieces of meta-data information. Cheers, Patricio --- Patricio F. Ortiz pfo at star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UK From cgp at star.le.ac.uk Tue Oct 21 04:27:48 2003 From: cgp at star.le.ac.uk (Clive Page) Date: Tue, 21 Oct 2003 12:27:48 +0100 (BST) Subject: Version 1.9.9 of UCD definition In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Anita Richards wrote: > Thus, to keep the 'decades', this seems more suitable: > XRAY 0.12 - 12 keV (as per UCD2) > GAMMARAY > 12 keV > but maybe a high energy astronomer can advise.... I think that boundary between the two is reasonable: some past instruments called 'X-ray' have some coverage up to maybe 15 keV, but most modern telescopes depend on grazing-incidence reflection, which practically stops working above about 10 or 12 keV. > OVERLAPPING DATA > How will the UCDs be used? As I understand it, in order to use Vizier or > the Aladin SED tool to search for e.g. radio observations between 1.3 and > 1.7 GHz (radio L-band), the software would look for UCDs > em.radio.750-1500MHz and em.radio.1.5-3GHz. I think the way it has to work is this: the UCD defined for some dataset should be as specific as possible, so if the data fall practically within a single band (say within 750-1500 MHz) then you declare the UCD as "em.radio.750-1500MHz". If not then you have to fall back on a less specific UCD of "em.radio". A user wanting radio measurments at some frequency should be prepared to search for both _the_ most specific UCD and the less specific UCD, i.e. "em.radio". This may bring up some false positives, but so will any scheme that we can devise. The alternative, as you say, is to apply a two or more UCDs to a dataset. That might be better in principle, but difficult in practice. > * not use the fine divisions in UCDs, but treat frequency like position > and treat every query as a 1D cone search, ie a linear segment search. Well that's similar to a proposal I made some time ago (maybe in a Data Models context). Datasets should specify the range of frequencies they cover (which requires two numbers for the min frequency and max frequency, which doesn't imply that coverage between the limits is continuous, gaps should be ignored); then users also specify the range of interest. It's then a trivial exercise for a computer to compare the two intervals and work out which resources overlap the range of interest. All this without us having to dream up any artificial divisions between wavebands. But that doesn't fit in with UCDs as devised up to now, and I don't see any easy way to make it fit. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 From pfo at star.le.ac.uk Tue Oct 21 05:52:27 2003 From: pfo at star.le.ac.uk (Patricio F. Ortiz) Date: Tue, 21 Oct 2003 13:52:27 +0100 (BST) Subject: Version 1.9.9 of UCD definition In-Reply-To: Message-ID: Hi, I'm glad we are going into this discussion, not because it's an easy subject, but because it is a relevant one. The EMS is perhaps the most important 'tree' which we want to describe properly. I see several different regimes: a) broad band observations (eg, Johnson's V, radio bands, X-ray bands) b) narrow bands c) line spectroscopy (H-alpha, 21cm, CO, OIII, etc) d) continuum spectroscopy (eg, from blue to red) e) flux ratios The question of how to describe these quantities is strongly linked to what we want to use with the descriptors. It is different what we assign to the column (the UCD) and the type of search we intend to perform. If our goal is to find all radio fluxes, then we could cut the UCD in phot.flux.radio at the level of search, so even if we have infinite granularity (eg, phot.flux.radio.22Ghz), this column will be recognized. If on the other hand we are interested in finding observations in 22GHz, having an assigned UCD phot.flux.radio assures us to find the 22GHz data (eventually), but the S/N will be quite low, having to browse through hundreds of catalogues to find 1 is not desirable, and it's not what we had in mind when UCDs were introduced. Specifity also means that one can discover quantities which are comparable, that one will not mix a line measurement with a continuum one (despite being very close in frequency). What I think is important to decide now is if up to what extent the UCDs can do the job, without complicating them too much or oversimplifying them to the point of becoming useless for discovery purposes. One thing I wouldn't like to see is to see UCDs become as ambiguous as column descriptions. It is quite possible that we should resource to another element of meta-information for the description of a quantity and for its discovery. I thought that 'utype' could be used to link observational windows with their description (data model?). eg, UCD="phot.mag.opt" utype="strongrem.B" or UCD="phot.flux.radio" utype="JBO.22Ghz" or UCD="phot.flux.radio" utype="VLA.6cm" For accurate matching, one would have to use both pieces of information to avoid mixing apples with oranges. The method that Anita and Clive mention (a fine piano-keyboard mask) is something I've considered as well. It is broader than the UCD, but requires accurate knowledge of each band observation window (I'm not saying it's not doable). Users could then specify which part of the spectrum they are interested in and they will find the tables which contain data in those parts of the EMS. This works quite well for broad bands, but it is not so great if one is interested in narrow bands. Imagine one searches data around HBeta. Most catalogues with "blue broad band" observations will have 1's in that area of the spectrum, therefore, the noise will be huge! The same is valid for lines in other areas of the spectruma. (that's why I didn't pursue this model farther). Same is valid for colour indices: looking for V-K in terms of its initial and final wavelength will bring a large number of undesired catalogues/columns. One solution I've proposed is to stick to a very fine granularity at the UCD level which will allow us to compare alike quantities, but to create "virtual containers" in as many orthogonal directions as needed to solve the EM problem, as boundaries will continue shifting and being taste dependent. Imagine for a second that we keep the fine granularity, having UCDs like phot.jhn.B, phot.jhn.V, phot.jhn.R, phot.jhn.I, phot.str.B, .... phot.sloan.b (if exists) The original UCD structure allowed users to retrieve any catalogue with Johnson's photometry (just search for /phot.jhn/ and voila), but there was no room to look for "catalogues with blue broad-band observations" A "virtual container" for this example could be phot.opt.blue.bb. No column will ever be assigned this VC-UCD, but a search engine will understand it as "hmm, phot.opt.blue.bb is not a UCD, I need to look for catalogues which contain any of the UCDs listed in its index". To follow the example, phot.opt.blue.bb := ("phot.jhn.B", "phot.str.B", "phot.sloan.b"); And if later, GAIA introduces its own blue filter, phot.gaia.B will be created as a UCD, and "phot.gaia.B" will become part of this list: phot.opt.blue.bb := ("phot.jhn.B", "phot.str.B", "phot.sloan.b", "phot.gaia.b"); Finally, alhough a column should have one and only one UCD, nothing should prevent that a UCD could belong to several virtual containers. UCDs would describe what quantities are, VCs are used to describe contexts or related concepts accepted by the community at any time, and quite possibly user-definable in the future. Food for thought... after all, > 50% of the current and future quantities are related to the EMS! Cheers, Patricio --- Patricio F. Ortiz pfo at star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UK From posuna at iso.vilspa.esa.es Tue Oct 21 05:35:45 2003 From: posuna at iso.vilspa.esa.es (posuna at iso.vilspa.esa.es) Date: Tue, 21 Oct 2003 14:35:45 +0200 Subject: UCDS and DM In-Reply-To: References: <1066406745.3f90135915f9f@netmail.pipex.net> <3F950D15.AEDBC59C@astro.u-strasbg.fr> Message-ID: <20031021143545.375a8cf4.posuna@iso.vilspa.esa.es> Hi, > I was slightly puzzled by comments at the interoperability workshop > about data collections which do not have catalogues/tables. Surely > even a collection of FITS spectra or images or etc. must be accessed > via a descriptive list? Although I suppose a data collection could > consist of a single image (or etc.). In either case, though, is this > is another example of where it would be useful to take UCDs from what > might be a single description ("images taken at 0.05 arcsec resolution > ...") of the whole data collection - and use them to form or augment a > table to allow access to the data? We do have quite complex data models for our data; you can see part of the ISO model in the attached figures called iso_cdm_obss.gif and iso_pdm_obss.gif which are part of our (database jargon) CDM (Conceptual Data Model) and PDM (Physical Data Model) documents (where you can see loads of very project-specific things). The access to our data is not done through a descriptive list, but through database queries. However, in the case that someone wants a catalogue/table from our data, we can provide one. This is what we do with CDS: we provide them with the table they store in Vizier and that can be used as the rest of catalogues there for data mining (I'm not sure whether this is the word...I mean data discovery, etc.). You can find this table at: ftp://cdsarc.u-strasbg.fr/cats/B/iso/isolog.dat.gz However, that is only a very specific view of our data which can, otherwise, be accessed through other means, e.g., by SIAP or future SSAP, etc., or in the case of a proper general Data Model, through a proper VOQl to the Data Model (how we translate the future "Standard VO data Model" to ours is another story...(unfortunately, only up to us I'm afraid...)). Cheers, Pedro. -- Pedro Osuna Alcalaya SOFTWARE Development Group XMM-Newton Science Archive e-mail: Pedro.Osuna at esa.int Tel + 34 91 8131314 European Space Agency VILLAFRANCA Satellites Tracking Station P.O. Box 50727 E-28080 Villafranca del Castillo MADRID - SPAIN -------------- next part -------------- A non-text attachment was scrubbed... Name: iso_cdm_obss.gif Type: application/octet-stream Size: 29042 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: iso_pdm_obss.gif Type: application/octet-stream Size: 24526 bytes Desc: not available URL: From roy at cacr.caltech.edu Tue Oct 21 07:40:58 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Tue, 21 Oct 2003 07:40:58 -0700 Subject: UCD use cases References: Message-ID: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Please find below some possible use cases for UCD. -- How do the different proposed UCD/DM schemes help with these cases? -- What does the query look like in each case, and how would that query be coded? -- Can you supply additional use cases? Please try to stay within the realm of the possible here! -- If you refer to a Data Model, please try to be concrete about who/how/example. Please do not simply invoke "the Future IVOA Data Model". Thank you Roy -------------------------- (1) Cone search. How to decide which columns are the RA,Dec that was used in the search. What frame (B1950, J2000, ...) do these come from? If there are columns with ID, what sort of ID is it, and how do I resolve it? (2) SIAP search. Find the column that contains the URLs where the images are. Find out if there are other columns that have RA,Dec of the image center. (3) We have for example a crossmatch service that is clever enough to know about error ellipses. How does it get from a table the most sophisticated error info that is there: (a) position (b) circular error (c) ellipse error. (4) We want to compare photometry in two tables covering the same star cluster. How do I decide if they share measurements in the same filter? One has R band, the other has Halpha. What happens if fluxes are expressed differently -- eg number / energy / magnitude / luminosity density. (5) I want distances to stellar objects measured in meters, so I can make a 3D display for the children. How do I recognize a redshift (z) value, how do I recognize a radial velocity, how do I recognize an actual distance measure? (6) I am looking for supernovae that have both optical and Xray measurements. Can I (should I) use UCD to help my search? (7) How do I find 21cm observations (that may be redshifted), which also have polarization information? From pfo at star.le.ac.uk Tue Oct 21 08:46:37 2003 From: pfo at star.le.ac.uk (Patricio F. Ortiz) Date: Tue, 21 Oct 2003 16:46:37 +0100 (BST) Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: Hi Roy, here are my 2 cents. On Tue, 21 Oct 2003, Roy Williams wrote: > Please find below some possible use cases for UCD. > > -- How do the different proposed UCD/DM schemes help with these cases? > > -- What does the query look like in each case, and how would that query be > coded? > > -- Can you supply additional use cases? Please try to stay within the realm > of the possible here! > > -- If you refer to a Data Model, please try to be concrete about > who/how/example. Please do not simply invoke "the Future IVOA Data Model". > > Thank you > Roy > -------------------------- > > (1) Cone search. How to decide which columns are the RA,Dec that was used in > the search. What frame (B1950, J2000, ...) do these come from? If there are > columns with ID, what sort of ID is it, and how do I resolve it? pick RA and dec from UCDs: pos.eq.ra;main, pos.eq.dec;main If the source is a VOTable, one can use the COOSYS element to figure out the equinox/epoch. If the source is plain metadata, we're missing the equinox element by using the plain UCD. Within a DM, the equinox should be there and can be used to solve the ambiguity. Now, if they are different, the question is whether they should be precessed. I assume that you mean src.object_id... Hmmm, invoke simbad or ned or any name-solver with the appropriate output equinox. However, I would advocate at this point in time to strongly avoid the existence of catalogues where only the ID is provided. It would be an effort to solve those names for all these catalogues, but once done, we don't have to worry about it anymore. > (2) SIAP search. Find the column that contains the URLs where the images > are. Find out if there are other columns that have RA,Dec of the image > center. I can see your point of differentiating RA/dec for sources as opposed to observations in a log. RA/dec are coordinates regardless of what one measure, so UCD1 didn't do anything about it. In the image catalogues, pos.eq.[ra|dec];main represent the image pointing. > (3) We have for example a crossmatch service that is clever enough to know > about error ellipses. How does it get from a table the most sophisticated > error info that is there: (a) position (b) circular error (c) ellipse error. Thought about this one before :-) Most complete case: elliptic error: - semi-major axis error.pos.smaj - semi-minor axis error.pos.smin - ellipse orientation (PA) error.pos-ang Alternatively, one could have ellipticity (error.ellipt) xor axis-ratio (error.pos.ax-ratio) and major axis, in which case one has to compute the semi-minor axis Circular, simpler, we only need err.pos.smaj (or err.pos.rad) Roy, please extend this case not just to errors, but to extended objects, as we could be talking about overlap between galaxies or molecular clouds or other extended structures/objects. > (4) We want to compare photometry in two tables covering the same star > cluster. How do I decide if they share measurements in the same filter? One > has R band, the other has Halpha. What happens if fluxes are expressed > differently -- eg number / energy / > magnitude / luminosity density. In a first approximation, I'd say stick to compare apples with apples and oranges with oranges. UCDs should tell you which band is observed (oops, that's UCD1s), therefore it should be clear whether the two tables contain the same type of observations. One interesting scenario here is to have a real Xmatch machine which identifies the stars of one table with the ones with the other. I may want to produce a diagram type R vs Halpha. If more than two tables are involved, and one chooses one filter, we are talking about forming light curves. > (5) I want distances to stellar objects measured in meters, so I can make a > 3D display for the children. How do I recognize a redshift (z) value, how do > I recognize a radial velocity, how do I recognize an actual distance > measure? Redshift, by its UCD: redshift.hc radial velocity: veloc.hc (beware of radial velocities of expanding stars) distance: phys.distance.true, drop anything measured in kpc or Mpc :-) beware of things measured in km or au Or, go to any catalogue measuring parallax and convert to distance. convert your distances to meters (tool to be built) > (6) I am looking for supernovae that have both optical and Xray > measurements. Can I (should I) use UCD to help my search? Optical and Xray UCDs appear not only in catalogues related to supernovae. Look for catalogues with supernova in their title and then use the X-RAy UCDs (SN are still discovered in optical, so they'll surely have optical fluxes). Look at a few catalogues with SNe for quantities which are proper of these objects (var.??) > (7) How do I find 21cm observations (that may be redshifted), which also > have polarization information? UCDs will help you with polarization. Try /pol./, 21cm for sure frequency-wise: phot.flux.radio.1.4G (that's why we didn't use '.' to separate the UCDs :-) but no assurance if the redshift is too high. Scan the table title for /21/ && /cm/ if the radio UCD. I'd expect the results to be nearly the same. Note: I just used http://barbara.star.le.ac.uk/datoz/mykats.html to perform the last search... I found a few cats with POL_ and RADIO_FLUX in 1.4Ghz Sorry, no time to write the queries in a WLQL (wish list query language) :-) Cheers, Patricio --- Patricio F. Ortiz pfo at star.le.ac.uk AstroGrid project Department of Physics and Astronomy University of Leicester Tel: +44 (0)116 252 2015 LE1 7RH, UK From gtr at ast.cam.ac.uk Tue Oct 21 08:29:14 2003 From: gtr at ast.cam.ac.uk (Guy Rixon) Date: Tue, 21 Oct 2003 16:29:14 +0100 (BST) Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> References: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: An additional case: I have a heap of data expressing modelled and observed values for a handful of quantities; celestial coords and proper motions would do as an example. I want to read out all the modelled versions of one of these quantities and compare them to assess the spread of the models. Then I want to do the same with the observations. What do I look for? On Tue, 21 Oct 2003, Roy Williams wrote: > Please find below some possible use cases for UCD. > > -- How do the different proposed UCD/DM schemes help with these cases? > > -- What does the query look like in each case, and how would that query be > coded? > > -- Can you supply additional use cases? Please try to stay within the realm > of the possible here! > > -- If you refer to a Data Model, please try to be concrete about > who/how/example. Please do not simply invoke "the Future IVOA Data Model". > > Thank you > Roy > -------------------------- > > (1) Cone search. How to decide which columns are the RA,Dec that was used in > the search. What frame (B1950, J2000, ...) do these come from? If there are > columns with ID, what sort of ID is it, and how do I resolve it? > > (2) SIAP search. Find the column that contains the URLs where the images > are. Find out if there are other columns that have RA,Dec of the image > center. > > (3) We have for example a crossmatch service that is clever enough to know > about error ellipses. How does it get from a table the most sophisticated > error info that is there: (a) position (b) circular error (c) ellipse error. > > (4) We want to compare photometry in two tables covering the same star > cluster. How do I decide if they share measurements in the same filter? One > has R band, the other has Halpha. What happens if fluxes are expressed > differently -- eg number / energy / > magnitude / luminosity density. > > (5) I want distances to stellar objects measured in meters, so I can make a > 3D display for the children. How do I recognize a redshift (z) value, how do > I recognize a radial velocity, how do I recognize an actual distance > measure? > > (6) I am looking for supernovae that have both optical and Xray > measurements. Can I (should I) use UCD to help my search? > > (7) How do I find 21cm observations (that may be redshifted), which also > have polarization information? > Guy Rixon gtr at ast.cam.ac.uk Institute of Astronomy Tel: +44-1223-337542 Madingley Road, Cambridge, UK, CB3 0HA Fax: +44-1223-337523 From amsr at jb.man.ac.uk Tue Oct 21 10:28:24 2003 From: amsr at jb.man.ac.uk (Anita Richards) Date: Tue, 21 Oct 2003 18:28:24 +0100 (BST) Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> References: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: Hi Roy, Great to see something practical happening. One comment - we do need to see the use of UCDs alongside the use of the Registry and data models (e.g. Pedro Osuna's very helpful reply which has indeed lifted my confusion about non-table data - or that bit of it anyway). Hence, when people provide implimentation of use cases, it would be nice to see: Where/if UCDs are used in the Registry to select data-sets (and if only UCDs, not other entries e.g. spectral coverage, keywords, serve a particular purpose); How UCDs are used (ideally in the context of a data model) to enable the processing steps in the execution of a query. Use case - if you are sick of Brown Dwarfs read no further: http://wiki.astrogrid.org/bin/view/Astrogrid/BrownDwarfRegistryRequirements The RegistryQuery steps use some things which could be UCDs like Proper Motion but they might also be key words. Specific colours are also used, but it might be sufficient to know that a catalogue contains optical and colour data. The DataSetQuery/Evaluation steps use these UCDs above and others. The new UCD structure enabling the use of partial matches would be a great help as the workflow contains choices where you use a quantity (e.g. colour) if already defined but if not you derive it ("you" being a data processing software agent). More examples could be drawn from the AstroGrid Ten... cheers a - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dr. Anita M. S. Richards, AVO Astronomer MERLIN/VLBI National Facility, University of Manchester, Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax). From tam at lheapop.gsfc.nasa.gov Tue Oct 21 12:28:31 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Tue, 21 Oct 2003 15:28:31 -0400 Subject: A suggested revision for UCDs Message-ID: <3F9588DF.3030805@lheapop.gsfc.nasa.gov> A few minutes ago I uploaded a version of my suggested revised proposal for UCDs to the Twiki. This is just a Word version since I don't have a PDF generator handy. The URL is http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.doc This builds upon Roy and Sebastien's division of UCDs into concepts and propeties but puts them together rather differently. In addition and largely independently, it includes a discussion of the use of UCDs in the context of groups of columns and tables as a whole. With the approach suggested, I hope that the ambiguities that have heretofore precluded considering using UCDs to mediate between data models and data can be removed and we may no longer require a utype parameter. Some discussion of the document is included (in red) within the text. This mostly describes the relationship of this version to the 1.9.9 version. Sections 3 and 6 of the document are copied from the previous version. (though section 6 was section 8 in that document) from the previous version. The abstract is altered to reflect the discussion of grouping constructs. Section 2 -- describing the status of the document -- has been changed to discuss the implications of the adoption of this recommendation upon existing software systems and protocols. I think that some such statement should be part of draft recommendations. Not sure if that's part of the recommendation process but it probably should be. Section 4 is the discussion of UCDs and UCD syntax. At the end of section 4 there is a long parenthetical discussion of some of the ways this proposal differs from 1.9.9 since people seemed to be concerned about that last Thursday. If they've gotten this far they probably don't need this anymore but it may be of interest. This section tries to be rather rigorous -- addressing a fair number of nits that had not been talked about in the earlier proposal even though it makes the proposal somewhat longer (e.g., the discussion of array valued cells, a much more detailed discussion of comparability of columns, how to ensure the uniqueness of UCDs, ...) The actual definition of all of the valid words for UCDs is deferred as it was in the earlier version, but a substantial number of examples are given. [In fact, most of the words should transfer between the two versions fairly transparently. The differences lie mostly in how they are put together.] Section 5 is the dicussion of UCDs and grouping structures. I'm quite excited by this since I think it real potential for helping to unify discussions of data, data models and data access. The discussion in this chapter is less rigorous -- even if the basic idea is adopted I'm sure it will need substantially more work but I think it has real possibilities for linking data models and data. This chapter is why I've sent this message to the DAL and DM groups as well as to the UCD group. Apologies to all of you who get this twice or thrice! I trust there will be comments... Tom From tam at lheapop.gsfc.nasa.gov Tue Oct 21 12:51:17 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Tue, 21 Oct 2003 15:51:17 -0400 Subject: UCD use cases In-Reply-To: <049b01c397e1$57b39ad0$6501a8c0@Ropy> References: <049b01c397e1$57b39ad0$6501a8c0@Ropy> Message-ID: <3F958E35.1090703@lheapop.gsfc.nasa.gov> Here are some thoughts about how the new UCD scheme I've proposed might help in Roy's scenarios. I'd sent this to Roy directly earlier without realizing that it had been sent to the UCD and DM groups.... Tom > > (1) Cone search. How to decide which columns are the RA,Dec that was used in > the search. What frame (B1950, J2000, ...) do these come from? If there are > columns with ID, what sort of ID is it, and how do I resolve it? > UCDtrees look something like: src;instance meta.id[.type];value pos;instance ra/dec, l/b columns pos.coosys;value Optional, either table coosys element of the table or a table wide parameter Look for highest pos;instance and meta.id;value to get the correct id and pos columns. If we want to handle B1950 coordinates then we need to include a table parameter or parameters including the coordinate system. Of course the cone search protocol requires that the output be in J2000, but it would be easy to support genericity if we want. I suggested that the coosys element in tables could have a UCD, but less controversial would be a table parameter giving the information. That's not quite as elegant. However I don't like duplicating metadata information. Maybe the best solution is to include a parameter but make it point to the coosys using the XML ID. Not sure what you mean by resolve the id, but assuming that this isn't addressed by specific subtypes of id, replace meta.id;value with meta.id;instance meta.id;value meta.id.resolver;value meta.resolver;value would presumably be a parameter but in principal if there was a different resolver for each row in the table it could be a column. > (2) SIAP search. Find the column that contains the URLs where the images > are. Find out if there are other columns that have RA,Dec of the image > center. Basic UCDTree is: obs;instance meta.id;value # The required id field meta.link;value # The link field pos;instance # The required position These are mandated columns in the SIAP protocol and the protocol should also specify that there is only one pos, meta.id and meta.link element at the top level. Note, by the by, that if we want to allow the SIAP protocol to return galactic positions, it's easy to do. [This isn't necessarily a good idea for SIAP, but the it shows the expressivity of the UCDTrees.] Could there be an image center that is distinct from this image? This would require there to be an associated observation to the main observation. [If I really want to go overboard I might have made the table UCD obs;filter.cutout;instance but I don't think that that is going to be useful to most people, though it does show a nice use of my suggested filter hierarchy.] This could be suggested by adding the following to the tree obs;instance meta.id;value pos;instance The issue which is harder to address is understanding the relationship between this secondary observation and the primary observation. This is a semantic issue not a structural one, so a UCD must be involved. E.g., we need to define concepts for background, center, offset, ... If we have done that (and I would suggest they belong in my meas tree since they have to do with idea of a measurement) then the previous addendum becomes [and I've put in a background image for good measure] obs.instance meta.id;value obs;meas.center pos;instance obs.instance meta.id;value obs;meas.background pos;instance How do I find this... I look for associated observations and search for one which has the desired measurement attribute. I believe an XQuery expression can do this pretty easily. > > (3) We have for example a crossmatch service that is clever enough to know > about error ellipses. How does it get from a table the most sophisticated > error info that is there: (a) position (b) circular error (c) ellipse error. This is pretty easy... Looking just at a given position, we might have: pos;instance pos.eq.ra;value pos.eq.dec;value pos.eq.ra;meas.error Errors in the coordinates pos.eq.ra;meas.error pos;meas.2d.error This is a circular 2-d error pos;meas.2d.error.elliptical;instance pos;meas.2d.error.elliptical.x pos;meas.2d.error.elliptical.y pos;meas.2d.error.elliptical.posang which gives all three of these. While there may be a better name for the last two errors, I don't think it matters much. Whatever error there is is directly associated with the position and it's easy to find out what errors are availalbe. The question of which error the user should use, is not appropriate for the UCD scheme. That's up to the user. The UCDs indicate which errors are available. I'm not sure if we want to elaborate the measurement tree to this level, maybe it could be done more simply in some other way. However I don't think this is an unreasonable approach. > > (4) We want to compare photometry in two tables covering the same star > cluster. How do I decide if they share measurements in the same filter? One > has R band, the other has Halpha. What happens if fluxes are expressed > differently -- eg number / energy / > magnitude / luminosity density. Well I don't think unit conversions are an issue that UCDs should address. So I'll pass on that. [That's what units keywords are for after all.] The columns should describe the bands in which they were taken. If we want to assure that we actually have the same filter, then we need UCDs that are specific down to the filter level. The old UCD tree (and I assume you've kept it in the current one) puts all of that info in the initial word. I'd tend to use the em modifier here to define the band. E.g., phot.mag;em.optical.filter.v.johnson;value phot.flux;em.optical.filter.v.johnson;value and phot.counts;em.optical.filter.v.johnson;value for magnitude, flux and counts (in a photon counting instrument) If the question is whether the filters overlap, then I don't think this is a question the UCDs answer directly. It's expert knowledge about the concepts (Just like the exact relationship between RA and Dec isn't specified in the UCD. That's expert knowledge too.) > (5) I want distances to stellar objects measured in meters, so I can make a > 3D display for the children. How do I recognize a redshift (z) value, how do > I recognize a radial velocity, how do I recognize an actual distance > measure? Units are not a UCD issue. The other questions simply need distinct UCDs. As I understand the schema they would just be different elements of the phys tree but that's a bit of a guess. Redshift is a different concept from distance as is a radial velocity. They can be converted into each other in certain circumstances, but it is not the role of UCDs to understand the transformation rules -- just as it is not the role of UCDs to understand how to convert from RA,Dec to L,B. I think it may be reasonable to have all of the base concepts: phys.distance A distance of some kind phys.distance.z A redshift in a cosmological context phys.distance.vrad A radial velocity in a cosmological context phys.z A redshift in non-cosmological context phys.vrad A radial or to only have the single distance concept and leave it to the user to recognize that they are in a cosmological context so that radial velocities can be converted to distances. That's probably safer from the context of trying to ensure that there is only one UCD for a given concept so I'd tend to go that way. Users put z in a table rather than distance for a reason. UCDs describe what a column is, they need not and should not describe what you can do with a column. I gather that is the role of ontologies. > > (6) I am looking for supernovae that have both optical and Xray > measurements. Can I (should I) use UCD to help my search? You can certainly try. This depends a bit upon the specific UCD tree. If table UCDs are often of the form src.[type];instance you can search the registry for tables of the form src.supernova;instance Similarly ou can use search columns the phot and look for optical and X-ray qualifiers. If you get tables that meet all the criteria you've got a great place to start. I.e., look for UCDTree that matches the template src.supernovae;instance //phot*;em.xray*;value //phot*;em.optical*;value [Where this is some Xqueryish kind of match to the UCDTree hierarchy] and you've got a great chance that this has the information you wnat. Even if there is no match, you can consider tables that are partial hits and look to see if you can join information appropriately using multiple tables. A real win here would be a VO service to enable comparison of class information among tables. Right now source classes are a real mess and we need a Simbad- or NED-like service to address it. That will probably be needed if you want to interrogate the vast majority of UCDTrees that with have src;instance and some classification parameter inside. > > (7) How do I find 21cm observations (that may be redshifted), which also > have polarization information? > > Look for all tables that have columns that match flux.phot*;em.pol*;value to get polarization information Limit to tables that have a flux value in the 21cm region -- don't know if that has it's own special UCD -- or which has flux in the region from 10-100 cm and a redshift or radial velocity. This assumes there is some service that gives me the em qualifiers I need to look for given a specific energy/wavelength/frequency range. It wouldn't be too hard by hand either. This is a straightforward analysis of the column UCDs that doesn't need to worry about the structure UCDs at all. From hanisch at stsci.edu Wed Oct 22 10:15:37 2003 From: hanisch at stsci.edu (Robert Hanisch) Date: Wed, 22 Oct 2003 13:15:37 -0400 Subject: Fw: A suggested revision for UCDs Message-ID: <021901c398c0$1ec33c00$7deca782@stsci.edu> Here is a discussion that Tom and I had off-list, but I think are number of points of more general interest are raised. Warning -- it is quite long! Bob - - - - - Hi Bob, Thanks for the review and comments. I'm particularly interested in the areas that were unclear. It seemed to me that I needed to actually put the ideas out where I could get some detailed reactions. A fair number of typo issues were addressed in the version I uploaded to the Twiki and announced to the UCD, DM and DAL groups. Haven't heard of any reaction. I've responded to your comments below (there a lot of detail but I thought it user to think these things through). Tom Robert Hanisch wrote: > Hi Tom. I read through your revised UCD document this evening. Phew. > There is much in it I like, much I don't, and much I don't follow. Perhaps > the two latter categories mix together. > > I guess my biggest problem is that the roles of concept, attribute, and > modifier are partly defined by syntax (where they appear in the string) and > partly by having to know what names (em, pos, flux) have been allocated to > which category. This seems very arbitrary (and very confusing) to me. > Although I have never written a parser in my life, it looks to me like a > parser for this would be a zillion if statements. Maybe this is fewer if > statements than for other approaches, but it still looks very complex. > I agree that this is a major issue, although my biggest concern with it is a little different. I'm giving a long answer to help me organize my thoughts. The writer of a table presumably has access to the documentation for UCDs so it shouldn't be a big problem dealing with the three types -- especially once there are examples. The problem is more in using UCDs when reading tables. In practice I'm not sure this would be a big deal for 'real' tools. E.g., something like VOPlot is going to need to know about the value and meas.error attributes internally so that it can plot values and error bars for a given quantity. I.e., it's just going to look for pairs of columns within the same group of the form: SomeString;value and SomeString;meas.error A spectral processing tool is going to look for pairs like phot.flux*;value and phys.wavelength;value. Specific tools internalize this kind of knowledge -- or even better read it in as a data model. These tools don't really know about how UCDs are organized. The organization is intended to make it easy for them to search for the appropriate strings, but they just take advantage of that. Generic tools for manipulating UCDs and for validating them are where the problem really begin to show up. Currently there are only 6 trees that are not basic concepts (em, frame and intent for modifiers and filter, stat and meas for attributes). I think the single word attributes are important enough that they will not cause a problem. So a complete algorithm to determine what word belongs in what vocabulary is currently pretty easy... Psuedocode is just: firstAtom = substring(ucd, index(ucd,".")) switch (firstAtom) { 'em', 'frame','intent': return thisIsAModifier 'stat','filter','meas': return thisIsAnAttribute 'value','local','instance','multiplet', 'vector': return thisIsAnAttribute } return thisIsAConcept Alternatively we're talking about validating UCDs against an IVOA schema to define the valid words and the match against this could give the type. There are other simple ways to deal with this: Begin all modifiers with m. and attributes with a. Or I've suggested in the draft that all modifiers could be in the frame tree -- the idea is that the role of modifiers is to limit the context to which the concept applies. I don't think the attribute trees join as easily but if it's important enough we could pick a name for all of the attribute trees. The biggest problem is non-standard namespaces. How do we handle a new UCD tree? In some sense the issue is moot. Non-standard words shouldn't be used outside of some developers local context. They can be responsible for handling them. However I suspect that non-standard words will escape into the wild. The validate against the schema approach still works, but it's impossible for writers for tables to know how to use these UCDs. There are some other ideas that might help address this issue: Your suggestion of another separator character is nice. I thought about it but decided that it was too radial a change. Maybe separate atributes and modifiers within themselves by commas but separate them by '-'s. e.g., a complex UCd might be: flux.phot-em.optical,intent.calculated-meas.error,stat.max I'd still like to keep the vocabularies separate, but now it's trivial to parse the UCD. For the moment I tried to minimize the change from the original proposal. Note that this is all much harder in the original proposal. There is no way to tell what anything after the first word is. In that proposal the first word is a property, but all subsequent words can be either properties or concepts. Nor there any lexical definition of what a property is (i.e., any word can be a property). > The document has a lot of signs of a rush job -- is it Uniform or Unified? > (Unified, I think.) I always thought it was Uniform so that wasn't a typo but an error or my part... Sigh... Is flux a 0-level concept? Or is it phot.flux? That I think is fixed in the published version (it's always phot.flux) On p. > 3 you say that units are not part of UCDs, but on p.16 you create a UCD, > phys.degrees;value I wasn't quite sure what the UCD should be there. Maybe phys.angle.separation;value? , that is all about units. On p.12, I really like the > typo(?) in 'pudding' (pubbing). Alas that is also fixed. [That kind of error must reflect some curious things about the mind. I clearly picked the mirror image letter even though the typing motion for it is nothing like 'd'] > > I'm not sure how others have reacted -- have not gone to the UCD list yet to > see. But I was particularly confused by the following things. > > o p.4, you say that > > phot.flux;em.optical;intent.calculated;value > > is equivalent to > > phot.flux;intent.calculated;em.flux;value > > But there must be a mistake here. Shouldn't 'flux' in the second line be > 'optical'? And isn't the first form illegal if alphabetical order is > required? The typos in the UCD were fixed and I hope that would help clarify what I was trying to say. The two UCDs should have been phot.flux;em.optical;intent.calculated;value and phot.flux;intent.calculated;em.optical;value The statement I was trying to make was that there is no natural reason to prefer one of these to the other, so we had to choose an arbitrary rule to try to ensure uniqueness of UCDs. Thus indeed the second is illegal. > > I find the goal of brevity at conflict with the goal of clarity. What does > 'em' mean to a human reader? Why 'src' and not 'source'? Why 'value' and > not 'scalar' (parallel structure to 'vector')? Why default on 'value' in a > otherwise well-defined ontology? I can't really argue with most of these. The tension between various goals it why I tried to list them all together. I would be happy to change to longer words. The default for value was just meant to be a convenience for writers of tables. If it confuses things I'm happy to drop it. I like value rather than scalar because a value can be a vector quantity. E.g., if we have a cell that contains an array of fluxes it's UCD might be phot.spectrum;value That's because the concept of spectrum is inherently non-scalar. A field that had a UCD of phot.spectrum;vector would imply that each cell contained an array of spectra (i.e, that the cell was presumably a 2-d array). However this is no big deal. > > I think if a clear distinction is to be made between attributes and > modifiers, it must be encoded explicitly (i.e., not just based on a list of > magic words). I do not like the semicolons as delimiters; this is not what > they mean in English grammar. (The semicolon in the last sentence was used > properly. The second clause is not necessarily a direct modifier of the > first, but rather is related in some intimate way.) This is fine by me -- I gave an example above using different separators. I think the grammar is just as simple. > > I don't understand how to use the concept 'concept' in a practical sense. > Well I tried to give two examples: If you have a VOTable in an editor how do you find the fields that don't have a defined concept? If a user simply omits the UCD field it's kind of painful to find them. However one can just do a string search for "concept" if the user has entered ucd='concept;value' to explicitly mark that the underlying UCD is unknown. The real reason is given in the last example in section 5. When correlating two tables that describe different kinds of quantities, e.g., sources and observations, I need to be able to describe what the ouput table is. There are two objects in every row so it's a multiplet (in my scheme), but what kind of multiplet? I can't call it a source, and I can't call it an observation, so I need to go up to a more generic word, i.e., concept. Basically it just provides the root for entire concept hierarchy. If we really wanted to be regular, we could start all of the base concepts as using this word... > Your definition of 'pos' does not include solar or planetary coordinate > systems, though later you give an example that does. I don't know what the current hierarchy under pos is... What I'd guess is that it would contain something like: pos.body.lat and pos.body.lon and then the frame modifier would be used to specify which body. [Or maybe I left an inconsistency in from the previous version] > > 'intent' is defined as the 'human context' of the concept. Huh? How are > 'calculated', 'predicted', and 'simulated' anymore human concepts than > 'observed' or 'measured'? Observed and measured would be fine additions here except that they are likely to be considered the default. I.e., a time.exposure;value is assumed to be the measured time, so I don't need to put that in. [Note that meas is short for measurement]. The explanation probably needs to be better, but I think we need some kind of modifier that distinguishes between 'real' values and predicted, scheduled, calculated, ... values. This doesn't come up so much in VizieR tables, but many of the tables that I deal with are riddled with situations where I may have an allocated exposure time, a predicted exposure time and an actual exposure time. So something is assumed to be actual/measured/observed unless an intent is specified. > > In 4.4 you insist that full words should be used ('electron' instead of > 'el'), but at the same time assert that 'phys', 'temp', 'em', etc., are all > ok. I don't have a horse in this race... I tried to match the usage of the previous paper, but I'd be happy to go either way. > > Example 2 (p.14) does not convey to me anything semantically different if I > disregard your comments. How am I supposed to understand something about > guide stars and plate centers from the structure of the UCDs alone? I take > issue with your assertion that "both software and humans should have no > trouble distinguishing the very different semantics of the two tables." > Well... I'd hope that by looking at the table UCD, you would immediately note that one table returned source information and the other returned observation information. That's no small matter. The structure immediately shows which concept is subordinate to the other. The actual semantics of the relationship were not described. You could do that if you want that level of detail. I'm not sure what the right UCDs are. E.g., in the source table might have included (hope the indentation survives the mail): obs.instance meta.id;value pos;meas.center pos.eq.ra;value pos.eq.dec;value I guess if we really want to include the concept of a guide star in the UCD hierarchy, they probably belong in the base concept or maybe in frame somehow, but I think this is too detailed. If we went ahead with it... The guide star might be src;frame.usage.guiding;instance meta.id;value pos.instance Note that in the first case it's the position that got the extra information, because the observation is just a standard observation (as far as we know). In the second case we're suggesting that this is a special kind of source. But I don't think I want to put that in the relatively simple examples. What I was trying to show was how the need for main columns has disappeared and that we could get source or observation information from either table with equivalent ease. > I don't like 'arith' as a concept. 'math' would be ok. If we need it at > all. Well I did try to discourage it... I have no problem with math. > > I don't like 'soft' as a concept. Is it so bad to just say 'software'? All > this stuff will be encoded in XML, which is notoriously verbose. If we > chose unclear abbreviations we will obscure whatever semantic meaning is to > be found. Fine with me... > > OK, a lot of these criticisms are not really directed to you, but to the > predecessor document. I understood your presentation in Strasbourg (I > thought) but do not follow the document sufficiently well that I would ever > be comfortable promoting it forward. I did not like Roy and Sebatien's > premise that concept and property could morph, one into the other, depending > on context. I do like your attempt to structure things more rigidly. It > seems to me not rigid enough. And when I ran into phys.degrees I felt like > the whole thing was falling down around me. The concept is an angular > distance, which of course can be expressed in degrees, radians, arcsec, etc. Agreed... {see above) > > It might be worth our time to look at the AIPS++ measures definitions. If I > were to construct a quick hierarchy, what we are trying to do here is > distinguish various sorts of measurements, metadata about those > measurements, and metadata about the people/organizations associated with > those measurements. So our fundamental concept is a measurement, of which > there are various sorts: > > measurement > photometric > spectroscopic (which is just photometric per wavelength in an ordered sort > of way) > astrometric ('pos') > temporal > instrumental > > Ancillary information about measurements comes in the form of metadata: > > metadata > identifiers > people > organizations > > And we may have some special classes: > > software > source (to collect measurements of an object in space-time) > > Measurements are taken in bandpasses, and in certain coordinate frames, and > from either the real universe or from computer simulations. A bandpass is a > 'frame' restricting coverage in the em-spectrum. A coordinate frame > describes a restriction on the spatial coverage. The idea of 'intent' has > nothing to do with anything; it is simply a mode of collecting measurements. > > Allright, enough of my rantings for this evening. I applaud your attempt to > add rationality to Roy and Sebastien's work, but feel we still have some way > to go. > Thanks... I don't disagree with what you are saying and I hope that we can a least reopen the discussion. Tom From norman at astro.gla.ac.uk Wed Oct 22 10:43:48 2003 From: norman at astro.gla.ac.uk (Norman Gray) Date: Wed, 22 Oct 2003 18:43:48 +0100 (BST) Subject: A suggested revision for UCDs In-Reply-To: <3F9588DF.3030805@lheapop.gsfc.nasa.gov> Message-ID: Greetings, all, and Tom in particular. On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > A few minutes ago I uploaded a version of my suggested revised > proposal for UCDs to the Twiki. This is just a Word version since > I don't have a PDF generator handy. The URL is > http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.doc I've appended a (longish) set of comments below. I've just noticed that Bob has forwarded a long set of comments to the list. I haven't read those yet. By the way, I notice that this announcement/discussion has been posted to no fewer than _three_ lists, namely ucd, dm and dal. It would be at the very least neater if it were on only one -- ucd at ivoa.net is the obvious one. What do folk think -- are there folk on the other two lists who have an interest in this and aren't on the ucd at ivoa.net list? I'm sure I'm not the only one to find Tom's proposal very thought-provoking. The suggestions bring up several new use-cases; and the idea of the `local' atom in particular is valuable, and a gap in the 1.9.9 proposals (though I'd put it in a different place). I think there are very likely several places in the 1.9.9 proposals which are underspecified, and some where I personally would probably explain things slightly differently from Roy and Sebastien, but these are editorial matters. I have a few difficulties with some aspects of Tom's proposal, however, which I'll discuss here, and add a few more general remarks at the end. I'm speaking for myself of course, rather than the group of authors, and thus it's probable that my opinion and interpretation of some 1.9.9 points is at variance with others in the group, or goes beyond what the document aims to say (which would be a useful datapoint). Most urgent, I think, is Tom's discussion, in his section 4.5, of the distinction between his proposals and the 1.9.9 ones. These are crucial, since these criticisms are what would ultimately justify replacing the 1.9.9 proposals with Tom's more complicated ones. In the 1.9.9 proposals, the function of a word is always the same: some things such as `src' are concepts (and only concepts), and every other word names a property. The distinction is that concepts can't have a value, but can have properties; and a property always has a value. Now, the property;concept _pair_ also names a concept, which can therefore have properties in turn (this has the same potential as Tom's proposals for generating long UCDs in principle, but probably very unlikely in practice). There will doubtless be some rather formal language which makes this cast-iron, but it's actually fairly intuitive once you get the property/concept dichotomy and read `;' as `of a' or something like that. Section 3.1 in the 1.9.9 proposals -- the crucial section of the document, for which everything else is to some extent just scaffolding, and without which the rest of the document makes rather less sense -- is what attempts to describe this. Perhaps that explanation needs work. At any rate, I do not believe that one has to sign up to the (basically ontology-inspired) language in that section in order to use the UCDs thus justified. Indeed, it might be useful for that section to be split into two, one to communicate the underlying idea to folk who simply want to _use_ UCDs, and another to reexpress it more formally for the ontology enthusiasts. In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any string of words can be determined to be illegal in the old scheme''. I'd probably agree in outline: there are significantly fewer rules necessary in the 1.9.9 proposals than in Tom's proposals. The only place a base concept can go is in the right-most position, and thus you can't have a concept sitting on its own, since the left-most position is the name of the property, the value of which is the number/column/whatever which has been annotated by this UCD (the syntactic mechanism for making that annotation is outside of scope for the UCD proposals, I'd think). Also, there are some property-concept pairs that make no sense, such as stat.err;src. But that's about it -- you don't need any more rules than that. Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does look unwieldy (but note there's no need for parentheses in the 1.9.9 proposals), but I get the impression that the `arith' UCD tree was to some extent a kite being flown, and I for one would be surprised if it made it much beyond this version, partly because it would seem to encourage such odd-looking UCDs. Also, there's no tying of one table to another in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and quite properly so: I'll mention this below). The 1.9.9 proposals allow no ambiguity in the way that UCDs are written: properties queue up in front of the single base concept, and ordering matters, so that stat.max;stat.err;phot.flux is different from stat.err;stat.max;phot.flux. More specific points in Tom's proposal, in document order rather than any other (section references are to Tom's document): Section 4.1: Bringing the number of terms up to three -- concept, attribute and modifier -- reminds me of the qualifier/modifier idea that was in previous versions of the draft, which I still think is an unstable distinction, and which Roy and Sebastien thankfully managed to get rid of by simplifying the syntax down to just concept plus properties (but see below). Also, there's no syntactic distinction between modifiers and attributes, so in order to apply the extra ordering rules for those, or even to break the UCD into its three parts, you have to know which words are of which type. That is, you can't do it at parse time. Section 4.1.2 (not an important point, I don't think): I'm puzzled at the requirement that words in the non-standard namespace must be distinct from all words in the IVOA namespace. The point of having a namespace is to make this possible, or (since such duplication would surely be condemned as bad practice) at least not an error. The rule also means that if a new word were added to the IVOA namespace which happened to match a word in a private namespace, the namespaced UCDs would thereby suddenly become invalid, with no change in the spec. Section 4.2.2: The `intent' modifier has no corresponding notion in the 1.9.9 proposals, but it's not clear to me where in those proposals this would fit in, and I think this is a _problem_ for the 1.9.9 proposals. I can see how it would fit in to what I take the underlying 1.9.9 model to be, but not into the serialisation of that model that the 1.9.9 syntax represents. I can see three approaches to this problem within the general framework of the 1.9.9 proposals. (i) Rule it out of scope: it's not UCD's problem to talk about what values are intended to be, since they're only for data discovery, and are not required to be capable of driving analysis, so that if this `intent' distinction matters to you, you're going to have to understand the utype somehow. (ii) Add modifiers like this to the 1.9.9 model and syntax: that's potentially quite a lot of work, since it would require thinking very clearly about just what the distinction is between modifiers and properties, _and_ working out a usable syntax for adding them in -- they _have_ to be distinguishable at parse time. (iii) Think about it more and discover a way they can be viewed as properties in a principled way. The point isn't just about this `intent' modifier: if we can convince ourselves that there are things like `intent' (and that they're in scope) which are in principle qualitatively distinct from properties (and I would at least dispute that `em' and `frame' count here), then that has to be dealt with. Perhaps this example will help us find the stable distinction between `qualifiers' and `modifiers' that escaped us in earlier versions. Section 4.2.3: The `value', `vector', `instance' and `multiplet' attributes seem overly complicated. The `value' attribute is not required in the 1.9.9 proposals because all properties have a value, namely the value they're being used to annotate. The other three seem artefacts of the `complex UCDs' which Tom is introducing in these proposals. These complex UCDs seem problematic to me because they seem tightly bound to VOTable. That destroys the orthogonality of the UCD and VOTable specs (the W3C has had _terrible_ trouble with non-orthogonal specs, tying itself in knots trying to resolve their dependencies on each other), and makes it harder to use UCDs in other contexts, such as queries. I feel that UCDs should be seen as annotating a `thing', whether that `thing' be a value, a column, a group, or a query `phrase', and it should be the responsibility of whatever defines the syntax of that annotation (that is, VOTable or SIA) to define precisely what the thing is that the annotation applies to. Thus, VOTable might say that when a UCD appears in a then it indicates a set of relationships between the corresponding entries of the table; when it appears in a it means something different; and so on. Dealing with the typing and complexity issues of this in a general way within the UCD spec would surely make it impossibly unwieldy and limit its scope. This is also a general worry for all of Tom's Section 5; I really think this should be out of scope for UCD, to the extent that Tom's ``The grouping does not describe the semantics of the relationship. That is the role of UCDs'' would be much better as ``The grouping describes (some of?) the semantics of the relationship. That is not the role of UCDs''. This is a can of worms. Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals. Another way of dealing with it would be to say that a UCD `local.X' meant exactly the same as the `X', but was not comparable with it. More general points: Tom's document seems to discuss his proposals in object terms. However the property-concept parts of the UCD proposal are _not_ an object model, and if you cram them into an object model, they won't fit, and the result will inevitably look like a mess, and look backwards. The model is simpler than this, however: things which are purely concepts (such as `src') don't have values. Concepts do have properties though, and these properties have numeric values, namely the numeric values we're trying to annotate with this UCD. As regards ordering, yes, as Tom said, it doesn't fundamentally matter, and it's just a matter of syntax, rather than of the model. However having the property first seems natural, since it's this which posesses the numerical value which is being annotated, and so it's this which I would have thought it best would be shown up-front. Now, there is a _vague_ object model implicit in the construction of the UCD words like `pos.eq.ra', but this is only because, along with the replacement of underscores with dots, came the explicit freedom to crop each word at a dot from the right, and use the result as a UCD word also. This prompts a natural perception of the words as hierarchical, or object-oriented if you must. The actual words are basically little changed from the original UCDs, though there's a review of these under way. These words weren't the main point of the UCD2 proposals. At present these words are those mined from the column names actually occurring in the databases in the CDS collection; they are thus unprincipled. Whether this is a good or a bad thing is an open question. I'm sure it is this which causes some people (I'm thinking of Gerard Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra for special deprecation as incoherent. If you believe that principled generation of UCD words would be a Good Thing (and that would probably be my prejudice), then I suspect that paths in (say) Gerard and Pat's model would be a good way to do it (do Gerard and Pat claim that every UCD word is thus expressible?). If you believe, on the other hand, that the mined nature of the words is of primary importance (and I can see the force of that, too), then they might need little more than a review or tidy-up, to make sure that the `croppability' is reasonable in fact, and that the implications, or suggestions, of the words chosen do in fact fit in with a properties-based model (or whatever we end up with). Phew! I think that's probably quite enough for just now -- I should let someone else get a word in. All the best, Norman -- --------------------------------------------------------------------------- Norman Gray http://www.astro.gla.ac.uk/users/norman/ Physics and Astronomy, University of Glasgow, UK norman at astro.gla.ac.uk From Edward.J.Shaya.1 at gsfc.nasa.gov Wed Oct 22 11:35:46 2003 From: Edward.J.Shaya.1 at gsfc.nasa.gov (Ed Shaya) Date: Wed, 22 Oct 2003 14:35:46 -0400 Subject: UCD changes on top of McGlynn's changes Message-ID: <3F96CE02.8020804@gsfc.nasa.gov> I have made numerous tracked changes in the 1-9.9b of McGlynn and created a 1-9.9c. So if one has a recent version of Word one can accept or reject these changes as seen fit. I think someone will need to post this to the UCD and dal lists cause I am not on them. Here are some "highlights" -------------------------------------------------------------------------- The term property was used in a confusing manner. Everything was at some point in the specification referred to as a property including the basic concept as well as the attribute and modifier. So, I changed attribute property to attribute and modifier property to modifier. I think this is pretty good but it is missing something. The modifiers are in effect extending the hierarchicial tree of basic concepts into the virtual tree of full concepts. This is not mentioned and probably should be. But, if it is true then sometimes the order of the modifiers should make a difference. I don't have an example but I am worried that there will be concepts that require A;B;C and other concepts that are A;C;B but they are not the same. I don't see how to ensure that that never happens. ------------------------------------------------------------ Word and atom were a bit confused. Atoms looked like words to me and Words were clearly composed of several words (not good). There was no atom in the Backus-Naur notation. But there were examples of atoms of the physics type (yikes). So I made the following simplification. atom -> word word -> term word-component -> word ----------------------------------- I changed a few occurrence of "column" to "contents" so that it did not seem that this was for tables only. And so the Contents in "UCD" would be meaningful. -------------------------------------- Why is there a meas.error and a stat.erro, and one is a concept and one is an attribute? Perhaps this was suppose to be a stat:max? 1 x1:experimental.quantity;x2:new.modifier;stat.error ---------------------------------------------------- Why not have a different symbol to separate the attributes from the base and modifiers? pos.eq;phys.electron#value;vector pos.eq;phys.electron#stat.error;vector This is clearer. It says, if you are looking for instances of the concept of a positional measure of electrons, here it is. By the way it is in vector format and there is an error associate with it, you may need to transform this format. Queries will be keying on the concept and so it should be cleanly separated. If the query finds additional attribute information it may grab them for completeness even if they were not specificied in the query. --------------------------- Why not allow a namespaced term to reuse existing term? That is what namespace is for! 1 phot.flux;x1:meas.error Namespace reuses existing term-------------------------- In the Group of pos.eq.ra and pos.eq.dec the UCD of the group should be pos.eq, not pos.instance. The "eq" is there because one should be as specifying as possible. The instance should not be there since this is a table level term and is redundant here. ------------------------------------------------------------ I don't buy the idea that the main pos is always in the least indented or grouped column. This is a extremely fragile and restrictive way to go. There are many ways that the targets are in a more groups than the plate positions or the guide stars. What if the target stars have position groups and so one wants a second grouping of ra, dec l,b or what if the grouping of stars is by cluster or by spectral type or by accuracy of measurements etc? That is not to say that I like the idea of a main UCD. Rather, the best way is to ensure that the structural container of the data has a way of refering the properties to the objects that has these properties. A quanitty needs to have a isPropertyOf attribute that refers to the object. So, a positional property column should have isPropertyOf="column(starName)". The default could be isPropertyOf=column(1). --------------------------------------------------------- Finally. I find it curious that this system makes no descrimination between properties of objects (color, brightness, distance, size) and objects (electrons, atoms, planets, stars, galaxies). Every time one uses a UCD-property there must be implicitly a UCD-object that has been left off. A brightness is always of a star or a planet or a human. A query system must then be able to infer this from other metadata in the dataset. Therefore one needs to ensure that every data set has somewhere atleast one UCD-object. This will be hard to do if they are not somehow separated out I just wanted to point that out. It may or may not be a fatal flaw. Ed -------------- next part -------------- A non-text attachment was scrubbed... Name: UCD-1.9.9c.doc Type: application/msword Size: 387584 bytes Desc: not available URL: From patrick.dowler at nrc-cnrc.gc.ca Wed Oct 22 12:19:29 2003 From: patrick.dowler at nrc-cnrc.gc.ca (Patrick Dowler) Date: Wed, 22 Oct 2003 12:19:29 -0700 Subject: A suggested revision for UCDs In-Reply-To: References: Message-ID: <200310221219.29068.patrick.dowler@nrc-cnrc.gc.ca> On Wednesday 22 October 2003 10:43, Norman Gray wrote: > I'm sure it is this which causes some people (I'm thinking of Gerard > Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra > for special deprecation as incoherent. If you believe that principled > generation of UCD words would be a Good Thing (and that would probably > be my prejudice), then I suspect that paths in (say) Gerard and Pat's > model would be a good way to do it (do Gerard and Pat claim that every > UCD word is thus expressible?). That particular example was chosen to show that the different parts of the POS_EQ_RA_MAIN (UCD1) are quite different types of things with different realtionships to the thing being described. In the data model, POS is the "position" phenomenon, EQ is a particular ReferenceSystem, RA is one component of a point (the type used to represent a position in that ReferenceSystem). I still don't know that MAIN would belong in a data model. It isn't "incoherent" so much as it includes some very different kinds of things. In the concept;property style of UCD2, I don't think it is inconsistent with the data model we presented. The difference between UCD and DM is that in a model one explicitly states the relationship between the concept and the property, So, for example, in our DM we are expliclty saying that the relationship between a position and the ReferenceSystem is a different relationship that between the position and the RA (a component of the data type/structure). UCD2 leaves the relationship implicit by having only one relationship: "propertyOf". Whether that's good or bad thing is an open issue. -- Patrick Dowler Tel/T?l: (250) 363-6914 | fax/t?l?copieur: (250) 363-0045 Canadian Astronomy Data Centre | Centre canadien de donnees astronomiques National Research Council Canada | Conseil national de recherches Canada Government of Canada | Gouvernement du Canada 5071 West Saanich Road | 5071, chemin West Saanich Victoria, BC | Victoria (C.-B.) From tam at lheapop.gsfc.nasa.gov Wed Oct 22 13:14:39 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 22 Oct 2003 16:14:39 -0400 Subject: A suggested revision for UCDs In-Reply-To: References: Message-ID: <3F96E52F.8060402@lheapop.gsfc.nasa.gov> Hi Norman, Thanks for your comments. I've got some responses for some, I'm glad you found some good in this! In keeping with your suggestion I've only sent this to the UCD group. I sent the original proposal to the other groups since I thought the use of complex UCDs is relevant, but I'm sure we can live without three copies and hopefully they have subscribed so they can hear this interesting(!?) debate. Tom Norman Gray wrote: ... > > Most urgent, I think, is Tom's discussion, in his section 4.5, ... That was really meant to be at the end of section 4, rather than part of 4.5... >... of the > distinction between his proposals and the 1.9.9 ones. These are crucial, > since these criticisms are what would ultimately justify replacing the > 1.9.9 proposals with Tom's more complicated ones. > > In the 1.9.9 proposals, the function of a word is always the same: > some things such as `src' are concepts (and only concepts), and > every other word names a property. The distinction is that > concepts can't have a value, but can have properties; and a property > always has a value. Now, the property;concept _pair_ also names a > concept, which can therefore have properties in turn (this has the > same potential as Tom's proposals for generating long UCDs in > principle, but probably very unlikely in practice). There will > doubtless be some rather formal language which makes this cast-iron, > but it's actually fairly intuitive once you get the property/concept > dichotomy and read `;' as `of a' or something like that. Well there are two distinct issues here. First lexically how can I tell what is a property and what is a concept? In 1.9.9 there is no way to tell. The only lexical statement it makes is that the first word is a property. What about the second word, or the third? The 1.9.9 proposal allows properties that modify other properties. The second issue is the semantic confusion. The second word can be (according to section 4.2 in 1.9.9) a concept referred to another property referred to information related to the primary word. In section 3.4 in defining the error in RA of a galaxy we have the phrase We identify the central property as "error", and the concept as right ascension", with a subsidiary word about "galaxy". So in fact version 1.9.9 has all three of concept property and modifier but just tries to hide the fact and doesn't give you any way of telling which is which... Suppose I give one a UCD of word1;word2;word3 Does word3 modify word1 or do both of them modify word1? No way to tell. stat.error;phot.flux;em.optical (word3 modifies word2) or phot.flux;em.optical;src.galaxy (word3 modifies word1) There is an explicit statement that some things (pos.eq.ra) are sometimes concepts and sometimes secondary words and the rules make it trivial to build UCDs that are simply incomplete semantically. stat.error is a perfectly valid UCD but it has no rooted semantic content. You may say "Well what about meta.id, isn't that the same?" Not really because I can suggest (and indeed I do suggest) that UCDs need to be interpreted in the context of what the table is about. So for a source table meta.id refers to the id for a source, for an observation table meta.id refers to the id for a source. But what does stat.error refer to.... The error in the source? That doesn't make sense. > > Section 3.1 in the 1.9.9 proposals -- the crucial section of the document, > for which everything else is to some extent just scaffolding, and without > which the rest of the document makes rather less sense -- is what attempts > to describe this. Perhaps that explanation needs work. At any rate, I do > not believe that one has to sign up to the (basically ontology-inspired) > language in that section in order to use the UCDs thus justified. > Indeed, it might be useful for that section to be split into two, one to > communicate the underlying idea to folk who simply want to _use_ UCDs, > and another to reexpress it more formally for the ontology enthusiasts. > > In his section 4.5, Tom also remarks that ``Indeed I'm not sure that any > string of words can be determined to be illegal in the old scheme''. > I'd probably agree in outline: there are significantly fewer rules > necessary in the 1.9.9 proposals than in Tom's proposals. The only place > a base concept can go is in the right-most position, and thus you can't > have a concept sitting on its own, since the left-most position is the > name of the property, the value of which is the number/column/whatever > which has been annotated by this UCD (the syntactic mechanism for making > that annotation is outside of scope for the UCD proposals, I'd think). > Also, there are some property-concept pairs that make no sense, such > as stat.err;src. But that's about it -- you don't need any more rules > than that. > > Tom constructs an `arith.diff;arith.sum;phot.flux;...' UCD. That does > look unwieldy (but note there's no need for parentheses in the 1.9.9 > proposals), but I get the impression that the `arith' UCD tree was to > some extent a kite being flown, and I for one would be surprised if it > made it much beyond this version, partly because it would seem to encourage > such odd-looking UCDs. Also, there's no tying of one table to another > in the 1.9.9 proposals -- I'd think that was out of scope for UCD (and > quite properly so: I'll mention this below). > Sorry that's a typo... Should have said tying of one column to another. > The 1.9.9 proposals allow no ambiguity in the way that UCDs are > written: properties queue up in front of the single base concept, and > ordering matters, so that stat.max;stat.err;phot.flux is different > from stat.err;stat.max;phot.flux. What I call attributes and the properties you specify here are indeed largely unambiguous in both cases. However what I call modifiers and what 1.9.9 calls either subsidiary words or 'information related to the primary word' are less clear. E.g., suppose I'm detecting circularly polarized light in the radio. That natural UCD for this would be: phot.flux;em.radio;em.polarized;circular or is it phot.flux;em.polarized;circular;em.radio or do we have to multiply the size of the vocabulary to add polarized and polarized.circular (and the other variants) to every wavelength spec we have? That seems silly... So we need to fix that. There's a similar problem with arith (another nail for the coffin perhaps) is it arith.sum;property1;property2 or arith.sum;property2;property2 And this general idea that properties can refer to other properties in an uncontrolled way... Here's a UCD describing the flux of galaxies... phot.flux;em.optical;src.galaxy or is it phot.flux;src.galaxy;em.optical ? E.g., suppose I have a column that is the maximum flux from any of three wavebands. Can I write stat.max;flux.phot;em.optical;flux.phot;em.xray;phot.flux;em.radio I hope not, but the document seems to encourage it. This would be illegal in the revision since it includes three base concepts. > > > > > More specific points in Tom's proposal, in document order rather than > any other (section references are to Tom's document): > > Section 4.1: Bringing the number of terms up to three -- concept, > attribute and modifier -- reminds me of the qualifier/modifier idea > that was in previous versions of the draft, which I still think is an > unstable distinction, and which Roy and Sebastien thankfully managed > to get rid of by simplifying the syntax down to just concept plus > properties (but see below). ... but they haven't, they have just not told you about the difference. The words stat.max, em.optical, and phot.flux have distinct grammar rules in how they are used in 1.9.9 but you have no way to tell that. I.e., phot.flux can appear as he initial word or any subsidiary word but can never appear before stat.max or any other word of the class I would call attribute. em.optical can appear after words of the same class as phot.flux and possibly after words of the same class as itself. stat.max can appear anywhere in a UCD but it really should appear either before either a word of the class of phot.flux or a word of its own class. There are three kinds of words and we should just recognize that in the grammar. Also, there's no syntactic distinction > between modifiers and attributes, so in order to apply the extra > ordering rules for those, or even to break the UCD into its three > parts, you have to know which words are of which type. That is, you > can't do it at parse time. Sure you can. At least if the number of modifiers remains small. Note that table writers should have access to appropriate documentation when writing their tables (or writing the software that writes tables) so even if it gets more complex the writers have no problems and the readers don't care. See my response to Bob on this issue. I've suggested that all modifiers be put in the frame tree, though largely to address this issue. > > Section 4.1.2 (not an important point, I don't think): I'm puzzled at > the requirement that words in the non-standard namespace must be > distinct from all words in the IVOA namespace. The point of having a > namespace is to make this possible, or (since such duplication would > surely be condemned as bad practice) at least not an error. The rule > also means that if a new word were added to the IVOA namespace which > happened to match a word in a private namespace, the namespaced UCDs > would thereby suddenly become invalid, with no change in the spec. > This idea is copied from the previous proposal. I think the idea is that we don't want proliferation of new uncontrolled UCDs. I put this in a separate section, but I believe the content is the same as the previous proposal. I leave it to others to decide which is right. > Section 4.2.2: The `intent' modifier has no corresponding notion in > the 1.9.9 proposals, but it's not clear to me where in those proposals > this would fit in, and I think this is a _problem_ for the 1.9.9 > proposals. I can see how it would fit in to what I take the > underlying 1.9.9 model to be, but not into the serialisation of that > model that the 1.9.9 syntax represents. I can see three approaches to > this problem within the general framework of the 1.9.9 proposals. (i) > Rule it out of scope: it's not UCD's problem to talk about what values > are intended to be, since they're only for data discovery, and are not > required to be capable of driving analysis, so that if this `intent' > distinction matters to you, you're going to have to understand the utype > somehow. That's not acceptable for observation tables. We frequently have multiple columns in a table which differ only in intent (proposed and actual exposure times), predicated and actual times of events, predicated and actual fluxes and we need to know which to use for various purposes. Spectral fitting will be sadly served if we can't put distinguish the calculated and actual spectra. What happens when we want to compare simulated and real data? (ii) Add modifiers like this to the 1.9.9 model and syntax: > that's potentially quite a lot of work, since it would require > thinking very clearly about just what the distinction is between > modifiers and properties, _and_ working out a usable syntax for adding > them in -- they _have_ to be distinguishable at parse time. I'm happy to trade intent for frame.human and put all the modifiers in frame. (iii) > Think about it more and discover a way they can be viewed as > properties in a principled way. The point isn't just about this > `intent' modifier: if we can convince ourselves that there are things > like `intent' (and that they're in scope) which are in principle > qualitatively distinct from properties (and I would at least dispute > that `em' and `frame' count here), then that has to be dealt with. > Perhaps this example will help us find the stable distinction between > `qualifiers' and `modifiers' that escaped us in earlier versions. Personally I take a modifier as something that limits the context of a concept. > > Section 4.2.3: The `value', `vector', `instance' and `multiplet' > attributes seem overly complicated. The `value' attribute is not > required in the 1.9.9 proposals because all properties have a value, > namely the value they're being used to annotate. The word value is the price I pay for making sure attributes, concepts are distinct. Personally I think it's worth it. The other three seem > artefacts of the `complex UCDs' which Tom is introducing in these > proposals. Vector is not... It's simply to warn the user that the column has is a vector. While VOTables have a array attribute that does this, I don't want to tie this proposal to VOTables... More on that below. .These complex UCDs seem problematic to me because they > seem tightly bound to VOTable. That destroys the orthogonality of the > UCD and VOTable specs (the W3C has had _terrible_ trouble with > non-orthogonal specs, tying itself in knots trying to resolve their > dependencies on each other), and makes it harder to use UCDs in other > contexts, such as queries. I feel that UCDs should be seen as > annotating a `thing', whether that `thing' be a value, a column, a > group, or a query `phrase', and it should be the responsibility of > whatever defines the syntax of that annotation (that is, VOTable or > SIA) to define precisely what the thing is that the annotation applies > to. Thus, VOTable might say that when a UCD appears in a then > it indicates a set of relationships between the corresponding entries > of the table; when it appears in a it means something > different; and so on. Dealing with the typing and complexity issues > of this in a general way within the UCD spec would surely make it > impossibly unwieldy and limit its scope. This is also a general worry > for all of Tom's Section 5; I really think this should be out of scope > for UCD, to the extent that Tom's ``The grouping does not describe the > semantics of the relationship. That is the role of UCDs'' would be > much better as ``The grouping describes (some of?) the semantics of > the relationship. That is not the role of UCDs''. This is a can of > worms. I think this is completely wrong. The grouping proposal has no special relationship to VOTables other than that they happen to support it. [Or they may soon!] Any other structure that supports groupings of tables would do just as well. This is a fairly natural attribute of object relational as well as hierarchical databases. It just that VOTables have finally decided to enable the natural abilities that XML's hierarchical structure supports. > > Section 4.2.3 (local): I agree this is a gap in the 1.9.9 proposals. > Another way of dealing with it would be to say that a UCD > `local.X' meant exactly the same as the `X', but was not > comparable with it. > That's essentially what my proposal mod the order difference. > > > > > More general points: > > Tom's document seems to discuss his proposals in object terms. > However the property-concept parts of the UCD proposal are _not_ an > object model, and if you cram them into an object model, they won't > fit, and the result will inevitably look like a mess, and look > backwards. The model is simpler than this, however: things which are > purely concepts (such as `src') don't have values. Concepts do have > properties though, and these properties have numeric values, namely > the numeric values we're trying to annotate with this UCD. Sounds like objects and attributes to me... What's the difference here? But the old proposal doesn't agree anyway! Is phot.flux a concept? Seems like it to me. But it's a property in 1.9.9. Sometimes... In the UCD phot.flux. But it's sort of a concept in stat.err;phot.flux Or is it a property there? I'm not sure and there is no way to tell! By specifying a value attribute I've cleared away this confusion. > > As regards ordering, yes, as Tom said, it doesn't fundamentally > matter, and it's just a matter of syntax, rather than of the model. > However having the property first seems natural, since it's this > which posesses the numerical value which is being annotated, and > so it's this which I would have thought it best would be shown > up-front. This is not critical, but since I believe the model is analogous to the object/attribute relationship using the same order that has conventionally been used there is helpful. > > Now, there is a _vague_ object model implicit in the construction of > the UCD words like `pos.eq.ra', but this is only because, along with > the replacement of underscores with dots, came the explicit freedom to > crop each word at a dot from the right, and use the result as a UCD > word also. This prompts a natural perception of the words as > hierarchical, or object-oriented if you must. Well I don't have to but I sure would like to! .. The actual words are > basically little changed from the original UCDs, though there's a > review of these under way. These words weren't the main point of the > UCD2 proposals. > > At present these words are those mined from the column names actually > occurring in the databases in the CDS collection; they are thus > unprincipled. Whether this is a good or a bad thing is an open question. > I'm sure it is this which causes some people (I'm thinking of Gerard > Lemson and Pat Dowler) to gasp and, in their poster, pick out pos_eq_ra > for special deprecation as incoherent. If you believe that principled > generation of UCD words would be a Good Thing (and that would probably > be my prejudice), then I suspect that paths in (say) Gerard and Pat's > model would be a good way to do it (do Gerard and Pat claim that every > UCD word is thus expressible?). If you believe, on the other hand, that > the mined nature of the words is of primary importance (and I can see > the force of that, too), then they might need little more than a review > or tidy-up, to make sure that the `croppability' is reasonable in fact, > and that the implications, or suggestions, of the words chosen do in > fact fit in with a properties-based model (or whatever we end up with). > > As in 1.9.9 I didn't build a complete list but I agree that most words will transfer between the two proposals. From roy at cacr.caltech.edu Wed Oct 22 13:32:32 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Wed, 22 Oct 2003 13:32:32 -0700 Subject: UCD changes on top of McGlynn's changes References: <3F96CE02.8020804@gsfc.nasa.gov> Message-ID: <01c501c398db$9ef8ef80$6b91d783@cacr.caltech.edu> All: >From now on, please do not cross-post UCD material to dm and dal. Please send UCD-related posting to ucd at ivoa.net only. Those of you on the other lists, please join the ucd list to continue reading UCD material. Thank You Roy -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From tam at lheapop.gsfc.nasa.gov Wed Oct 22 13:40:59 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 22 Oct 2003 16:40:59 -0400 Subject: UCD changes on top of McGlynn's changes In-Reply-To: <3F96CE02.8020804@gsfc.nasa.gov> References: <3F96CE02.8020804@gsfc.nasa.gov> Message-ID: <3F96EB5B.8000001@lheapop.gsfc.nasa.gov> Hi Ed, Most of this ounds reasonable to me... Some short (just to show I can do it) comments. Tom Ed Shaya wrote: ... > I think this is pretty good but it is missing something. The modifiers > are in effect extending the hierarchicial tree of basic concepts into > the virtual tree of full concepts. This is not mentioned and probably > should be. But, if it is true then sometimes the order of the modifiers > should make a difference. I don't have an example but I am worried > that there will be concepts that require A;B;C and other concepts that > are A;C;B but they are not the same. I don't see how to ensure that > that never happens. I just haven't seen where it happens... So I'm crossing my fingers! > -------------------------------------- > Why is there a meas.error and a stat.erro, and one is a concept and one > is an attribute? > Perhaps this was suppose to be a stat:max? > > 1 x1:experimental.quantity;x2:new.modifier;stat.error > Probably just missed it when I introduced the meas tree. > > Why not have a different symbol to separate the attributes from the base > and modifiers? > pos.eq;phys.electron#value;vector > pos.eq;phys.electron#stat.error;vector Bob and you both suggested that. Sounds good to me. I'd probably have written the first as just pos.eq;phys.electron#vector > --------------------------- > > Why not allow a namespaced term to reuse existing term? That is what > namespace is for! Talk to Ray and Sebastien. I think they feel that it's best for UCDs to be highly controlled so that namespaces are only used for experimental terms before they are introduced to the standard namespace. Both approaches seem reasonable to me. > ------------------------------------------------------------ > I don't buy the idea that the main pos is always in the least indented > or grouped column. This is a extremely fragile and restrictive way to > go. There are many ways that the targets are in a more groups than the > plate positions or the guide stars. I don't want to suggest that one cannot build tables that would break the proposal. But I was unable to find a circumstance where I couldn't use structure to make the main elements clear. I.e., I'm not trying to cater to every reasonable structure for tables, but trying to see if the proposal allows sufficient flexibility for tables to express mainness assuming they are written by friends. I tried to allude to that with the sense that we'll need templates for how to use structures. What if the target stars have > position groups > and so one wants a second grouping of > > > > ra, dec > > > l,b > > > or what if the grouping of stars is by cluster or by spectral type or > by accuracy of measurements etc? In these examples it would be the responsibility of the table writer to think about how the table could be written to express what readers need to know. If we're allowed the full flexibility of the VOTable grouping structures, then we might have the same fields referenced more than once: near the root so that they are seen to be main, and within some more nested structures. This would use the reference capability. However I'd be interested if we have any actual instances of tables where this is needed or whether it's something that's possible but not very likely. > That is not to say that I like the idea of a main UCD. > Rather, the best way is to ensure that the structural container of the > data has a way of refering the properties to the objects that has these > properties. A quanitty needs to have a isPropertyOf attribute that > refers to the object. So, a positional property column should have > isPropertyOf="column(starName)". The default could be > isPropertyOf=column(1). With references to virtual columns I suspect this is equivalent to what I suggest, but it might be cleaner. > --------------------------------------------------------- > Finally. I find it curious that this system makes no descrimination > between > properties of objects (color, brightness, distance, size) and objects > (electrons, > atoms, planets, stars, galaxies). Every time one uses a UCD-property there > must be implicitly a UCD-object that has been left off. A brightness is > always of a star or a planet or a human. A query system must then be > able to > infer this from other metadata in the dataset. Therefore one needs to > ensure that every data set has somewhere atleast one UCD-object. This > will be hard to do if they are > not somehow separated out I just wanted to point that out. > It may or may not be a fatal flaw. > I guess this is what I'm typically indicting as the table UCD. E.g., the table UCD is the source, but the column UCDs are the properties of the sources. From roy at cacr.caltech.edu Wed Oct 22 18:02:10 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Wed, 22 Oct 2003 18:02:10 -0700 Subject: is UCD out of control? Message-ID: <05e001c39901$4a189d50$6501a8c0@Ropy> I must say that I find Tom's version of the UCD paper has a number of definite improvements, such as the importance of Groups, with the child inheriting UCD from its parent. However, I find the suggested syntax confusing and muddying. It seems to be going back to the old model of "base + other stuff" that we discussed in Cambridge. What I do not understand is how a machine would parse the other stuff, these modifiers and attribute properties and so on. I do not understand which is a modifier and which is an attribute. The reason we went with the new scheme was that we couldn't imagine writing code to disentangle the Cambridge scheme. In the 1.9.9 document, the first word of the UCD corresponds to the thing that has the units. In "stat.variance; phys.length", we know that the unit is L*L (its a variance). The second word was the concept to which this relates. Everything in UCD2 should be of the form "The of the ". Forget the attempts to justify three words. Leave that for UCD3. Every UCD has at most two words. Keep It Simple! In the 1.9.9 document, we tried to keep as close as we can to the metadata mines -- the 3000 tables of Vizier from which all this comes. We thought that had more validity than somebody (anybody) sitting down and inventing structure. Look at the problems we get when we move away from mining real metadata: Tom thinks that "error" belongs in a tree called "measurement", and the earlier version put it in a tree called "statistics". There is no right or wrong here, just opinion. I pointed this out in the earlier document concerning the "equinox" concept, but that has been deleted. We must make every attempt to follow what 3000 published paper have done -- not push our own opinions. In Tom's paper, there seem to be lots of new attributes (value, vector, multiplet, local, human, soft) that further stretch the scope UCD. If there are multiple values in a table cell, then the VOTable will indicate this in other ways. Perhaps Tom can put in a few more attibutes so we can find out if the data quantitiy is a float or an integer? UCD is about *semantic type*, not all this other stuff. What *real* tables use the "human" section? Are humans base, attribute, or modifier? I think we can all agree that UCD as currently formulated cannot express the complexity inherent in its task. What is really needed is a well-thought RDF vocabulary of predicates and objects, and that is the idea of UCD3. The intention of UCD2 is to provide a stopgap that will be backward compatible when UCD3 arrives. We use only one predicate for now "propertyOf". But Tom has chosen to remove all the discussion of why and what we are doing, where we are going, and driven instead down a road that tries to put a lot of complexity into this string representation. The result is something terribly complicated and not very understandable. Of course the proof is in the pudding. As usual in the VO, we are making a language that is very expressive, then hope to eventually write the code that understands it. So let's think it through now. How do I construct code that "understands" something like "phot.flux; em.optical; intent.calculated; value". I want to know what kind of data structure can be created from this, I want to know how to compare UCDs, I want to know how to convert a UCD into a human-readable description of what it represents. I know how to do these things with the 2-word property/concept style, but not with this grab-bag of attributes and modifiers. In conclusion is my IF ... ELSE clause: IF { we cannot find a killer app for UCD2, if we cannot write code to understand them, we should stick with UCD1, that has been improved and groomed in the last months. Then next year we can make UCD3. } ELSE { I like simplicity. I want to turn every table cell into " of the " so that every UCD2 would have at most two words. } -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 From cgp at star.le.ac.uk Thu Oct 23 01:40:37 2003 From: cgp at star.le.ac.uk (Clive Page) Date: Thu, 23 Oct 2003 09:40:37 +0100 (BST) Subject: A suggested revision for UCDs In-Reply-To: <3F9588DF.3030805@lheapop.gsfc.nasa.gov> Message-ID: On Tue, 21 Oct 2003, Thomas McGlynn wrote: > A few minutes ago I uploaded a version of my suggested revised > proposal for UCDs to the Twiki. This is just a Word version since > I don't have a PDF generator handy. An off-topic note to Tom (and others with the same problem in producing PDFs): Many of us on Unix/Linux systems find Word documents inconvenient; I thought OpenOffice.org might cope with your file, but it produces illegible results without a lot of playing with fonts, and even then was very poor. Not long ago, however, I came across an on-line service for converting Word documents to PDF (or HTML) which seems quite useful: http://www.gobcl.com/ You upload your document and they email you the PDF a couple of minutes later. I haven't received any obvious spam as a result, and don't know what their privacy policy is, but for documents to be published, this doesn't seem important. I don't know why they provide this service free of charge, but I've found it useful. Actually I got an error message when I submitted Tom's recent UCD document, but the PDF came back anyway, and seems ok, still in glorious technicolor. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 From jcm at head-cfa.cfa.harvard.edu Thu Oct 23 04:58:57 2003 From: jcm at head-cfa.cfa.harvard.edu (Jonathan McDowell) Date: Thu, 23 Oct 2003 07:58:57 -0400 (EDT) Subject: A suggested revision for UCDs Message-ID: <200310231158.h9NBwv9a025610@urania.cfa.harvard.edu> > Many of us on Unix/Linux systems find Word documents inconvenient I strongly second Clive's remarks; thanks for putting the 1.9.9b PDF version up, there's now a chance I'll read it before this morning's NVO telecon. We're trying for interoperability here, and Word isn't that. Jonathan From tam at lheapop.gsfc.nasa.gov Thu Oct 23 05:58:20 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Thu, 23 Oct 2003 08:58:20 -0400 Subject: A suggested revision for UCDs In-Reply-To: <200310231158.h9NBwv9a025610@urania.cfa.harvard.edu> References: <200310231158.h9NBwv9a025610@urania.cfa.harvard.edu> Message-ID: <3F97D06C.8000105@lheapop.gsfc.nasa.gov> Jonathan McDowell wrote: >>Many of us on Unix/Linux systems find Word documents inconvenient > > > I strongly second Clive's remarks; thanks for putting the 1.9.9b > PDF version up, there's now a chance I'll read it before this morning's > NVO telecon. We're trying for interoperability here, and Word isn't that. > Jonathan > > It looks like Marco Leoni posted a PDF version shortly after I uploaded the original... The URL is http://www.ivoa.net/internal/IVOA/IvoaUCD/UCD-1.9.9b.pdf So it's there! Cheers, Tom From jcm at head-cfa.cfa.harvard.edu Thu Oct 23 06:54:18 2003 From: jcm at head-cfa.cfa.harvard.edu (Jonathan McDowell) Date: Thu, 23 Oct 2003 09:54:18 -0400 (EDT) Subject: A suggested revision for UCDs Message-ID: <200310231354.h9NDsIii025671@urania.cfa.harvard.edu> Tom, Now that I've read the document and absorbed at least some of the emails, I must say that I like your proposal a lot. Although there are certainly details to be cleaned up, and I agree with most of Norman's and some of Bob's comments, it feels to me a more solid basis for a UCD2. My biggest beef (which goes directly against Norman's prejudice that UCDs are not an object model!) is that I don't like the distinction between "value" and "instance", especially given that array-valued "values" like "spectrum.value" are mentioned by you. I think your instance is just the value of a higher level term, and I would like to replace "pos.instance" with "pos.value", and then immediately drop the optional ".value" and just say "pos". Why is that a bad idea? Thanks for doing this work! Jonathan From arots at head-cfa.cfa.harvard.edu Thu Oct 23 06:26:21 2003 From: arots at head-cfa.cfa.harvard.edu (Arnold Rots) Date: Thu, 23 Oct 2003 09:26:21 -0400 (EDT) Subject: A suggested revision for UCDs In-Reply-To: Message-ID: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> It's a simple choice: those who shelled out the money for a Word license can either try to force the rest of the community to do the same by sending around Word documents, or spend themselves a little more by buying an Acrobat license and thus allow everybody to spend his/her money as (s)he sees fit. Personally, I think Word should always be bundled with Acrobat, at least in our community. - Arnold Clive Page wrote: > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > A few minutes ago I uploaded a version of my suggested revised > > proposal for UCDs to the Twiki. This is just a Word version since > > I don't have a PDF generator handy. > > An off-topic note to Tom (and others with the same problem in producing > PDFs): > > Many of us on Unix/Linux systems find Word documents inconvenient; I > ... > > -- > Clive Page > Dept of Physics & Astronomy, > University of Leicester, Tel +44 116 252 3551 > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head-cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From ael at star.le.ac.uk Thu Oct 23 07:29:05 2003 From: ael at star.le.ac.uk (Tony Linde) Date: Thu, 23 Oct 2003 15:29:05 +0100 Subject: Another pdf generator In-Reply-To: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> Message-ID: <007801c39972$031c1780$6124d28f@gnowee> A PDF generator that I use a lot is at http://www.pdf995.com/ - it is a printer driver that, afaik, is windows only but works well. Cheers, Tony. > -----Original Message----- > From: owner-ucd at eso.org [mailto:owner-ucd at eso.org] On Behalf > Of Arnold Rots > Sent: 23 October 2003 14:26 > To: Clive Page > Cc: ucd at ivoa.net > Subject: Re: A suggested revision for UCDs > > > It's a simple choice: those who shelled out the money for a > Word license can either try to force the rest of the > community to do the same by sending around Word documents, or > spend themselves a little more by buying an Acrobat license > and thus allow everybody to spend his/her money as (s)he sees > fit. Personally, I think Word should always be bundled with > Acrobat, at least in our community. > > - Arnold > > Clive Page wrote: > > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > > > A few minutes ago I uploaded a version of my suggested revised > > > proposal for UCDs to the Twiki. This is just a Word > version since I > > > don't have a PDF generator handy. > > > > An off-topic note to Tom (and others with the same problem in > > producing > > PDFs): > > > > Many of us on Unix/Linux systems find Word documents > inconvenient; I > > ... > > > > -- > > Clive Page > > Dept of Physics & Astronomy, > > University of Leicester, Tel +44 116 252 3551 > > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > > > -------------------------------------------------------------- > ------------ > Arnold H. Rots Chandra X-ray > Science Center > Smithsonian Astrophysical Observatory tel: +1 > 617 496 7701 > 60 Garden Street, MS 67 fax: +1 > 617 495 7356 > Cambridge, MA 02138 > arots at head-cfa.harvard.edu > USA > http://hea-www.harvard.edu/~arots/ > > -------------------------------------------------------------- > ------------ > From mchill at dial.pipex.com Thu Oct 23 07:11:26 2003 From: mchill at dial.pipex.com (martin hill) Date: Thu, 23 Oct 2003 15:11:26 +0100 Subject: Off topic document formats... In-Reply-To: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> References: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> Message-ID: <1066918286.3f97e18ebc274@netmail.pipex.net> Er, so everyone is forced to send around Acrobat documents? And buy Acrobat licences instead of Word ones? Isn't there a Word converter for Open Office? Even much more better still, how about posting as hmtl (if there are no diagrams)? Everyone can read it and almost everything has a converter to it. Or, since it's an AVO standard, I'm sure we could find some way of using VOTable... Cheers, Martin from the "Standards are evil" dept Quoting Arnold Rots : > It's a simple choice: those who shelled out the money for a Word > license can either try to force the rest of the community to do the > same by sending around Word documents, or spend themselves a little > more by buying an Acrobat license and thus allow everybody to spend > his/her money as (s)he sees fit. > Personally, I think Word should always be bundled with Acrobat, at > least in our community. > > - Arnold > > Clive Page wrote: > > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > > > A few minutes ago I uploaded a version of my suggested revised > > > proposal for UCDs to the Twiki. This is just a Word version since > > > I don't have a PDF generator handy. > > > > An off-topic note to Tom (and others with the same problem in producing > > PDFs): > > > > Many of us on Unix/Linux systems find Word documents inconvenient; I > > ... > > > > -- > > Clive Page > > Dept of Physics & Astronomy, > > University of Leicester, Tel +44 116 252 3551 > > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > > > -------------------------------------------------------------------------- > Arnold H. Rots Chandra X-ray Science Center > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > 60 Garden Street, MS 67 fax: +1 617 495 7356 > Cambridge, MA 02138 arots at head-cfa.harvard.edu > USA http://hea-www.harvard.edu/~arots/ > -------------------------------------------------------------------------- > > -- Martin Hill 07901 55 24 66 www.mchill.net From tam at lheapop.gsfc.nasa.gov Thu Oct 23 06:33:03 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Thu, 23 Oct 2003 09:33:03 -0400 Subject: is UCD out of control? In-Reply-To: <05e001c39901$4a189d50$6501a8c0@Ropy> References: <05e001c39901$4a189d50$6501a8c0@Ropy> Message-ID: <3F97D88F.9030501@lheapop.gsfc.nasa.gov> Hi Roy, I guess I find the current discussion stimulating and not a sign of a problem. One concern on my part.... My guess is that we only get to make one change to UCDs where the revision is incompatible with the previous set. I don't think the community will want to change their software twice while we're getting our act together. Even Microsoft can't get away with that! Absent your killer app, I'm not sure that a revision to UCDs is on the critical path for the VO so we should take the time to do it right and live with the current standard as long as needed. Personally, I think that UCDTrees or something like them (along with XQuery style queries on them) can be a link between abstract data models and concrete data representations. Tools that search for data matching a given data model could be that killer app. We'll see if this pans out when looked at in detail. Talk to you later. Tom From jcm at head-cfa.cfa.harvard.edu Thu Oct 23 08:03:40 2003 From: jcm at head-cfa.cfa.harvard.edu (Jonathan McDowell) Date: Thu, 23 Oct 2003 11:03:40 -0400 (EDT) Subject: Off topic document formats... Message-ID: <200310231503.h9NF3e82025726@urania.cfa.harvard.edu> > so everyone is forced to send around Acrobat documents? And buy Acrobat licences instead of Word ones? Well, ps2pdf and /usr/bin/acroread are free... I'm happy if you make it postscript or ascii, but PDF seems a reasonable compromise that's ok for people in both the Windows and Unix worlds. OpenOffice is still - at least for me - a continual struggle with not-quite-adequate compatibility, slow printing, yucky interface, and of course fails whenever MS updates Word. And I worry about .doc documents on the twiki being reliably readable in the future. > how about posting as hmtl (sic) Fine by me, although it's inconvenient to print a long document all at once. > I'm sure we could find some way of using VOTable Y'know, I was tempted to make this suggestion before, but manfully resisted... :-) Indeed the IVOA standard for documents, so I understand, is html. Perhaps this should be discussed more widely (in the standards WG, or whatever it's called?) In the meantime, perhaps I'll just post latex DVI files.... Jonathan From roy at cacr.caltech.edu Thu Oct 23 07:58:51 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Thu, 23 Oct 2003 07:58:51 -0700 Subject: Off topic document formats... References: <200310231326.h9NDQLPR001855@xebec.cfa.harvard.edu> <1066918286.3f97e18ebc274@netmail.pipex.net> Message-ID: <074601c39976$2cb69620$6501a8c0@Ropy> > Even much more better still, how about posting as hmtl (if there are no > diagrams)? Everyone can read it and almost everything has a converter to it. I think this is best, when it converges, I will have the UCD document converted to the clean HTML -- I mean the stuff that can be edited by a human! Roy From dtody at nrao.edu Thu Oct 23 07:53:38 2003 From: dtody at nrao.edu (Doug Tody) Date: Thu, 23 Oct 2003 08:53:38 -0600 (MDT) Subject: A suggested revision for UCDs In-Reply-To: Message-ID: Ok, we are off topic but... A simple solution is to run vmware (or something similar) on Linux, with Windows in the vm, using samba to share the unix file system. This provides the best of both worlds. Doug On Thu, 23 Oct 2003, Clive Page wrote: > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > A few minutes ago I uploaded a version of my suggested revised > > proposal for UCDs to the Twiki. This is just a Word version since > > I don't have a PDF generator handy. > > An off-topic note to Tom (and others with the same problem in producing > PDFs): > > Many of us on Unix/Linux systems find Word documents inconvenient; I > thought OpenOffice.org might cope with your file, but it produces > illegible results without a lot of playing with fonts, and even then was > very poor. > From arots at head-cfa.cfa.harvard.edu Thu Oct 23 07:17:43 2003 From: arots at head-cfa.cfa.harvard.edu (Arnold Rots) Date: Thu, 23 Oct 2003 10:17:43 -0400 (EDT) Subject: Off topic document formats... In-Reply-To: <1066918286.3f97e18ebc274@netmail.pipex.net> Message-ID: <200310231417.h9NEHhaS001976@xebec.cfa.harvard.edu> martin hill wrote: > Er, so everyone is forced to send around Acrobat documents? And buy Acrobat > licences instead of Word ones? Only those who pay for Word licenses in the first place. Acrobat reader is free. > > Isn't there a Word converter for Open Office? Doesn't always work well - see Clive's roiginal post. > > Even much more better still, how about posting as hmtl (if there are no > diagrams)? Everyone can read it and almost everything has a converter to it. Fine with me, though PDF is quite a reasonable standard for documents. > Or, since it's an AVO standard, I'm sure we could find some way of using VOTable... > > Cheers, > > Martin > from the "Standards are evil" dept > > > > Quoting Arnold Rots : > > > It's a simple choice: those who shelled out the money for a Word > > license can either try to force the rest of the community to do the > > same by sending around Word documents, or spend themselves a little > > more by buying an Acrobat license and thus allow everybody to spend > > his/her money as (s)he sees fit. > > Personally, I think Word should always be bundled with Acrobat, at > > least in our community. > > > > - Arnold > > > > Clive Page wrote: > > > On Tue, 21 Oct 2003, Thomas McGlynn wrote: > > > > > > > A few minutes ago I uploaded a version of my suggested revised > > > > proposal for UCDs to the Twiki. This is just a Word version since > > > > I don't have a PDF generator handy. > > > > > > An off-topic note to Tom (and others with the same problem in producing > > > PDFs): > > > > > > Many of us on Unix/Linux systems find Word documents inconvenient; I > > > ... > > > > > > -- > > > Clive Page > > > Dept of Physics & Astronomy, > > > University of Leicester, Tel +44 116 252 3551 > > > Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 > > > > > -------------------------------------------------------------------------- > > Arnold H. Rots Chandra X-ray Science Center > > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > > 60 Garden Street, MS 67 fax: +1 617 495 7356 > > Cambridge, MA 02138 arots at head-cfa.harvard.edu > > USA http://hea-www.harvard.edu/~arots/ > > -------------------------------------------------------------------------- > > > > > > > -- > Martin Hill > 07901 55 24 66 > www.mchill.net > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head-cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From dide at discovery.saclay.cea.fr Thu Oct 23 10:26:42 2003 From: dide at discovery.saclay.cea.fr (DIDELON Pierre) Date: Thu, 23 Oct 2003 19:26:42 +0200 (MEST) Subject: A suggested revision for UCDs Message-ID: <200310231726.h9NHQgH11982@rosetta.saclay.cea.fr> Hi Tom, Big and impressive work. I have read and try to absorb the document and most of the following mails. I try to digest all. It is not so obvious for a frenchy like me and I hope that my comments below are appropriate. I apologize for any misunderstanding or confusion, by jumping in the discussion, but a really want to make a few points. If a clearly understand the concern of Pedro Osuna, the actual discussion illustrate what he stressed in strasbourg and his further mail. UCD1 and a data model for context evaluation is sufficient, because DM is not available some structure is needed to be introduced and so UCD2 came. But unless we put DM in UCD (eventually using VOTable structure) it will be incomplete. Even UCDTree (which I really like) needed additional structure/template or external reference as stressed in the document 1.9.9b: p16 last paragraph, p18 last paragraph. I like several aspect of this proposal, and I agree of some of the comments of Norman, Ed and Jonathan, but as it is very tedious for me to write in english, I will not do an extensive comment list of what is hapenning in the discussion but go directly to the pb I had. Instead of specific words or context meaning to distinguish between 3 kind of thing; concept, modifier property and attribute property I would be more comfortable with a syntax distinction and a specific separator (i.e. #). The complexity of the three trees with specific words seems very strange and is confusing me. My main concern is related to Attributes. - I did not understand the meaning of local. You can always correlated a property with an identical or similar property, it depend of the purpose of your correlation, and without presuposition you cannot exclude apriori some intressting thing that can be extract from data. Taking your example: you can extract data where temperature jitter of the data acquisition system is above a certain threshold and try to correlate these with phot,flux;error or any kind of measurement error available. Perhaps not used very often, but not forbidden I hope? - It seems to me that there are some dangerous redundancy in the basic word of the attribute tree. For example value, vector, instance, multiplet are common data properties distinguish by their use or the context in which they appear. IMO there is no differences between vector and multiplet and each time you use multiplet you could replace it by vector. No? Like jonathan, if you put vectors in value, I did not see the need for instance. The relation of measurements, errors and all this kind of things with the precedding is even more complicated. A value (I mean a real scalar) can be a unique measurement value, a statistical property of a measurement serie (like mean, mode, error, std.dev., skewness...), a parameter and perhaps other meaning I am not able to think of now. Trying to be brief, (but unfortunatly uncomplete) I feel that the pb came from the fact that diff data property are mixed together (I believe that the same pb occurs in DM group). Mainly data structure, data meaning (could be extended to purpose) and data representation (without speaking of format and location). Data structure would include ; value (scalar), vector (avoiding multiplet) tree, and extended easly with matrix or composite structure. For me instance is A structure, but wich one? A free formatted structure, a VOTable tree formatted? It is not clear. data meaning is related to measurement, error and all staistical properties. data representation mixed both things; a measurement serie can be represented by a vector (even in a VOTable cell), or by one or several statistical prop. (mean, mean+std.dev.) structured in very diff way (mean and error in separate cols, mean+mode+median in one col ...) I feel that refurbishing is needed here to clarify UCD usage more than UCD existance. I agree with earlier comments that stressed that the big advantage of UCD1 is the fact that they are not due to apriori/re-invented structure but are the illustration of the existing data. - I did not see the filter needs. It seems to me that it try to catch a part of the data history, but it seems so restrive that it will be very soon unappropriate I bet. - concept seems only due to the uncompltness of the words available in the concept tree root. In your example (p18) a word correlation (which would certainly be needed for VO) would better match the needs. I stop here because it's late and I would become confused, if not yet done. Thanks for the food for thought, sincerely, Pierre ------------------------------------------------------------------------------- DIDELON e-mail : pdidelon_at_cea.fr CEA SACLAY - Service d'Astrophysique W3 : http://www-dapnia.cea.fr/Sap/ 91191 Gif-Sur-Yvette Cedex Phone : 33 (0)1 69 08 58 89 ------------------------------------------------------------------------------- From Edward.J.Shaya.1 at gsfc.nasa.gov Thu Oct 23 13:02:05 2003 From: Edward.J.Shaya.1 at gsfc.nasa.gov (Ed Shaya) Date: Thu, 23 Oct 2003 16:02:05 -0400 Subject: Use case: distance Message-ID: <3F9833BD.40604@gsfc.nasa.gov> An HTML attachment was scrubbed... URL: From tam at lheapop.gsfc.nasa.gov Wed Oct 29 10:38:28 2003 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 29 Oct 2003 13:38:28 -0500 Subject: Further thoughts on UCDs Message-ID: <3FA00924.3040502@lheapop.gsfc.nasa.gov> One of my problems in understanding how or whether we should change the UCD strings is the indefiniteness of the current UCD2 proposals which do not specify the full list of UCDs. I've just spent a few minutes looking at the PHOT hierarchy. Probably Sebastien and Roy have done this with far greater care, but I've gone through the old UCD tree and tried to see what UCDs are suggested by what's in the PHOT hierarchy, where I have explicitly left out any band information from the UCDs assuming that to be supplied by qualifiers in the em hierarchy. There are only about 25 distinct phot words here versus just under 500 in the original UCD1 tree. Most of this savings is at the cost of having an extensive em tree describing the bands, but I think that's helpful since we want to be able to combine fluxes from different bands. I think it makes the photometry tree much more accessible. One thought expressed here is that we should not distinguish between flux, magnitudes and counts in the UCDs, but rather do that with the units. I'm not sure if this is a good idea but it seems to me that they are more 'similar' to each other than each is to say a fluence or a color. There is a question as to whether we specify colors by providing special keywords for each color, or by using a pair of em qualifiers. In the latter case, then the order of the em qualifiers may be significant. If we go this route (and I think it makes more sense) then my proposal for well-formed UCDs would have to be modified to say something like: 'where the order of qualifiers is not significant, they should be given in alphabetical order'. phot phot.flux ? distinctions between counts, magnitudes, fluxes, etc should be carried in the ? units not the UCD. phot.flux.surfaceBrightness phot.fluence phot.flux.absolute phot.color (This is really a ratio, but we use color as the traditional name). phot.color.diff (A difference in colors, i.e., a kind of second derivative) phot.color.excess em.bolometric (Bolometric measurements are after all just specifying a very broad band) phot.atmosphere phot.atmosphere.airmass phot.atmosphere.extinction phot.class phot.extinction.galactic phot.extinction.internal phot.extinction.ism phot.extinction.total phot.flux.isophotal phot.flux.central phot.correction.k phot.flux.limit phot.flux.offset phot.profile math/arith.ratio (for the phot_sd/b-bright, phot_tot-Bright/b-bright UCDs) phot.system (is this different from phot.class?) phot.parameter (some parameter of the photometric system phot.zeropoint phot.spectrum (vector valued fields) phot.timeseries (vector valued fields) phot.image (vector valued fields) This was just a quick exercise and doubtless there are better specific choices, but the phot hierarchy is one area where I imagine that our software may be very UCD aware. If we really can simplify it to this extent, then I think this alone is a major impetus for going to UCD2. Tom