From andrea.preitemartinez at rm.iasf.cnr.it Tue Jan 3 01:20:08 2006 From: andrea.preitemartinez at rm.iasf.cnr.it (Andrea Preite Martinez) Date: Tue, 03 Jan 2006 10:20:08 +0100 Subject: UCD news... Message-ID: <20060103102008.y85gtgkm7jf4844g@webmail.sic.rm.cnr.it> Dear members of the WG, first of all, a very happy new year to you all! For a good start, I'm pleased to annunce you that the UCD1+ word-list is an IVOA Recommendation from Dec 31 2005. You can find version 1.11 of the document at http://www.ivoa.net/Documents/latest/UCDlist.html Now that we have eventually the first standard list of words, it's time to start wondering how to change it! We already had a discussion on a Note on UCD1+ maintenance I submitted to the WG in October. I revised the Note according to some (if not all) of your remarks, and transformed the Note into the Working Draft that you can find at http://www.ivoa.net/Documents/latest/UCDlistMaintenance.html . I?m ready to move the document to the state of PR and request comments from the community, but before doing this I?d like to be sure that the discussion in our WG is over. So, if you still have comments or amendments to the text of the Working Draft, please do it soon. A reasonable schedule could be to have an RFC in February and to submit the PR to the Exec before the InterOp in spring. Cheers, Andrea ============================================================================== Andrea Preite Martinez andrea.preitemartinez at rm.iasf.cnr.it Istituto di Astrofisica Spaziale Tel.:+39.06.4993.4641 Area di Ricerca di Tor Vergata Fax.:+39.06.2066.0188 Via del Fosso del Cavaliere 100 Cell:+39.339.3817355 00133 Roma CDS :+33.3.90242473 ============================================================================== From andrea.preitemartinez at rm.iasf.cnr.it Tue Jan 3 01:20:08 2006 From: andrea.preitemartinez at rm.iasf.cnr.it (Andrea Preite Martinez) Date: Tue, 03 Jan 2006 10:20:08 +0100 Subject: UCD news... Message-ID: <20060103102008.y85gtgkm7jf4844g@webmail.sic.rm.cnr.it> Dear members of the WG, first of all, a very happy new year to you all! For a good start, I'm pleased to annunce you that the UCD1+ word-list is an IVOA Recommendation from Dec 31 2005. You can find version 1.11 of the document at http://www.ivoa.net/Documents/latest/UCDlist.html Now that we have eventually the first standard list of words, it's time to start wondering how to change it! We already had a discussion on a Note on UCD1+ maintenance I submitted to the WG in October. I revised the Note according to some (if not all) of your remarks, and transformed the Note into the Working Draft that you can find at http://www.ivoa.net/Documents/latest/UCDlistMaintenance.html . I?m ready to move the document to the state of PR and request comments from the community, but before doing this I?d like to be sure that the discussion in our WG is over. So, if you still have comments or amendments to the text of the Working Draft, please do it soon. A reasonable schedule could be to have an RFC in February and to submit the PR to the Exec before the InterOp in spring. Cheers, Andrea ============================================================================== Andrea Preite Martinez andrea.preitemartinez at rm.iasf.cnr.it Istituto di Astrofisica Spaziale Tel.:+39.06.4993.4641 Area di Ricerca di Tor Vergata Fax.:+39.06.2066.0188 Via del Fosso del Cavaliere 100 Cell:+39.339.3817355 00133 Roma CDS :+33.3.90242473 ============================================================================== From tam at lheapop.gsfc.nasa.gov Wed Jan 25 10:55:34 2006 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 25 Jan 2006 13:55:34 -0500 Subject: Fitting HEASARC tables into the UCD framework. Message-ID: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> During the past few weeks, we have been trying to see how well the HEASARC's tables fit into the UCD framework as currently defined. We've also looked at how well the current Web tools provided by the CDS do in determining the UCD automatically from our existing metadata. A report summarizing our experiences is attached. The three line summary is that the basic framework seemed to be work well, but that there seemed to be some holes in the coverage of concepts related to the process of proposing, taking, analyzing and archiving observations. We'd be very interesting in comments on this. Assuming we don't learn that we've overlooked a whole slew of UCDs, we'll be going ahead and proposing some new words. (And if we have missed a bunch then we'll be trying to see how we might improve the documentation!). Eventually we anticipate providing full UCDs for all 400 HEASARC tables including the ~200 or so that are not duplicated in Vizier. Regards, Tom McGlynn and Michael Preciado -------------- next part -------------- A non-text attachment was scrubbed... Name: UCD summary.doc Type: application/msword Size: 48128 bytes Desc: not available URL: From andrea.preitemartinez at rm.iasf.cnr.it Fri Jan 27 03:10:52 2006 From: andrea.preitemartinez at rm.iasf.cnr.it (Andrea Preite Martinez) Date: Fri, 27 Jan 2006 12:10:52 +0100 Subject: Fitting HEASARC tables into the UCD framework. In-Reply-To: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> References: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> Message-ID: <20060127121052.8wuuc6fb5g0ckk8w@webmail.sic.rm.cnr.it> Tom, Michael, first of all thank you for your report on the assignation of UCDs to HEASARC's tables. It is very useful, for me (author of the scripts behind the ucd-builder) and also for other potential users. If you don't mind, I'll post your file on the twiki page of the IVOA UCD working group (http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD ) as an example of application of UCD-tools. I was thinking of organizing a session at next IVOA InterOp meeting in May on applications of ucd-tools to real cases and feedback from users. I hope you'll be there to present your work. Comments: First of all, let me say that the public tools provided in the CDS page were not ment for massive use, like yours. Indeed you soon realised that the builder is only an interactive tool, and can only be (effectively) used on a one-by-one basis. At CDS I am confronted with a task similar to yours, but the situation is worst: about 40.000 tables and 150.000 columns to assign UCDs to (just to update the old ucd1 to the new ucdi+), and then a steady work of about 1.000 columns to assign per month. In order to do the job in a reasonable time, I built around the basic scripts (find-ucd-word: fw, and build-ucd-from-words: b-ucd, both used by the public builder) a command-line assignator that accepts as input a file describing each table (a suitable version of the VIZIER read-me file). Now, in about 100s I can assign UCDs to more than 40.000 columns. But the real problem is not only time (because you have in any case the control-time to consider!). The assign tool can use more information to assign the ucd, based on column-name (most of them are standard names, more explicit than user descriptions!!) or units. A short description of the tool follows at the end. The tools are continuously upgraded and improved (see the builder page at http://vizier.u-strasbg.fr/UCD/cgi-bin/descr2ucd, with the updated date), with the feedback of my work on Vizier tables and looking at the log of the builder on the CDS page. Thanks to your work I'll have an additional feedback to work on!! ===================================================================== > assign1p -h assign1p = assign ucd1p-words from list of [key]words and build UCD1+ from word(s). If present, a suggested old UCD1 is default-translated. USE: assign1p [options] [<] input-file.tsv tab-separated input-file fields: 0: Cat/Tab (Table name) 1: Data Type (I,F,A) 2: Col_Name (Title of the column) 3: Col_units (no units= ---) 4: Col_description (Free text) 5: UCD1/ucd1+ || notes tab-separated output fields: [nn]0,1,2,ucd,3,4 Options: -h[elp] : this help -d : print revised description used by FindWord after the application of syntax/semantic rules (def=print input descr.) -l : list all words with matching score > 5 (def=only top P/QECV/S scores) -k : print elements of association-tables containing input description [key]words (def=no) -r : do not apply syntax/semantic rules to list of keywords (def=yes) -s : print some statistics at the end (def=no) -t : find also the traditional UCD. Generates a duplicate output line with old UCD (def=no) -u : do not force use of suggested ucd (def=yes) -v : verbose output, sets also -l -u (def=no) -n : prefix output with record-number (def=no) -nn : start at record 'nn' in input-file (def=0) Required: readtab.pl, f-word.pl, bucd.pl Files: U1Pdescr_w, U1-U1P.defaults The result of the assign1p procedure is flagged in field 1, as (Type,flag), where flag can be: nn = best score shown (if only description is used), _ = ColumnName used, % = Units used, _% = CN+U used, f = used suggested ucd, f1 = used suggested ucd1, ? = unable to assign, ! = forced note. ===================================================================== Regards, Andrea ============================================================================== Andrea Preite Martinez andrea.preitemartinez at rm.iasf.cnr.it Istituto di Astrofisica Spaziale Tel.:+39.06.4993.4641 Area di Ricerca di Tor Vergata Fax.:+39.06.2066.0188 Via del Fosso del Cavaliere 100 Cell:+39.339.3817355 00133 Roma CDS :+33.3.90242473 ============================================================================== From tam at lheapop.gsfc.nasa.gov Fri Jan 27 06:36:36 2006 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Fri, 27 Jan 2006 09:36:36 -0500 Subject: Fitting HEASARC tables into the UCD framework. In-Reply-To: <20060127121052.8wuuc6fb5g0ckk8w@webmail.sic.rm.cnr.it> References: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> <20060127121052.8wuuc6fb5g0ckk8w@webmail.sic.rm.cnr.it> Message-ID: <43DA2FF4.5080208@lheapop.gsfc.nasa.gov> Andrea Preite Martinez wrote: > > If you don't mind, I'll post your file on the twiki page of the IVOA UCD > working group (http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD ) > as an example of application of UCD-tools. > > I was thinking of organizing a session at next IVOA InterOp meeting in May > on applications of ucd-tools to real cases and feedback from users. > I hope you'll be there to present your work. > We're certainly plan to do so. Feel free to post our paper where you feel it's appropriate. Note that the testing of the tools was really the secondary issue for us. Our primary concern was whether the UCDs could adequately describe the content a table column and only then how we found that appropriate UCD. > Now, in about 100s I can assign UCDs to more than 40.000 columns. > But the real problem is not only time (because you have in any case the > control-time to consider!). We'll certainly be trying out the tools you describe. Even for our measly 400 tables it's quite a slog to get get UCDs for all tables. One thing that we will be looking at is how well the UCDs we generate compare with yours on a few of the tables we hold in common. Generally I think we'll defer to your selections there, but it will be an interesting experiment to see how well we match for at least a few test cases. (I think there is one common table in our trial set but I haven't looked at this yet.) Not sure what you mean by control time -- is that the time you take to check the results? In that case I think I agree that that's the real rate limiting step. Today I think that a human has to look at every UCD to confirm that it is appropriate. Tom From andrea.preitemartinez at rm.iasf.cnr.it Tue Jan 3 01:20:08 2006 From: andrea.preitemartinez at rm.iasf.cnr.it (Andrea Preite Martinez) Date: Tue, 03 Jan 2006 10:20:08 +0100 Subject: UCD news... Message-ID: <20060103102008.y85gtgkm7jf4844g@webmail.sic.rm.cnr.it> Dear members of the WG, first of all, a very happy new year to you all! For a good start, I'm pleased to annunce you that the UCD1+ word-list is an IVOA Recommendation from Dec 31 2005. You can find version 1.11 of the document at http://www.ivoa.net/Documents/latest/UCDlist.html Now that we have eventually the first standard list of words, it's time to start wondering how to change it! We already had a discussion on a Note on UCD1+ maintenance I submitted to the WG in October. I revised the Note according to some (if not all) of your remarks, and transformed the Note into the Working Draft that you can find at http://www.ivoa.net/Documents/latest/UCDlistMaintenance.html . I?m ready to move the document to the state of PR and request comments from the community, but before doing this I?d like to be sure that the discussion in our WG is over. So, if you still have comments or amendments to the text of the Working Draft, please do it soon. A reasonable schedule could be to have an RFC in February and to submit the PR to the Exec before the InterOp in spring. Cheers, Andrea ============================================================================== Andrea Preite Martinez andrea.preitemartinez at rm.iasf.cnr.it Istituto di Astrofisica Spaziale Tel.:+39.06.4993.4641 Area di Ricerca di Tor Vergata Fax.:+39.06.2066.0188 Via del Fosso del Cavaliere 100 Cell:+39.339.3817355 00133 Roma CDS :+33.3.90242473 ============================================================================== From andrea.preitemartinez at rm.iasf.cnr.it Tue Jan 3 01:20:08 2006 From: andrea.preitemartinez at rm.iasf.cnr.it (Andrea Preite Martinez) Date: Tue, 03 Jan 2006 10:20:08 +0100 Subject: UCD news... Message-ID: <20060103102008.y85gtgkm7jf4844g@webmail.sic.rm.cnr.it> Dear members of the WG, first of all, a very happy new year to you all! For a good start, I'm pleased to annunce you that the UCD1+ word-list is an IVOA Recommendation from Dec 31 2005. You can find version 1.11 of the document at http://www.ivoa.net/Documents/latest/UCDlist.html Now that we have eventually the first standard list of words, it's time to start wondering how to change it! We already had a discussion on a Note on UCD1+ maintenance I submitted to the WG in October. I revised the Note according to some (if not all) of your remarks, and transformed the Note into the Working Draft that you can find at http://www.ivoa.net/Documents/latest/UCDlistMaintenance.html . I?m ready to move the document to the state of PR and request comments from the community, but before doing this I?d like to be sure that the discussion in our WG is over. So, if you still have comments or amendments to the text of the Working Draft, please do it soon. A reasonable schedule could be to have an RFC in February and to submit the PR to the Exec before the InterOp in spring. Cheers, Andrea ============================================================================== Andrea Preite Martinez andrea.preitemartinez at rm.iasf.cnr.it Istituto di Astrofisica Spaziale Tel.:+39.06.4993.4641 Area di Ricerca di Tor Vergata Fax.:+39.06.2066.0188 Via del Fosso del Cavaliere 100 Cell:+39.339.3817355 00133 Roma CDS :+33.3.90242473 ============================================================================== From tam at lheapop.gsfc.nasa.gov Wed Jan 25 10:55:34 2006 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Wed, 25 Jan 2006 13:55:34 -0500 Subject: Fitting HEASARC tables into the UCD framework. Message-ID: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> During the past few weeks, we have been trying to see how well the HEASARC's tables fit into the UCD framework as currently defined. We've also looked at how well the current Web tools provided by the CDS do in determining the UCD automatically from our existing metadata. A report summarizing our experiences is attached. The three line summary is that the basic framework seemed to be work well, but that there seemed to be some holes in the coverage of concepts related to the process of proposing, taking, analyzing and archiving observations. We'd be very interesting in comments on this. Assuming we don't learn that we've overlooked a whole slew of UCDs, we'll be going ahead and proposing some new words. (And if we have missed a bunch then we'll be trying to see how we might improve the documentation!). Eventually we anticipate providing full UCDs for all 400 HEASARC tables including the ~200 or so that are not duplicated in Vizier. Regards, Tom McGlynn and Michael Preciado -------------- next part -------------- A non-text attachment was scrubbed... Name: UCD summary.doc Type: application/msword Size: 48128 bytes Desc: not available URL: From andrea.preitemartinez at rm.iasf.cnr.it Fri Jan 27 03:10:52 2006 From: andrea.preitemartinez at rm.iasf.cnr.it (Andrea Preite Martinez) Date: Fri, 27 Jan 2006 12:10:52 +0100 Subject: Fitting HEASARC tables into the UCD framework. In-Reply-To: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> References: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> Message-ID: <20060127121052.8wuuc6fb5g0ckk8w@webmail.sic.rm.cnr.it> Tom, Michael, first of all thank you for your report on the assignation of UCDs to HEASARC's tables. It is very useful, for me (author of the scripts behind the ucd-builder) and also for other potential users. If you don't mind, I'll post your file on the twiki page of the IVOA UCD working group (http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD ) as an example of application of UCD-tools. I was thinking of organizing a session at next IVOA InterOp meeting in May on applications of ucd-tools to real cases and feedback from users. I hope you'll be there to present your work. Comments: First of all, let me say that the public tools provided in the CDS page were not ment for massive use, like yours. Indeed you soon realised that the builder is only an interactive tool, and can only be (effectively) used on a one-by-one basis. At CDS I am confronted with a task similar to yours, but the situation is worst: about 40.000 tables and 150.000 columns to assign UCDs to (just to update the old ucd1 to the new ucdi+), and then a steady work of about 1.000 columns to assign per month. In order to do the job in a reasonable time, I built around the basic scripts (find-ucd-word: fw, and build-ucd-from-words: b-ucd, both used by the public builder) a command-line assignator that accepts as input a file describing each table (a suitable version of the VIZIER read-me file). Now, in about 100s I can assign UCDs to more than 40.000 columns. But the real problem is not only time (because you have in any case the control-time to consider!). The assign tool can use more information to assign the ucd, based on column-name (most of them are standard names, more explicit than user descriptions!!) or units. A short description of the tool follows at the end. The tools are continuously upgraded and improved (see the builder page at http://vizier.u-strasbg.fr/UCD/cgi-bin/descr2ucd, with the updated date), with the feedback of my work on Vizier tables and looking at the log of the builder on the CDS page. Thanks to your work I'll have an additional feedback to work on!! ===================================================================== > assign1p -h assign1p = assign ucd1p-words from list of [key]words and build UCD1+ from word(s). If present, a suggested old UCD1 is default-translated. USE: assign1p [options] [<] input-file.tsv tab-separated input-file fields: 0: Cat/Tab (Table name) 1: Data Type (I,F,A) 2: Col_Name (Title of the column) 3: Col_units (no units= ---) 4: Col_description (Free text) 5: UCD1/ucd1+ || notes tab-separated output fields: [nn]0,1,2,ucd,3,4 Options: -h[elp] : this help -d : print revised description used by FindWord after the application of syntax/semantic rules (def=print input descr.) -l : list all words with matching score > 5 (def=only top P/QECV/S scores) -k : print elements of association-tables containing input description [key]words (def=no) -r : do not apply syntax/semantic rules to list of keywords (def=yes) -s : print some statistics at the end (def=no) -t : find also the traditional UCD. Generates a duplicate output line with old UCD (def=no) -u : do not force use of suggested ucd (def=yes) -v : verbose output, sets also -l -u (def=no) -n : prefix output with record-number (def=no) -nn : start at record 'nn' in input-file (def=0) Required: readtab.pl, f-word.pl, bucd.pl Files: U1Pdescr_w, U1-U1P.defaults The result of the assign1p procedure is flagged in field 1, as (Type,flag), where flag can be: nn = best score shown (if only description is used), _ = ColumnName used, % = Units used, _% = CN+U used, f = used suggested ucd, f1 = used suggested ucd1, ? = unable to assign, ! = forced note. ===================================================================== Regards, Andrea ============================================================================== Andrea Preite Martinez andrea.preitemartinez at rm.iasf.cnr.it Istituto di Astrofisica Spaziale Tel.:+39.06.4993.4641 Area di Ricerca di Tor Vergata Fax.:+39.06.2066.0188 Via del Fosso del Cavaliere 100 Cell:+39.339.3817355 00133 Roma CDS :+33.3.90242473 ============================================================================== From tam at lheapop.gsfc.nasa.gov Fri Jan 27 06:36:36 2006 From: tam at lheapop.gsfc.nasa.gov (Thomas McGlynn) Date: Fri, 27 Jan 2006 09:36:36 -0500 Subject: Fitting HEASARC tables into the UCD framework. In-Reply-To: <20060127121052.8wuuc6fb5g0ckk8w@webmail.sic.rm.cnr.it> References: <43D7C9A6.2060805@lheapop.gsfc.nasa.gov> <20060127121052.8wuuc6fb5g0ckk8w@webmail.sic.rm.cnr.it> Message-ID: <43DA2FF4.5080208@lheapop.gsfc.nasa.gov> Andrea Preite Martinez wrote: > > If you don't mind, I'll post your file on the twiki page of the IVOA UCD > working group (http://www.ivoa.net/twiki/bin/view/IVOA/IvoaUCD ) > as an example of application of UCD-tools. > > I was thinking of organizing a session at next IVOA InterOp meeting in May > on applications of ucd-tools to real cases and feedback from users. > I hope you'll be there to present your work. > We're certainly plan to do so. Feel free to post our paper where you feel it's appropriate. Note that the testing of the tools was really the secondary issue for us. Our primary concern was whether the UCDs could adequately describe the content a table column and only then how we found that appropriate UCD. > Now, in about 100s I can assign UCDs to more than 40.000 columns. > But the real problem is not only time (because you have in any case the > control-time to consider!). We'll certainly be trying out the tools you describe. Even for our measly 400 tables it's quite a slog to get get UCDs for all tables. One thing that we will be looking at is how well the UCDs we generate compare with yours on a few of the tables we hold in common. Generally I think we'll defer to your selections there, but it will be an interesting experiment to see how well we match for at least a few test cases. (I think there is one common table in our trial set but I haven't looked at this yet.) Not sure what you mean by control time -- is that the time you take to check the results? In that case I think I agree that that's the real rate limiting step. Today I think that a human has to look at every UCD to confirm that it is appropriate. Tom