From dtody at nrao.edu Mon Jul 4 13:53:12 2011 From: dtody at nrao.edu (Douglas Tody) Date: Mon, 4 Jul 2011 14:53:12 -0600 (MDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: <201106091956.p59Ju32S024292@xebec.cfa.harvard.edu> References: <201106091956.p59Ju32S024292@xebec.cfa.harvard.edu> Message-ID: On Thu, 9 Jun 2011, Arnold Rots wrote: > 3. dataproduct_type, dataproduct_subtype, access_format > I still think the scheme that is proposed is incomplete since it is > ill-suited (as currently defined) to accommodate datasets (i.e., > collections of files). > I would like to suggest that it would be good to add a > dataproduct_type "package" (or some such thing) that indicates that > the client will be receiving not just a single file. However, the > client will still want to know what is in the package, so maybe the > subtype should contain a list of the science file data types? > In access format we are running into a somewhat similar problem: > it's nice (and necessary) to know that a tar file is coming, but it is > equally important to know what kinds of formats are hidden inside that > tar file: if it is, say, Cobol code, I am not interested. Should it be > a comma separated list? Or something like "tar(fits,pdf,txt)"? Complex datasets are handled by the scheme. It is true that we don't really have a way to define what is inside a tar, zip, FITS MEF, directory, etc.; that would be quite complex to attempt. However support for this use case is provided in two ways. First, the subtype may be used to define what the data object is in collection or archive specific terms. For example if the data object is a tar file containing all the files comprising a ROSAT observation the data provider can define a subtype for this type of data. It is up to the client to understand what the content of the proprietary data product is, but if they are able to deal with such instrument-specific data they probably do know what it is. Second, it is possible to expose the individual files comprising the complex dataset. Then all the metadata can be specified separately for each data product allowing a full description. All data products would share the same obs_id hence they are still associated as a complex dataset. Which approach is better probably depends upon how one expects the data to be used. If the client will almost always want to get all the data elements at once (e.g. for custom reprocessing or analysis of instrument-specific data) then the first approach is probably preferable. If they are more likely to want only a higher level derived data product such as an image or spectrum, the second approach might be preferred. Combinations of the two approaches are also possible since obs_id can link multiple associated data products of any type. On Thu, 9 Jun 2011, Arnold Rots wrote: > Are you saying that it is unwise to include optional columns in a > query, because it may cause them to error out? > Then why do we bother with optional items? > It seems to me that their use is discouraged. By not specifying how > servers should handle them we render them useless, don't we? Not at all. The optional columns are ignored by a generic query without error but are still useful to more fully describe the data to the client or user. Also, it is possible in a subsequent query to the specific service providing this extra metadata to reference the custom elements, and still have a well-formed query. In this way the general mechanism can be used to pose more precise archive-specific queries, but the ability to pose generic queries to a number of services has not been compromised. - Doug From arots at head.cfa.harvard.edu Tue Jul 5 08:51:57 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Tue, 5 Jul 2011 11:51:57 -0400 (EDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: Message-ID: <201107051551.p65Fpvbt025530@xebec.cfa.harvard.edu> See below Douglas Tody wrote: > On Thu, 9 Jun 2011, Arnold Rots wrote: > > > 3. dataproduct_type, dataproduct_subtype, access_format > > I still think the scheme that is proposed is incomplete since it is > > ill-suited (as currently defined) to accommodate datasets (i.e., > > collections of files). > > I would like to suggest that it would be good to add a > > dataproduct_type "package" (or some such thing) that indicates that > > the client will be receiving not just a single file. However, the > > client will still want to know what is in the package, so maybe the > > subtype should contain a list of the science file data types? > > In access format we are running into a somewhat similar problem: > > it's nice (and necessary) to know that a tar file is coming, but it is > > equally important to know what kinds of formats are hidden inside that > > tar file: if it is, say, Cobol code, I am not interested. Should it be > > a comma separated list? Or something like "tar(fits,pdf,txt)"? > > Complex datasets are handled by the scheme. It is true that we don't > really have a way to define what is inside a tar, zip, FITS MEF, > directory, etc.; that would be quite complex to attempt. However > support for this use case is provided in two ways. > > First, the subtype may be used to define what the data object is in > collection or archive specific terms. For example if the data object is > a tar file containing all the files comprising a ROSAT observation the > data provider can define a subtype for this type of data. It is up to > the client to understand what the content of the proprietary data > product is, but if they are able to deal with such instrument-specific > data they probably do know what it is. This is precisely the case I was trying to solve: a tarfile containing a mix of data types: images, spectra, event lists. The way I would like to solve it is to allow "package" (or something similar) for the data type and enumerate the data files contained in the tarfile in the data subtype. It still leaves a similar issue for the access format: that would be tar, but it would be nice to be able to enumerate the formats of the files in the tarfile in a similar format subtype - that also would allow one to indicate whether or not the content of the the tarfile is gzipped (as opposed to gzipping the tarfile itself). I realize that this constitutes a use of subtypes that is different from the original intent (at least, I think so), but it does seem a useful mechanism. > > Second, it is possible to expose the individual files comprising the > complex dataset. Then all the metadata can be specified separately > for each data product allowing a full description. All data products > would share the same obs_id hence they are still associated as a complex > dataset. That is a trivial case that is completely covered and I have no issue with it. > > Which approach is better probably depends upon how one expects the data > to be used. If the client will almost always want to get all the data > elements at once (e.g. for custom reprocessing or analysis of > instrument-specific data) then the first approach is probably > preferable. If they are more likely to want only a higher level derived > data product such as an image or spectrum, the second approach might be > preferred. Combinations of the two approaches are also possible since > obs_id can link multiple associated data products of any type. > > On Thu, 9 Jun 2011, Arnold Rots wrote: > > Are you saying that it is unwise to include optional columns in a > > query, because it may cause them to error out? > > Then why do we bother with optional items? > > It seems to me that their use is discouraged. By not specifying how > > servers should handle them we render them useless, don't we? > > Not at all. The optional columns are ignored by a generic query without > error but are still useful to more fully describe the data to the client > or user. Also, it is possible in a subsequent query to the specific > service providing this extra metadata to reference the custom elements, > and still have a well-formed query. In this way the general mechanism > can be used to pose more precise archive-specific queries, but the > ability to pose generic queries to a number of services has not been > compromised. That's fine, but then I fail to understand the problem with having polarization metadata optional. That was the original issue and, if I understood the discussion correctly, the argument was made that if polarization was made optional, it would lead to many query errors. > > - Doug > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From seaman at noao.edu Tue Jul 5 09:15:13 2011 From: seaman at noao.edu (Rob Seaman) Date: Tue, 5 Jul 2011 09:15:13 -0700 Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: <201107051551.p65Fpvbt025530@xebec.cfa.harvard.edu> References: <201107051551.p65Fpvbt025530@xebec.cfa.harvard.edu> Message-ID: <91902DC5-54F4-40B3-9F89-1962817ABA2E@noao.edu> On Jul 5, 2011, at 8:51 AM, Arnold Rots wrote: > It still leaves a similar issue for the access format: that would be > tar, but it would be nice to be able to enumerate the formats of the > files in the tarfile in a similar format subtype - that also would > allow one to indicate whether or not the content of the the tarfile is > gzipped (as opposed to gzipping the tarfile itself). Could we please not assume that gzip is a magic Harry Potter shrinking spell, appropriate for all data types? Suggest support for tag/keyword/field/whatever to indicate a per-file compression algorithm/protocol/scheme. Rob From arots at head.cfa.harvard.edu Tue Jul 5 13:28:30 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Tue, 5 Jul 2011 16:28:30 -0400 (EDT) Subject: [obs-tap]:updates on the Proposed recommendation In-Reply-To: <20110620210846.gr5zd7xcg8g4gsk4@webmail.u-strasbg.fr> Message-ID: <201107052028.p65KSU7I025623@xebec.cfa.harvard.edu> Mireille, Here are some items. Ian Evans noticed the inconsistency in units for spatial resolution between the Tables 1, 4, 5, and 6 (arcsec vs. deg); what should it be? I assume deg? See also s_stat_error in Table 5. In addition, I noticed that Table 5 contained unit "day" that should be "d", Table 7 has erroneous unit "d" for data rights and is missing most units. The section on obs_publisher_did is a bit murky and not quite consistent with the definition in the spectral data model where it expresses a strong preference for using the same DIDs as are being used in the journals. That implies that the data product the query result refers to may be a subset of what the DID stands for, as the current spectral draft affirms. On the other hand, I don't think the spectrum DM and the SSA DAL are quite consistent in this respect. It might be good to have a more thorough discussion on these DIDs and consistency between all PRs. It also brings me back to the issue I have been harping on: what to do with packages of products pertaining to a single observation; I will not repeat that here. However, there is also the reverse problem: what do we do with data products based on multiple observations? Do we allow ObsId to be a list of ObsIds? I still find the bibcodes a bit problematic. The SSA DAL doc calls it a "curation reference", but in the text seesm to imply that any publication mentioning the data is fair game. Is this really meant to be a reference to the data, or is it to be any paper that references the data? There is a difference between these two... I realize, though, that this is primarily an issue for the SSA DAL doc. But it has repercussions for this document as well. Cheers, - Arnold Mireille Louys wrote: [ Charset ISO-8859-1 unsupported, converting... ] > Arnold, Daniel, > This is a second try . I mixed adresses with a strange copy/paste. > Sorry , Mireille. > > ------------------------------------------ > Dear all , > > Here is an updated version of the ObsCore DM document. > I tried to correct and modify the text according to: > - typos and inconsitencies mentionned by Petr on RFC page > - comments given in mails > - actions listed during the telco on June 6th > > Modifications are highlighted to help you tracking the changes. > *check pol_states* > I maintained the first proposal of having a list with / , with a > leading / to help > distinguish Y from other combinations XY, YY , YX etc.. > --> Using a leading comma was too strange for me. > > table 6 and 7 use TAP defined columns: principal, indexed, standard. > I filled them in with my personnal understanding . > It would be useful to have data base system managers to check this . > > to be inserted : update of section 5 for registering an ObsTAP service. > > Thanks for your reading and comments (the very last ones, I hope) > Cheers , Mireille > [ Attachment, skipping... ] -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From dtody at nrao.edu Tue Jul 5 13:56:36 2011 From: dtody at nrao.edu (Douglas Tody) Date: Tue, 5 Jul 2011 14:56:36 -0600 (MDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: <201107051551.p65Fpvbt025530@xebec.cfa.harvard.edu> References: <201107051551.p65Fpvbt025530@xebec.cfa.harvard.edu> Message-ID: On Tue, 5 Jul 2011, Arnold Rots wrote: >> First, the subtype may be used to define what the data object is in >> collection or archive specific terms. For example if the data object is >> a tar file containing all the files comprising a ROSAT observation the >> data provider can define a subtype for this type of data. It is up to >> the client to understand what the content of the proprietary data >> product is, but if they are able to deal with such instrument-specific >> data they probably do know what it is. > > This is precisely the case I was trying to solve: a tarfile containing > a mix of data types: images, spectra, event lists. > The way I would like to solve it is to allow "package" (or something > similar) for the data type and enumerate the data files contained in > the tarfile in the data subtype. > > It still leaves a similar issue for the access format: that would be > tar, but it would be nice to be able to enumerate the formats of the > files in the tarfile in a similar format subtype - that also would > allow one to indicate whether or not the content of the the tarfile is > gzipped (as opposed to gzipping the tarfile itself). > > I realize that this constitutes a use of subtypes that is different > from the original intent (at least, I think so), but it does seem a > useful mechanism. Arnold - I agree that in principle it would be useful to have this extra information. However we had to argue for quite a while to get support for instrumental data at this level included at all. One *can* expose this data with ObsTAP 1.0 as outlined in my earlier email; in particular exposing the individual data products separately allows them to be described if the data provider wants to do so. Even exposing only the tar/zip/MEF etc. file works so long as the client recognizes the subtype. To attempt to the describe the contents of arbitrary complex instrumental datasets is out of scope for ObsTAP, at least 1.0. Perhaps we can address this issue in the next phase of development where we prototype related mechanisms such as data linking. > However, there is also the reverse problem: what do we do with data > products based on multiple observations? Do we allow ObsId to be a > list of ObsIds? This was addressed in the document as I recall. In the case of complex data products which are derived from multiple inputs (e.g. multiple observations) which essentially have a new "software observation", and a new obs_id should be assigned. To say more about the derivation of a particular data product is complex and gets into the general issue of provenance which is being addressed separately. Furthermore obs_id is a database key used to uniquely identify specific "observations" (usable as a foreign key in other tables for example) hence we cannot turn it into a list of obs_ids. - Doug From arots at head.cfa.harvard.edu Wed Jul 6 07:29:49 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Wed, 6 Jul 2011 10:29:49 -0400 (EDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: Message-ID: <201107061430.p66EU2fK003705@xebec.cfa.harvard.edu> I think I am beginning to realize what it is that makes me so uncomfortable with ObsTAP and what makes it so hard to grasp the correct way to implement it: its ambivalence. It is primarily intended (I think) as a data discovery interface. The problem is that it also doubles as a data access tool. I think it is the intertwining of these two functions that makes it murky. And I wish these two functions had been separated into separate intefaces. I know this is not an issue for some observatories (say, the ones that only produce simple 2-D images), but it makes life difficult for more complicated datasets. As a data discovery tool, I would have expected its purpose to be: - find available observations that fall within certain constraints in time, space, frequency, etc. - tell me what kind of data products are available for each For a data access tool: - Give me the URL to a specific (set of) type(s) of data product for a specific (set of) observation(s) For all I know, this role could be played by SIAP. SSAP, SCS, or whatever protocols are already in existence. The trouble is that for Chandra data, the intertwining of the two functions requires us to duplicate each ObsCore record six times to enumerate, laboriously, the different data types we can provide. When it comes to proper data discovery, it makes much more sense to return a single record with the ObsCore parameters and a list of available data product types (event lists, images, light curves, spectra, tarfiles with all of the above, etc.). Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be Modified Julian Day. I hope you don't mind these ruminations, but these are things that I am discovering as we are trying to implement this - and it is hard. Cheers, - Arnold Douglas Tody wrote: > On Tue, 5 Jul 2011, Arnold Rots wrote: > > >> First, the subtype may be used to define what the data object is in > >> collection or archive specific terms. For example if the data object is > >> a tar file containing all the files comprising a ROSAT observation the > >> data provider can define a subtype for this type of data. It is up to > >> the client to understand what the content of the proprietary data > >> product is, but if they are able to deal with such instrument-specific > >> data they probably do know what it is. > > > > This is precisely the case I was trying to solve: a tarfile containing > > a mix of data types: images, spectra, event lists. > > The way I would like to solve it is to allow "package" (or something > > similar) for the data type and enumerate the data files contained in > > the tarfile in the data subtype. > > > > It still leaves a similar issue for the access format: that would be > > tar, but it would be nice to be able to enumerate the formats of the > > files in the tarfile in a similar format subtype - that also would > > allow one to indicate whether or not the content of the the tarfile is > > gzipped (as opposed to gzipping the tarfile itself). > > > > I realize that this constitutes a use of subtypes that is different > > from the original intent (at least, I think so), but it does seem a > > useful mechanism. > > Arnold - I agree that in principle it would be useful to have this extra > information. However we had to argue for quite a while to get support > for instrumental data at this level included at all. One *can* expose > this data with ObsTAP 1.0 as outlined in my earlier email; in particular > exposing the individual data products separately allows them to be > described if the data provider wants to do so. Even exposing only the > tar/zip/MEF etc. file works so long as the client recognizes the > subtype. > > To attempt to the describe the contents of arbitrary complex > instrumental datasets is out of scope for ObsTAP, at least 1.0. Perhaps > we can address this issue in the next phase of development where we > prototype related mechanisms such as data linking. > > > However, there is also the reverse problem: what do we do with data > > products based on multiple observations? Do we allow ObsId to be a > > list of ObsIds? > > This was addressed in the document as I recall. In the case of complex > data products which are derived from multiple inputs (e.g. multiple > observations) which essentially have a new "software observation", and a > new obs_id should be assigned. To say more about the derivation of a > particular data product is complex and gets into the general issue of > provenance which is being addressed separately. Furthermore obs_id is a > database key used to uniquely identify specific "observations" (usable > as a foreign key in other tables for example) hence we cannot turn it > into a list of obs_ids. > > - Doug > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From dtody at NRAO.EDU Wed Jul 6 08:54:58 2011 From: dtody at NRAO.EDU (Douglas Tody) Date: Wed, 6 Jul 2011 09:54:58 -0600 (MDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: <201107061430.p66EU2fK003705@xebec.cfa.harvard.edu> References: <201107061430.p66EU2fK003705@xebec.cfa.harvard.edu> Message-ID: On Wed, 6 Jul 2011, Arnold Rots wrote: > I think I am beginning to realize what it is that makes me so > uncomfortable with ObsTAP and what makes it so hard to grasp the > correct way to implement it: its ambivalence. > > It is primarily intended (I think) as a data discovery interface. > The problem is that it also doubles as a data access tool. > I think it is the intertwining of these two functions that makes it murky. > And I wish these two functions had been separated into separate intefaces. > I know this is not an issue for some observatories (say, the ones that > only produce simple 2-D images), but it makes life difficult for more > complicated datasets. > > As a data discovery tool, I would have expected its purpose to be: > - find available observations that fall within certain constraints in > time, space, frequency, etc. > - tell me what kind of data products are available for each > > For a data access tool: > - Give me the URL to a specific (set of) type(s) of data product for a > specific (set of) observation(s) > For all I know, this role could be played by SIAP. SSAP, SCS, or > whatever protocols are already in existence. ObsTAP is intended mainly to provide uniform global data discovery; it can find any type of data, even non-VO data formats. The data access capabilities provided at this level are very limited, but can be used to retrieve static archive data files (the data product could actually be generated on the fly if desired, but the description at least is static). As you suggest, the idea is that for any non-trivial data access the typed interfaces would be used (SIA, SSA, etc.). So for example one could do global data discovery using ObsTAP and then followup with one of the typed interfaces to get more complete object-specific metadata and do the actual data access, which for a typed/OO interface will often involve virtual data generation (subsetting, filtering, transforming, output format specification, etc.). Of course if just retrieving the static archive file is enough then that can be done with just the acref returned by ObsTAP. > The trouble is that for Chandra data, the intertwining of the two > functions requires us to duplicate each ObsCore record six times to > enumerate, laboriously, the different data types we can provide. > When it comes to proper data discovery, it makes much more sense to > return a single record with the ObsCore parameters and a list of > available data product types (event lists, images, light curves, > spectra, tarfiles with all of the above, etc.). True, but this is necessary to be consistent with the relational model and to provide a simple mechanism. For a Chandra observation one might return a set of records with the same obs_id, one being a tar.gz of the full instrumental dataset, the others being static images, spectra, etc. derived from that data. A query for a specific obs_id would thus describe all the data products available for the observation. As you note it is necessary to duplicate some of the metadata in associated records, but much of the metadata will differ for each data product as well. So far as the archive goes one would probably want to autogenerate the ObsTAP table from more fundamental, fully normalized database tables. Any updates would be done only on the underlying tables (auto-updating the ObsTAP "view" after each such update). Then there should be no problem with the redundant metadata in the ObsTAP index table becoming inconsistent or whatever. In addition to a few static images or spectra providing standard views of an observation one would ideally provide SIA, SSA, etc. services capable of accessing the event data and computing custom virtual data products on the fly. In the future the proposed data linking facilities would be able point directly to such services. At present one would have to do a registry query to find the service and then use the publisher DID from the ObsTAP query to access the desired dataset. > Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be > Modified Julian Day. > > I hope you don't mind these ruminations, but these are things that I > am discovering as we are trying to implement this - and it is hard. Not at all; it is useful to have these discussions in the record for others later as well. - Doug > Cheers, > > - Arnold > > > Douglas Tody wrote: >> On Tue, 5 Jul 2011, Arnold Rots wrote: >> >>>> First, the subtype may be used to define what the data object is in >>>> collection or archive specific terms. For example if the data object is >>>> a tar file containing all the files comprising a ROSAT observation the >>>> data provider can define a subtype for this type of data. It is up to >>>> the client to understand what the content of the proprietary data >>>> product is, but if they are able to deal with such instrument-specific >>>> data they probably do know what it is. >>> >>> This is precisely the case I was trying to solve: a tarfile containing >>> a mix of data types: images, spectra, event lists. >>> The way I would like to solve it is to allow "package" (or something >>> similar) for the data type and enumerate the data files contained in >>> the tarfile in the data subtype. >>> >>> It still leaves a similar issue for the access format: that would be >>> tar, but it would be nice to be able to enumerate the formats of the >>> files in the tarfile in a similar format subtype - that also would >>> allow one to indicate whether or not the content of the the tarfile is >>> gzipped (as opposed to gzipping the tarfile itself). >>> >>> I realize that this constitutes a use of subtypes that is different >>> from the original intent (at least, I think so), but it does seem a >>> useful mechanism. >> >> Arnold - I agree that in principle it would be useful to have this extra >> information. However we had to argue for quite a while to get support >> for instrumental data at this level included at all. One *can* expose >> this data with ObsTAP 1.0 as outlined in my earlier email; in particular >> exposing the individual data products separately allows them to be >> described if the data provider wants to do so. Even exposing only the >> tar/zip/MEF etc. file works so long as the client recognizes the >> subtype. >> >> To attempt to the describe the contents of arbitrary complex >> instrumental datasets is out of scope for ObsTAP, at least 1.0. Perhaps >> we can address this issue in the next phase of development where we >> prototype related mechanisms such as data linking. >> >>> However, there is also the reverse problem: what do we do with data >>> products based on multiple observations? Do we allow ObsId to be a >>> list of ObsIds? >> >> This was addressed in the document as I recall. In the case of complex >> data products which are derived from multiple inputs (e.g. multiple >> observations) which essentially have a new "software observation", and a >> new obs_id should be assigned. To say more about the derivation of a >> particular data product is complex and gets into the general issue of >> provenance which is being addressed separately. Furthermore obs_id is a >> database key used to uniquely identify specific "observations" (usable >> as a foreign key in other tables for example) hence we cannot turn it >> into a list of obs_ids. >> >> - Doug >> > -------------------------------------------------------------------------- > Arnold H. Rots Chandra X-ray Science Center > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > 60 Garden Street, MS 67 fax: +1 617 495 7356 > Cambridge, MA 02138 arots at head.cfa.harvard.edu > USA http://hea-www.harvard.edu/~arots/ > -------------------------------------------------------------------------- > From arots at head.cfa.harvard.edu Thu Jul 7 12:25:22 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Thu, 7 Jul 2011 15:25:22 -0400 (EDT) Subject: [obs-tap]:updates on the Proposed recommendation In-Reply-To: <201107052028.p65KSU7I025623@xebec.cfa.harvard.edu> Message-ID: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> Aside from what I reported in a previous message, quoted below, there are more discrepancies between Table 5 and Tables 6 and 7: obs_creator_did is missing from Table 7 o_units in Table 5 should be o_unit pol_states is missing from Table 6 facility_name and instrument_name are spelled differently; even though required, they show up in Table 7, rather than 6 em_unit is missing from Table 5 o_stat_error is missing from Table 7 Also, note the comment I made on MJD in use case 1.6 and on the uselessness of bib_reference because of its murky definition I still lament the fact that the data access functionality is compromising the self-consistency and usefulness of the data discovery function, but decided for our tarred packages to use: dataproduct_type = NULL dataproduct_subtype = package:event,image access_format = application/x-tar As far as I can tell, this is within the specifications. o_stat_error is an interesting case. Since our unit is counts, the proper value would be "POISSON"; I realize that that is not a double, but what else can we give as a value? Please do not consider this list of corrections to be exhaustive. Cheers, - Arnold Arnold Rots wrote: > Mireille, > > Here are some items. > > Ian Evans noticed the inconsistency in units for spatial resolution > between the Tables 1, 4, 5, and 6 (arcsec vs. deg); what should it be? > I assume deg? > See also s_stat_error in Table 5. > > In addition, I noticed that Table 5 contained unit "day" that should > be "d", Table 7 has erroneous unit "d" for data rights and is missing > most units. > > The section on obs_publisher_did is a bit murky and not quite > consistent with the definition in the spectral data model where it > expresses a strong preference for using the same DIDs as are being > used in the journals. > That implies that the data product the query result refers to may be a > subset of what the DID stands for, as the current spectral draft > affirms. > On the other hand, I don't think the spectrum DM and the SSA DAL are > quite consistent in this respect. It might be good to have a more > thorough discussion on these DIDs and consistency between all PRs. > > It also brings me back to the issue I have been harping on: what to > do with packages of products pertaining to a single observation; > I will not repeat that here. > > However, there is also the reverse problem: what do we do with data > products based on multiple observations? Do we allow ObsId to be a > list of ObsIds? > > I still find the bibcodes a bit problematic. The SSA DAL doc calls it > a "curation reference", but in the text seesm to imply that any > publication mentioning the data is fair game. Is this really meant to > be a reference to the data, or is it to be any paper that references > the data? There is a difference between these two... > I realize, though, that this is primarily an issue for the SSA DAL doc. > But it has repercussions for this document as well. > > Cheers, > > - Arnold > > > Mireille Louys wrote: > [ Charset ISO-8859-1 unsupported, converting... ] > > Arnold, Daniel, > > This is a second try . I mixed adresses with a strange copy/paste. > > Sorry , Mireille. > > > > ------------------------------------------ > > Dear all , > > > > Here is an updated version of the ObsCore DM document. > > I tried to correct and modify the text according to: > > - typos and inconsitencies mentionned by Petr on RFC page > > - comments given in mails > > - actions listed during the telco on June 6th > > > > Modifications are highlighted to help you tracking the changes. > > *check pol_states* > > I maintained the first proposal of having a list with / , with a > > leading / to help > > distinguish Y from other combinations XY, YY , YX etc.. > > --> Using a leading comma was too strange for me. > > > > table 6 and 7 use TAP defined columns: principal, indexed, standard. > > I filled them in with my personnal understanding . > > It would be useful to have data base system managers to check this . > > > > to be inserted : update of section 5 for registering an ObsTAP service. > > > > Thanks for your reading and comments (the very last ones, I hope) > > Cheers , Mireille > > > > [ Attachment, skipping... ] > -------------------------------------------------------------------------- > Arnold H. Rots Chandra X-ray Science Center > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > 60 Garden Street, MS 67 fax: +1 617 495 7356 > Cambridge, MA 02138 arots at head.cfa.harvard.edu > USA http://hea-www.harvard.edu/~arots/ > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From mireille.louys at unistra.fr Fri Jul 8 04:21:10 2011 From: mireille.louys at unistra.fr (Mireille Louys) Date: Fri, 08 Jul 2011 13:21:10 +0200 Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> References: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> Message-ID: <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> Dear Arnold, Dear all, Thanks very much for reporting these typos and inconsistencies. I produced a new document with corrections suggested from you and the RFC page inputs. They appear highlighted as modification follow up in the .docx file attached below. This is just to help contributors to follow the changes. You can provide comments till Monday, then I will integrate the final changes in order to proceed for TCG review. See my comments inserted in the text below. Best regards , Mireille Arnold Rots a ?crit?: > Aside from what I reported in a previous message, quoted below, there > are more discrepancies between Table 5 and Tables 6 and 7: > > obs_creator_did is missing from Table 7 > o_units in Table 5 should be o_unit > pol_states is missing from Table 6 > facility_name and instrument_name are spelled differently; > even though required, they show up in Table 7, rather than 6 > em_unit is missing from Table 5 > o_stat_error is missing from Table 7 > *included and corrected* > Also, note the comment I made on MJD in use case 1.6 > and on the uselessness of bib_reference because of its murky > definition *MJD done* bib-reference is an optional field that a data provider may use to flag some data sets as the ones used and published together with a scientific paper. This is not meant to behave like a citation index and point to all papers mentioning this data set. > I still lament the fact that the data access functionality is > compromising the self-consistency and usefulness of the data discovery > function, but decided for our tarred packages to use: > dataproduct_type = NULL > dataproduct_subtype = package:event,image > access_format = application/x-tar > As far as I can tell, this is within the specifications. > This seems a proper use of the Obs/TAP specification to expose your data. "dataproduct_subtype" is an optional field that the data provider can define. Possible values for this field should be clearly documented by the service. > o_stat_error is an interesting case. Since our unit is counts, the > proper value would be "POISSON"; I realize that that is not a double, > but what else can we give as a value? > This was meant only for quantitative estimation of the error and does not cover the statistical properties of the signal. > Please do not consider this list of corrections to be exhaustive. > > Cheers, > > - Arnold > > Arnold Rots wrote: >> Mireille, >> >> Here are some items. >> >> Ian Evans noticed the inconsistency in units for spatial resolution >> between the Tables 1, 4, 5, and 6 (arcsec vs. deg); what should it be? >> I assume deg? >> See also s_stat_error in Table 5. >> We agreed in previous iterations to have resolution and errors in a convenient and handy unit: arcsec for space and s for time , for instance because they are usually given that way in instrument descriptions and scientific papers. >> In addition, I noticed that Table 5 contained unit "day" that should >> be "d", Table 7 has erroneous unit "d" for data rights and is missing >> most units. >> *updated* >> The section on obs_publisher_did is a bit murky and not quite >> consistent with the definition in the spectral data model where it >> expresses a strong preference for using the same DIDs as are being >> used in the journals. >> That implies that the data product the query result refers to may be a >> subset of what the DID stands for, as the current spectral draft >> affirms. >> On the other hand, I don't think the spectrum DM and the SSA DAL are >> quite consistent in this respect. It might be good to have a more >> thorough discussion on these DIDs and consistency between all PRs. >> This is identified as a work to do , in compatibility with undergoing effort on IVOA identifiers definitions . We agreed during the last telecon to reconsider it for a future version of Obs/Tap. >> It also brings me back to the issue I have been harping on: what to >> do with packages of products pertaining to a single observation; >> I will not repeat that here. >> >> However, there is also the reverse problem: what do we do with data >> products based on multiple observations? Do we allow ObsId to be a >> list of ObsIds? >> >> I still find the bibcodes a bit problematic. The SSA DAL doc calls it >> a "curation reference", but in the text seesm to imply that any >> publication mentioning the data is fair game. Is this really meant to >> be a reference to the data, or is it to be any paper that references >> the data? There is a difference between these two... >> I realize, though, that this is primarily an issue for the SSA DAL doc. >> But it has repercussions for this document as well. >> >> Cheers, >> >> - Arnold -------------- next part -------------- A non-text attachment was scrubbed... Name: PR-ObsCore-v1.0-20110807.docx Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document Size: 363043 bytes Desc: not available URL: From arots at head.cfa.harvard.edu Fri Jul 8 06:36:59 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Fri, 8 Jul 2011 09:36:59 -0400 (EDT) Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> Message-ID: <201107081336.p68DaxoJ020641@xebec.cfa.harvard.edu> Just two quick comments. Both highlight how this standard is still heavily slanted toward optical images. - Arnold Mireille Louys wrote: [ Charset ISO-8859-1 unsupported, converting... ] > Dear Arnold, Dear all, > > > > o_stat_error is an interesting case. Since our unit is counts, the > > proper value would be "POISSON"; I realize that that is not a double, > > but what else can we give as a value? > > > This was meant only for quantitative estimation of the error and does > not cover the statistical properties of the signal. I realize that, but the problem is that one can't give a single quantitative value in the case of Poisson noise. However, identifying it as Poisson does immediately provide the value for each point in the image. > > >> > >> Ian Evans noticed the inconsistency in units for spatial resolution > >> between the Tables 1, 4, 5, and 6 (arcsec vs. deg); what should it be? > >> I assume deg? > >> See also s_stat_error in Table 5. > >> > We agreed in previous iterations to have resolution and errors in a > convenient and handy unit: arcsec for space and s for time , for > instance because they are usually given that way in instrument > descriptions and scientific papers. Too bad the same reasoning was not applied to the EM units; when I complained about the requirement to use m, I was told that it's only a number seen by software that can easily be turned into a more convenient unit for display to the user :-P > > > [ Attachment, skipping... ] -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From seaman at noao.edu Fri Jul 8 07:13:08 2011 From: seaman at noao.edu (Rob Seaman) Date: Fri, 8 Jul 2011 07:13:08 -0700 Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <201107081336.p68DaxoJ020641@xebec.cfa.harvard.edu> References: <201107081336.p68DaxoJ020641@xebec.cfa.harvard.edu> Message-ID: <65324DBA-962E-4866-BECF-73497EA63CAA@noao.edu> On Jul 8, 2011, at 6:36 AM, Arnold Rots wrote: > Just two quick comments. Both highlight how this standard is still > heavily slanted toward optical images. Undoubtedly true and such biases should be minimized. > Mireille Louys wrote: > [ Charset ISO-8859-1 unsupported, converting... ] >> Dear Arnold, Dear all, >> >>> o_stat_error is an interesting case. Since our unit is counts, the >>> proper value would be "POISSON"; I realize that that is not a double, >>> but what else can we give as a value? >> >> >> This was meant only for quantitative estimation of the error and does >> not cover the statistical properties of the signal. > > I realize that, but the problem is that one can't give a single > quantitative value in the case of Poisson noise. However, identifying > it as Poisson does immediately provide the value for each point in the > image. Well, it provides a heuristic for making estimates of the error/noise/variance values. That is certainly better than picking a scalar. Even in the optical a purely scalar error is an artificial choose, see for example: http://arxiv.org/abs/0910.3733 Negative numbers are presumably non-physical and could be used to encode special values corresponding to "POISSON" or even "POISSON+GAUSSIAN" with the absolute value representing the scalar gaussian contribution (e.g., read noise + sky noise). Might consider providing the error as a variance, too. Rob From arots at head.cfa.harvard.edu Fri Jul 8 09:28:48 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Fri, 8 Jul 2011 12:28:48 -0400 (EDT) Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> Message-ID: <201107081628.p68GSmPR020791@xebec.cfa.harvard.edu> em_calib_status is missing from Table 7. -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From francois.bonnarel at astro.unistra.fr Sat Jul 9 22:40:19 2011 From: francois.bonnarel at astro.unistra.fr (=?ISO-8859-1?Q?Fran=E7ois_Bonnarel?=) Date: Sun, 10 Jul 2011 07:40:19 +0200 Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: References: <201107061430.p66EU2fK003705@xebec.cfa.harvard.edu> Message-ID: <4E193B43.4070400@astro.unistra.fr> Arnold, Doug, Le 06/07/2011 17:54, Douglas Tody a ?crit : > On Wed, 6 Jul 2011, Arnold Rots wrote: > >> I think I am beginning to realize what it is that makes me so >> uncomfortable with ObsTAP and what makes it so hard to grasp the >> correct way to implement it: its ambivalence. >> >> It is primarily intended (I think) as a data discovery interface. >> The problem is that it also doubles as a data access tool. >> I think it is the intertwining of these two functions that makes it >> murky. >> And I wish these two functions had been separated into separate >> intefaces. >> I know this is not an issue for some observatories (say, the ones that >> only produce simple 2-D images), but it makes life difficult for more >> complicated datasets. >> >> As a data discovery tool, I would have expected its purpose to be: >> - find available observations that fall within certain constraints in >> time, space, frequency, etc. >> - tell me what kind of data products are available for each >> >> For a data access tool: >> - Give me the URL to a specific (set of) type(s) of data product for a >> specific (set of) observation(s) >> For all I know, this role could be played by SIAP. SSAP, SCS, or >> whatever protocols are already in existence. > > ObsTAP is intended mainly to provide uniform global data discovery; it > can find any type of data, even non-VO data formats. The data access > capabilities provided at this level are very limited, but can be used to > retrieve static archive data files (the data product could actually be > generated on the fly if desired, but the description at least is > static). > > As you suggest, the idea is that for any non-trivial data access the > typed interfaces would be used (SIA, SSA, etc.). So for example one > could do global data discovery using ObsTAP and then followup with one > of the typed interfaces to get more complete object-specific metadata > and do the actual data access, which for a typed/OO interface will often > involve virtual data generation (subsetting, filtering, transforming, > output format specification, etc.). Of course if just retrieving the > static archive file is enough then that can be done with just the acref > returned by ObsTAP. > >> The trouble is that for Chandra data, the intertwining of the two >> functions requires us to duplicate each ObsCore record six times to >> enumerate, laboriously, the different data types we can provide. >> When it comes to proper data discovery, it makes much more sense to >> return a single record with the ObsCore parameters and a list of >> available data product types (event lists, images, light curves, >> spectra, tarfiles with all of the above, etc.). > > True, but this is necessary to be consistent with the relational model > and to provide a simple mechanism. For a Chandra observation one might > return a set of records with the same obs_id, one being a tar.gz of the > full instrumental dataset, the others being static images, spectra, etc. > derived from that data. A query for a specific obs_id would thus > describe all the data products available for the observation. As you > note it is necessary to duplicate some of the metadata in associated > records, but much of the metadata will differ for each data product as > well. > > So far as the archive goes one would probably want to autogenerate the > ObsTAP table from more fundamental, fully normalized database tables. > Any updates would be done only on the underlying tables (auto-updating > the ObsTAP "view" after each such update). Then there should be no > problem with the redundant metadata in the ObsTAP index table becoming > inconsistent or whatever. > > In addition to a few static images or spectra providing standard views > of an observation one would ideally provide SIA, SSA, etc. services > capable of accessing the event data and computing custom virtual data > products on the fly. In the future the proposed data linking facilities > would be able point directly to such services. At present one would > have to do a registry query to find the service and then use the > publisher DID from the ObsTAP query to access the desired dataset. > A few words about these data linking facilities we have in mind (presentation in Nara and Napoli for example)... The Obsid can be used as a key, (or an entry parameter) to a table or service containing or returning links to related data and metadata... The basic idea is that it's not just an obsid / acref association (which we have as a byproduct of Obstap) allready, but provides also a description of the link... What we have is a little DataLink model with a few parameters Association meaning or Nature (calibration files, dataset retrieval, whatever X-ray band, etc ......) the nature of the link (simple URL, S*AP Query or AccesData mode, etc ...) A little model of the internal path to a given file or subfile in the global package is also proposed.... Thus we should be able to expose Chandra packages as a whole in the Obstap service... And the various files in the packages can be accessed via DataLink... If some of the files have their own dateset type they can be described by a S*AP service (or OBsTAp again) given by the link. or Accessed via the AccessData method of the relevant S*AP service (again URL given by the link) An IVOA note is in preparation on this. Fran?ois >> Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be >> Modified Julian Day. >> >> I hope you don't mind these ruminations, but these are things that I >> am discovering as we are trying to implement this - and it is hard. > > Not at all; it is useful to have these discussions in the record for > others later as well. > > - Doug > > >> Cheers, >> >> - Arnold >> >> >> Douglas Tody wrote: >>> On Tue, 5 Jul 2011, Arnold Rots wrote: >>> >>>>> First, the subtype may be used to define what the data object is in >>>>> collection or archive specific terms. For example if the data >>>>> object is >>>>> a tar file containing all the files comprising a ROSAT observation >>>>> the >>>>> data provider can define a subtype for this type of data. It is >>>>> up to >>>>> the client to understand what the content of the proprietary data >>>>> product is, but if they are able to deal with such >>>>> instrument-specific >>>>> data they probably do know what it is. >>>> >>>> This is precisely the case I was trying to solve: a tarfile containing >>>> a mix of data types: images, spectra, event lists. >>>> The way I would like to solve it is to allow "package" (or something >>>> similar) for the data type and enumerate the data files contained in >>>> the tarfile in the data subtype. >>>> >>>> It still leaves a similar issue for the access format: that would be >>>> tar, but it would be nice to be able to enumerate the formats of the >>>> files in the tarfile in a similar format subtype - that also would >>>> allow one to indicate whether or not the content of the the tarfile is >>>> gzipped (as opposed to gzipping the tarfile itself). >>>> >>>> I realize that this constitutes a use of subtypes that is different >>>> from the original intent (at least, I think so), but it does seem a >>>> useful mechanism. >>> >>> Arnold - I agree that in principle it would be useful to have this >>> extra >>> information. However we had to argue for quite a while to get support >>> for instrumental data at this level included at all. One *can* expose >>> this data with ObsTAP 1.0 as outlined in my earlier email; in >>> particular >>> exposing the individual data products separately allows them to be >>> described if the data provider wants to do so. Even exposing only the >>> tar/zip/MEF etc. file works so long as the client recognizes the >>> subtype. >>> >>> To attempt to the describe the contents of arbitrary complex >>> instrumental datasets is out of scope for ObsTAP, at least 1.0. >>> Perhaps >>> we can address this issue in the next phase of development where we >>> prototype related mechanisms such as data linking. >>> >>>> However, there is also the reverse problem: what do we do with data >>>> products based on multiple observations? Do we allow ObsId to be a >>>> list of ObsIds? >>> >>> This was addressed in the document as I recall. In the case of complex >>> data products which are derived from multiple inputs (e.g. multiple >>> observations) which essentially have a new "software observation", >>> and a >>> new obs_id should be assigned. To say more about the derivation of a >>> particular data product is complex and gets into the general issue of >>> provenance which is being addressed separately. Furthermore obs_id >>> is a >>> database key used to uniquely identify specific "observations" (usable >>> as a foreign key in other tables for example) hence we cannot turn it >>> into a list of obs_ids. >>> >>> - Doug >>> >> -------------------------------------------------------------------------- >> >> Arnold H. Rots Chandra X-ray Science >> Center >> Smithsonian Astrophysical Observatory tel: +1 617 496 >> 7701 >> 60 Garden Street, MS 67 fax: +1 617 495 >> 7356 >> Cambridge, MA 02138 >> arots at head.cfa.harvard.edu >> USA >> http://hea-www.harvard.edu/~arots/ >> -------------------------------------------------------------------------- >> >> From arots at head.cfa.harvard.edu Mon Jul 11 06:48:06 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Mon, 11 Jul 2011 09:48:06 -0400 (EDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: <4E193B43.4070400@astro.unistra.fr> Message-ID: <201107111348.p6BDm6vL014925@xebec.cfa.harvard.edu> I hear what you are saying (I think), but in retrospect it would have been much cleaner (and clearer) if the data discovery and data access roles had not been combined in ObsTAP. If ObsTAP had purely provided information on the availability of data, observational metadata, the data product types, the data formats, and the packaging, a comprehensive data access tool could have provided the access URLs in response to a query that is based on the information provided through ObsTAP. As it is, the data discovery is somewhat compromised by the data access features and the data access features themselves are not very satisfactory. I am afraid that this is a bit of a missed opportunity. Cheers, - arnold Fran?ois Bonnarel wrote: [ Charset ISO-8859-1 unsupported, converting... ] > Arnold, Doug, > Le 06/07/2011 17:54, Douglas Tody a ?crit : > > On Wed, 6 Jul 2011, Arnold Rots wrote: > > > >> I think I am beginning to realize what it is that makes me so > >> uncomfortable with ObsTAP and what makes it so hard to grasp the > >> correct way to implement it: its ambivalence. > >> > >> It is primarily intended (I think) as a data discovery interface. > >> The problem is that it also doubles as a data access tool. > >> I think it is the intertwining of these two functions that makes it > >> murky. > >> And I wish these two functions had been separated into separate > >> intefaces. > >> I know this is not an issue for some observatories (say, the ones that > >> only produce simple 2-D images), but it makes life difficult for more > >> complicated datasets. > >> > >> As a data discovery tool, I would have expected its purpose to be: > >> - find available observations that fall within certain constraints in > >> time, space, frequency, etc. > >> - tell me what kind of data products are available for each > >> > >> For a data access tool: > >> - Give me the URL to a specific (set of) type(s) of data product for a > >> specific (set of) observation(s) > >> For all I know, this role could be played by SIAP. SSAP, SCS, or > >> whatever protocols are already in existence. > > > > ObsTAP is intended mainly to provide uniform global data discovery; it > > can find any type of data, even non-VO data formats. The data access > > capabilities provided at this level are very limited, but can be used to > > retrieve static archive data files (the data product could actually be > > generated on the fly if desired, but the description at least is > > static). > > > > As you suggest, the idea is that for any non-trivial data access the > > typed interfaces would be used (SIA, SSA, etc.). So for example one > > could do global data discovery using ObsTAP and then followup with one > > of the typed interfaces to get more complete object-specific metadata > > and do the actual data access, which for a typed/OO interface will often > > involve virtual data generation (subsetting, filtering, transforming, > > output format specification, etc.). Of course if just retrieving the > > static archive file is enough then that can be done with just the acref > > returned by ObsTAP. > > > >> The trouble is that for Chandra data, the intertwining of the two > >> functions requires us to duplicate each ObsCore record six times to > >> enumerate, laboriously, the different data types we can provide. > >> When it comes to proper data discovery, it makes much more sense to > >> return a single record with the ObsCore parameters and a list of > >> available data product types (event lists, images, light curves, > >> spectra, tarfiles with all of the above, etc.). > > > > True, but this is necessary to be consistent with the relational model > > and to provide a simple mechanism. For a Chandra observation one might > > return a set of records with the same obs_id, one being a tar.gz of the > > full instrumental dataset, the others being static images, spectra, etc. > > derived from that data. A query for a specific obs_id would thus > > describe all the data products available for the observation. As you > > note it is necessary to duplicate some of the metadata in associated > > records, but much of the metadata will differ for each data product as > > well. > > > > So far as the archive goes one would probably want to autogenerate the > > ObsTAP table from more fundamental, fully normalized database tables. > > Any updates would be done only on the underlying tables (auto-updating > > the ObsTAP "view" after each such update). Then there should be no > > problem with the redundant metadata in the ObsTAP index table becoming > > inconsistent or whatever. > > > > In addition to a few static images or spectra providing standard views > > of an observation one would ideally provide SIA, SSA, etc. services > > capable of accessing the event data and computing custom virtual data > > products on the fly. In the future the proposed data linking facilities > > would be able point directly to such services. At present one would > > have to do a registry query to find the service and then use the > > publisher DID from the ObsTAP query to access the desired dataset. > > > A few words about these data linking facilities we have in mind > (presentation in > Nara and Napoli for example)... > The Obsid can be used as a key, (or an entry parameter) to a table or > service containing or returning links to related data and metadata... > The basic idea is that it's not just an obsid / acref association (which we > have as a byproduct of Obstap) allready, but provides also a description > of the link... What we have is a little DataLink model with a few parameters > Association meaning or Nature (calibration files, dataset retrieval, > whatever > X-ray band, etc ......) > the nature of the link (simple URL, S*AP Query or AccesData mode, etc ...) > A little model of the internal path to a given file or subfile in the > global package > is also proposed.... > > Thus we should be able to expose Chandra packages as a whole in the > Obstap service... And the various files in the packages can be accessed > via DataLink... If some of the files have their own dateset type they can > be described by a S*AP service (or OBsTAp again) given by the link. > or Accessed via the AccessData method of the relevant S*AP service > (again URL given by the link) > > An IVOA note is in preparation on this. > > Fran?ois > > > > >> Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be > >> Modified Julian Day. > >> > >> I hope you don't mind these ruminations, but these are things that I > >> am discovering as we are trying to implement this - and it is hard. > > > > Not at all; it is useful to have these discussions in the record for > > others later as well. > > > > - Doug > > > > > >> Cheers, > >> > >> - Arnold > >> > >> > >> Douglas Tody wrote: > >>> On Tue, 5 Jul 2011, Arnold Rots wrote: > >>> > >>>>> First, the subtype may be used to define what the data object is in > >>>>> collection or archive specific terms. For example if the data > >>>>> object is > >>>>> a tar file containing all the files comprising a ROSAT observation > >>>>> the > >>>>> data provider can define a subtype for this type of data. It is > >>>>> up to > >>>>> the client to understand what the content of the proprietary data > >>>>> product is, but if they are able to deal with such > >>>>> instrument-specific > >>>>> data they probably do know what it is. > >>>> > >>>> This is precisely the case I was trying to solve: a tarfile containing > >>>> a mix of data types: images, spectra, event lists. > >>>> The way I would like to solve it is to allow "package" (or something > >>>> similar) for the data type and enumerate the data files contained in > >>>> the tarfile in the data subtype. > >>>> > >>>> It still leaves a similar issue for the access format: that would be > >>>> tar, but it would be nice to be able to enumerate the formats of the > >>>> files in the tarfile in a similar format subtype - that also would > >>>> allow one to indicate whether or not the content of the the tarfile is > >>>> gzipped (as opposed to gzipping the tarfile itself). > >>>> > >>>> I realize that this constitutes a use of subtypes that is different > >>>> from the original intent (at least, I think so), but it does seem a > >>>> useful mechanism. > >>> > >>> Arnold - I agree that in principle it would be useful to have this > >>> extra > >>> information. However we had to argue for quite a while to get support > >>> for instrumental data at this level included at all. One *can* expose > >>> this data with ObsTAP 1.0 as outlined in my earlier email; in > >>> particular > >>> exposing the individual data products separately allows them to be > >>> described if the data provider wants to do so. Even exposing only the > >>> tar/zip/MEF etc. file works so long as the client recognizes the > >>> subtype. > >>> > >>> To attempt to the describe the contents of arbitrary complex > >>> instrumental datasets is out of scope for ObsTAP, at least 1.0. > >>> Perhaps > >>> we can address this issue in the next phase of development where we > >>> prototype related mechanisms such as data linking. > >>> > >>>> However, there is also the reverse problem: what do we do with data > >>>> products based on multiple observations? Do we allow ObsId to be a > >>>> list of ObsIds? > >>> > >>> This was addressed in the document as I recall. In the case of complex > >>> data products which are derived from multiple inputs (e.g. multiple > >>> observations) which essentially have a new "software observation", > >>> and a > >>> new obs_id should be assigned. To say more about the derivation of a > >>> particular data product is complex and gets into the general issue of > >>> provenance which is being addressed separately. Furthermore obs_id > >>> is a > >>> database key used to uniquely identify specific "observations" (usable > >>> as a foreign key in other tables for example) hence we cannot turn it > >>> into a list of obs_ids. > >>> > >>> - Doug > >>> > >> -------------------------------------------------------------------------- > >> > >> Arnold H. Rots Chandra X-ray Science > >> Center > >> Smithsonian Astrophysical Observatory tel: +1 617 496 > >> 7701 > >> 60 Garden Street, MS 67 fax: +1 617 495 > >> 7356 > >> Cambridge, MA 02138 > >> arots at head.cfa.harvard.edu > >> USA > >> http://hea-www.harvard.edu/~arots/ > >> -------------------------------------------------------------------------- > >> > >> > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From dtody at nrao.edu Mon Jul 11 14:46:16 2011 From: dtody at nrao.edu (Douglas Tody) Date: Mon, 11 Jul 2011 15:46:16 -0600 (MDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: <201107111348.p6BDm6vL014925@xebec.cfa.harvard.edu> References: <201107111348.p6BDm6vL014925@xebec.cfa.harvard.edu> Message-ID: On Mon, 11 Jul 2011, Arnold Rots wrote: > I hear what you are saying (I think), but in retrospect it would have > been much cleaner (and clearer) if the data discovery and data access > roles had not been combined in ObsTAP. > If ObsTAP had purely provided information on the availability of data, > observational metadata, the data product types, the data formats, and > the packaging, a comprehensive data access tool could have provided > the access URLs in response to a query that is based on the > information provided through ObsTAP. > As it is, the data discovery is somewhat compromised by the data > access features and the data access features themselves are not > very satisfactory. > I am afraid that this is a bit of a missed opportunity. Well then we would lose the ability to describe and point to non-VO data like Chandra or ALMA observations, for which we have no VO data services. For images/spectra etc. it could work to not have any acref in the ObsTAP metadata, but the current scheme already makes it possible to ignore the acref returned by ObsTAP and go directly to a SIA or whatever service if such is provided. Even so it could be useful and convenient to get a reference or preview image back from the ObsTAP acref without having to go to the full up data service. What probably would be the best approach for Chandra (or ALMA etc.) would be to have the level 0 or 1 observational data plus some standard data products such as reference images etc., all described in the ObsTAP QR with a shared obs_id. Then also provide a data link service such as Francois describes to fully resolve all the data products or other resources available for the observation. In a data discovery portal one would then be able to do discovery and preview the data, but then optionally retrieve and examine all the data links for a given observation, and possibly do full-up data access, invoke a pipeline reprocessing job, examine auxiliary information like logs or proposal cover pages, etc. - Doug > Cheers, > > - arnold > > Fran?ois Bonnarel wrote: > [ Charset ISO-8859-1 unsupported, converting... ] >> Arnold, Doug, >> Le 06/07/2011 17:54, Douglas Tody a ?crit : >>> On Wed, 6 Jul 2011, Arnold Rots wrote: >>> >>>> I think I am beginning to realize what it is that makes me so >>>> uncomfortable with ObsTAP and what makes it so hard to grasp the >>>> correct way to implement it: its ambivalence. >>>> >>>> It is primarily intended (I think) as a data discovery interface. >>>> The problem is that it also doubles as a data access tool. >>>> I think it is the intertwining of these two functions that makes it >>>> murky. >>>> And I wish these two functions had been separated into separate >>>> intefaces. >>>> I know this is not an issue for some observatories (say, the ones that >>>> only produce simple 2-D images), but it makes life difficult for more >>>> complicated datasets. >>>> >>>> As a data discovery tool, I would have expected its purpose to be: >>>> - find available observations that fall within certain constraints in >>>> time, space, frequency, etc. >>>> - tell me what kind of data products are available for each >>>> >>>> For a data access tool: >>>> - Give me the URL to a specific (set of) type(s) of data product for a >>>> specific (set of) observation(s) >>>> For all I know, this role could be played by SIAP. SSAP, SCS, or >>>> whatever protocols are already in existence. >>> >>> ObsTAP is intended mainly to provide uniform global data discovery; it >>> can find any type of data, even non-VO data formats. The data access >>> capabilities provided at this level are very limited, but can be used to >>> retrieve static archive data files (the data product could actually be >>> generated on the fly if desired, but the description at least is >>> static). >>> >>> As you suggest, the idea is that for any non-trivial data access the >>> typed interfaces would be used (SIA, SSA, etc.). So for example one >>> could do global data discovery using ObsTAP and then followup with one >>> of the typed interfaces to get more complete object-specific metadata >>> and do the actual data access, which for a typed/OO interface will often >>> involve virtual data generation (subsetting, filtering, transforming, >>> output format specification, etc.). Of course if just retrieving the >>> static archive file is enough then that can be done with just the acref >>> returned by ObsTAP. >>> >>>> The trouble is that for Chandra data, the intertwining of the two >>>> functions requires us to duplicate each ObsCore record six times to >>>> enumerate, laboriously, the different data types we can provide. >>>> When it comes to proper data discovery, it makes much more sense to >>>> return a single record with the ObsCore parameters and a list of >>>> available data product types (event lists, images, light curves, >>>> spectra, tarfiles with all of the above, etc.). >>> >>> True, but this is necessary to be consistent with the relational model >>> and to provide a simple mechanism. For a Chandra observation one might >>> return a set of records with the same obs_id, one being a tar.gz of the >>> full instrumental dataset, the others being static images, spectra, etc. >>> derived from that data. A query for a specific obs_id would thus >>> describe all the data products available for the observation. As you >>> note it is necessary to duplicate some of the metadata in associated >>> records, but much of the metadata will differ for each data product as >>> well. >>> >>> So far as the archive goes one would probably want to autogenerate the >>> ObsTAP table from more fundamental, fully normalized database tables. >>> Any updates would be done only on the underlying tables (auto-updating >>> the ObsTAP "view" after each such update). Then there should be no >>> problem with the redundant metadata in the ObsTAP index table becoming >>> inconsistent or whatever. >>> >>> In addition to a few static images or spectra providing standard views >>> of an observation one would ideally provide SIA, SSA, etc. services >>> capable of accessing the event data and computing custom virtual data >>> products on the fly. In the future the proposed data linking facilities >>> would be able point directly to such services. At present one would >>> have to do a registry query to find the service and then use the >>> publisher DID from the ObsTAP query to access the desired dataset. >>> >> A few words about these data linking facilities we have in mind >> (presentation in >> Nara and Napoli for example)... >> The Obsid can be used as a key, (or an entry parameter) to a table or >> service containing or returning links to related data and metadata... >> The basic idea is that it's not just an obsid / acref association (which we >> have as a byproduct of Obstap) allready, but provides also a description >> of the link... What we have is a little DataLink model with a few parameters >> Association meaning or Nature (calibration files, dataset retrieval, >> whatever >> X-ray band, etc ......) >> the nature of the link (simple URL, S*AP Query or AccesData mode, etc ...) >> A little model of the internal path to a given file or subfile in the >> global package >> is also proposed.... >> >> Thus we should be able to expose Chandra packages as a whole in the >> Obstap service... And the various files in the packages can be accessed >> via DataLink... If some of the files have their own dateset type they can >> be described by a S*AP service (or OBsTAp again) given by the link. >> or Accessed via the AccessData method of the relevant S*AP service >> (again URL given by the link) >> >> An IVOA note is in preparation on this. >> >> Fran?ois >> >> >> >>>> Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be >>>> Modified Julian Day. >>>> >>>> I hope you don't mind these ruminations, but these are things that I >>>> am discovering as we are trying to implement this - and it is hard. >>> >>> Not at all; it is useful to have these discussions in the record for >>> others later as well. >>> >>> - Doug >>> >>> >>>> Cheers, >>>> >>>> - Arnold >>>> >>>> >>>> Douglas Tody wrote: >>>>> On Tue, 5 Jul 2011, Arnold Rots wrote: >>>>> >>>>>>> First, the subtype may be used to define what the data object is in >>>>>>> collection or archive specific terms. For example if the data >>>>>>> object is >>>>>>> a tar file containing all the files comprising a ROSAT observation >>>>>>> the >>>>>>> data provider can define a subtype for this type of data. It is >>>>>>> up to >>>>>>> the client to understand what the content of the proprietary data >>>>>>> product is, but if they are able to deal with such >>>>>>> instrument-specific >>>>>>> data they probably do know what it is. >>>>>> >>>>>> This is precisely the case I was trying to solve: a tarfile containing >>>>>> a mix of data types: images, spectra, event lists. >>>>>> The way I would like to solve it is to allow "package" (or something >>>>>> similar) for the data type and enumerate the data files contained in >>>>>> the tarfile in the data subtype. >>>>>> >>>>>> It still leaves a similar issue for the access format: that would be >>>>>> tar, but it would be nice to be able to enumerate the formats of the >>>>>> files in the tarfile in a similar format subtype - that also would >>>>>> allow one to indicate whether or not the content of the the tarfile is >>>>>> gzipped (as opposed to gzipping the tarfile itself). >>>>>> >>>>>> I realize that this constitutes a use of subtypes that is different >>>>>> from the original intent (at least, I think so), but it does seem a >>>>>> useful mechanism. >>>>> >>>>> Arnold - I agree that in principle it would be useful to have this >>>>> extra >>>>> information. However we had to argue for quite a while to get support >>>>> for instrumental data at this level included at all. One *can* expose >>>>> this data with ObsTAP 1.0 as outlined in my earlier email; in >>>>> particular >>>>> exposing the individual data products separately allows them to be >>>>> described if the data provider wants to do so. Even exposing only the >>>>> tar/zip/MEF etc. file works so long as the client recognizes the >>>>> subtype. >>>>> >>>>> To attempt to the describe the contents of arbitrary complex >>>>> instrumental datasets is out of scope for ObsTAP, at least 1.0. >>>>> Perhaps >>>>> we can address this issue in the next phase of development where we >>>>> prototype related mechanisms such as data linking. >>>>> >>>>>> However, there is also the reverse problem: what do we do with data >>>>>> products based on multiple observations? Do we allow ObsId to be a >>>>>> list of ObsIds? >>>>> >>>>> This was addressed in the document as I recall. In the case of complex >>>>> data products which are derived from multiple inputs (e.g. multiple >>>>> observations) which essentially have a new "software observation", >>>>> and a >>>>> new obs_id should be assigned. To say more about the derivation of a >>>>> particular data product is complex and gets into the general issue of >>>>> provenance which is being addressed separately. Furthermore obs_id >>>>> is a >>>>> database key used to uniquely identify specific "observations" (usable >>>>> as a foreign key in other tables for example) hence we cannot turn it >>>>> into a list of obs_ids. >>>>> >>>>> - Doug >>>>> >>>> -------------------------------------------------------------------------- >>>> >>>> Arnold H. Rots Chandra X-ray Science >>>> Center >>>> Smithsonian Astrophysical Observatory tel: +1 617 496 >>>> 7701 >>>> 60 Garden Street, MS 67 fax: +1 617 495 >>>> 7356 >>>> Cambridge, MA 02138 >>>> arots at head.cfa.harvard.edu >>>> USA >>>> http://hea-www.harvard.edu/~arots/ >>>> -------------------------------------------------------------------------- >>>> >>>> >> > -------------------------------------------------------------------------- > Arnold H. Rots Chandra X-ray Science Center > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > 60 Garden Street, MS 67 fax: +1 617 495 7356 > Cambridge, MA 02138 arots at head.cfa.harvard.edu > USA http://hea-www.harvard.edu/~arots/ > -------------------------------------------------------------------------- > From dtody at nrao.edu Mon Jul 11 15:03:34 2011 From: dtody at nrao.edu (Douglas Tody) Date: Mon, 11 Jul 2011 16:03:34 -0600 (MDT) Subject: [obs-tap]:updates on the Proposed recommendation In-Reply-To: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> References: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> Message-ID: On Thu, 7 Jul 2011, Arnold Rots wrote: > Aside from what I reported in a previous message, quoted below, there > are more discrepancies between Table 5 and Tables 6 and 7: > > obs_creator_did is missing from Table 7 > o_units in Table 5 should be o_unit > pol_states is missing from Table 6 > facility_name and instrument_name are spelled differently; > even though required, they show up in Table 7, rather than 6 > em_unit is missing from Table 5 > o_stat_error is missing from Table 7 > > Also, note the comment I made on MJD in use case 1.6 > and on the uselessness of bib_reference because of its murky > definition > > I still lament the fact that the data access functionality is > compromising the self-consistency and usefulness of the data discovery > function, but decided for our tarred packages to use: > dataproduct_type = NULL > dataproduct_subtype = package:event,image > access_format = application/x-tar > As far as I can tell, this is within the specifications. Well we don't specify what the subtypes you provide for your archive should be so I suppose you could get away with this, but this example is not at all what we had in mind. The subtype should be the science type of the specific data product, *not* details about the content of the data product. I would expect the type to be "event" (meaning "event data" not "event list") and the subtype to be something more like "chandra.hrc.package", "chandra.hrc.refimage (or "rosat.XX" etc.). Note subtypes are supposed to be fixed strings so that one can search the local archive for a particular type of data product; if you try to describe what is included in a particular data product then such selection won't be possible. So for example a client will do a generic query to see what subtypes Chandra defines, and then they can pose a more specific query to get a certain type of Chandra-specific data product. Likewise for ALMA etc. Note you also have obs.title where you can provide a short description of the data product and for this you can provide whatever you want. - Doug From dtody at nrao.edu Mon Jul 11 15:14:38 2011 From: dtody at nrao.edu (Douglas Tody) Date: Mon, 11 Jul 2011 16:14:38 -0600 (MDT) Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> References: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> Message-ID: Hi Mireille - I completed one pass through the revised version. Will go though it more carefully but the main thing I noticed was that we did not yet replace the use of "/" in pol_states with our standard list delimiter (comma) as was agreed in the telcon. Has a reason to use "/" (which is inconsistent) been identified, or is this just an oversight? - Doug On Fri, 8 Jul 2011, Mireille Louys wrote: > Dear Arnold, Dear all, > > Thanks very much for reporting these typos and inconsistencies. > I produced a new document with corrections suggested from you and the RFC > page inputs. They appear highlighted as modification follow up in the .docx > file attached below. This is just to help contributors to follow the changes. > > You can provide comments till Monday, then I will integrate the final changes > in order to proceed for TCG review. > > See my comments inserted in the text below. > > Best regards , Mireille > > Arnold Rots a ?crit?: > >> Aside from what I reported in a previous message, quoted below, there >> are more discrepancies between Table 5 and Tables 6 and 7: >> >> obs_creator_did is missing from Table 7 >> o_units in Table 5 should be o_unit >> pol_states is missing from Table 6 >> facility_name and instrument_name are spelled differently; >> even though required, they show up in Table 7, rather than 6 >> em_unit is missing from Table 5 >> o_stat_error is missing from Table 7 >> > > *included and corrected* > >> Also, note the comment I made on MJD in use case 1.6 >> and on the uselessness of bib_reference because of its murky >> definition > *MJD done* > bib-reference is an optional field that a data provider may use to flag some > data sets as the ones used and published together with a scientific paper. > This is not meant to behave like a citation index and point to all papers > mentioning this data set. > >> I still lament the fact that the data access functionality is >> compromising the self-consistency and usefulness of the data discovery >> function, but decided for our tarred packages to use: >> dataproduct_type = NULL >> dataproduct_subtype = package:event,image >> access_format = application/x-tar >> As far as I can tell, this is within the specifications. >> > This seems a proper use of the Obs/TAP specification to expose your data. > "dataproduct_subtype" is an optional field that the data provider can define. > Possible values for this field should be clearly documented by the service. > >> o_stat_error is an interesting case. Since our unit is counts, the >> proper value would be "POISSON"; I realize that that is not a double, >> but what else can we give as a value? >> > This was meant only for quantitative estimation of the error and does not > cover the statistical properties of the signal. > >> Please do not consider this list of corrections to be exhaustive. >> >> Cheers, >> >> - Arnold >> >> Arnold Rots wrote: >>> Mireille, >>> >>> Here are some items. >>> >>> Ian Evans noticed the inconsistency in units for spatial resolution >>> between the Tables 1, 4, 5, and 6 (arcsec vs. deg); what should it be? >>> I assume deg? >>> See also s_stat_error in Table 5. >>> > We agreed in previous iterations to have resolution and errors in a > convenient and handy unit: arcsec for space and s for time , for instance > because they are usually given that way in instrument descriptions and > scientific papers. > >>> In addition, I noticed that Table 5 contained unit "day" that should >>> be "d", Table 7 has erroneous unit "d" for data rights and is missing >>> most units. >>> > *updated* >>> The section on obs_publisher_did is a bit murky and not quite >>> consistent with the definition in the spectral data model where it >>> expresses a strong preference for using the same DIDs as are being >>> used in the journals. >>> That implies that the data product the query result refers to may be a >>> subset of what the DID stands for, as the current spectral draft >>> affirms. >>> On the other hand, I don't think the spectrum DM and the SSA DAL are >>> quite consistent in this respect. It might be good to have a more >>> thorough discussion on these DIDs and consistency between all PRs. >>> > This is identified as a work to do , in compatibility with undergoing effort > on IVOA identifiers definitions . > We agreed during the last telecon to reconsider it for a future version of > Obs/Tap. >>> It also brings me back to the issue I have been harping on: what to >>> do with packages of products pertaining to a single observation; >>> I will not repeat that here. >>> >>> However, there is also the reverse problem: what do we do with data >>> products based on multiple observations? Do we allow ObsId to be a >>> list of ObsIds? >>> >>> I still find the bibcodes a bit problematic. The SSA DAL doc calls it >>> a "curation reference", but in the text seesm to imply that any >>> publication mentioning the data is fair game. Is this really meant to >>> be a reference to the data, or is it to be any paper that references >>> the data? There is a difference between these two... >>> I realize, though, that this is primarily an issue for the SSA DAL doc. >>> But it has repercussions for this document as well. >>> >>> Cheers, >>> >>> - Arnold > > From dtody at nrao.edu Mon Jul 11 15:52:34 2011 From: dtody at nrao.edu (Douglas Tody) Date: Mon, 11 Jul 2011 16:52:34 -0600 (MDT) Subject: [obs-tap]:updates on the Proposed recommendation In-Reply-To: References: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> Message-ID: More precisely what you might have is something like (display in a wide view): ObsId Type Subtype Level Format Title ---------------------------------------------------------------------------------------------------------- 123 event chandra.hrc.pkg 1 application/x-tar-gzip Chandra ACS-XYZ observation package (event,refimage) 123 image chandra.hrc.refimage 2 image/fits Chandra ACS-XYZ reference image 123 image chandra.hrc.preview 2 image/jpeg Chandra ACS-XYZ preview image 345 event rosat.foo.pkg 1 application/x-tar-gzip ROSAT whatever observation package (xxx) and so forth. The subtype could in principle be more generic but will likely be instrument-specific for a level 1 observation. The Title should concisely describe the data product, e.g., origin, instrument, ID, what it is (observation package, calibration, standard view, etc.). The title string is what one normally wants to output on a displayed image or plot to identify to a human the data being shown. You can put whatever you want in there to describe the data product so long as it is concise (one line of text). - Doug On Mon, 11 Jul 2011, Douglas Tody wrote: > On Thu, 7 Jul 2011, Arnold Rots wrote: > >> Aside from what I reported in a previous message, quoted below, there >> are more discrepancies between Table 5 and Tables 6 and 7: >> >> obs_creator_did is missing from Table 7 >> o_units in Table 5 should be o_unit >> pol_states is missing from Table 6 >> facility_name and instrument_name are spelled differently; >> even though required, they show up in Table 7, rather than 6 >> em_unit is missing from Table 5 >> o_stat_error is missing from Table 7 >> >> Also, note the comment I made on MJD in use case 1.6 >> and on the uselessness of bib_reference because of its murky >> definition >> >> I still lament the fact that the data access functionality is >> compromising the self-consistency and usefulness of the data discovery >> function, but decided for our tarred packages to use: >> dataproduct_type = NULL >> dataproduct_subtype = package:event,image >> access_format = application/x-tar >> As far as I can tell, this is within the specifications. > > Well we don't specify what the subtypes you provide for your archive > should be so I suppose you could get away with this, but this example is > not at all what we had in mind. The subtype should be the science type > of the specific data product, *not* details about the content of the > data product. I would expect the type to be "event" (meaning "event > data" not "event list") and the subtype to be something more like > "chandra.hrc.package", "chandra.hrc.refimage (or "rosat.XX" etc.). > > Note subtypes are supposed to be fixed strings so that one can search > the local archive for a particular type of data product; if you try to > describe what is included in a particular data product then such > selection won't be possible. So for example a client will do a generic > query to see what subtypes Chandra defines, and then they can pose a > more specific query to get a certain type of Chandra-specific data > product. Likewise for ALMA etc. > > Note you also have obs.title where you can provide a short description > of the data product and for this you can provide whatever you want. > > - Doug > From arots at head.cfa.harvard.edu Tue Jul 12 06:40:51 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Tue, 12 Jul 2011 09:40:51 -0400 (EDT) Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <65324DBA-962E-4866-BECF-73497EA63CAA@noao.edu> Message-ID: <201107121340.p6CDepL6023133@xebec.cfa.harvard.edu> Rob Seaman wrote: > On Jul 8, 2011, at 6:36 AM, Arnold Rots wrote: > > > Just two quick comments. Both highlight how this standard is still > > heavily slanted toward optical images. > > Undoubtedly true and such biases should be minimized. > > > Mireille Louys wrote: > > [ Charset ISO-8859-1 unsupported, converting... ] > >> Dear Arnold, Dear all, > >> > >>> o_stat_error is an interesting case. Since our unit is counts, the > >>> proper value would be "POISSON"; I realize that that is not a double, > >>> but what else can we give as a value? > >> > >> > >> This was meant only for quantitative estimation of the error and does > >> not cover the statistical properties of the signal. > > > > I realize that, but the problem is that one can't give a single > > quantitative value in the case of Poisson noise. However, identifying > > it as Poisson does immediately provide the value for each point in the > > image. > > Well, it provides a heuristic for making estimates of the > error/noise/variance values. That is certainly better than picking a > scalar. Even in the optical a purely scalar error is an artificial > choose, see for example: http://arxiv.org/abs/0910.3733 > > Negative numbers are presumably non-physical and could be used to > encode special values corresponding to "POISSON" or even > "POISSON+GAUSSIAN" with the absolute value representing the scalar > gaussian contribution (e.g., read noise + sky noise). > > Might consider providing the error as a variance, too. > > Rob > So, which of the three options should I choose for o_stat_error: 1. Return "POISSON" 2. Return -1 (or some other negative value) that by convention will mean "POISSON" 3. Do not return o_stat_error (i.e., don't provide any information on it) - Arnold -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From seaman at noao.edu Tue Jul 12 07:40:00 2011 From: seaman at noao.edu (Rob Seaman) Date: Tue, 12 Jul 2011 07:40:00 -0700 Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <201107121340.p6CDepL6023133@xebec.cfa.harvard.edu> References: <201107121340.p6CDepL6023133@xebec.cfa.harvard.edu> Message-ID: On Jul 12, 2011, at 6:40 AM, Arnold Rots wrote: > So, which of the three options should I choose for o_stat_error: > > 1. Return "POISSON" > 2. Return -1 (or some other negative value) that by convention will mean "POISSON" > 3. Do not return o_stat_error (i.e., don't provide any information on it) Number 3 is always an option, but the VO should provide interfaces that can do better. If the noise model is purely poisson then some token that can't be misunderstood should be designated. Others will have to comment on whether a non-numeric string is "legal". Otherwise probably not -1 since such a value might be middle-of-range depending on the units. Either a very small or very large negative number. If the noise model is poisson+gaussian, what I was trying to suggest was that the value should be a negative number whose absolute value is the gaussian part, eg, -13.7 would mean a gaussian "background" (whatever that means for these holdings) of 13.7 DNs (or whatever the unit is), but that it is to be understood that there is also a signal dependent poisson component. The handling for the compound noise model would be application dependent; the question here is how to express it. Applications that don't know poisson from "fish" could just take the absolute value. Applications that don't know gaussians would see the negative and know to take the square root of the signal. Those that know about both could do something more nuanced. There should also be a clear consensus of whether "error" means variance or rather "that thing that one would call standard deviation if the noise were interpreted as purely gaussian". How do other interfaces and statistical packages handle this issue? Rob From arots at head.cfa.harvard.edu Tue Jul 12 08:29:17 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Tue, 12 Jul 2011 11:29:17 -0400 (EDT) Subject: [ObsCoreRFC]Minutes of the telco Monday June 6 In-Reply-To: Message-ID: <201107121529.p6CFTHuV023196@xebec.cfa.harvard.edu> Douglas Tody wrote: [ Charset UTF-8 unsupported, converting... ] > On Mon, 11 Jul 2011, Arnold Rots wrote: > > > I hear what you are saying (I think), but in retrospect it would have > > been much cleaner (and clearer) if the data discovery and data access > > roles had not been combined in ObsTAP. > > If ObsTAP had purely provided information on the availability of data, > > observational metadata, the data product types, the data formats, and > > the packaging, a comprehensive data access tool could have provided > > the access URLs in response to a query that is based on the > > information provided through ObsTAP. > > As it is, the data discovery is somewhat compromised by the data > > access features and the data access features themselves are not > > very satisfactory. > > I am afraid that this is a bit of a missed opportunity. > > Well then we would lose the ability to describe and point to non-VO data > like Chandra or ALMA observations, for which we have no VO data > services. For images/spectra etc. it could work to not have any acref > in the ObsTAP metadata, but the current scheme already makes it possible > to ignore the acref returned by ObsTAP and go directly to a SIA or > whatever service if such is provided. Even so it could be useful and > convenient to get a reference or preview image back from the ObsTAP > acref without having to go to the full up data service. > > What probably would be the best approach for Chandra (or ALMA etc.) > would be to have the level 0 or 1 observational data plus some standard > data products such as reference images etc., all described in the ObsTAP > QR with a shared obs_id. Then also provide a data link service such as > Francois describes to fully resolve all the data products or other > resources available for the observation. > > In a data discovery portal one would then be able to do discovery and > preview the data, but then optionally retrieve and examine all the data > links for a given observation, and possibly do full-up data access, > invoke a pipeline reprocessing job, examine auxiliary information like > logs or proposal cover pages, etc. > > - Doug > We wouldn't lose anything. If ObsTAP would just return observational parameters and a list of available products (that's proper data discovery), the data access protocol would allow users to get a list of access URLs for specific data products and specific ObsIds, with information on file formats, etc. That would be a proper separation of data discovery and data access functions and it would in no way make us lose any capabilities. Preview images could be included in the table returned by the access service. Cheers, - Arnold > > > Cheers, > > > > - arnold > > > > Fran?ois Bonnarel wrote: > > [ Charset ISO-8859-1 unsupported, converting... ] > >> Arnold, Doug, > >> Le 06/07/2011 17:54, Douglas Tody a ?crit : > >>> On Wed, 6 Jul 2011, Arnold Rots wrote: > >>> > >>>> I think I am beginning to realize what it is that makes me so > >>>> uncomfortable with ObsTAP and what makes it so hard to grasp the > >>>> correct way to implement it: its ambivalence. > >>>> > >>>> It is primarily intended (I think) as a data discovery interface. > >>>> The problem is that it also doubles as a data access tool. > >>>> I think it is the intertwining of these two functions that makes it > >>>> murky. > >>>> And I wish these two functions had been separated into separate > >>>> intefaces. > >>>> I know this is not an issue for some observatories (say, the ones that > >>>> only produce simple 2-D images), but it makes life difficult for more > >>>> complicated datasets. > >>>> > >>>> As a data discovery tool, I would have expected its purpose to be: > >>>> - find available observations that fall within certain constraints in > >>>> time, space, frequency, etc. > >>>> - tell me what kind of data products are available for each > >>>> > >>>> For a data access tool: > >>>> - Give me the URL to a specific (set of) type(s) of data product for a > >>>> specific (set of) observation(s) > >>>> For all I know, this role could be played by SIAP. SSAP, SCS, or > >>>> whatever protocols are already in existence. > >>> > >>> ObsTAP is intended mainly to provide uniform global data discovery; it > >>> can find any type of data, even non-VO data formats. The data access > >>> capabilities provided at this level are very limited, but can be used to > >>> retrieve static archive data files (the data product could actually be > >>> generated on the fly if desired, but the description at least is > >>> static). > >>> > >>> As you suggest, the idea is that for any non-trivial data access the > >>> typed interfaces would be used (SIA, SSA, etc.). So for example one > >>> could do global data discovery using ObsTAP and then followup with one > >>> of the typed interfaces to get more complete object-specific metadata > >>> and do the actual data access, which for a typed/OO interface will often > >>> involve virtual data generation (subsetting, filtering, transforming, > >>> output format specification, etc.). Of course if just retrieving the > >>> static archive file is enough then that can be done with just the acref > >>> returned by ObsTAP. > >>> > >>>> The trouble is that for Chandra data, the intertwining of the two > >>>> functions requires us to duplicate each ObsCore record six times to > >>>> enumerate, laboriously, the different data types we can provide. > >>>> When it comes to proper data discovery, it makes much more sense to > >>>> return a single record with the ObsCore parameters and a list of > >>>> available data product types (event lists, images, light curves, > >>>> spectra, tarfiles with all of the above, etc.). > >>> > >>> True, but this is necessary to be consistent with the relational model > >>> and to provide a simple mechanism. For a Chandra observation one might > >>> return a set of records with the same obs_id, one being a tar.gz of the > >>> full instrumental dataset, the others being static images, spectra, etc. > >>> derived from that data. A query for a specific obs_id would thus > >>> describe all the data products available for the observation. As you > >>> note it is necessary to duplicate some of the metadata in associated > >>> records, but much of the metadata will differ for each data product as > >>> well. > >>> > >>> So far as the archive goes one would probably want to autogenerate the > >>> ObsTAP table from more fundamental, fully normalized database tables. > >>> Any updates would be done only on the underlying tables (auto-updating > >>> the ObsTAP "view" after each such update). Then there should be no > >>> problem with the redundant metadata in the ObsTAP index table becoming > >>> inconsistent or whatever. > >>> > >>> In addition to a few static images or spectra providing standard views > >>> of an observation one would ideally provide SIA, SSA, etc. services > >>> capable of accessing the event data and computing custom virtual data > >>> products on the fly. In the future the proposed data linking facilities > >>> would be able point directly to such services. At present one would > >>> have to do a registry query to find the service and then use the > >>> publisher DID from the ObsTAP query to access the desired dataset. > >>> > >> A few words about these data linking facilities we have in mind > >> (presentation in > >> Nara and Napoli for example)... > >> The Obsid can be used as a key, (or an entry parameter) to a table or > >> service containing or returning links to related data and metadata... > >> The basic idea is that it's not just an obsid / acref association (which we > >> have as a byproduct of Obstap) allready, but provides also a description > >> of the link... What we have is a little DataLink model with a few parameters > >> Association meaning or Nature (calibration files, dataset retrieval, > >> whatever > >> X-ray band, etc ......) > >> the nature of the link (simple URL, S*AP Query or AccesData mode, etc ...) > >> A little model of the internal path to a given file or subfile in the > >> global package > >> is also proposed.... > >> > >> Thus we should be able to expose Chandra packages as a whole in the > >> Obstap service... And the various files in the packages can be accessed > >> via DataLink... If some of the files have their own dateset type they can > >> be described by a S*AP service (or OBsTAp again) given by the link. > >> or Accessed via the AccessData method of the relevant S*AP service > >> (again URL given by the link) > >> > >> An IVOA note is in preparation on this. > >> > >> Fran?ois > >> > >> > >> > >>>> Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be > >>>> Modified Julian Day. > >>>> > >>>> I hope you don't mind these ruminations, but these are things that I > >>>> am discovering as we are trying to implement this - and it is hard. > >>> > >>> Not at all; it is useful to have these discussions in the record for > >>> others later as well. > >>> > >>> - Doug > >>> > >>> > >>>> Cheers, > >>>> > >>>> - Arnold > >>>> > >>>> > >>>> Douglas Tody wrote: > >>>>> On Tue, 5 Jul 2011, Arnold Rots wrote: > >>>>> > >>>>>>> First, the subtype may be used to define what the data object is in > >>>>>>> collection or archive specific terms. For example if the data > >>>>>>> object is > >>>>>>> a tar file containing all the files comprising a ROSAT observation > >>>>>>> the > >>>>>>> data provider can define a subtype for this type of data. It is > >>>>>>> up to > >>>>>>> the client to understand what the content of the proprietary data > >>>>>>> product is, but if they are able to deal with such > >>>>>>> instrument-specific > >>>>>>> data they probably do know what it is. > >>>>>> > >>>>>> This is precisely the case I was trying to solve: a tarfile containing > >>>>>> a mix of data types: images, spectra, event lists. > >>>>>> The way I would like to solve it is to allow "package" (or something > >>>>>> similar) for the data type and enumerate the data files contained in > >>>>>> the tarfile in the data subtype. > >>>>>> > >>>>>> It still leaves a similar issue for the access format: that would be > >>>>>> tar, but it would be nice to be able to enumerate the formats of the > >>>>>> files in the tarfile in a similar format subtype - that also would > >>>>>> allow one to indicate whether or not the content of the the tarfile is > >>>>>> gzipped (as opposed to gzipping the tarfile itself). > >>>>>> > >>>>>> I realize that this constitutes a use of subtypes that is different > >>>>>> from the original intent (at least, I think so), but it does seem a > >>>>>> useful mechanism. > >>>>> > >>>>> Arnold - I agree that in principle it would be useful to have this > >>>>> extra > >>>>> information. However we had to argue for quite a while to get support > >>>>> for instrumental data at this level included at all. One *can* expose > >>>>> this data with ObsTAP 1.0 as outlined in my earlier email; in > >>>>> particular > >>>>> exposing the individual data products separately allows them to be > >>>>> described if the data provider wants to do so. Even exposing only the > >>>>> tar/zip/MEF etc. file works so long as the client recognizes the > >>>>> subtype. > >>>>> > >>>>> To attempt to the describe the contents of arbitrary complex > >>>>> instrumental datasets is out of scope for ObsTAP, at least 1.0. > >>>>> Perhaps > >>>>> we can address this issue in the next phase of development where we > >>>>> prototype related mechanisms such as data linking. > >>>>> > >>>>>> However, there is also the reverse problem: what do we do with data > >>>>>> products based on multiple observations? Do we allow ObsId to be a > >>>>>> list of ObsIds? > >>>>> > >>>>> This was addressed in the document as I recall. In the case of complex > >>>>> data products which are derived from multiple inputs (e.g. multiple > >>>>> observations) which essentially have a new "software observation", > >>>>> and a > >>>>> new obs_id should be assigned. To say more about the derivation of a > >>>>> particular data product is complex and gets into the general issue of > >>>>> provenance which is being addressed separately. Furthermore obs_id > >>>>> is a > >>>>> database key used to uniquely identify specific "observations" (usable > >>>>> as a foreign key in other tables for example) hence we cannot turn it > >>>>> into a list of obs_ids. > >>>>> > >>>>> - Doug > >>>>> > >>>> -------------------------------------------------------------------------- > >>>> > >>>> Arnold H. Rots Chandra X-ray Science > >>>> Center > >>>> Smithsonian Astrophysical Observatory tel: +1 617 496 > >>>> 7701 > >>>> 60 Garden Street, MS 67 fax: +1 617 495 > >>>> 7356 > >>>> Cambridge, MA 02138 > >>>> arots at head.cfa.harvard.edu > >>>> USA > >>>> http://hea-www.harvard.edu/~arots/ > >>>> -------------------------------------------------------------------------- > >>>> > >>>> > >> > > -------------------------------------------------------------------------- > > Arnold H. Rots Chandra X-ray Science Center > > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > > 60 Garden Street, MS 67 fax: +1 617 495 7356 > > Cambridge, MA 02138 arots at head.cfa.harvard.edu > > USA http://hea-www.harvard.edu/~arots/ > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From arots at head.cfa.harvard.edu Tue Jul 12 09:06:13 2011 From: arots at head.cfa.harvard.edu (Arnold Rots) Date: Tue, 12 Jul 2011 12:06:13 -0400 (EDT) Subject: [obs-tap]:updates on the Proposed recommendation In-Reply-To: Message-ID: <201107121606.p6CG6DV0023225@xebec.cfa.harvard.edu> This is becoming unwieldy. Trying to make X-ray data (and I suspect the same is true for aperture synthesis data) fit into something that is designed with optical images in mind is reminiscent of round pegs and square holes. Service providers are free to define subtypes and titles, but you are saying that if they don't follow rules that are not spelled out, things won't work as envisaged. Also, if I understand the argument correctly, if data discovery software is to be helpful at all, it needs to be able to extract some information from the title field - but that is intended for human consumption. If I see this, it looks like I need to generate at least eight records for a single observation, some containing a mix of levels, and all duplicating pretty much the same metadata. This is not going to make it attractive to provide ObsTAP services. Maybe I should do what you did and provide an example of how I thought it should have worked. Here is how I would envisage data discovery of Chandra data to work: A single record per Obsid that provides the observational metadata and: ObsId 12345 Dataset Identifier ivo://ADS/Sa.CXO#obs/12345 Data Types available Package Event list Image Calibration level 2 Title Chandra/ACIS ObsId 12345 Then a data access protocol that allows querying the archive using any of the above in a where clause, with either ObsId or DID required, and returning: ObsId DataType Contents Level Format URL ----------------------------------------------------------- 12345 Pkg_1 evt,img 2 tar http://... 12345 Pkg_2 evt,img 1 tar http://... 12345 Pkg_12 evt,img 2,1 tar http://... 12345 evt evt 2 fits-bin http://... 12345 evt evt 1 fits-bin http://... 12345 img img 2 fits http://... 12345 img img 2 jpg http://... 12345 img img 2 fits http://... 12345 img img 2 jpg http://... This is an example where the client specified ObsId or DID, but no data type or format. Never mind the terms and abbreviations I used - you get the picture. Cheers, - Arnold Douglas Tody wrote: > More precisely what you might have is something like (display in a wide view): > > ObsId Type Subtype Level Format Title > ---------------------------------------------------------------------------------------------------------- > 123 event chandra.hrc.pkg 1 application/x-tar-gzip Chandra ACS-XYZ observation package (event,refimage) > 123 image chandra.hrc.refimage 2 image/fits Chandra ACS-XYZ reference image > 123 image chandra.hrc.preview 2 image/jpeg Chandra ACS-XYZ preview image > 345 event rosat.foo.pkg 1 application/x-tar-gzip ROSAT whatever observation package (xxx) > > and so forth. The subtype could in principle be more generic but will > likely be instrument-specific for a level 1 observation. > > The Title should concisely describe the data product, e.g., origin, > instrument, ID, what it is (observation package, calibration, standard > view, etc.). The title string is what one normally wants to output on a > displayed image or plot to identify to a human the data being shown. > You can put whatever you want in there to describe the data product so > long as it is concise (one line of text). > > - Doug > > > > > On Mon, 11 Jul 2011, Douglas Tody wrote: > > > On Thu, 7 Jul 2011, Arnold Rots wrote: > > > >> Aside from what I reported in a previous message, quoted below, there > >> are more discrepancies between Table 5 and Tables 6 and 7: > >> > >> obs_creator_did is missing from Table 7 > >> o_units in Table 5 should be o_unit > >> pol_states is missing from Table 6 > >> facility_name and instrument_name are spelled differently; > >> even though required, they show up in Table 7, rather than 6 > >> em_unit is missing from Table 5 > >> o_stat_error is missing from Table 7 > >> > >> Also, note the comment I made on MJD in use case 1.6 > >> and on the uselessness of bib_reference because of its murky > >> definition > >> > >> I still lament the fact that the data access functionality is > >> compromising the self-consistency and usefulness of the data discovery > >> function, but decided for our tarred packages to use: > >> dataproduct_type = NULL > >> dataproduct_subtype = package:event,image > >> access_format = application/x-tar > >> As far as I can tell, this is within the specifications. > > > > Well we don't specify what the subtypes you provide for your archive > > should be so I suppose you could get away with this, but this example is > > not at all what we had in mind. The subtype should be the science type > > of the specific data product, *not* details about the content of the > > data product. I would expect the type to be "event" (meaning "event > > data" not "event list") and the subtype to be something more like > > "chandra.hrc.package", "chandra.hrc.refimage (or "rosat.XX" etc.). > > > > Note subtypes are supposed to be fixed strings so that one can search > > the local archive for a particular type of data product; if you try to > > describe what is included in a particular data product then such > > selection won't be possible. So for example a client will do a generic > > query to see what subtypes Chandra defines, and then they can pose a > > more specific query to get a certain type of Chandra-specific data > > product. Likewise for ALMA etc. > > > > Note you also have obs.title where you can provide a short description > > of the data product and for this you can provide whatever you want. > > > > - Doug > > > -------------------------------------------------------------------------- Arnold H. Rots Chandra X-ray Science Center Smithsonian Astrophysical Observatory tel: +1 617 496 7701 60 Garden Street, MS 67 fax: +1 617 495 7356 Cambridge, MA 02138 arots at head.cfa.harvard.edu USA http://hea-www.harvard.edu/~arots/ -------------------------------------------------------------------------- From patrick.dowler at nrc-cnrc.gc.ca Tue Jul 12 10:30:43 2011 From: patrick.dowler at nrc-cnrc.gc.ca (Patrick Dowler) Date: Tue, 12 Jul 2011 10:30:43 -0700 Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <201107121340.p6CDepL6023133@xebec.cfa.harvard.edu> References: <201107121340.p6CDepL6023133@xebec.cfa.harvard.edu> Message-ID: <201107121030.44226.patrick.dowler@nrc-cnrc.gc.ca> On 2011-07-12 06:40:51 Arnold Rots wrote: > So, which of the three options should I choose for o_stat_error: > > 1. Return "POISSON" > 2. Return -1 (or some other negative value) that by convention will > mean "POISSON" > 3. Do not return o_stat_error (i.e., don't provide any information on > it) o_stat_err has a data type of double, so option #1 is not legal. The optional columns are going to provide minimal value to users since, well, they are optional and therefore only some sites are going to have them. So first of all I would not expect the ObsCore model to satisfy all the detailed use cases of Chandra users because that it not the purpose for which it is designed. You can and probably should extend the model with site-specific columns to make your extra details unambiguously clear to your users. Since the utype of this field is: Char.ObservableAxis.accuracy.statError.refVal.value the supposed ambiguity of the meaning of o_stat_err is something to be resolved in the CharDM. I would look there, expect to find another field in the model that said what kind of statError this was, and add an extra column with a utype from CharDM for the type of error. Since ObsCore does not specify it, I would make up a column name, but try to stick to the style of ObsCore. Then in ObsCore-1.1 maybe it would become part of the standard. This is a perfectly acceptable use of ObsCore and TAP. If CharDM was completely lacking, I would probably chose option #3 as I would judge the concept here not usable at this time. I would not use some arbitrary convention involving negative values (option #2). my 2c, -- Patrick Dowler Tel/T?l: (250) 363-0044 Canadian Astronomy Data Centre National Research Council Canada 5071 West Saanich Road Victoria, BC V9E 2M7 Centre canadien de donnees astronomiques Conseil national de recherches Canada 5071, chemin West Saanich Victoria (C.-B.) V9E 2M7 From francois.bonnarel at astro.unistra.fr Tue Jul 12 14:46:29 2011 From: francois.bonnarel at astro.unistra.fr (=?ISO-8859-1?Q?Fran=E7ois_Bonnarel?=) Date: Tue, 12 Jul 2011 23:46:29 +0200 Subject: [obs-tap]:updates on the Proposed recommendation + new document In-Reply-To: <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> References: <201107071925.p67JPMmQ012511@xebec.cfa.harvard.edu> <20110708132110.fc2fxvr4l0kw4g48@webmail.u-strasbg.fr> Message-ID: <4E1CC0B5.2080903@astro.unistra.fr> Hi Mireille there are till a few things in the tables Comparing table 1 of main text and table 1 in Appendix B dataproduct_type: Type string versus enum access_url: string versus "no type" Additional comment for table 1 in appendix B: "Description" of s_dec is missing.. utypes: can we split the lines after dot instead of the middle of a term (examples such as .... Bounds Extent.diameter or SpatialAxis.calibStatus or SpatialAxis.Accuracy.staError etc ...? Little discripancy between the utypes table 1 appendix B and table 7 appendix C Curation .Rights versus Curation.rights Curation.Reference versus Curation.reference em_calib_status is missing in table 1 Appendix B that's it for now .... Cheers Fran?ois Le 08/07/2011 13:21, Mireille Louys a ?crit : > Dear Arnold, Dear all, > > Thanks very much for reporting these typos and inconsistencies. > I produced a new document with corrections suggested from you and the > RFC page inputs. They appear highlighted as modification follow up in > the .docx file attached below. This is just to help contributors to > follow the changes. > > You can provide comments till Monday, then I will integrate the final > changes in order to proceed for TCG review. > > See my comments inserted in the text below. > > Best regards , Mireille > > Arnold Rots a ?crit : > >> Aside from what I reported in a previous message, quoted below, there >> are more discrepancies between Table 5 and Tables 6 and 7: >> >> obs_creator_did is missing from Table 7 >> o_units in Table 5 should be o_unit >> pol_states is missing from Table 6 >> facility_name and instrument_name are spelled differently; >> even though required, they show up in Table 7, rather than 6 >> em_unit is missing from Table 5 >> o_stat_error is missing from Table 7 >> > > *included and corrected* > >> Also, note the comment I made on MJD in use case 1.6 >> and on the uselessness of bib_reference because of its murky >> definition > *MJD done* > bib-reference is an optional field that a data provider may use to > flag some data sets as the ones used and published together with a > scientific paper. > This is not meant to behave like a citation index and point to all > papers mentioning this data set. > >> I still lament the fact that the data access functionality is >> compromising the self-consistency and usefulness of the data discovery >> function, but decided for our tarred packages to use: >> dataproduct_type = NULL >> dataproduct_subtype = package:event,image >> access_format = application/x-tar >> As far as I can tell, this is within the specifications. >> > This seems a proper use of the Obs/TAP specification to expose your data. > "dataproduct_subtype" is an optional field that the data provider can > define. > Possible values for this field should be clearly documented by the > service. > >> o_stat_error is an interesting case. Since our unit is counts, the >> proper value would be "POISSON"; I realize that that is not a double, >> but what else can we give as a value? >> > This was meant only for quantitative estimation of the error and does > not cover the statistical properties of the signal. > >> Please do not consider this list of corrections to be exhaustive. >> >> Cheers, >> >> - Arnold >> >> Arnold Rots wrote: >>> Mireille, >>> >>> Here are some items. >>> >>> Ian Evans noticed the inconsistency in units for spatial resolution >>> between the Tables 1, 4, 5, and 6 (arcsec vs. deg); what should it be? >>> I assume deg? >>> See also s_stat_error in Table 5. >>> > We agreed in previous iterations to have resolution and errors in a > convenient and handy unit: arcsec for space and s for time , for > instance because they are usually given that way in instrument > descriptions and scientific papers. > >>> In addition, I noticed that Table 5 contained unit "day" that should >>> be "d", Table 7 has erroneous unit "d" for data rights and is missing >>> most units. >>> > *updated* >>> The section on obs_publisher_did is a bit murky and not quite >>> consistent with the definition in the spectral data model where it >>> expresses a strong preference for using the same DIDs as are being >>> used in the journals. >>> That implies that the data product the query result refers to may be a >>> subset of what the DID stands for, as the current spectral draft >>> affirms. >>> On the other hand, I don't think the spectrum DM and the SSA DAL are >>> quite consistent in this respect. It might be good to have a more >>> thorough discussion on these DIDs and consistency between all PRs. >>> > This is identified as a work to do , in compatibility with undergoing > effort on IVOA identifiers definitions . > We agreed during the last telecon to reconsider it for a future > version of Obs/Tap. >>> It also brings me back to the issue I have been harping on: what to >>> do with packages of products pertaining to a single observation; >>> I will not repeat that here. >>> >>> However, there is also the reverse problem: what do we do with data >>> products based on multiple observations? Do we allow ObsId to be a >>> list of ObsIds? >>> >>> I still find the bibcodes a bit problematic. The SSA DAL doc calls it >>> a "curation reference", but in the text seesm to imply that any >>> publication mentioning the data is fair game. Is this really meant to >>> be a reference to the data, or is it to be any paper that references >>> the data? There is a difference between these two... >>> I realize, though, that this is primarily an issue for the SSA DAL doc. >>> But it has repercussions for this document as well. >>> >>> Cheers, >>> >>> - Arnold > > From mireille.louys at unistra.fr Wed Jul 13 09:52:47 2011 From: mireille.louys at unistra.fr (Mireille Louys) Date: Wed, 13 Jul 2011 18:52:47 +0200 Subject: New release of ObsCore Proposed recommendation document Message-ID: <20110713185247.kdrcg2uw0k00wc4o@webmail.u-strasbg.fr> Dear all , I attached a new release of the proposed recommendation for ObsCore data model at http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/ObsCoreRFC 2 files are available: in docx and in pdf format. I inserted typos corrections, some reformulations and updated the items moved from optional to mandatory status. Plus some editing of all tables, to have a more coherent view. Thanks to you for providing the feedback for this specification during the last 6 months. Cheers, Mireille From Jesus.Salgado at sciops.esa.int Thu Jul 21 02:57:10 2011 From: Jesus.Salgado at sciops.esa.int (Jesus Salgado) Date: Thu, 21 Jul 2011 11:57:10 +0200 Subject: ObsCore enters into the TCG review phase Message-ID: <2468_1311242225_4E27F7F1_2468_743309_1_1311242230.28932.256.camel@satl11.net4.lan> Dear all, After latest updated version uploaded by Mireille some days ago and a verification that this version covers the main issues found during the working group review on the DM side, ObsCore enters officially into TCG review phase. RFC page can be found here http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/ObsCoreRFC As this RFC period will overlap the holidays period, the period will be longer than 4 weeks: 21-July/15-September I would like to congratulate Mireille here for all the effort done in all this process. Best Regards, -- Jesus J. SALGADO Jesus.Salgado at sciops.esa.int ESAC Science Archives Team European Space Astronomy Centre (ESAC) European Space Agency (ESA) European Space Agency/European Space Astronomy Centre P.O. Box 78 28691 Villanueva de la Canada Tel: +34 91 813 12 71 Madrid - SPAIN Fax: +34 91 813 13 08 ------------------------------------------------------------------- ================================================================================================ This message and any attachments are intended for the use of the addressee or addressees only. The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its content is not permitted. If you received this message in error, please notify the sender and delete it from your system. Emails can be altered and their integrity cannot be guaranteed by the sender. Please consider the environment before printing this email. ================================================================================================= From francois.bonnarel at astro.unistra.fr Fri Jul 29 17:11:13 2011 From: francois.bonnarel at astro.unistra.fr (Francois Bonnarel) Date: Sat, 30 Jul 2011 02:11:13 +0200 Subject: [obs-tap]:updates on the Proposed recommendation In-Reply-To: <201107121606.p6CG6DV0023225@xebec.cfa.harvard.edu> References: <201107121606.p6CG6DV0023225@xebec.cfa.harvard.edu> Message-ID: <4E334C21.3030106@astro.unistra.fr> Hi Arnold, all dm people, Let me go back to this, because apparently, this discussion is going on underground First come back to the very beginning of the ObsTap effort... It was a strong commitment from the comitee to build something fast reusing tAP protocol and observation/charac data model for data discovery covering most of the needs... From the very beginning also, it was obvious that Data links and virtual access data could not and will not be covered by Obstap The DataLink method or service concept has been around in various DAL notes since years now. As far as I am concerned I made presentations in the last three Interop meetings (Victoria, Nara and Napoli, see eg the latter: http://www.ivoa.net/internal/IVOA/DAL-InteropMay2011/DataLink.pdf ) This concept is there, because you cannot imagine providing both Data Discovery and complex linkage features (or linkage for complex data structure) in one step and a SINGLE table, (single table required by the TAP-ADQL protocol as all may remember) So ObsTap is there for DataDiscovery... the only thing you can imagine to provide access to the various Data sets in an observation is to duplicate the observation raws until you reach full discovery of all observation-related products as was allready explained... This is verbose and works . So now how can DataLink work in the future ? see below on your use case ... Data Link is now in the roodmap of the DAL working group and an IVOA note is in preparation as a very first drafting effort of this new "protocol".... The note will be available within 3 weeks or so.. Arnold Rots a e'crit : > This is becoming unwieldy. > Trying to make X-ray data (and I suspect the same is true for aperture > synthesis data) fit into something that is designed with optical > images in mind is reminiscent of round pegs and square holes. > > Service providers are free to define subtypes and titles, but you are > saying that if they don't follow rules that are not spelled out, > things won't work as envisaged. > Also, if I understand the argument correctly, if data discovery > software is to be helpful at all, it needs to be able to extract some > information from the title field - but that is intended for human > consumption. > > If I see this, it looks like I need to generate at least eight records > for a single observation, some containing a mix of levels, and all > duplicating pretty much the same metadata. > > This is not going to make it attractive to provide ObsTAP services. > > > Maybe I should do what you did and provide an example of how I thought > it should have worked. > > Here is how I would envisage data discovery of Chandra data to work: > A single record per Obsid that provides the observational metadata and: > ObsId > 12345 > Dataset Identifier > ivo://ADS/Sa.CXO#obs/12345 > Data Types available > Package > Event list > Image > Calibration level > 2 > Title > Chandra/ACIS ObsId 12345 > > DataLink is a method or a service allowing to retrieve a table describing links between observations identified by their obsid and any kind of data retrieval ... Obsid known from an ObsTap discovery phase can be directly used for interrogating such a service of course.. (and by the way in the case the Obstap service is a TAP-PQL service the DataLink table could be attached with the main obstap table in the same query response because the single table requirement is no more there in that case) But it is a qualified link which means that the semantic or type of the link is given in one field of the table, while the nature of the access is given in another field : this can tell us if it is a simple retrieval , an SIA Query service ans SSA AccesData method, etc ... So in your use case we will get three different links for the same Observation (obsid) .. the types (or semantic) will be Package, event list and image and the Access nature could be respectivly : retrieval retrieval and SIA query (for example) In addition the "Access" package (group of access fields in the table) is proposed to be extended beyond the traditional "reference" and "format" to describe which part of a complex "file" is to be retrieved ( path in a directory/tar file, extension in MEF file, table name in a VOTABLE, etc ...) .. A proposal for such an extended access package is described in the chaaracterisation 2 draft at the moment... Best regards Franc,ois > Then a data access protocol that allows querying the archive using any > of the above in a where clause, with either ObsId or DID required, and > returning: > ObsId DataType Contents Level Format URL > ----------------------------------------------------------- > 12345 Pkg_1 evt,img 2 tar http://... > 12345 Pkg_2 evt,img 1 tar http://... > 12345 Pkg_12 evt,img 2,1 tar http://... > 12345 evt evt 2 fits-bin http://... > 12345 evt evt 1 fits-bin http://... > 12345 img img 2 fits http://... > 12345 img img 2 jpg http://... > 12345 img img 2 fits http://... > 12345 img img 2 jpg http://... > This is an example where the client specified ObsId or DID, but no > data type or format. > > Never mind the terms and abbreviations I used - you get the picture. > > Cheers, > > - Arnold > > > Douglas Tody wrote: > >> More precisely what you might have is something like (display in a wide view): >> >> ObsId Type Subtype Level Format Title >> ---------------------------------------------------------------------------------------------------------- >> 123 event chandra.hrc.pkg 1 application/x-tar-gzip Chandra ACS-XYZ observation package (event,refimage) >> 123 image chandra.hrc.refimage 2 image/fits Chandra ACS-XYZ reference image >> 123 image chandra.hrc.preview 2 image/jpeg Chandra ACS-XYZ preview image >> 345 event rosat.foo.pkg 1 application/x-tar-gzip ROSAT whatever observation package (xxx) >> >> and so forth. The subtype could in principle be more generic but will >> likely be instrument-specific for a level 1 observation. >> >> The Title should concisely describe the data product, e.g., origin, >> instrument, ID, what it is (observation package, calibration, standard >> view, etc.). The title string is what one normally wants to output on a >> displayed image or plot to identify to a human the data being shown. >> You can put whatever you want in there to describe the data product so >> long as it is concise (one line of text). >> >> - Doug >> >> >> >> >> On Mon, 11 Jul 2011, Douglas Tody wrote: >> >> >>> On Thu, 7 Jul 2011, Arnold Rots wrote: >>> >>> >>>> Aside from what I reported in a previous message, quoted below, there >>>> are more discrepancies between Table 5 and Tables 6 and 7: >>>> >>>> obs_creator_did is missing from Table 7 >>>> o_units in Table 5 should be o_unit >>>> pol_states is missing from Table 6 >>>> facility_name and instrument_name are spelled differently; >>>> even though required, they show up in Table 7, rather than 6 >>>> em_unit is missing from Table 5 >>>> o_stat_error is missing from Table 7 >>>> >>>> Also, note the comment I made on MJD in use case 1.6 >>>> and on the uselessness of bib_reference because of its murky >>>> definition >>>> >>>> I still lament the fact that the data access functionality is >>>> compromising the self-consistency and usefulness of the data discovery >>>> function, but decided for our tarred packages to use: >>>> dataproduct_type = NULL >>>> dataproduct_subtype = package:event,image >>>> access_format = application/x-tar >>>> As far as I can tell, this is within the specifications. >>>> >>> Well we don't specify what the subtypes you provide for your archive >>> should be so I suppose you could get away with this, but this example is >>> not at all what we had in mind. The subtype should be the science type >>> of the specific data product, *not* details about the content of the >>> data product. I would expect the type to be "event" (meaning "event >>> data" not "event list") and the subtype to be something more like >>> "chandra.hrc.package", "chandra.hrc.refimage (or "rosat.XX" etc.). >>> >>> Note subtypes are supposed to be fixed strings so that one can search >>> the local archive for a particular type of data product; if you try to >>> describe what is included in a particular data product then such >>> selection won't be possible. So for example a client will do a generic >>> query to see what subtypes Chandra defines, and then they can pose a >>> more specific query to get a certain type of Chandra-specific data >>> product. Likewise for ALMA etc. >>> >>> Note you also have obs.title where you can provide a short description >>> of the data product and for this you can provide whatever you want. >>> >>> - Doug >>> >>> > -------------------------------------------------------------------------- > Arnold H. Rots Chandra X-ray Science Center > Smithsonian Astrophysical Observatory tel: +1 617 496 7701 > 60 Garden Street, MS 67 fax: +1 617 495 7356 > Cambridge, MA 02138 arots at head.cfa.harvard.edu > USA http://hea-www.harvard.edu/~arots/ > -------------------------------------------------------------------------- > > -- ===================================================================== Franc,ois Bonnarel Observatoire Astronomique de Strasbourg CDS (Centre de donne'es 11, rue de l'Universite' astronomiques de Strasbourg) F--67000 Strasbourg (France) Tel: +33-(0)3 68 85 24 11 WWW: http://cdsweb.u-strasbg.fr/people/fb.html Fax: +33-(0)3 68 85 24 25 E-mail: francois.bonnarel at astro.unistra.fr --------------------------------------------------------------------- From francois.bonnarel at astro.unistra.fr Sun Jul 31 10:15:34 2011 From: francois.bonnarel at astro.unistra.fr (=?ISO-8859-1?Q?Fran=E7ois_Bonnarel?=) Date: Sun, 31 Jul 2011 19:15:34 +0200 Subject: Characterisation data model version 2.0 draft Message-ID: <4E358DB6.5050101@astro.unistra.fr> Dear all, I would like to announce the very first release of the Characterisation version 2 Working draft... This effort started immediatly after the release of version 1.0 of the model more than 3 years ago, because some extended features (level 4 characterisation, complex data) had been intentionnaly postponed for a future version... Since 2008, we also encountered data providers and other working group feedback on version 1. From 2009 on, the ObsTap effort led to a couple of additional slight changes in the model... This effort had been mainly slowed down for the reason that most participants were concentrated on this Obstap effort but nonetheless it was regularly reported at each Interop from May 2008 to May 2011 (see eg My napoli presentation: http://www.ivoa.net/internal/IVOA/InteropMay2011DM/CharVersion2.0.pdf) Beside the obstap-motivated changes the new version allows taking into account Composed datasets, Polarization and discrete-valued axes, Redshift axes, sensitivity maps, psf, resolution variation maps, etc... A prototyped version of characterisation 2 metadata service has been used by Igor in his spectrum "best fit" service presented in Nara and Napoli. Xml schema is actually ready but will be posted only in August on this list... The draft version is there: http://www.ivoa.net/internal/IVOA/CharacterisationDataModel/NewChar2.pdf There is still lot of editorial mistakes probably, but it's now time to discuss the ideas more widely... The draft contains the description of an Extended package which is probably usefull for IVOA outside the context of Char 2 (within povenance and Data Link as I wrote in my email on this topic yesterday). We may have to decide to make a dedicated document for this extended access package... Best regards Fran?ois