From msdemlei at ari.uni-heidelberg.de Mon Jun 6 03:28:07 2011 From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner) Date: Mon, 6 Jun 2011 12:28:07 +0200 Subject: Use Cases for new Registry search interface In-Reply-To: References: Message-ID: <20110606102807.GG22886@ari.uni-heidelberg.de> Dear RWGers, On Tue, May 24, 2011 at 10:40:38AM +0100, Paul Harrison wrote: > > On 2011-05 -23, at 19:05, Ray Plante wrote: > > At Naples during our future directions discussion, we talked about a new > > Registry search interface, and there was a call to collect use cases and > > requirements. This includes collecting the kinds of queries we would like > > to be able make and process easily. > that page seems like a pretty good summary to me. > > One complaint that we have often heard with the simple "keyword" > style searches is that different registries ended up giving > different answers - I think that this is probably a consequence of Coming back from vacation, I've added some comments to the wiki page and added what I think Paul's point are in two places -- so, Paul, if I misrepresented you, sorry. Executive summary of my changes: I'd really like to see a TAP/ADQL interface to searchable registries, but we shouldn't forget that there is OAI-PMH and VOSI. We may want some IR/Text-Searching user defined function if we go ahead and use ADQL as the query language. And, we really should keep XML (unfortunately XSD-defined) as the primary format of VO resource records, if only for interoperability with other bibliographic engines. Cheers, Markus From paul.harrison at manchester.ac.uk Mon Jun 6 04:51:35 2011 From: paul.harrison at manchester.ac.uk (Paul Harrison) Date: Mon, 6 Jun 2011 12:51:35 +0100 Subject: Use Cases for new Registry search interface In-Reply-To: <20110606102807.GG22886@ari.uni-heidelberg.de> References: <20110606102807.GG22886@ari.uni-heidelberg.de> Message-ID: On 2011-06 -06, at 11:28, Markus Demleitner wrote: > Coming back from vacation, I've added some comments to the wiki page > and added what I think Paul's point are in two places -- so, Paul, if > I misrepresented you, sorry. I was serious about providing a formal definition of the registry data model in a form other than the XML schema - but I tend to agree that these schema should define the interchange format at least so that part of the registry infrastructure does not change - this was actually at the heart of the different interpretations that various groups placed on these schema historically and the reason why there was no universally accepted searchable data model definition. I have even taken a step on the path to creating an independent definition of the model using the vo-urp (http://code.google.com/p/vo-urp/) formalism developed by Gerard Lemson et.al. I have used a semi-automatic reverse engineering of the existing schema to produce the MagicDraw XMI UML model http://code.google.com/p/vo-urp/source/browse/trunk/input/registry/registry_DM.xml This model is fairly comprehensive with one exception - xml types that are derived from primitives such as xs:integer are not yet represented. This needs more work, but I am currently busy with my "day job". The issues going forward on this are 1. The relational table model that will naturally come out of this will result in many joins being needed for fairly "simple" queries. 2. The xml representation of the model from vo-urp will not be exactly the same as the original schema. I think that the first of these issues can probably be somewhat mitigated by some manual tweaking of the model (Gerard has experience of this from the theory work) - and perhaps by providing some standardized views and adql functions. The second can probably be dealt with by some standardized xsl transformations. Anyway I think that it is worth carrying on with this approach to see if it can produce an acceptable model with its associated relational and XML representations. Paul. From pierre.lesidaner at obspm.fr Mon Jun 6 06:51:17 2011 From: pierre.lesidaner at obspm.fr (Pierre Le Sidaner) Date: Mon, 06 Jun 2011 15:51:17 +0200 Subject: Use Cases for new Registry search interface In-Reply-To: <20110606102807.GG22886@ari.uni-heidelberg.de> References: <20110606102807.GG22886@ari.uni-heidelberg.de> Message-ID: <4DECDB55.7020101@obspm.fr> Le 06/06/2011 12:28, Markus Demleitner a ?crit : > Dear RWGers, > > On Tue, May 24, 2011 at 10:40:38AM +0100, Paul Harrison wrote: >> On 2011-05 -23, at 19:05, Ray Plante wrote: >>> At Naples during our future directions discussion, we talked about a new >>> Registry search interface, and there was a call to collect use cases and >>> requirements. This includes collecting the kinds of queries we would like >>> to be able make and process easily. >> that page seems like a pretty good summary to me. >> >> One complaint that we have often heard with the simple "keyword" >> style searches is that different registries ended up giving >> different answers - I think that this is probably a consequence of > Coming back from vacation, I've added some comments to the wiki page > and added what I think Paul's point are in two places -- so, Paul, if > I misrepresented you, sorry. > > Executive summary of my changes: > > I'd really like to see a TAP/ADQL interface to searchable registries, > but we shouldn't forget that there is OAI-PMH and VOSI. We may want > some IR/Text-Searching user defined function if we go ahead and use > ADQL as the query language. And, we really should keep XML > (unfortunately XSD-defined) as the primary format of VO resource > records, if only for interoperability with other bibliographic > engines. > > Cheers, > > Markus > I don't realy see the point, why tap ADQL to the registry ? There is actualy two method : search and keyword search. From my point of view, searching in the registry has nothing to do with searching in a data service. The complexity of ADQL make only the job much more complex to implement. We should realy think of a search method in the registry based on very limited SQL (LIKE, WHERE, AND , OR) inside a search and keyword search. maintaining XML schema will allow xml databases or mixed one (Postgresql including xpath). Pushing TAP to that will increase much more the implementation complexity for no real useful feature. -- ------------------------------------------------------------------------- Pierre Le Sidaner Observatoire de Paris Division Informatique de l'Observatoire Observatoire Virtuel 01 40 51 20 89 61, avenue de l'Observatoire 75014 Paris mailto:pierre.lesidaner at obspm.fr http://vo-web.obspm.fr -------------------------------------------------------------------------- From rplante at ncsa.uiuc.edu Mon Jun 6 07:14:10 2011 From: rplante at ncsa.uiuc.edu (Ray Plante) Date: Mon, 6 Jun 2011 09:14:10 -0500 (CDT) Subject: Use Cases for new Registry search interface In-Reply-To: <4DECDB55.7020101@obspm.fr> References: <20110606102807.GG22886@ari.uni-heidelberg.de> <4DECDB55.7020101@obspm.fr> Message-ID: Hi folks, On Mon, 6 Jun 2011, Pierre Le Sidaner wrote: > I don't realy see the point, why tap ADQL to the registry ? Personally, I have not formed a strong opinion on this, yet, but I do see a motivation. In a VO where both server and client libraries are ubiquitous, it's not unreasonable to want a TAP interface to the registry. For example, Topcat recently added support for pulling data from TAP services. If the registry had a TAP service, it would be trivial for Topcat to also use it to find TAP services. It's also worth pointing out that a TAP interface would satisfy a key requirement requested by app developers: ability to request a few selected pieces of information--that is, what the select clause gives you. I will note that in my initial write-up, I did not explicitly mention TAP as a requirment. Rather, I would rather this choice either flow directly from requirements or from requests from apps developers. cheers, Ray From rplante at ncsa.uiuc.edu Mon Jun 6 07:30:59 2011 From: rplante at ncsa.uiuc.edu (Ray Plante) Date: Mon, 6 Jun 2011 09:30:59 -0500 (CDT) Subject: Use Cases for new Registry search interface In-Reply-To: References: <20110606102807.GG22886@ari.uni-heidelberg.de> Message-ID: On Mon, 6 Jun 2011, Paul Harrison wrote: > The issues going forward on this are > 1. The relational table model that will naturally come out of this will > result in many joins being needed for fairly "simple" queries. This is the main difficulty I see with deriving tables from the model as it is currently defined. I would much rather see if we can come up with a relational model that is, say, 3 tables big (plus maybe some pre-joined views). This may well be at the expense of certain complex queries (e.g. against relationships or contact info in a non-information-losing way) that cannot be done. This simplified model should considered as a TAP data model like ObsCore--views against some possibly more complex or extended structure underneath. Nevertheless, it would be good to exam what comes out of vo-urp. > 2. The xml representation of the model from vo-urp will not be exactly > the same as the original schema. I don't see a need for changing the XML representation. Defining a new search interface will be disruptive enough. Adopting a new XML representation would essentially mean throwing away most of the infrastructure we have built and have been using successfully. cheers, Ray From msdemlei at ari.uni-heidelberg.de Mon Jun 6 23:51:31 2011 From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner) Date: Tue, 7 Jun 2011 08:51:31 +0200 Subject: Use Cases for new Registry search interface In-Reply-To: <4DECDB55.7020101@obspm.fr> References: <20110606102807.GG22886@ari.uni-heidelberg.de> <4DECDB55.7020101@obspm.fr> Message-ID: <20110607065131.GD4844@ari.uni-heidelberg.de> Dear RWG, On Mon, Jun 06, 2011 at 03:51:17PM +0200, Pierre Le Sidaner wrote: > Le 06/06/2011 12:28, Markus Demleitner a ?crit : > >I'd really like to see a TAP/ADQL interface to searchable registries, > >but we shouldn't forget that there is OAI-PMH and VOSI. We may want > I don't realy see the point, why tap ADQL to the registry ? As Ray already pointed out, mainly to reuse a technology we've already developed and that is up to the task. > We should realy think of a search method in the registry based on > very limited SQL (LIKE, WHERE, AND , OR) inside a search and keyword > search. -- but that would mean defining *another* language *someone* will have to implement; on the other hand, people running searchable registries probably have a TAP service somewhere anyway. > maintaining XML schema will allow xml databases or mixed one > (Postgresql including xpath). That's another point -- I could very well see that we might not want a (purely) relational registry representation at all but rather some XML-based technology. Paul has given some reasons to do so. My take is that if Ray's "three-table-or-so" simplified registry model proves unfeasable, I'd definitely advocate looking into X again. Even then, I'd suggest maintaining TAP as "transport protocol" and just plugging in a different language than ADQL. Defining some ad-hoc transport protocol might seem attractive at first, but in the end we'll end up solving quite a few problems that TAP already addresses (e.g., response format selection, paging[1], etc). But *if* we're going relational, let's use what we've already defined for accessing relational data (i.e., TAP with its default language ADQL), both for keeping our implementation effort low and for making the most of our users' efforts to learn our query language. Cheers, Markus [1] ok, TAP doesn't do paging, but since you can access async results by HTTP and good servers can do partial transfers, you can, if you insist, get about the same effect. From paul.harrison at manchester.ac.uk Thu Jun 9 03:36:48 2011 From: paul.harrison at manchester.ac.uk (Paul Harrison) Date: Thu, 9 Jun 2011 11:36:48 +0100 Subject: VOResource Schema changes to easy relational mappings... Message-ID: Hi In order to make a better VOResource schema for mapping to a relational model there are a number of changes that could be made - every time the cardinality of an element is unbounded a table join strictly needs to be made - it would be useful if we could decrease the cardinalities of some of the elements - I have identified a number of cases where this could be done at almost no cost. 1. we allow a multiple Relationship elements which can each have multiple RelatedResource elements within them. The purpose of this is to allow the related resources to be grouped by relationshipType - however the same effect can be obtained by only allowing a single RelatedResource element within the Relationship element and then repeating Relationship elements. In the AstroGrid registry there are 1208 resources that use the relationship element and of those only 2 use multiple relatedResource elements within a single relationship ivo://org.gavo.dc/__system__/tap/run ivo://org.gavo.dc/lensunion/q/im 2 AccessURL - I remember the argument for having multiple accessURLs - i.e. to be able to specify fallback URLs - however there are no resources in the registry that make use of this - perhaps this is outside the scope of what the registry should be trying to do - fallback on a single address should be done via network layer techniques. 3. curation/date - do we need this at all? the resource record itself has attributes that seem to cover the needs (unless an audit trail of all the update dates is required)- it is used by a lot of records (9888) but only ivo://ivoa.net/rofr ivo://CDS.VizieR ivo://CDS.VizieR/registry use it more than the single time - if it were removed then it would require automated editing of many records, but again if it were given a cardinality of 1 only a handful of records would need hand editing... Now for the even more controversial aspect (for the purists) - I suggest that we make a retrospective change to VOResource 1.0 without making a change to the namespace - this would be much less disruptive - A handful of resource records needing to be edited compared with potentially many software systems if the namespace is changed (Admittedly depending on whether the software uses the live schema or not, it will need changes - so that it does not allow the newly forbidden multiple cardinalities - but these can be done gradually, and the whole registry does not become instantly invalid/outdated). I might find more as I look further... Paul. From msdemlei at ari.uni-heidelberg.de Thu Jun 9 05:23:11 2011 From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner) Date: Thu, 9 Jun 2011 14:23:11 +0200 Subject: VOResource Schema changes to easy relational mappings... In-Reply-To: References: Message-ID: <20110609122311.GA25360@ari.uni-heidelberg.de> On Thu, Jun 09, 2011 at 11:36:48AM +0100, Paul Harrison wrote: > In order to make a better VOResource schema for mapping to a > relational model there are a number of changes that could be made - > every time the cardinality of an element is unbounded a table join [...] > In the AstroGrid registry there are 1208 resources that use the > relationship element and of those only 2 use multiple > relatedResource elements within a single relationship > > ivo://org.gavo.dc/__system__/tap/run > ivo://org.gavo.dc/lensunion/q/im Since both of these are mine: I'd be fine with flattening out the relationships and could fix the RRs as soon as we decide to do this. Actually, the grouping of relationships of the same type was a bit painful in implementation, so simplifying this will actually help implementors IMHO. > 2 AccessURL - I remember the argument for having multiple > accessURLs - i.e. to be able to specify fallback URLs - however > there are no resources in the registry that make use of this - > perhaps this is outside the scope of what the registry should be > trying to do - fallback on a single address should be done via > network layer techniques. Agreed. > Now for the even more controversial aspect (for the purists) - I > suggest that we make a retrospective change to VOResource 1.0 > without making a change to the namespace - this would be much less I'm all for being a bit naughty here. The alternative would be a long process with uncertain results. As long as we only touch unused or little-used features I think "retroactive changes" (great term!) are far preferable. Cheers, Markus From paul.harrison at manchester.ac.uk Mon Jun 13 07:41:39 2011 From: paul.harrison at manchester.ac.uk (Paul Harrison) Date: Mon, 13 Jun 2011 15:41:39 +0100 Subject: Relational Registry Resource DM Message-ID: Hi, I have made some progress on creating a relational model from the VOResouce schema on http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/RelationalRegistryDM This model had using the smallest number of tables and potential joins as its uppermost design aim, and I believe that this model does achieve this aim if all the current cardinalities are to be supported* - I can see that there might be an argument for increasing this number by having explicit join tables so that many to many relationships can be supported in order to be able properly to normalise the model - e.g. between Resource and Contact. Any comments before I have a go at adding VODataService.xsd? Paul. *see http://www.ivoa.net/pipermail/registry/2011-June/004799.html