RWP04: Registry Replication

Ray Plante rplante at
Tue Apr 29 23:27:02 PDT 2003


Nice work on these requirements, Keith.  (I've had my head in prep for a 
conference, so sorry if I'm still catching up on the discussion.)

> 6. Flexible
>       * Registries should support rich data enquiries.
>       * A registry should also support "data harvesting" by other
>         registries. This allows registry data to be copied and expanded
>         as required as well as supporting data redundancy through data
>         replication.

It is not necessary that a registry support both functions.  There are 
good reasons why one might not want to support both a searchable interface 
and a harvestable one.  See below.

> 1. Distributed
>       * It probably doesn't make sense to store all of the registry data
>         in every registry. This implies a distributed model of registry
>         searching and maintenance.

Given the extensive discussion about possible designs for replication, I 
think we need to spell out *why* we think we need a distributed registry.  
What does it do for us?  Here are some possible reasons:

  a.  Redundancy:  guarantees that there is always a registry available  
         to applications looking for resources.

  b.  Specialization:  a registry can specialize in certain types of 
         resources, topics, frequency bands, or some idiosyncratic notion 
         of data quality.  

  c.  Scalability:  larger registries might be harder to maintain; 
         distributed registries might make better use of bandwidth; ...

  d.  Control and Maintanance:  it might be easier for a large-ish data
         provider to maintain/update its resource records if it ran its 
         own registry.

Now, personally, I do *not* thing (c) is a good reason.  In my mind, the 
point of the registry is that users can go to one place to find other 
resources.  The data in a registry is meant to be light-weight.  So I 
don't see the need to go to a multiple registries to find everything.

(d) is interesting but it doesn't require that the provider-run registry 
support search queries; the metadata just needs to be harvestable.  
In the OAI model, replication of the data is handled separately from 
search queries.  I think this is a good idea.

(a) and (b) sound pretty good to me, though.  Given this, I think the 
full-limited-private model seems pretty good for searchable registries.  
The "full" type claims that it tries to hold all (harvestable) resource 
metadata.  (d) corresponds to the "private" registry.  

> 2. Self maintaining
>       * The registries should place the absolute minimum burden upon the
>         data centres where they run. On-going maintenance costs are
>         mostly those relating to support personnel, so the less human
>         intervention the better.

Note that the overhead for searchable registries is going to be larger.  A 
more complex querying interface is necessary, and since it plays a more 
active role in applications, it needs to more reliable.  On the other 
hand, if your registry is just meant for harvesting by searchable 
registries, then the demands are lighter.  For small registries, it may 
mean, for example, that running a database is not necessary.

I suspect that the need for low-cost is going to be greatest for providers 
who run a registry primarily for just getting their metadata out there.  
Providing a searchable interface for applications is less likely to be 
important (or useful).

>       * Registries should where possible manage their own data integrity
>         (backups etc)

Does this mean that a registry system has built into it the ability, say, 
to write its contents to tape?  I think backup should not be part of the 
design.  A lower-cost solution is to let the curator use whatever 
backup method used for any other kind of data they maintain.  We do want 
to watch for requirements or design choices that make common backup 
methods difficult.  (Maybe I'm reading this requirement wrong.)

>       * Self maintenance almost certainly implies a peer network of
>         registries.

(Not sure why.)


More information about the registry mailing list