RWP04: Registry Replication
rplante at poplar.ncsa.uiuc.edu
Tue Apr 29 23:27:02 PDT 2003
Nice work on these requirements, Keith. (I've had my head in prep for a
conference, so sorry if I'm still catching up on the discussion.)
> 6. Flexible
> * Registries should support rich data enquiries.
> * A registry should also support "data harvesting" by other
> registries. This allows registry data to be copied and expanded
> as required as well as supporting data redundancy through data
It is not necessary that a registry support both functions. There are
good reasons why one might not want to support both a searchable interface
and a harvestable one. See below.
> 1. Distributed
> * It probably doesn't make sense to store all of the registry data
> in every registry. This implies a distributed model of registry
> searching and maintenance.
Given the extensive discussion about possible designs for replication, I
think we need to spell out *why* we think we need a distributed registry.
What does it do for us? Here are some possible reasons:
a. Redundancy: guarantees that there is always a registry available
to applications looking for resources.
b. Specialization: a registry can specialize in certain types of
resources, topics, frequency bands, or some idiosyncratic notion
of data quality.
c. Scalability: larger registries might be harder to maintain;
distributed registries might make better use of bandwidth; ...
d. Control and Maintanance: it might be easier for a large-ish data
provider to maintain/update its resource records if it ran its
Now, personally, I do *not* thing (c) is a good reason. In my mind, the
point of the registry is that users can go to one place to find other
resources. The data in a registry is meant to be light-weight. So I
don't see the need to go to a multiple registries to find everything.
(d) is interesting but it doesn't require that the provider-run registry
support search queries; the metadata just needs to be harvestable.
In the OAI model, replication of the data is handled separately from
search queries. I think this is a good idea.
(a) and (b) sound pretty good to me, though. Given this, I think the
full-limited-private model seems pretty good for searchable registries.
The "full" type claims that it tries to hold all (harvestable) resource
metadata. (d) corresponds to the "private" registry.
> 2. Self maintaining
> * The registries should place the absolute minimum burden upon the
> data centres where they run. On-going maintenance costs are
> mostly those relating to support personnel, so the less human
> intervention the better.
Note that the overhead for searchable registries is going to be larger. A
more complex querying interface is necessary, and since it plays a more
active role in applications, it needs to more reliable. On the other
hand, if your registry is just meant for harvesting by searchable
registries, then the demands are lighter. For small registries, it may
mean, for example, that running a database is not necessary.
I suspect that the need for low-cost is going to be greatest for providers
who run a registry primarily for just getting their metadata out there.
Providing a searchable interface for applications is less likely to be
important (or useful).
> * Registries should where possible manage their own data integrity
> (backups etc)
Does this mean that a registry system has built into it the ability, say,
to write its contents to tape? I think backup should not be part of the
design. A lower-cost solution is to let the curator use whatever
backup method used for any other kind of data they maintain. We do want
to watch for requirements or design choices that make common backup
methods difficult. (Maybe I'm reading this requirement wrong.)
> * Self maintenance almost certainly implies a peer network of
(Not sure why.)
More information about the registry