How should the Registry handle mirrors?

Clive Page cgp at star.le.ac.uk
Thu Jun 24 04:26:37 PDT 2004


Many important astronomical data resources are duplicated, for example
there are 9 copies of the VizieR service, and there are two or more copies
of quite a number of other important data resources.  It seems to me quite
important that the VO should to be able to handle them sensibly.  This
obviously involves the Registry, yet I could not find any reference to
mirrors in any registry-related documents that I scanned.

My first thoughts on functional requirements are set out here. The two
principal benefits that the user gets from having mirrored copies are (a)
increased availability in the face of services and links which are not
guaranteed to be on-line 24/7, and (b) increased performance by choosing
the copy with the best network links or smallest existing workload.  I can
see, however, that achieving these will be difficult in practice.

I think that the functionality of the Registry will need to depend on
whether it is being queried by a human or by a machine.

If a human sends a resource discovery query to the registry which finds
that two or more identical copies exist of the required resource, ideally
the registry should tell the user the best one to use.  Doing this in
practice will be very difficult, as it will depend on what subsequent
operations the user plans to carry out.  If it is a trivial query
returning a large volume of data, then the network link speed should be
given a large weight in making the decision, whereas if they want to
perform a substantial data mining query, then the current workload of the
various servers may be the determining factor.  My guess is that most
human users would like to know of the existence of all available mirrors
for the resource, so they can take the decision as to which one to use.
So at the most basic level the Registry should simply list them all. This
is, after all, what generally happens at present, e.g. if you access the
Vizier home page.

The VO will provide added value if it can give information on which of the
mirrors is currently on-line (e.g. by doing a few pings before returning
results), and even more value if it can indicate the nearest in network
terms (e.g. by doing a few traceroutes), but this is clearly something
that we don't need to provide initially.

If the query comes from a machine, e.g. as part of a complex workflow, the
situation is quite different.  The Registry *has* to choose one copy (or
else it will just put off the decision to the workflow engine, which
doesn't solve the problem, it just passes the buck).  In this situation it
is highly desirable that it chooses a copy of the resource which is
actually working, so here some pings will really be needed.  Ideally the
Registry might try to maintain an up-to-date table of available resources,
but this is surely more advanced functionality than we can contemplate at
present.

I wonder if there is a third case in which the user wants to compare
mirrors by sending the same query to two (or more) of them?  This might
not be something the system should encourage, but it might be a nice
function for data centre administrators to have available, so as to check
on the validity of their own mirroring facilities.


Now a question to Bob and the drafters of the Resource and Service
Metadata: how do mirrors actually get registered and identified?  Is it
sufficient for the Title element to be identical (so all Vizier clones are
simply called "Vizier" (or "VizieR"?).  The Identifier (URI) will
obviously be different, but what about the ShortName, and the Publisher?

Given the importance of mirrors in the astronomical data provision, it
would be nice if the documentation could give clear guidance on these
matters.  We are already starting to see prototype registries being set
up, and mistakes at this stage could be hard to unwind later on.

Apologies all round if these issues have already been explored in the
mailing lists, and I've just failed to notice them.

-- 
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.




More information about the registry mailing list