VOResource 1.1: Mirrors?

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue May 31 13:53:06 CEST 2016


Dear Registry, Dear DAL,

[I'd suggest followups should go to registry]

While revising VOResource 1.1, I'd like to gather opinions on whether
VOResource (and, in consequence, the Registry) should support the
declaration of mirrors -- and if so, how.

If you think mirrors should be handled in VOResource, let me know
even if there's nothing else you'd like to say.  Because, due to the
complexities discussed below I'd not plan for it unless there's
(enough) interest in the first place.

If you don't care about mirrors and/or the registry, you can stop
reading now -- there's nothing else below.


(1) The problem

You could say we already support mirrors -- if one and the same
interface is available at several places in the world, you can just
add some accessURLs in to the interface element in your registry
record, like this:

  <capability standardID="ivo://example/whatever">
    <interface std="True">
      <accessURL use="base">http://example.eu/svc/stars?</accessURL>
      <accessURL use="base">http://example.za/svc/stars?</accessURL>
      <accessURL use="base">http://example.cn/svc/stars?</accessURL>
    </interface>
    <param ...
  </capability>

The problem is that that's not good enough.  There's no client
support for this at all.  That's bad because clients will then
probably choose a random mirror (depending on access modalities),
which means that users will, in all likelihood, be directed to a
mirror that's far away from them and probably on a smaller
machine than the main site.

With RegTAP, there's additional pain.  For one, since nobody so
far has used multiple accessURLs and they'd have required an
additional join in almost every query (on top of the capability and
interface joins), RegTAP has said "if someone really does this, just
make new interfaces per accessURL".  So, you'd have lots of
interfaces, which is kind of ugly.  

But worse, since RegTAP is about database tables, there's again no
telling in which order the various accessURLs would come out; for a
while, though, the order would be constant, and if "naive" clients
always used the first interface (I'd suspect that's what legacy
clients do), they'd *all* end up on some small mirror rather than on
the big main site for a while.

So, *if* we "officially" introduce mirror handling in VOResource, it
needs to be done with a bit of deliberation.


(2) Design goal

It'd be fairly important to me to keep "simple" service discovery
possible.  So, I'd say the design goal for mirrors in the Registry
would be

  "Let advanced clients or other parts of the VO infrastructure
  figure out the possible access URLs so it can select one close to
  them.  Plain clients should just be directed to a primary site."


(3) Alternatives

My suspicion is that the Registry is not the ideal component if your
goal is geographical load balancing or even some sort of fallback
scheme.  Here's some other ways I'm aware of:

(a) I guess most commercial services use some sort of GeoIP, i.e., the
DNS responses depend on the geographic location.  So, for instance
here in Baden-Württemberg www.google.de (at the moment) resolves to
2a00:1450:4013:c01::5e, wheres in Saxonia it is
2a00:1450:4001:817::2003.  I've never set something like that up
before, but I'd be surprised if it was hard.

reg.g-vo.org uses something like this for failover (except we're, of
course, switching manually and at any given time everyone sees  the
same address.  But it's playing tricks with DNS nevertheless).

The advantage is that mirror selection is up to whoever maintains the
GeoIP mapping, so you could even do a bit of load balancing in this
way (which clearly wouldn't work when mirror selection is with the
clients).  Also, it's transparent to the clients, which is nice.

The disadvantage is that a client couldn't easily say "I want to go
to the main site" -- which it might, for instance, when it wants to
run a huge TAP job.

(b) A redirector.  I think some content delivery networks work like
this.  The access URL then points to
http://redirector.example.org/svcs/stars, and based on an arbitrary
heuristics, that one would then, respond with a 301 or 303 redirect
to a mirror.  In terms of advantages and disadvantages, that's a bit
like (a), except it would be easier for a client to insist on using
the main site -- it just would go directly there.  If it's smart
enough to figure out there is a primary site, it stands to reason
that it knows its URL, too.

(c) Do nothing.  Perhaps with faster networks and more fiber in the
oceans, there's not much point any more in putting a large effort
into mirrors and their maintenance, which always is a bit of a pain.

(d) you tell me.


(4) VOResource solutions

Here's what I've worked out so far how mirrors could work within the
registry infrastructure.

(a) accessURL attributes

One could use an attribute to say what's the primary site and what's
a mirror.  So, perhaps:

  <capability standardID="ivo://example/whatever">
    <interface std="True">
      <accessURL use="base" priority="primary"
        >http://example.eu/svc/stars?</accessURL>
      <accessURL use="base" priority="fallback"
        >http://example.za/svc/stars?</accessURL>
      <accessURL use="base" priority="mirror"
        >http://example.cn/svc/stars?</accessURL>
    </interface>
    <param ...
  </capability>

(I've added there the possibility of giving another service to use
when the primary site is unresponsive ("fallback"); I believe that's
a bad idea, but that may be still be another use case for marking up
mirrors).

If that went into some searchable registry scheme as-is, legacy
clients would still choose random mirrors, so that's bad.  However,
standards for searchable registries could say that they, the
searchable registries, are to select one accessURL per interface,
based on some smart heuristics.  In that way, we could have regional
registries that have access URLs selected for a particular region.

I'm not convinced we can pull that off, and even if we could, do we
want this, given that the primary site usually is better maintained
than the mirrors, and if the mirror is down you definitely want to go
to the primary site?

So, I'd say if we do something like this, all the mirrors should be in
"another table" (in RegTAP lingo).  That way legacy queries just keep
doing the right thing (for them): lead to the primary site.  Fancy
clients could check the mirrors table and work with what they  get
from there.  But if we want this in a different table, perhaps it's a
different thing in the first place, and we should have

(b) have a separate element, perhaps like this:

  <capability standardID="ivo://example/whatever">
    <interface std="True">
      <accessURL use="base">http://example.eu/svc/stars?</accessURL>
      <mirrorURL>http://example.za/svc/stars?</mirrorURL>
      <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
    </interface>
    <param ...
  </capability>

I guess something like that would be my winner if we decided to go
ahead with VOResource mirrors.  

Is there any metadata that should go ahead with mirrorURL if we went
this way?  Perhaps something to help make a choice between the
mirrors without having to ping them all?


(5) Another issue with mirrors: Availability

If we decide mirrors need to be described interoperably (i.e., make
the VO mirror-aware), there's a second problem: VOSI availability,
i.e., the endpoint that says whether a given service is up and if
not, when one should try again.

Currently, it's modelled as a separated capability, i.e., the
capabilities of a service with mirrors would look like this:

  <capability standardID="ivo://example/whatever">
    <interface std="True">
      <accessURL use="base">http://example.eu/svc/stars?</accessURL>
      <mirrorURL>http://example.za/svc/stars?</mirrorURL>
      <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
    </interface>
    <param ...
  </capability>

  <capability standardID="ivo://ivoa.net/std/VOSI#availability">
    <interface xsi:type="vs:ParamHTTP">
      <accessURL use="full">http://av.example.eu/av/stars</accessURL>
    </interface>
  </capability>

The availability schema doesn't let you specify the status of mirrors
(or, for that matter, alternative interfaces) yet; including
additional mirrorURLs probably isn't terribly helpful because it'd be
hard to match query URL and availability URL.  

If we were serious about mirrors, we'd hence need to fix
availability, too.  This would be a good moment for that because VOSI
is being reviewed as we speak.   But someone would have to volunteer
for actually doing it.

Thanks for making it down here,

         Markus


More information about the dal mailing list