VOResource 1.1: Mirrors?
Accomazzi, Alberto
aaccomazzi at cfa.harvard.edu
Tue May 31 20:21:26 CEST 2016
Hi Markus,
Apologies in advance for muddying the waters further, as I will provide
none of the feedback that your email was eliciting, but rather elaborate a
bit on a coupe of issues related to this topic.
My first reaction to the whole issue of mirrors is "we don't really need
them anymore." But this is purely based on the fact that for most of us
everyday bandwidth has become abundant and well-connected data centers are
able to keep up with the demand of most users (this is certainly the case
for ADS and arXiv where payloads are not great but connections are).
Abstract the infrastructure further via cloud-hosting and the problem
seemed solved for good from where I stand. Unfortunately, this analysis
may not reflect the needs and concerns of the big centers where data volume
and egress costs are significant, so I do hope that some of the big-data
projects chime in. IMHO it would not be terrible to let the end user
choose, rather than trying to complicate the protocol and infrastructure,
as an astronomer will probably know whether NRAO is better than ESO or NAOJ
as the source of ALMA data depending on her location. Individual projects
such as Vizier have their own way to provide load-balancing via redirection
when geolocation features are enabled, so maybe it's not really a VO
problem unless there is an upswell of demand with concrete use cases to
back it up.
The second thought follows from the first one, and relates to the ability
to identify mirror services. We are already seeing the replication of
services through data hosting, which may not produce exact clones but which
nonetheless often creates sites with semantically equivalent data and
services. Right now as far as I can tell there is no way to tell whether
the instances of the SDSS DR8 listed in the registry differ from each
other, and what the provenance of each one is. Since these correspond to
different IVOIDs there is no way right now to express their relatedness as
far as I can tell, but in an era of "data publishing" I think this should
be documented, and possibly made machine-readable for those cases where the
provenance is straightforward.
The last thing I wanted to mention was something which may sound far
fetched today but which came up during a coffee break conversation in Cape
Town. The topic was deploying computing resources near large datasets for
data-intensive research and the desire to be able to provide a sandbox
environment where a script or pipeline which worked anywhere on the
internet could be run more efficiently in a container with access to local
data. Somebody (sorry forgot who -- maybe Dave Morris?) mentioned the
difficulty in having a VO application which uses the registry discover the
presence of a local data service (presumably something running on a local
port) if the application is written do "do the right thing" and look things
up in the registry. Is this a use case which should be considered as part
of this conversation?
Thanks,
-- Alberto
On Tue, May 31, 2016 at 7:53 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:
> Dear Registry, Dear DAL,
>
> [I'd suggest followups should go to registry]
>
> While revising VOResource 1.1, I'd like to gather opinions on whether
> VOResource (and, in consequence, the Registry) should support the
> declaration of mirrors -- and if so, how.
>
> If you think mirrors should be handled in VOResource, let me know
> even if there's nothing else you'd like to say. Because, due to the
> complexities discussed below I'd not plan for it unless there's
> (enough) interest in the first place.
>
> If you don't care about mirrors and/or the registry, you can stop
> reading now -- there's nothing else below.
>
>
> (1) The problem
>
> You could say we already support mirrors -- if one and the same
> interface is available at several places in the world, you can just
> add some accessURLs in to the interface element in your registry
> record, like this:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
> <accessURL use="base">http://example.za/svc/stars?</accessURL>
> <accessURL use="base">http://example.cn/svc/stars?</accessURL>
> </interface>
> <param ...
> </capability>
>
> The problem is that that's not good enough. There's no client
> support for this at all. That's bad because clients will then
> probably choose a random mirror (depending on access modalities),
> which means that users will, in all likelihood, be directed to a
> mirror that's far away from them and probably on a smaller
> machine than the main site.
>
> With RegTAP, there's additional pain. For one, since nobody so
> far has used multiple accessURLs and they'd have required an
> additional join in almost every query (on top of the capability and
> interface joins), RegTAP has said "if someone really does this, just
> make new interfaces per accessURL". So, you'd have lots of
> interfaces, which is kind of ugly.
>
> But worse, since RegTAP is about database tables, there's again no
> telling in which order the various accessURLs would come out; for a
> while, though, the order would be constant, and if "naive" clients
> always used the first interface (I'd suspect that's what legacy
> clients do), they'd *all* end up on some small mirror rather than on
> the big main site for a while.
>
> So, *if* we "officially" introduce mirror handling in VOResource, it
> needs to be done with a bit of deliberation.
>
>
> (2) Design goal
>
> It'd be fairly important to me to keep "simple" service discovery
> possible. So, I'd say the design goal for mirrors in the Registry
> would be
>
> "Let advanced clients or other parts of the VO infrastructure
> figure out the possible access URLs so it can select one close to
> them. Plain clients should just be directed to a primary site."
>
>
> (3) Alternatives
>
> My suspicion is that the Registry is not the ideal component if your
> goal is geographical load balancing or even some sort of fallback
> scheme. Here's some other ways I'm aware of:
>
> (a) I guess most commercial services use some sort of GeoIP, i.e., the
> DNS responses depend on the geographic location. So, for instance
> here in Baden-Württemberg www.google.de (at the moment) resolves to
> 2a00:1450:4013:c01::5e, wheres in Saxonia it is
> 2a00:1450:4001:817::2003. I've never set something like that up
> before, but I'd be surprised if it was hard.
>
> reg.g-vo.org uses something like this for failover (except we're, of
> course, switching manually and at any given time everyone sees the
> same address. But it's playing tricks with DNS nevertheless).
>
> The advantage is that mirror selection is up to whoever maintains the
> GeoIP mapping, so you could even do a bit of load balancing in this
> way (which clearly wouldn't work when mirror selection is with the
> clients). Also, it's transparent to the clients, which is nice.
>
> The disadvantage is that a client couldn't easily say "I want to go
> to the main site" -- which it might, for instance, when it wants to
> run a huge TAP job.
>
> (b) A redirector. I think some content delivery networks work like
> this. The access URL then points to
> http://redirector.example.org/svcs/stars, and based on an arbitrary
> heuristics, that one would then, respond with a 301 or 303 redirect
> to a mirror. In terms of advantages and disadvantages, that's a bit
> like (a), except it would be easier for a client to insist on using
> the main site -- it just would go directly there. If it's smart
> enough to figure out there is a primary site, it stands to reason
> that it knows its URL, too.
>
> (c) Do nothing. Perhaps with faster networks and more fiber in the
> oceans, there's not much point any more in putting a large effort
> into mirrors and their maintenance, which always is a bit of a pain.
>
> (d) you tell me.
>
>
> (4) VOResource solutions
>
> Here's what I've worked out so far how mirrors could work within the
> registry infrastructure.
>
> (a) accessURL attributes
>
> One could use an attribute to say what's the primary site and what's
> a mirror. So, perhaps:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base" priority="primary"
> >http://example.eu/svc/stars?</accessURL>
> <accessURL use="base" priority="fallback"
> >http://example.za/svc/stars?</accessURL>
> <accessURL use="base" priority="mirror"
> >http://example.cn/svc/stars?</accessURL>
> </interface>
> <param ...
> </capability>
>
> (I've added there the possibility of giving another service to use
> when the primary site is unresponsive ("fallback"); I believe that's
> a bad idea, but that may be still be another use case for marking up
> mirrors).
>
> If that went into some searchable registry scheme as-is, legacy
> clients would still choose random mirrors, so that's bad. However,
> standards for searchable registries could say that they, the
> searchable registries, are to select one accessURL per interface,
> based on some smart heuristics. In that way, we could have regional
> registries that have access URLs selected for a particular region.
>
> I'm not convinced we can pull that off, and even if we could, do we
> want this, given that the primary site usually is better maintained
> than the mirrors, and if the mirror is down you definitely want to go
> to the primary site?
>
> So, I'd say if we do something like this, all the mirrors should be in
> "another table" (in RegTAP lingo). That way legacy queries just keep
> doing the right thing (for them): lead to the primary site. Fancy
> clients could check the mirrors table and work with what they get
> from there. But if we want this in a different table, perhaps it's a
> different thing in the first place, and we should have
>
> (b) have a separate element, perhaps like this:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
> <mirrorURL>http://example.za/svc/stars?</mirrorURL>
> <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
> </interface>
> <param ...
> </capability>
>
> I guess something like that would be my winner if we decided to go
> ahead with VOResource mirrors.
>
> Is there any metadata that should go ahead with mirrorURL if we went
> this way? Perhaps something to help make a choice between the
> mirrors without having to ping them all?
>
>
> (5) Another issue with mirrors: Availability
>
> If we decide mirrors need to be described interoperably (i.e., make
> the VO mirror-aware), there's a second problem: VOSI availability,
> i.e., the endpoint that says whether a given service is up and if
> not, when one should try again.
>
> Currently, it's modelled as a separated capability, i.e., the
> capabilities of a service with mirrors would look like this:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
> <mirrorURL>http://example.za/svc/stars?</mirrorURL>
> <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
> </interface>
> <param ...
> </capability>
>
> <capability standardID="ivo://ivoa.net/std/VOSI#availability">
> <interface xsi:type="vs:ParamHTTP">
> <accessURL use="full">http://av.example.eu/av/stars</accessURL>
> </interface>
> </capability>
>
> The availability schema doesn't let you specify the status of mirrors
> (or, for that matter, alternative interfaces) yet; including
> additional mirrorURLs probably isn't terribly helpful because it'd be
> hard to match query URL and availability URL.
>
> If we were serious about mirrors, we'd hence need to fix
> availability, too. This would be a good moment for that because VOSI
> is being reviewed as we speak. But someone would have to volunteer
> for actually doing it.
>
> Thanks for making it down here,
>
> Markus
>
--
Dr. Alberto Accomazzi
Principal Investigator
NASA Astrophysics Data System - http://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
60 Garden St, MS 83, Cambridge, MA 02138, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20160531/6974608f/attachment-0001.html>
More information about the registry
mailing list