dal Digest, Vol 80, Issue 26
gilles landais
gilles.landais at unistra.fr
Fri Jun 3 09:27:22 CEST 2016
Concerning the mirrors in the registry,
To provide transparent services managing their own mirror-balancing is
obviously what clients expect.
Currently VizieR uses redirection (method (2) of the Markus proposals).
I'm not sure that this is the best way ; we discover in the past that
some client didn't support redirection response.
GeoIP seems to be more attractiv.
However, and indepenently of the technology choosen, to have the list of
mirrors is a nice and complementary method that could be used by
clients. And I think that registry is a good candidate to store the
list. For my part I like the (b) proposal that distinguish accessURL and
mirrorURL (better for cross-version compatibility).
For informations and concerning VizieR mirrors.
VizieR has 7 mirrors today containing the data or a part of the VizieR
data (each mirror contains its own list of big catalogues (even if all
big catalogues are queriable from every mirrors by redirection)).
Preservation is of course improved by mirrors, but mirrors improve also
the service availability.
Currently VizieR combined two methods to manage mirror-balancing.
The first one is based on the GLU system to detect the availability of
CDS services. The VizieR mirror which has the best score in term of
time-response is choosen and becomes the target linked in the CDS web pages.
The second, concerns the VizieR service only (including VOTable output
and simple cone search result). When failure is detected (test every 15
minutes), the mechanism redirects queries (HTTP code 302) to a
predefined VizieR mirror.
Gilles Landais (CDS)
On 31/05/2016 20:22, dal-request at ivoa.net wrote:
> Send dal mailing list submissions to
> dal at ivoa.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.ivoa.net/mailman/listinfo/dal
> or, via email, send a message with subject or body 'help' to
> dal-request at ivoa.net
>
> You can reach the person managing the list at
> dal-owner at ivoa.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of dal digest..."
>
>
> Today's Topics:
>
> 1. VOResource 1.1: Mirrors? (Markus Demleitner)
> 2. Re: VOResource 1.1: Mirrors? (Accomazzi, Alberto)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 31 May 2016 13:53:06 +0200
> From: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
> To: registry at ivoa.net, dal at ivoa.net
> Subject: VOResource 1.1: Mirrors?
> Message-ID: <20160531115306.GA24640 at victor>
> Content-Type: text/plain; charset=utf-8
>
> Dear Registry, Dear DAL,
>
> [I'd suggest followups should go to registry]
>
> While revising VOResource 1.1, I'd like to gather opinions on whether
> VOResource (and, in consequence, the Registry) should support the
> declaration of mirrors -- and if so, how.
>
> If you think mirrors should be handled in VOResource, let me know
> even if there's nothing else you'd like to say. Because, due to the
> complexities discussed below I'd not plan for it unless there's
> (enough) interest in the first place.
>
> If you don't care about mirrors and/or the registry, you can stop
> reading now -- there's nothing else below.
>
>
> (1) The problem
>
> You could say we already support mirrors -- if one and the same
> interface is available at several places in the world, you can just
> add some accessURLs in to the interface element in your registry
> record, like this:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
> <accessURL use="base">http://example.za/svc/stars?</accessURL>
> <accessURL use="base">http://example.cn/svc/stars?</accessURL>
> </interface>
> <param ...
> </capability>
>
> The problem is that that's not good enough. There's no client
> support for this at all. That's bad because clients will then
> probably choose a random mirror (depending on access modalities),
> which means that users will, in all likelihood, be directed to a
> mirror that's far away from them and probably on a smaller
> machine than the main site.
>
> With RegTAP, there's additional pain. For one, since nobody so
> far has used multiple accessURLs and they'd have required an
> additional join in almost every query (on top of the capability and
> interface joins), RegTAP has said "if someone really does this, just
> make new interfaces per accessURL". So, you'd have lots of
> interfaces, which is kind of ugly.
>
> But worse, since RegTAP is about database tables, there's again no
> telling in which order the various accessURLs would come out; for a
> while, though, the order would be constant, and if "naive" clients
> always used the first interface (I'd suspect that's what legacy
> clients do), they'd *all* end up on some small mirror rather than on
> the big main site for a while.
>
> So, *if* we "officially" introduce mirror handling in VOResource, it
> needs to be done with a bit of deliberation.
>
>
> (2) Design goal
>
> It'd be fairly important to me to keep "simple" service discovery
> possible. So, I'd say the design goal for mirrors in the Registry
> would be
>
> "Let advanced clients or other parts of the VO infrastructure
> figure out the possible access URLs so it can select one close to
> them. Plain clients should just be directed to a primary site."
>
>
> (3) Alternatives
>
> My suspicion is that the Registry is not the ideal component if your
> goal is geographical load balancing or even some sort of fallback
> scheme. Here's some other ways I'm aware of:
>
> (a) I guess most commercial services use some sort of GeoIP, i.e., the
> DNS responses depend on the geographic location. So, for instance
> here in Baden-W?rttemberg www.google.de (at the moment) resolves to
> 2a00:1450:4013:c01::5e, wheres in Saxonia it is
> 2a00:1450:4001:817::2003. I've never set something like that up
> before, but I'd be surprised if it was hard.
>
> reg.g-vo.org uses something like this for failover (except we're, of
> course, switching manually and at any given time everyone sees the
> same address. But it's playing tricks with DNS nevertheless).
>
> The advantage is that mirror selection is up to whoever maintains the
> GeoIP mapping, so you could even do a bit of load balancing in this
> way (which clearly wouldn't work when mirror selection is with the
> clients). Also, it's transparent to the clients, which is nice.
>
> The disadvantage is that a client couldn't easily say "I want to go
> to the main site" -- which it might, for instance, when it wants to
> run a huge TAP job.
>
> (b) A redirector. I think some content delivery networks work like
> this. The access URL then points to
> http://redirector.example.org/svcs/stars, and based on an arbitrary
> heuristics, that one would then, respond with a 301 or 303 redirect
> to a mirror. In terms of advantages and disadvantages, that's a bit
> like (a), except it would be easier for a client to insist on using
> the main site -- it just would go directly there. If it's smart
> enough to figure out there is a primary site, it stands to reason
> that it knows its URL, too.
>
> (c) Do nothing. Perhaps with faster networks and more fiber in the
> oceans, there's not much point any more in putting a large effort
> into mirrors and their maintenance, which always is a bit of a pain.
>
> (d) you tell me.
>
>
> (4) VOResource solutions
>
> Here's what I've worked out so far how mirrors could work within the
> registry infrastructure.
>
> (a) accessURL attributes
>
> One could use an attribute to say what's the primary site and what's
> a mirror. So, perhaps:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base" priority="primary"
> >http://example.eu/svc/stars?</accessURL>
> <accessURL use="base" priority="fallback"
> >http://example.za/svc/stars?</accessURL>
> <accessURL use="base" priority="mirror"
> >http://example.cn/svc/stars?</accessURL>
> </interface>
> <param ...
> </capability>
>
> (I've added there the possibility of giving another service to use
> when the primary site is unresponsive ("fallback"); I believe that's
> a bad idea, but that may be still be another use case for marking up
> mirrors).
>
> If that went into some searchable registry scheme as-is, legacy
> clients would still choose random mirrors, so that's bad. However,
> standards for searchable registries could say that they, the
> searchable registries, are to select one accessURL per interface,
> based on some smart heuristics. In that way, we could have regional
> registries that have access URLs selected for a particular region.
>
> I'm not convinced we can pull that off, and even if we could, do we
> want this, given that the primary site usually is better maintained
> than the mirrors, and if the mirror is down you definitely want to go
> to the primary site?
>
> So, I'd say if we do something like this, all the mirrors should be in
> "another table" (in RegTAP lingo). That way legacy queries just keep
> doing the right thing (for them): lead to the primary site. Fancy
> clients could check the mirrors table and work with what they get
> from there. But if we want this in a different table, perhaps it's a
> different thing in the first place, and we should have
>
> (b) have a separate element, perhaps like this:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
> <mirrorURL>http://example.za/svc/stars?</mirrorURL>
> <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
> </interface>
> <param ...
> </capability>
>
> I guess something like that would be my winner if we decided to go
> ahead with VOResource mirrors.
>
> Is there any metadata that should go ahead with mirrorURL if we went
> this way? Perhaps something to help make a choice between the
> mirrors without having to ping them all?
>
>
> (5) Another issue with mirrors: Availability
>
> If we decide mirrors need to be described interoperably (i.e., make
> the VO mirror-aware), there's a second problem: VOSI availability,
> i.e., the endpoint that says whether a given service is up and if
> not, when one should try again.
>
> Currently, it's modelled as a separated capability, i.e., the
> capabilities of a service with mirrors would look like this:
>
> <capability standardID="ivo://example/whatever">
> <interface std="True">
> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
> <mirrorURL>http://example.za/svc/stars?</mirrorURL>
> <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
> </interface>
> <param ...
> </capability>
>
> <capability standardID="ivo://ivoa.net/std/VOSI#availability">
> <interface xsi:type="vs:ParamHTTP">
> <accessURL use="full">http://av.example.eu/av/stars</accessURL>
> </interface>
> </capability>
>
> The availability schema doesn't let you specify the status of mirrors
> (or, for that matter, alternative interfaces) yet; including
> additional mirrorURLs probably isn't terribly helpful because it'd be
> hard to match query URL and availability URL.
>
> If we were serious about mirrors, we'd hence need to fix
> availability, too. This would be a good moment for that because VOSI
> is being reviewed as we speak. But someone would have to volunteer
> for actually doing it.
>
> Thanks for making it down here,
>
> Markus
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 31 May 2016 14:21:26 -0400
> From: "Accomazzi, Alberto" <aaccomazzi at cfa.harvard.edu>
> To: registry at ivoa.net, DAL mailing list <dal at ivoa.net>
> Subject: Re: VOResource 1.1: Mirrors?
> Message-ID:
> <CAOFyuCz+gQ6e7tGhM8oLOrmSXodGA5ZG=u1hpXx5Hpj=XS4Pow at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Markus,
>
> Apologies in advance for muddying the waters further, as I will provide
> none of the feedback that your email was eliciting, but rather elaborate a
> bit on a coupe of issues related to this topic.
>
> My first reaction to the whole issue of mirrors is "we don't really need
> them anymore." But this is purely based on the fact that for most of us
> everyday bandwidth has become abundant and well-connected data centers are
> able to keep up with the demand of most users (this is certainly the case
> for ADS and arXiv where payloads are not great but connections are).
> Abstract the infrastructure further via cloud-hosting and the problem
> seemed solved for good from where I stand. Unfortunately, this analysis
> may not reflect the needs and concerns of the big centers where data volume
> and egress costs are significant, so I do hope that some of the big-data
> projects chime in. IMHO it would not be terrible to let the end user
> choose, rather than trying to complicate the protocol and infrastructure,
> as an astronomer will probably know whether NRAO is better than ESO or NAOJ
> as the source of ALMA data depending on her location. Individual projects
> such as Vizier have their own way to provide load-balancing via redirection
> when geolocation features are enabled, so maybe it's not really a VO
> problem unless there is an upswell of demand with concrete use cases to
> back it up.
>
> The second thought follows from the first one, and relates to the ability
> to identify mirror services. We are already seeing the replication of
> services through data hosting, which may not produce exact clones but which
> nonetheless often creates sites with semantically equivalent data and
> services. Right now as far as I can tell there is no way to tell whether
> the instances of the SDSS DR8 listed in the registry differ from each
> other, and what the provenance of each one is. Since these correspond to
> different IVOIDs there is no way right now to express their relatedness as
> far as I can tell, but in an era of "data publishing" I think this should
> be documented, and possibly made machine-readable for those cases where the
> provenance is straightforward.
>
> The last thing I wanted to mention was something which may sound far
> fetched today but which came up during a coffee break conversation in Cape
> Town. The topic was deploying computing resources near large datasets for
> data-intensive research and the desire to be able to provide a sandbox
> environment where a script or pipeline which worked anywhere on the
> internet could be run more efficiently in a container with access to local
> data. Somebody (sorry forgot who -- maybe Dave Morris?) mentioned the
> difficulty in having a VO application which uses the registry discover the
> presence of a local data service (presumably something running on a local
> port) if the application is written do "do the right thing" and look things
> up in the registry. Is this a use case which should be considered as part
> of this conversation?
>
> Thanks,
> -- Alberto
>
>
>
> On Tue, May 31, 2016 at 7:53 AM, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
>
>> Dear Registry, Dear DAL,
>>
>> [I'd suggest followups should go to registry]
>>
>> While revising VOResource 1.1, I'd like to gather opinions on whether
>> VOResource (and, in consequence, the Registry) should support the
>> declaration of mirrors -- and if so, how.
>>
>> If you think mirrors should be handled in VOResource, let me know
>> even if there's nothing else you'd like to say. Because, due to the
>> complexities discussed below I'd not plan for it unless there's
>> (enough) interest in the first place.
>>
>> If you don't care about mirrors and/or the registry, you can stop
>> reading now -- there's nothing else below.
>>
>>
>> (1) The problem
>>
>> You could say we already support mirrors -- if one and the same
>> interface is available at several places in the world, you can just
>> add some accessURLs in to the interface element in your registry
>> record, like this:
>>
>> <capability standardID="ivo://example/whatever">
>> <interface std="True">
>> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
>> <accessURL use="base">http://example.za/svc/stars?</accessURL>
>> <accessURL use="base">http://example.cn/svc/stars?</accessURL>
>> </interface>
>> <param ...
>> </capability>
>>
>> The problem is that that's not good enough. There's no client
>> support for this at all. That's bad because clients will then
>> probably choose a random mirror (depending on access modalities),
>> which means that users will, in all likelihood, be directed to a
>> mirror that's far away from them and probably on a smaller
>> machine than the main site.
>>
>> With RegTAP, there's additional pain. For one, since nobody so
>> far has used multiple accessURLs and they'd have required an
>> additional join in almost every query (on top of the capability and
>> interface joins), RegTAP has said "if someone really does this, just
>> make new interfaces per accessURL". So, you'd have lots of
>> interfaces, which is kind of ugly.
>>
>> But worse, since RegTAP is about database tables, there's again no
>> telling in which order the various accessURLs would come out; for a
>> while, though, the order would be constant, and if "naive" clients
>> always used the first interface (I'd suspect that's what legacy
>> clients do), they'd *all* end up on some small mirror rather than on
>> the big main site for a while.
>>
>> So, *if* we "officially" introduce mirror handling in VOResource, it
>> needs to be done with a bit of deliberation.
>>
>>
>> (2) Design goal
>>
>> It'd be fairly important to me to keep "simple" service discovery
>> possible. So, I'd say the design goal for mirrors in the Registry
>> would be
>>
>> "Let advanced clients or other parts of the VO infrastructure
>> figure out the possible access URLs so it can select one close to
>> them. Plain clients should just be directed to a primary site."
>>
>>
>> (3) Alternatives
>>
>> My suspicion is that the Registry is not the ideal component if your
>> goal is geographical load balancing or even some sort of fallback
>> scheme. Here's some other ways I'm aware of:
>>
>> (a) I guess most commercial services use some sort of GeoIP, i.e., the
>> DNS responses depend on the geographic location. So, for instance
>> here in Baden-W?rttemberg www.google.de (at the moment) resolves to
>> 2a00:1450:4013:c01::5e, wheres in Saxonia it is
>> 2a00:1450:4001:817::2003. I've never set something like that up
>> before, but I'd be surprised if it was hard.
>>
>> reg.g-vo.org uses something like this for failover (except we're, of
>> course, switching manually and at any given time everyone sees the
>> same address. But it's playing tricks with DNS nevertheless).
>>
>> The advantage is that mirror selection is up to whoever maintains the
>> GeoIP mapping, so you could even do a bit of load balancing in this
>> way (which clearly wouldn't work when mirror selection is with the
>> clients). Also, it's transparent to the clients, which is nice.
>>
>> The disadvantage is that a client couldn't easily say "I want to go
>> to the main site" -- which it might, for instance, when it wants to
>> run a huge TAP job.
>>
>> (b) A redirector. I think some content delivery networks work like
>> this. The access URL then points to
>> http://redirector.example.org/svcs/stars, and based on an arbitrary
>> heuristics, that one would then, respond with a 301 or 303 redirect
>> to a mirror. In terms of advantages and disadvantages, that's a bit
>> like (a), except it would be easier for a client to insist on using
>> the main site -- it just would go directly there. If it's smart
>> enough to figure out there is a primary site, it stands to reason
>> that it knows its URL, too.
>>
>> (c) Do nothing. Perhaps with faster networks and more fiber in the
>> oceans, there's not much point any more in putting a large effort
>> into mirrors and their maintenance, which always is a bit of a pain.
>>
>> (d) you tell me.
>>
>>
>> (4) VOResource solutions
>>
>> Here's what I've worked out so far how mirrors could work within the
>> registry infrastructure.
>>
>> (a) accessURL attributes
>>
>> One could use an attribute to say what's the primary site and what's
>> a mirror. So, perhaps:
>>
>> <capability standardID="ivo://example/whatever">
>> <interface std="True">
>> <accessURL use="base" priority="primary"
>> >http://example.eu/svc/stars?</accessURL>
>> <accessURL use="base" priority="fallback"
>> >http://example.za/svc/stars?</accessURL>
>> <accessURL use="base" priority="mirror"
>> >http://example.cn/svc/stars?</accessURL>
>> </interface>
>> <param ...
>> </capability>
>>
>> (I've added there the possibility of giving another service to use
>> when the primary site is unresponsive ("fallback"); I believe that's
>> a bad idea, but that may be still be another use case for marking up
>> mirrors).
>>
>> If that went into some searchable registry scheme as-is, legacy
>> clients would still choose random mirrors, so that's bad. However,
>> standards for searchable registries could say that they, the
>> searchable registries, are to select one accessURL per interface,
>> based on some smart heuristics. In that way, we could have regional
>> registries that have access URLs selected for a particular region.
>>
>> I'm not convinced we can pull that off, and even if we could, do we
>> want this, given that the primary site usually is better maintained
>> than the mirrors, and if the mirror is down you definitely want to go
>> to the primary site?
>>
>> So, I'd say if we do something like this, all the mirrors should be in
>> "another table" (in RegTAP lingo). That way legacy queries just keep
>> doing the right thing (for them): lead to the primary site. Fancy
>> clients could check the mirrors table and work with what they get
>> from there. But if we want this in a different table, perhaps it's a
>> different thing in the first place, and we should have
>>
>> (b) have a separate element, perhaps like this:
>>
>> <capability standardID="ivo://example/whatever">
>> <interface std="True">
>> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
>> <mirrorURL>http://example.za/svc/stars?</mirrorURL>
>> <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
>> </interface>
>> <param ...
>> </capability>
>>
>> I guess something like that would be my winner if we decided to go
>> ahead with VOResource mirrors.
>>
>> Is there any metadata that should go ahead with mirrorURL if we went
>> this way? Perhaps something to help make a choice between the
>> mirrors without having to ping them all?
>>
>>
>> (5) Another issue with mirrors: Availability
>>
>> If we decide mirrors need to be described interoperably (i.e., make
>> the VO mirror-aware), there's a second problem: VOSI availability,
>> i.e., the endpoint that says whether a given service is up and if
>> not, when one should try again.
>>
>> Currently, it's modelled as a separated capability, i.e., the
>> capabilities of a service with mirrors would look like this:
>>
>> <capability standardID="ivo://example/whatever">
>> <interface std="True">
>> <accessURL use="base">http://example.eu/svc/stars?</accessURL>
>> <mirrorURL>http://example.za/svc/stars?</mirrorURL>
>> <mirrorURL>http://example.cn/svc/stars?</mirrorURL>
>> </interface>
>> <param ...
>> </capability>
>>
>> <capability standardID="ivo://ivoa.net/std/VOSI#availability">
>> <interface xsi:type="vs:ParamHTTP">
>> <accessURL use="full">http://av.example.eu/av/stars</accessURL>
>> </interface>
>> </capability>
>>
>> The availability schema doesn't let you specify the status of mirrors
>> (or, for that matter, alternative interfaces) yet; including
>> additional mirrorURLs probably isn't terribly helpful because it'd be
>> hard to match query URL and availability URL.
>>
>> If we were serious about mirrors, we'd hence need to fix
>> availability, too. This would be a good moment for that because VOSI
>> is being reviewed as we speak. But someone would have to volunteer
>> for actually doing it.
>>
>> Thanks for making it down here,
>>
>> Markus
>>
>
>
More information about the dal
mailing list