datalink questions

Fri Sep 4 17:16:35 CEST 2015

Hello Tom,

If we understand well, you want to allow users to pick some files one into the file hierarchy.
Let's take an example sketching out different ways to store HE data attached to one observation (according to my XMM experience) :
- obs/exposure/[images,spectra,...]
     or
- obs/[spectra,images,...]/[exposures]
     or
- obs/level0/raw-data
      /level1/event-list1,....
      /level2/[spectra,images]/[exposures]

In this use case, the user wants to get data attached to one given observation.
He needs to browse through the tree recursively, he needs to understand the nature the data on each tree level and finally he 
needs a way to access to content of the selected files.

The way to do it is to enable your DL service to list the content (either files or subdirs) of a given node identified by some 
identifier.
The question of encoding the node path into this identifier is well covered by Markus's proposal.
If the node is a subdirectory, the DL request returns its content as a list of subdirs and files, if it is a single product 
file, it returns a direct download link to that file. Other links can be added such as one returning the whole directory content 
in a tar-ball for instance or a more complete description of the dataset.
This way, the user can explore the whole hierarchy by running recursive DL requests (hoping from node to node).

In a DL response, a node content can be shown either as a custom service or as a flat list as suggested by Markus.
In a case of a custom service each node content is exposed by one OPTION element.
<RESOURCE .....>
...
<PARAM name="accessURL" value="?"/>
<GROUP name="inputParams>
<PARAM name="location" value="&"/>
<PARAM name="=">
<OPTION>event-list1</OPTION>
<OPTION>event-list2</OPTION>
<OPTION>....</OPTION>
</PARAM>
</GROUP>
</RESOURCE>

The URL http://mycustomservice?location=obs.level1&item=event-list1 will return either the content of the subdirectory 
"event-list1" or an access to the file.
This solution has the great drawback to not provide standard way to attach semantic to those OPTIONS.
So using DL responses with one link per item as suggested by Markus is a better solution: it contains an  explicit semantic field.
However, this is unsatisfactory for 2 reasons:
1) The current vocabulary is rather limited (see http://www.ivoa.net/rdf/datalink/). A proper semantic would be necessary to 
automate the exploration of such dataset but the current one could be extended.
2) The semantic attached to each node (not to an end-product) must tell 2 different things:
  a) This link is a peculiar service of some kind (here listing the directory content)
  b) The rationale for these items grouping

As a conclusion, the standard is able to tackle with this use-case, but we need to extend the vocabulary to allow some 
automation in the processing of the DL responses (humans can do with textual description field)

Laurent François & Mireille

Le 02/09/2015 10:00, Markus Demleitner a écrit :
> Hi Tom, hi list,
>
> On Mon, Aug 31, 2015 at 09:29:25AM -0400, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
>> I was on vacation last week, so sorry for the delay in
>> responding...  I'm still pretty confused but I've tried to clarify
>> my issues a little.  The biggest help would be a pointer to active
>> services that use both the service_def and which have multiple
>> fields used to point to the link.
>
> I'm not aware of a service that uses multiple fields, and I don't
> have a use case for anything like it.  Now that we have the feature,
> it'd be great if there was such a service, though, so client
> implementors failing to do what the spec says have a service they'll
> fail on.
>
> However, conceptually it wouldn't be very hard.  Consider, for
> example,
>
> http://dc.zah.uni-heidelberg.de/feros/q/ssa/ssap.xml?REQUEST=queryData&MAXREC=1
>
> (you'll want an XML pretty-printer for that).
>
> There are two datalink resources in there.  One is:
>
>    <RESOURCE type="meta" utype="adhoc:service">
>      <GROUP name="inputParams">
>        <PARAM arraysize="*" datatype="char" name="ID" ref="ssa_pubDID"
>          ucd="meta.id;meta.main" value=""/>
>      </GROUP>
>      <PARAM arraysize="*" datatype="char" name="standardID"
>        value="ivo://ivoa.net/std/DataLink#links-1.0"/>
>      <PARAM arraysize="*" datatype="char" name="accessURL"
>        value="http://dc.zah.uni-heidelberg.de/feros/q/sdl/dlmeta"/>
>    </RESOURCE>
>
> (I've removed a LINK that's in the live version for reasons of
> backward compatibility).
>
> As the standard id says, that points to the datalink service itself.
> It has one parameter (as behooves a datalink service), ID.  While you
> probably could add more parameters here, too, possibly even with ref,
> your service would still have to work only with ID, as that's what
> datalink wants.
>
> And then there's
>
>    <RESOURCE ID="apudntihmadn" type="meta" utype="adhoc:service">
>      [...]
>      <GROUP name="inputParams">
>        <PARAM arraysize="*" datatype="char" name="ID" ref="ssa_pubDID"
>            ucd="meta.id;meta.main" value="">
>          <DESCRIPTION>The pubisher DID of the dataset of interest</DESCRIPTION>
>          <LINK content-role="ddl:id-source" value="#ssa_pubDID"/>
>        </PARAM>
>        <PARAM arraysize="*" datatype="char" name="FLUXCALIB" ucd="phot.calib"
>          utype="ssa:Char.FluxAxis.Calibration" value="">
>          <DESCRIPTION>Recalibrate the spectrum.  Right now, the only
>            recalibration supported is max(flux)=1 ('RELATIVE').</DESCRIPTION>
>          <VALUES>
>            <OPTION name="RELATIVE" value="RELATIVE"/>
>            <OPTION name="UNCALIBRATED" value="UNCALIBRATED"/>
>          </VALUES>
>        </PARAM>
>        [...]
>      </GROUP>
>      <PARAM arraysize="*" datatype="char" name="accessURL" ucd="meta.ref.url"
>        value="http://dc.zah.uni-heidelberg.de/feros/q/sdl/dlget"/>
>      <PARAM arraysize="*" datatype="char" name="standardID"
>        value="ivo://ivoa.net/std/SSDP#sync"/>
>    </RESOURCE>
>
> (where I've made up the standard id for what's now called AccessData
> and what should, I believe, really be called Server Side Data
> Processing or something else less generic).
>
> So, that's a service that lets you do cutouts, recalibrations, and
> similar operations.  And here, you could bind certain parameters to
> table columns.  For instance, you could say
>
>     <PARAM arraysize="*" datatype="char" name="FLUXCALIB" ucd="phot.calib"
>        utype="ssa:Char.FluxAxis.Calibration" value=""
>        ref="ssa_fluxcalib">
>
> and a client would, on service invocation, take the value of
> FLUXCALIB from the ssa_fluxcalib column.  For why one might want
> this, I'd like to defer to the champions of that feature.
>
>>>> When I invoke   http://xxx?id=1234  I want to get two rows back (see the
>>>> first question).  The first points to the URL for the observation,
>>>> the second will request subproducts at the next level in the recursion.
>>> Ok, this is the datalink call.
>> Right.
>>>> These be given by the URL
>>>>     http://xxx?id=1234&products=sub1,sub2,sub3
>>> ...and that's now a call to a custom service that just happens to
>>> share its endpoint with a datalink service.
>> Did I get this right so far?
>>
>> Not really "just happens"...  The idea is that the http://xxx URL
>> is a generic data product service.  We give it a row identifier (in
>> practice we'd give both a table and row identifier) and it returns
>> data products associated with that row.  If no product is specified
>> it returns 'root' products.  But each root product may have one or
>> more subproducts that I'd like to link to.  Typically the root
>> product will be a directory and the subproducts subdirectories or
>> specific files, but that's not required.
>
> Being a fan of good-looking URLs I'm not sure this would be a design
> I'd choose, but one thing I'd really avoid is the "products=a,b,c"
> thing.  In the spirit of DALI and web forms, I'd much rather
> recommend products=a&products=b&products=c.  Added benefit: If
> a comma turns up in a, b, or c you're fine.
>
>>>> Two possibilities from the documentation....
>>>>
>>>>     <VOTABLE>... <FIELD="service_def"><FIELD="access_url">...
>>>> <TR><TD/><TD>http://heasarc/obs/1234</td>..
>>>> <TR><TD>id=1234&Products=sub1,sub2,sub3</TD><TD/>
>>>>     </VOTABLE>
>>> I don't quite get this one, I have to say.  But I guess this is meant
>>> to be the reponse on the *datalink* endpoint, right?  If so, why
>>> don't you just give the two URLs and be done with it?
>>
>> Then I'm not showing the hierarchical structure of the products. There can
>> be hundreds of different
>> product types for a given table.  Returning them all to the user without
>> providing any structure is
>> potentially confusing.  However if using a service_def I can point to just
>> the next level of hierarchy
>> of products then it keeps things comprehensible for the user.
>>
>> Regardless, the question is what string do I put in the service_def to point
>> to some more data products?
>> Can you provide a complete example?
>
> Let me try to describe what I think your use case is: You have n
> "datasets", each of which consists of 100s of files that allow some
> semantic grouping.  For the sake of examples, say it's herbiv, carniv,
> and plant.  So, it'd look like this:
>
> zoo1/
>    herbi/
>      cows/
>        file1.txt
>        pie.txt
>      geese/
>        wing.part
>        foot.part
>        otherfoot.part
>    carni/
>      lions/
>      tigers/
> zoo2/
>    herbi/
>      rabbit/
>    plant/
>      rose/
>      hyacinth/
>      daisy/
>
> If I had that structure, I'd not bother with service_def at all
> (beyond the basic descriptor as given above).
>
> Instead, I'd be a bit generous on the term "dataset" and define an id
> for each file in the hierarchy.  So, the root (and the actual
> dataset) for the first dataset would be
>
>    ivo://example.com/data?zoo1
>
> If your datalink service is http://example.com/dl, you would then
> return from http://example.com/dl?ID=ivo://example.com/data?zoo1
>
> something like
>
>    ID                 URL                                                          mime                                        semantics    description
>    ivo://(as above)   http://example.com/dl?ID=ivo://example.com/data?zoo1/herbi   application/x-votable+xml;content=datalink  #progenitor  What is being eaten
>    ivo://(as above)   http://example.com/dl?ID=ivo://example.com/data?zoo1/carni   application/x-votable+xml;content=datalink  #derivation  What eats the beasts
>
> -- so, you're just pointing back at the datalink service again.
> Incidentally, there's nothing that requires this parameter-based
> approach.  If I had do so such a thing, I'd be severely tempted to
> statically generate the datalink documents, put them into the respective
> directories as index.datalink, configure an apache to deliver them with
> the appropriate mime and as index documents, and I'd have beautiful
> links to the datalink documents (simply the directory URLs).  I like
> that idea so much I almost wish I had such data...
>
> Anyway, if you then retrieve the URL
> http://example.com/dl?ID=ivo://example.com/data?zoo1/herbi that's given
> up there, you'd get a datalink document that would look like this:
>
>    ID                    URL                                                               mime                                        semantics            description
>    ivo://...zoo1/herbi   http://example.com/dl?ID=ivo://example.com/data?zoo1/herbi/cows   application/x-votable+xml;content=datalink  http.//e#animal      Moo
>    ivo://...zoo1/herbi   http://example.com/dl?ID=ivo://example.com/data?zoo1/herbi/geese  application/x-votable+xml;content=datalink  http://e#animal      Small but noisy
>
> Of course, you could have put in more than one ID here (which is an
> advantage over the fs-based approach outlined above.  For instance, one
> could finally call the datalink service with both cows and geese, which
> would then yield:
>
>    ID                    URL                                                               mime                                        semantics            description
>    ivo://...herbi/cows   http://example.com/herbi/cows/file1.txt       text/plain             #noise      It's not always quiet near cows
>    ivo://...herbi/cows   http://example.com/herbi/cows/pie.txt         poo/soft               #weight     Better than cold feet anyway
>    ivo://...herbi/geese  http://example.com/herbi/geese/wing.part      application/singlepart #auxiliary  No flying without
>    ivo://...herbi/geese  http://example.com/herbi/geese/foot.part      application/singlepart #auxiliary  No walking without
>    ivo://...herbi/geese  http://example.com/herbi/geese/otherfoot.part application/singlepart #auxiliary  No swimming without
>
> Does that help?
>
> Cheers,
>
>              Markus
>

-- 
jesuischarlie

Laurent Michel
SSC XMM-Newton
Tél : +33 (0)3 68 85 24 37
Fax : +33 (0)3 )3 68 85 24 32
laurent.michel at astro.unistra.fr <mailto:laurent.michel at astro.unistra.fr>
Université de Strasbourg <http://www.unistra.fr>
Observatoire Astronomique
11 Rue de l'Université
F - 67200 Strasbourg
http://amwdb.u-strasbg.fr/HighEnergy/spip.php?rubrique34