Discovering Data Collections RFC

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Mar 6 10:58:19 CET 2017


Hi Marco,

On Thu, Mar 02, 2017 at 03:33:47PM +0100, Marco Molinaro wrote:
> I have a doubt: starting of Sec. 2.1.2 confuses me w.r.t.
> Sec. 2.4, Data model discovery subsection, when speaking of
> the capability xsi:type.

First off: this proposition is essentially orthogonal to the question
of xsi:type on the capability.  All it really talks about is
capability/@standardID (plus a bit relationship).  In principle, a
later version could introduce one or more capability types that should
preferentially be used for auxiliary capabilities without affecting
the mechanisms proposed here.

This is in line with general practices -- nothing forces you to use a
sia:SimpleImageAccess capability just because your standard id
happens to be ivo://ivoa.net/std/sia; it may be a good idea because
the extra metadata might matter to clients, but Aladin will still
find your service and be able to use it even if you don't.

> Considering that models will hopefully start to get attached to
> collections I'm not sure I agree with or understand the last
> sentence in §2.4.
> 
> I agree that data models apply to data collections, but letting
> that declaration fall off the service descriptor looks a bit strange,
> also considering the statement
>
> "data models are characteristics for service enumeration and not
> for data discovery"

What I wanted to say here is that I expect the typical query on
obstap will be more like

  Give me all obstap services out there

rather than

  Give me an obstap service publishing data from the infrared and
  talking about spiral galaxies.

Something similar applies to RegTAP.

Both of these, however, are a bit of an odd case, because they've
been designed as singletons.

EPN-TAP, as it's being specified right now (for those who've not come
across it:
https://voparis-confluence.obspm.fr/display/VES/EPN-TAP+V2.0+parameters),
is less of an oddball there; it essentially just says: "here's a
table structure, and there's no problem having lots of these in a
single service".  Indeed, there already are several services having
multiple EPN-TAP tables out there.

I'm pushing for using (in RegTAP terms) rr.res_table.utype for
discovery for them -- declaring support in TAPRegExt just isn't very
helpful: "Here's a service that contains one or more EPN-TAP tables".

Since I've written the text I've convinced myself that ObsTAP and
RegTAP are the odd cases (by virtue of being singletons) and EPN-TAP
is actually the typical case.

Discovering EPN-TAP services is a classic data discovery use case,
rather than enumeration.  So, I guess what I'm saying is that I'd
retract the statement about "characteristic for service enumeration".

Considering this, I'd propose the following change (tentatively in in
volute rev 3884::

 consequence, a query of the type ``Give me all ObsTAP services with data
 from Instrument X'' will not work as expected.  This deficiency could be
 alleviated by recommending or requiring the use of the full, typed
-capabilities with an auxiliary standard id for such records; 
-at the time of writing, the hard
-couplings of standard identifier and extension type are being lifted for
-TAPRegExt \citep{std:TAPRegExt} and SimpleDALRegExt
-\citep{std:DALREGEXT} in their 1.1 versions, 
-so that no technical obstacles will prevent this in the near future.
-We believe, however, that constraints on
-data models are characteristic for service enumeration and not for data
-discovery.  Should this belief turn out to be erroneous, we believe the
-declaration of compliance to data models should be moved from the
-service to the data collection level, as that is where, arguably, the data model
-actually is expressed.
+capabilities with an auxiliary standard id for such records and
+keep using \xmlel{capability/dataModel} for data model discovery.
+However, we believe the anomaly actually is the result of a modelling
+error.  Adherence to a data model is not a property of a service, which
+potentially contains many data collections conforming to different data
+models.  The early examples (ObsTAP and RegTAP) have suggested the
+contrary only because they described singletons.  In reality, data model
+adherence need to be declared where a data model is realised, i.e., at
+the level of table or schema.   We defer the details of
+discovering such data models to the respective specifications,
+mentioning EPN-TAP \citep{epntap} as an example how it can be effected.
 
Opinions?  Should we take the opportunity to deprecate dataModel in
TAPRegExt?  Perhaps even propose an alternative model there?

        -- Markus


More information about the registry mailing list