RM v1.1 feedback loop from VOResource
Doug Tody
dtody at nrao.edu
Wed Jun 8 12:08:44 PDT 2005
Hi Ray -
Thanks, this addresses most of my concerns.
On Wed, 8 Jun 2005, Ray Plante wrote:
> On Wed, 8 Jun 2005, Doug Tody wrote:
> > An issue here is that in DAL we have dataset identifiers, used by the
> > creator, publisher, etc., to assign unique identifiers to datasets.
> > Hence we have at most one "creator dataset ID" and any number of
> > "publisher dataset IDs". The names currently used for these identifiers
> > are "CreatorID" and "PublisherID" (or "PubID").
>
> Just to clarify, in the RM "PublisherID" is "the ID of the Publisher",
> whereas in Doug's example, one should read "PublisherID" as
> "Publisher-assigned Identifier for a dataset". (yes?)
>
> I definitely agree that we should not use the same name for these two
> things and we should adopt some consistent conventions. In both cases
> (RM's and Doug's), the value is a valid URI; however, the restrictions on
> each are slightly different (Doug's will have a trailing "#{name}"). Of
> the two, only the RM's "PublisherID" is technically a legal IVOA
> Identifier according to the standard that is now before the IVOA Exec.
>
> Your argument about database IDs notwithstanding, use of "ID" in RM's
> "PublisherID" reflects the terminology set down by the IVOA Identifiers
> spec., so I think it is more appropriate than "PublisherURI". Given the
> maturity of the IVOA ID and RM specs compared to SSA, I would recommend
> changing the names of the DAL-related terms. How about "PublisherDID",
> "CreatorDID"?
Something along these lines would work fine. Consistency with current
specs argues for what you propose. The question I was raising was what
will be most natural when we consider datasets as well as meta-resources
like publishers and collections. The issue is not so much what is best
for computer names (here we can use something like PublisherDatasetID)
but what we want to have in the user interface.
Another approach is Publisher/PubID, which I think is the most common
convention for document publishing. The question is, if a user is
searching for data will they expect "PubID" to refer to the publisher
or a document published by the publisher?
> > At the level of DAL we also want to identify resources such as a
> > collection, publisher, etc. In this case, since there can be many
> > individual datasets, there may be thousands or millions of records which
> > refer to a single such resource. Using a URI to identify such a global
> > resource results in poor information hiding; we are essentially embedding
> > a pathname in the data. In this case it might be better to refer to
> > the resource via a short name of some sort. We would then look up the
> > short name in the registry to get a full description of the resource.
>
> No. ShortName is not guaranteed to be unique! Identifier is. The whole
> point of an IVOA Identifier is to provide an unambiguous, globally unique
> identifier. It is designed for this very purpose.
>
> I'm unclear about the concern for "poor information hiding". Despite the
> fact that slashes are used in an IVOA Identifier, it is not a pathname.
> It is location-independent. The closest thing to location-dependence is
> in the authority ID (e.g. "adil.ncsa") which ties the resource to the
> organization providing the resource. (The issue of
> organization-dependence vs. independence is discussed somewhat in the
> Intro section of the Identifier spec.).
>
> Are the concerned about reletive sizes of an IVOA ID and a ShortName?
A use case is the query response for a DAL query. What is the value
of the "Collection" attribute? This same value may appear as dataset
metadata, e.g., in a FITS header, placed there by the data creator when
the dataset is generated. What a user might expect to see is something
such as "SDSS-DR2", rather than a full IVOA identifier. In the case
of a query parameter one would expect to specify something such as
"collection=SDSS-DR2". Ideally a short name such as this is defined by
the creator when the data collection is registered.
What I am trying to do here is look forward to the case where we have a
client application which does a query and gets back a list of candidate
datasets (an actual user inteface is involved). We need to be able to tell
the user concisely what data collection a dataset belongs to, and allow them
to use this tag to easily refine the query.
>
> > > o Add a new term, "ResourceValidatedBy" whose value is an IVOA
> > > identifier and whose value is the IVOA identifier for the registry or
> > > organisation that set the "ResourceValidationLevel".
> > >
> > > This came out of our discussion of ResourceValidationLeve discussed
> > > in Kyoto.
> >
> > What we really need here is not who performed the validation, but the
> > level of compliance as defined by the interface in question. If this
> > is well defined, it doesn't really matter who performed the validation.
>
> This is covered by "ResourceValidationLevel" which is already defined in
> the RM. In Kyoto, the Registry WG ask that we add a tag indicating who
> did the validation, because in practice that value may be set differently
> by different registries.
>
> > Note a complex resource such as a service is not merely valid/invalid,
> > rather there are levels of compliance. A valid resource meets the
> > criterial for some such level of compliance, for example a service which
> > supports all the MUST elements of the interface is said to be minimally
> > compliant; if in addition it supports all the SHOULD elements it is
> > fully compliant, etc.
>
> This distinction should be captured by service-specific capability
> metadata. For example, SkyNode descriptions indicate whether the
> implementation is a "full" or "basic" SkyNode. An automated SkyNode
> validator would score a ResourceValidationLevel=2 if the service actually
> complied at the level it claims to.
>
> In general, I would expect at the defined levels of compliance that you
> refer to will be dependent on the specific service standard; thus, we
> shouldn't try to incorporate that at the general RM level.
Ok - this sounds fine.
> > A service could potentially have more than one URL endpoint. It really
> > depends upon the protocol. In principle there could be one for each method;
> > if the service supports multiple protocols they might have different
> > service endpoints, etc.
>
> An important example of this is the description of registries in which the
> harvesting and search interfaces can have different endpoints.
>
> Multiple endpoints are currently supported by VOResource by allowing
> multiple interfaces to be described (although lumping multiple interfaces
> together may not always be wise). This issue will be examined more
> closely by the RWG-DM tiger team this month; however, those details should
> not affect RM or the VOResource core metadata.
I just wanted to make sure the issue was considered. Possibly this is
service-specific metadata but we should have a consistent scheme in mind.
- Doug
More information about the registry
mailing list