Charac Answer

Ed Shaya Edward.J.Shaya.1 at gsfc.nasa.gov
Fri Sep 2 09:23:23 PDT 2005


Anita Richards wrote:

>> I don't think I agree.
>> The data provider should not try to second-guess the needs of the
>> client.  What that means is that the metadata accompanying a dataset
>> should be as complete as possible.  It is up to the client to decide
>> what (s)he wants to use and what not.
>
>
> Arnold, I am not saying that any meta data should not be available.  
> But very often I just want to get an image or a spectrum and to know 
> what the resolution is, roughly how accurate the calibration is etc.  
> I will simply not bother if I ahve to read many pages of explanation 
> which will inevitably be in jargon.  As a data provider I know very 
> well that you have to offer layers.

Noone is expected to read the XML.   The metadata will be viewed through 
some stylesheet that can offer various views on the metadata.  It is 
easy to provide you with an XSLT that reveals only the metadata that you 
think is important.  There can be links to other pages for when you want 
to look deeper.

> Providing too much information up front is extemely counterproductive, 
> people think that you ahve to be an expert radio astronomer to begin 
> to look at the data.  Wha is necesary is to have a heriachical 
> apporach.  In my vire the Char model should just contain as much 
> information as is necessary to supply a well-described data product.  
> If the user wants more information they should have a link to the 
> Provenance or Observation or whatever.  Similarly, the Registry needs 
> to know whether a particular data collection covres a certain bit of 
> the sky, but retrictions on an archive due to the geographic location 
> of the observatory cshould have been translated into sky limits for a 
> registry entry.

I agree.  It should be acceptable to fill in parts of the metadata with 
links or entities or includes of physical files external to the data 
file.  Information about the Observatory could be a link to an XML 
document held at the Observatory website.  It is still part of the 
logical data file but not in the physical data file.  The downside to 
this is the possibility that the Observatory someday is closed or has to 
close its website.

>> In the specific case at hand, the optical astronomer will most likely
>> ignore the ObservatoryLocation (e.g., because (s)he has no software to
>> handle it, probably because it is unimportant to the client).  But a
>> comet observer who happens to be interested in the dataset would want
>> to know about that location.
>
>
> But that does not mean that it all has to be in one monolithic model!

Isn't this just semantics?  If you have a bunch of small models and they 
are self consistent with each other and are all useful for data files 
then they can be thought of as composing a larger (monolithic if you 
want) model.  It is just where you draw your box.

>
>> In my view, we can't rely in the VO anymore on the implied defaults
>> "that are obvious to everyone", since our clients come from different
>> communities that have different notions about what is obvious.
>
>
> Absolutely, which is why we need to have discrete but mutually 
> compatible models which describe things at the appropriate level of 
> detail.
>
>
>> Hence
>> it becomes dangerous to try to second guess our clients and the right
>> thing to do is attach all potentially relevant metadata.  If we don't
>> do that from the start, it will never happen and we'll get into
>> trouble half-way through.
>
>
> But attatch it _where_? That is all I am arguing, that the VO will 
> never implement anything properly if we make it too unwieldy.  
> Moreover we will only get all the details of models right if we test 
> them againsst users' needs. As a one-time radio astronomer shurely you 
> remember a certain software project which tried to be all things to 
> all astronomers....
>
Yes.  It should not be too unwieldy.  We should be constantly trying to 
simplify the model(s) where possible.  But, I don't see how we can shirk 
our responsibilities to make the data useable to any astronomer no 
matter what subfield and no matter what year.  I keep on hearing lets 
just us RA and DEC, and assume the default equinox.  Well, in 2021, when 
the default equinox is 2020, will we be sure of the equinox of data 
published in late 2019?  I actually have this problem for 1924 data and 
have never been able to resolve it.

> I am really concerned that it is time to stop adding bits to existing 
> models, since many of them duplicate - or rather, nearly-but-not-quite 
> replicate bits of other models - and start using them. I agree 
> completely that we attach all metadata but Charactersation is intended 
> to be a _part_ of a larger model, not replace STC, the Regsitry and 
> goodness knows what else.


As I see it, the problem arises from separate groups working in 
isolation from each other creating overlapping models that are not 
exactly consistent with each other.  The cure for that is to have a 
comprehensive look at all the models and find consistency.  But to avoid 
monolithic structure one does allow subfields to have their own 
namespaces to provide field specific element.  In some cases these terms 
should be permitted to overwrite (ie take precedence) on the more 
general model.  Perhaps this is done be substitution groups so that the 
specific element can replace the more usual one.

>
>
>> And that is the philosophy behind STC: build a structure that ensures
>> that all potentially relevant metadata on the related spatial,
>> temporal, spectral, and redshift coordinate axes are provided with the
>> data.  Though I'll admit that that is different from the approach that
>> Characterisation has taken.
>
>
> That is why we need both!
>
> STC is to me a reference manual.  If the VO (or indeed internal 
> observatory data models) wwant to describe anything covered by STC 
> then it shuld be conformant.  But e.g. prople designing registry 
> interfaces for data providers are putting effort into only presenting 
> selected defaults relevant to a type of data - the operative word 
> being default, just as with Vizier, yuo can 'view all fields' if you 
> want, but the VO is supposed to be able to deduce intelligent defaults 
> as a first offering.
>
>
> I don;t really care if we include observatory location or not, but I 
> don't that that is what we are arguing abojt, it is indeed the 
> 'different approach.  What I do care very strongly and negatively 
> about, is adding location, and something else, and, andd,and.... never 
> getting finished or having something which no-one has the time to 
> impliment.
>
STC has not really grown very much in the last year or so.  It has been 
modified a bit.  So we should not blame Arnold for requirements creep. 
On the one hand it is good to have a simple descriptive language because 
it is less work to fill it out.  On the other hand it is good to have a 
full and rich language because you can then fully describe whatever you 
want, or more correctly, whatever you feel you must.   The second one 
had the advantage that it can be subsetted easily to create the first, 
as long as not every element is required.

Cheers,
Ed

>
> cheers
> a




More information about the dm mailing list