CreationType
Doug Tody
dtody at nrao.edu
Mon Dec 18 07:35:36 PST 2006
Hi Alberto -
It is tempting to try to describe all the processing done to produce
the extracted spectra, but this is beyond the scope of a single VO
service such as SSA. It seems to me that what you describe is a new
spectral data collection, little different than any other case of
extracted 1D spectra produced from some instrument and subsequently
archived. This is a new, archival data collection, probably with
DataSource 'pointed' (since the data products were produced by
uniform application of a pipeline to the underlying instrumental data
collection; 'custom' could also apply if processing was tailored for
individual data products). CreationType is then 'archival'
Probably STECF is the Creator of this data collection. You should
define registry entries for Creator and Collection, and each spectrum
should have a unique CreatorDID within the collection. If we then
replicate the collection at MAST and CADC, those are Publishers
and have their own PublisherDID for each dataset, with Creator
and CreatorDID unchanged. Within STECF the Creator/Publisher and
CreatorDID/PublisherDID may or may not be the same; the PublisherDID
for example may very well change, depending upon how the specfic
archive is indexed.
This is probably one of a family of related data collections associated
with NICMOS. Describing how these relate is more the "complex data"
problem, where we attempt to associate actual physical datasets which
are in some way related, e.g., as part of an observation or complex
data collection.
- Doug
On Mon, 18 Dec 2006, Alberto Micol wrote:
>
> Dear Doug,
>
> I'm back onto the DataID.CreationType definition.
>
> Within the Hubble Legacy Archive umbrella, the STECF is generating
> 1D extracted spectra out of slitless grism images taken with HST/NICMOS.
> We have a full pipeline to be able to do a proper job, that is, those spectra
> are not generated on-the-fly by an SSA service. We are about to archive all
> those 1D spectra individually.
> Access is going to be provided via an SSA, *without any further
> manipulation*.
>
> So, the question is:
> what DataID.DataSource and DataID.CreationType to assign to these products?
>
> My immediate reaction was 'custom' and 'spectral extraction', but then I
> recalled
> the attached email exchange. (recap: SSA0.97 tells us to specify here only
> the type
> of processing actually applied by the VO service itself, which in this case
> is NONE)
>
> What would you think it would be the most logical (if any) DataSource and
> CreationType in this case?
>
> Many thanks,
>
> Alberto
>
>
> On Nov 22, 2006, at 04:42, Doug Tody wrote:
>
>>
>>> CreationType
>>> ------------
>>>
>>> The end user wants to know what kind of processing was applied
>>> to the data; hence the user should be told if the data were binned
>>> or mosaic'd, etc.
>>> What is not clear to me is why the SSA service should describe only
>>> the part of the processing it is responsible for, as the word "Typically"
>>> indicates in the second sentence of 2.4.2, and as indicated in the very
>>> last
>>> sentence in 2.4.2 (which actually contradicts that initial sentence
>>> by forcing the creationtype to express ONLY operations happening
>>> during the VO access).
>>>
>>> Wouldn't be better to describe the entire end-to-end process that brought
>>> the data in the status they are when they rich the user's disk?
>>> Otherwise, what is the value of such information?
>>>
>>> Unless the intention is to notify the VO user that the same data
>>> *in different form* exist somewhere else, in case s/he is not happy
>>> with it. If that is the case, then I would suggest a simpler "original"
>>> as opposed to "reprocessed" keyword, and forget all the quite artificial
>>> distinctions.
>>
>> This is one of the more difficult points of SSA (as is the next
>> one below). I agree that this is a difficult issue and am not yet
>> certain either what is the best approach.
>>
>> One point here is that often the user does know something about
>> the original data product, and may want to know what the service
>> has done to produce the data product which is actually delivered.
>> A use-case I had in mind here was access to complex data, e.g., a
>> spectral data cube. It is useful to know if a spectra was produced
>> by on-demand extraction from a spectral data cube, as opposed to,
>> for example, return an entire dataset from some well-known spectral
>> data collection (the "archival" case). In this case we have one well
>> defined "original" data product (the survey cube) and we can view
>> it is multiple ways, via 2-D or 3-D cutouts, via reprojection or a
>> general slice specified in 2-D, via filtering by spectral bandpass,
>> via extraction of a 1-D spectrum, and so forth. A good scheme which
>> describes the creation of data from a source data product can deal
>> with all these cases (this is more general than just SSA but that is
>> the point here as SIA V2 is next up).
>>
>> Another important case is where we have a well defined data collection
>> which has already been carefully processed - the usual survey or
>> instrument data collection for example - and the service generates
>> a virtual data product from this by either cutting out a subset, or
>> for example, reprojecting the data onto a standard coordinate system
>> (changing the spectral dispersion in this case). Which was done
>> is quite important to know: do we have the original pixels/samples
>> painstakingly generated by the well-known survey data collection,
>> or is the service filtering or interpolating, and thus degrading,
>> the data samples, to better represent what we asked for? (SIA V1
>> already addresses this in a rudimentary fashion by the way).
>>
>> On the other hand, I agree that in the most general case where the
>> original data (as defined by the DataID metadata in SSA) is not well
>> known, or we are doing a large scale automated analysis where knowledge
>> of well known data collections cannot easily be used, what one wants
>> to know is something about the overall processing done to get to to
>> the data actually returned by the service. Of course, this can get
>> quite complex to describe, and if it gets too complex, it won't happen
>> and we fail. We can hope to describe what the service does, but we
>> aren't able yet to describe all the prior processing done as well.
>>
>> I don't have a perfect solution to this problem yet either. The scheme
>> proposed is more or less adequate to describe data access operations
>> upon well defined data collections, hence may be a good starting point,
>> however I agree that have not yet fully addressed this problem.
>
>
>
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.
More information about the dal
mailing list