CreationType

Mon Dec 18 09:01:57 PST 2006

Thanks a lot Doug.

Hence, the CreationType actually tells what the data are, not
what the VO service does on them. If that is true, the ssa 2.4.2
description should be changed accordingly.

I know what you are after with this piece of information, but it
is very difficult to describe it with a single attribute.
Wouldn't be better to describe the status of the data before and
after the VO service, to clarify the actual role of the service?
Something like:

vo service input creation type   = e.g. data cube
vo service output creation type  = e.g. spectral extraction

?
Or in my case:

vo service input creation type   = spectral extraction
vo service output creation type  = spectral extraction

Actually I would prefer a clearer:

vo service input creation type   = spectral extraction
vo service output creation type  = identity

where 'identity' is meant to say that nothing happened at the service  
level.

Just a thought...

Alberto

On Dec 18, 2006, at 16:35, Doug Tody wrote:

> Hi Alberto -
>
> It is tempting to try to describe all the processing done to produce
> the extracted spectra, but this is beyond the scope of a single VO
> service such as SSA.  It seems to me that what you describe is a new
> spectral data collection, little different than any other case of
> extracted 1D spectra produced from some instrument and subsequently
> archived.  This is a new, archival data collection, probably with
> DataSource 'pointed' (since the data products were produced by
> uniform application of a pipeline to the underlying instrumental data
> collection; 'custom' could also apply if processing was tailored for
> individual data products).  CreationType is then 'archival'
>
> Probably STECF is the Creator of this data collection.  You should
> define registry entries for Creator and Collection, and each spectrum
> should have a unique CreatorDID within the collection.  If we then
> replicate the collection at MAST and CADC, those are Publishers
> and have their own PublisherDID for each dataset, with Creator
> and CreatorDID unchanged.  Within STECF the Creator/Publisher and
> CreatorDID/PublisherDID may or may not be the same; the PublisherDID
> for example may very well change, depending upon how the specfic
> archive is indexed.
>
> This is probably one of a family of related data collections  
> associated
> with NICMOS. Describing how these relate is more the "complex data"
> problem, where we attempt to associate actual physical datasets which
> are in some way related, e.g., as part of an observation or complex
> data collection.
>
> 	- Doug
>
>
> On Mon, 18 Dec 2006, Alberto Micol wrote:
>>
>> Dear Doug,
>>
>> I'm back onto the DataID.CreationType definition.
>>
>> Within the Hubble Legacy Archive umbrella, the STECF is generating
>> 1D extracted spectra out of slitless grism images taken with HST/ 
>> NICMOS.
>> We have a full pipeline to be able to do a proper job, that is,  
>> those spectra
>> are not generated on-the-fly by an SSA service. We are about to  
>> archive all
>> those 1D spectra individually.
>> Access is going to be provided via an SSA, *without any further  
>> manipulation*.
>>
>> So, the question is:
>> what DataID.DataSource and DataID.CreationType to assign to these  
>> products?
>>
>> My immediate reaction was 'custom' and 'spectral extraction', but  
>> then I recalled
>> the attached email exchange. (recap: SSA0.97 tells us to specify  
>> here only the type
>> of processing actually applied by the VO service itself, which in  
>> this case
>> is NONE)
>>
>> What would you think it would be the most logical (if any)  
>> DataSource and
>> CreationType in this case?
>>
>> Many thanks,
>>
>> Alberto
>>
>>
>> On Nov 22, 2006, at 04:42, Doug Tody wrote:
>>
>>>> CreationType
>>>> ------------
>>>> The end user wants to know what kind of processing was applied
>>>> to the data; hence the user should be told if the data were binned
>>>> or mosaic'd, etc.
>>>> What is not clear to me is why the SSA service should describe only
>>>> the part of the processing it is responsible for, as the word  
>>>> "Typically"
>>>> indicates in the second sentence of 2.4.2, and as indicated in  
>>>> the very last
>>>> sentence in 2.4.2 (which actually contradicts that initial sentence
>>>> by forcing the creationtype to express ONLY operations happening
>>>> during the VO access).
>>>> Wouldn't be better to describe the entire end-to-end process  
>>>> that brought
>>>> the data in the status they are when they rich the user's disk?
>>>> Otherwise, what is the value of such information?
>>>> Unless the intention is to notify the VO user that the same data
>>>> *in different form* exist somewhere else, in case s/he is not happy
>>>> with it. If that is the case, then I would suggest a simpler  
>>>> "original"
>>>> as opposed to "reprocessed" keyword, and forget all the quite  
>>>> artificial
>>>> distinctions.
>>> This is one of the more difficult points of SSA (as is the next
>>> one below).  I agree that this is a difficult issue and am not yet
>>> certain either what is the best approach.
>>> One point here is that often the user does know something about
>>> the original data product, and may want to know what the service
>>> has done to produce the data product which is actually delivered.
>>> A use-case I had in mind here was access to complex data, e.g., a
>>> spectral data cube.  It is useful to know if a spectra was produced
>>> by on-demand extraction from a spectral data cube, as opposed to,
>>> for example, return an entire dataset from some well-known spectral
>>> data collection (the "archival" case).  In this case we have one  
>>> well
>>> defined "original" data product (the survey cube) and we can view
>>> it is multiple ways, via 2-D or 3-D cutouts, via reprojection or a
>>> general slice specified in 2-D, via filtering by spectral bandpass,
>>> via extraction of a 1-D spectrum, and so forth.  A good scheme which
>>> describes the creation of data from a source data product can deal
>>> with all these cases (this is more general than just SSA but that is
>>> the point here as SIA V2 is next up).
>>> Another important case is where we have a well defined data  
>>> collection
>>> which has already been carefully processed - the usual survey or
>>> instrument data collection for example - and the service generates
>>> a virtual data product from this by either cutting out a subset, or
>>> for example, reprojecting the data onto a standard coordinate system
>>> (changing the spectral dispersion in this case).  Which was done
>>> is quite important to know: do we have the original pixels/samples
>>> painstakingly generated by the well-known survey data collection,
>>> or is the service filtering or interpolating, and thus degrading,
>>> the data samples, to better represent what we asked for?  (SIA V1
>>> already addresses this in a rudimentary fashion by the way).
>>> On the other hand, I agree that in the most general case where the
>>> original data (as defined by the DataID metadata in SSA) is not well
>>> known, or we are doing a large scale automated analysis where  
>>> knowledge
>>> of well known data collections cannot easily be used, what one wants
>>> to know is something about the overall processing done to get to to
>>> the data actually returned by the service.  Of course, this can get
>>> quite complex to describe, and if it gets too complex, it won't  
>>> happen
>>> and we fail.  We can hope to describe what the service does, but we
>>> aren't able yet to describe all the prior processing done as well.
>>> I don't have a perfect solution to this problem yet either.  The  
>>> scheme
>>> proposed is more or less adequate to describe data access operations
>>> upon well defined data collections, hence may be a good starting  
>>> point,
>>> however I agree that have not yet fully addressed this problem.
>>
>>
>>
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> MailScanner thanks transtec Computers for their support.
>