CreationType

Mon Dec 18 05:47:42 PST 2006

Dear Doug,

I'm back onto the DataID.CreationType definition.

Within the Hubble Legacy Archive umbrella, the STECF is generating
1D extracted spectra out of slitless grism images taken with HST/NICMOS.
We have a full pipeline to be able to do a proper job, that is, those  
spectra
are not generated on-the-fly by an SSA service. We are about to  
archive all
those 1D spectra individually.
Access is going to be provided via an SSA, *without any further  
manipulation*.

So, the question is:
what DataID.DataSource and DataID.CreationType to assign to these  
products?

My immediate reaction was 'custom' and 'spectral extraction', but  
then I recalled
the attached email exchange. (recap: SSA0.97 tells us to specify here  
only the type
of processing actually applied by the VO service itself, which in  
this case
is NONE)

What would you think it would be the most logical (if any) DataSource  
and
CreationType in this case?

Many thanks,

Alberto

On Nov 22, 2006, at 04:42, Doug Tody wrote:

>
>> CreationType
>> ------------
>>
>> The end user wants to know what kind of processing was applied
>> to the data; hence the user should be told if the data were binned
>> or mosaic'd, etc.
>> What is not clear to me is why the SSA service should describe only
>> the part of the processing it is responsible for, as the word  
>> "Typically"
>> indicates in the second sentence of 2.4.2, and as indicated in the  
>> very last
>> sentence in 2.4.2 (which actually contradicts that initial sentence
>> by forcing the creationtype to express ONLY operations happening
>> during the VO access).
>>
>> Wouldn't be better to describe the entire end-to-end process that  
>> brought
>> the data in the status they are when they rich the user's disk?
>> Otherwise, what is the value of such information?
>>
>> Unless the intention is to notify the VO user that the same data
>> *in different form* exist somewhere else, in case s/he is not happy
>> with it. If that is the case, then I would suggest a simpler  
>> "original"
>> as opposed to "reprocessed" keyword, and forget all the quite  
>> artificial
>> distinctions.
>
> This is one of the more difficult points of SSA (as is the next
> one below).  I agree that this is a difficult issue and am not yet
> certain either what is the best approach.
>
> One point here is that often the user does know something about
> the original data product, and may want to know what the service
> has done to produce the data product which is actually delivered.
> A use-case I had in mind here was access to complex data, e.g., a
> spectral data cube.  It is useful to know if a spectra was produced
> by on-demand extraction from a spectral data cube, as opposed to,
> for example, return an entire dataset from some well-known spectral
> data collection (the "archival" case).  In this case we have one well
> defined "original" data product (the survey cube) and we can view
> it is multiple ways, via 2-D or 3-D cutouts, via reprojection or a
> general slice specified in 2-D, via filtering by spectral bandpass,
> via extraction of a 1-D spectrum, and so forth.  A good scheme which
> describes the creation of data from a source data product can deal
> with all these cases (this is more general than just SSA but that is
> the point here as SIA V2 is next up).
>
> Another important case is where we have a well defined data collection
> which has already been carefully processed - the usual survey or
> instrument data collection for example - and the service generates
> a virtual data product from this by either cutting out a subset, or
> for example, reprojecting the data onto a standard coordinate system
> (changing the spectral dispersion in this case).  Which was done
> is quite important to know: do we have the original pixels/samples
> painstakingly generated by the well-known survey data collection,
> or is the service filtering or interpolating, and thus degrading,
> the data samples, to better represent what we asked for?  (SIA V1
> already addresses this in a rudimentary fashion by the way).
>
> On the other hand, I agree that in the most general case where the
> original data (as defined by the DataID metadata in SSA) is not well
> known, or we are doing a large scale automated analysis where  
> knowledge
> of well known data collections cannot easily be used, what one wants
> to know is something about the overall processing done to get to to
> the data actually returned by the service.  Of course, this can get
> quite complex to describe, and if it gets too complex, it won't happen
> and we fail.  We can hope to describe what the service does, but we
> aren't able yet to describe all the prior processing done as well.
>
> I don't have a perfect solution to this problem yet either.  The  
> scheme
> proposed is more or less adequate to describe data access operations
> upon well defined data collections, hence may be a good starting  
> point,
> however I agree that have not yet fully addressed this problem.