SED Data Model working draft (on behalf of D. Tody)

Thu Oct 25 06:27:09 PDT 2012

On Thu, 25 Oct 2012, Mark Cresitello-Dittmar wrote:

> Doug,
>
> On 10/24/2012 8:45 AM, Douglas Tody wrote:
>> 
>> Much of this discussion concerns the aggregate SED and segments. In
>> this most recent SEDDM it is proposed that a segment of the aggregate
>> SED is an externally viable, self contained dataset, which is merely
>> stored in the serialized aggregate SED, with the votable or FITS MEF
>> serving as a container (pointing to an actual external dataset with a
>> URI is also permitted and might be desirable for large datasets).
>> Usually this externally viable dataset could indeed be an object derived
>> from SpectralDM and compliant with the VO models, but if we merely have
>> a simple container of segments this does not have to be the case. Since
>> the aggregate SED is used for SED building or editing it could be
>> desirable to include or reference native datasets which have not been
>> converted to a VO model.
> I just want to be clear.. you want to allow the Segment to contain EITHER
> a reference to an external dataset OR an actual in-line serialization.. 
> right?

Yes, this was the intention.  It might be something large, or otherwise
awkward to copy/include.

>> If instead we want to require that segments of the aggregate SED must be
>> derived from and compatible with the SpectralDM then the model might
>> indeed want to be changed as you both describe.  Even in this case
>> however it is possible to merely provide a container for aggregating
>> externally viable, standalone datasets, giving us the flexibility to
>> reference external data in some native format.
>> 
>> ** So the issue to be decided here is whether we permit the aggregate
>> SED to reference native datasets or restrict it to only data classes
>> derived from SDM **
>> 
>
> I can see the interest in allowing the external dataset to be relatively 
> undefined.
> The same 'issue' was discussed for PhotDM regarding the TransmissionCurve.
>
>>
>>>  "Uniform (or rebinned)"  I think implies a synonym relationship, while 
>>> the description that follows shows that they have a different level of 
>>> processing.  Also, it says that rebinned handles overlaps.  Does that mean 
>>> that Uniform does NOT.. so the single Uniform sequence can have 
>>> overlapping data?
>> 
>> Yes, it is possible for segments in the uniform SED to overlap, e.g., a
>> spectrum may overlap another spectrum or photometry point from another
>> observation (segment).  Rebinning further normalizes the data to combine
>> any overlapped regions, which may or may not be desirable.  In both
>> cases we have a uniform SED however.  (This business of uniform vs
>> rebinning originated with Jonathan).
>> 
>
> I think I'm missing something in the picture then... the Uniform SED has 
> multiple Segments?
> From the description, I'm getting:
>    + Uniform    = Char+Data
>    + Aggregate = Char+Segment[]

The uniform SED has all the segments from the aggregate SED transformed
to common spectral/flux units and collected in the Data portion of the
SED object.  Rebinning to avoid overlaps is optional.  In general there
can be overlaps due to overlapping spectral coverage of the original
segments, but since Data represents the spectral coord as a vector this
can be handled.

>>> Section 3.5.1:
>>>  + Utypes containing SED?  Isn't this contrary to the changes we did to 
>>> remove the 'Spectrum' node from Spectral utypes?
>> 
>> We decided earlier that we would have a core SpectralDM defining classes
>> which are common to all the models - Spectrum, SED, TimeSeries, etc.  So
>> Dataset, Char, DataID, etc. Utypes do not start with "Spectrum.",
>> "SED." and so on, and are the same for all classes of data derived from
>> the model.  We would then extend the core model by adding a class which
>> is for stuff which really is specific to the particular class of data
>> and not part of the generic core, inherited model.  Hence for TimeSeries
>> for example, all the usual SDM stuff is there unchanged, but we add a
>> new class TimeSeries which contains metadata specific to time series
>> data, e.g., information about the period or periods, trend removal,
>> whatever.  For SED we have a similar construct.  Most of the SDM is
>> inherited unchanged, the Utypes are generic, but stuff which is really
>> specific to SED, e.g., nSegments, is put into a SED.xx element.
>
> The way I see this, we are extending BaseSPS to the desired dataset 
> (Spectrum, TimeSeries)
> and If we want to add content, they can go into a complex object (TimeProps 
> for example),
> or directly at the top level.  The UType convention imposed on 
> Spectral/Spectrum/PhotometryPoint
> is to NOT include the top node in the UType.. so these tags do not appear 
> before the
> top level objects and attributes (Spectrum.Dataset).
>
> To do so with SED is breaking a convention you advocated heavily for the 
> other BaseSPS extensions.

Well if necessary we could change the class name of the "complex object"
to something other than "SED"/ "TimeSeries" / "Image" etc.  ("SedProps"
in your example) and accomplish the same goal.  But I don't see this as
in conflict with removing the top node name from the Utypes since these
objects are defined as specific to the class of data and hence not
sharable.  These really are SED or TimeSeries etc.  specific Utypes.  It
is not that much different than e.g., Target, which contains only
Target-specific Utypes.  If we have a Sed.X class for SED-specific
metadata, but were including a "Sed" prefix in all Utypes (e.g.,
"Sed.Target.Name"), then applying the same convention we would have
"Sed.Sed.X" for the Utypes, which is not what is being suggested here.

So the proposal is

   SED object
     Dataset.X    generic Dataset metadata
     Sed.X        SED-specific metadata
     Target.X     Target metadat
     DataId.X     Dataset identification metadata
        etc.

But we could instead have SedProps or whatever for the class name if this
scheme is breaking some rule.

 	- Doug

> Mark
>