ObsCore update discussion : adding Axes information in Obscore table

Laurent MICHEL laurent.michel at astro.unistra.fr
Thu Apr 16 21:13:37 CEST 2015


Hello Markus,

Let me express a bit of skepticism about the idea of gathering 
information related to several columns in one encoded string (a RegExp 
what's more).
That look like a job done by a client which would aim at hiding too long 
ADQL queries but not as a model feature.
But you are right when you mention the need of concrete use cases.
I've one in mind that I can sketch like "selecting dataset that have a 
given number of time stamps and ....". That could make sense.

Cheers
Laurent


Le 16/04/2015 10:16, Markus Demleitner a écrit :
> Dear Data Modellers,
>
> I've not closely followed the discussion, so this may be a dumb
> question, but let me still ask it:  What use cases is this to cover?
>
> Is it, in essence, "Give me all lightcurves/spectra/spectral
> cubes/polarisation images satisfying these conditions"?  If so, then
> I have to say I feel
>
> On Wed, Apr 15, 2015 at 07:50:33PM +0200, Louys Mireille wrote:
>>   * s_dim1, s_dim2 = the coverage in sampling elements ( pixels) for
>>     each spatial axis
>>   * em_dim = the coverage in spectral elements along the energy axis
>>   * t_dim = the coverage in the time axis, as number of time bins
>>   * pol_dim = the coverage in the polarization axis, as number of
>>     polarization states
>
> is both a bit too much and a bit too little.
>
> I believe it's too much because it's using 6 columns to convey the
> information, and it contains lots of information that's not actually
> necessary for the one use case I've outlined above.  Six columns may
> not seem much, but
>
> gavo=# select count(*) from tap_schema.columns where table_name='ivoa.obscore';
>   count
> -------
>      29
>
> in current DaCHS, so that's a 20% increase.  Non-IVOA people quite
> usually complain that IVOA data models are too complex, so this is a
> non-trivial issue, and IMHO we should have strong ("we gain 20% in
> usefulness") use cases where people actually need the actual number
> of pixels for a common discovery operation.  Are we sure we have
> those?
>
> At the same time, I believe it's too little, as I can easily think of
> cubes that have axes that cannot be described in this way (in
> astroparticle physics, one axis might be particle type, for instance;
> for visibilities, I'd be reluctant to talk about spatial axes; you
> could easily have three spatial dimensions with density values --
> think GAIA --, etc).  I don't think we should plan on changing
> obscore everytime new instruments producing interesting new data
> products come around.
>
> I liked much better the idea that has been suggested at some recent
> Interop.  Let me sketch it out here again (I don't know who to credit
> for it -- speak up, if you're reading this):
>
> Just add one column obs_axes (or whatever), which would contain a
> string like (RE syntax)
>
> (/[a-z]+/)*
>
> For each (non-degenerate) axis actually present, we'd have one code,
> where the s, em, t, pol suggested by Mireille might suffice for now
> (though I'd like some guideline what to do with visibilities).
>
> The examples provided would then look like this:
>
>>   * MUSE data cube
>>
>>      s_dim1   = 300
>>      s_dim2   = 300
>>      em_dim   = 3463
>>      pol_dim      = 1
>>      pol_state = I
>>      t_dim      = 1
>
> /s/s/em/
>
>>   * 2MASS: 2D image
>>
>>      s_dim1   = 300
>>      s_dim2   = 300
>>      em_dim   = 1
>>      pol_dim  = 1
>>      pol_state = I
>>      t_dim = 1
>
> /s/s/
>
>>   * STIS spectroscopy (1D):
>>
>>      s_dim1   = 1
>>      s_dim2   = 1
>>      em_dim   = 1024
>>      pol_dim  = 1
>>      pol_state = I
>>      t_dim = 1
>
> /em/
>
>>   * STIS spectroscopy (2D long slit):
>>
>>      s_dim1    = 1024
>>      s_dim2    = 1
>>      em_dim    = 1024
>>      pol_dim   = 1
>>      pol_state = I
>>      t_dim = 1
>
> /s/em/
>
>>   * ALMA:
>>
>>      s_dim1    = 1000
>>      s_dim2    = 1000
>>      em_dim   = 3000
>>      pol_dim  = 4
>>      pol_state = I/U/V/Q
>>      t_dim = 1
>
> /s/s/em/pol/
>
> I claim that's enough for typical discovery problems; for instance:
>
> * Give me spectral cubes:
>
>    WHERE obs_axes='/s/s/em/'
>
> * Give me anything that has a spectral axis
>
>    WHERE obs_axes LIKE '%/em/%'
>
> * Give me time series
>
>    WHERE obs_axes='/t/'
>
> * Give me things that have both resolved time and resolved
>    polarization
>
>    WHERE obs_axes LIKE '%/t/%' and obs_axes LIKE '%/pol/%'
>
> The one drawback I can see is that the prevalence of % at the
> beginning of patterns isn't really index-friendly, and hence queries
> with *only* constraints of this type may involve all-table seqscans.
> I'd claim that such queries would be fairly rare, since you'd usually
> have additional constraints on position or other fields.
>
> Again: If we have usecases that justify increasing field count by
> 20%, I retract this entire post.  All I'm saying is we shouldn't
> column count lightly.
>
> Cheers,
>
>             Markus
>

-- 
---- Laurent MICHEL              Tel  (33 0) 3 68 85 24 37
      Observatoire de Strasbourg  Fax  (33 0) 3 68 85 24 32
      11 Rue de l'Universite      Mail laurent.michel at astro.unistra.fr
      67000 Strasbourg (France)   Web  http://astro.u-strasbg.fr/~michel
---


More information about the dal mailing list