ObsCore update discussion : adding Axes information in Obscore table
Laurent MICHEL
laurent.michel at astro.unistra.fr
Thu Apr 16 21:13:37 CEST 2015
Hello Markus,
Let me express a bit of skepticism about the idea of gathering
information related to several columns in one encoded string (a RegExp
what's more).
That look like a job done by a client which would aim at hiding too long
ADQL queries but not as a model feature.
But you are right when you mention the need of concrete use cases.
I've one in mind that I can sketch like "selecting dataset that have a
given number of time stamps and ....". That could make sense.
Cheers
Laurent
Le 16/04/2015 10:16, Markus Demleitner a écrit :
> Dear Data Modellers,
>
> I've not closely followed the discussion, so this may be a dumb
> question, but let me still ask it: What use cases is this to cover?
>
> Is it, in essence, "Give me all lightcurves/spectra/spectral
> cubes/polarisation images satisfying these conditions"? If so, then
> I have to say I feel
>
> On Wed, Apr 15, 2015 at 07:50:33PM +0200, Louys Mireille wrote:
>> * s_dim1, s_dim2 = the coverage in sampling elements ( pixels) for
>> each spatial axis
>> * em_dim = the coverage in spectral elements along the energy axis
>> * t_dim = the coverage in the time axis, as number of time bins
>> * pol_dim = the coverage in the polarization axis, as number of
>> polarization states
>
> is both a bit too much and a bit too little.
>
> I believe it's too much because it's using 6 columns to convey the
> information, and it contains lots of information that's not actually
> necessary for the one use case I've outlined above. Six columns may
> not seem much, but
>
> gavo=# select count(*) from tap_schema.columns where table_name='ivoa.obscore';
> count
> -------
> 29
>
> in current DaCHS, so that's a 20% increase. Non-IVOA people quite
> usually complain that IVOA data models are too complex, so this is a
> non-trivial issue, and IMHO we should have strong ("we gain 20% in
> usefulness") use cases where people actually need the actual number
> of pixels for a common discovery operation. Are we sure we have
> those?
>
> At the same time, I believe it's too little, as I can easily think of
> cubes that have axes that cannot be described in this way (in
> astroparticle physics, one axis might be particle type, for instance;
> for visibilities, I'd be reluctant to talk about spatial axes; you
> could easily have three spatial dimensions with density values --
> think GAIA --, etc). I don't think we should plan on changing
> obscore everytime new instruments producing interesting new data
> products come around.
>
> I liked much better the idea that has been suggested at some recent
> Interop. Let me sketch it out here again (I don't know who to credit
> for it -- speak up, if you're reading this):
>
> Just add one column obs_axes (or whatever), which would contain a
> string like (RE syntax)
>
> (/[a-z]+/)*
>
> For each (non-degenerate) axis actually present, we'd have one code,
> where the s, em, t, pol suggested by Mireille might suffice for now
> (though I'd like some guideline what to do with visibilities).
>
> The examples provided would then look like this:
>
>> * MUSE data cube
>>
>> s_dim1 = 300
>> s_dim2 = 300
>> em_dim = 3463
>> pol_dim = 1
>> pol_state = I
>> t_dim = 1
>
> /s/s/em/
>
>> * 2MASS: 2D image
>>
>> s_dim1 = 300
>> s_dim2 = 300
>> em_dim = 1
>> pol_dim = 1
>> pol_state = I
>> t_dim = 1
>
> /s/s/
>
>> * STIS spectroscopy (1D):
>>
>> s_dim1 = 1
>> s_dim2 = 1
>> em_dim = 1024
>> pol_dim = 1
>> pol_state = I
>> t_dim = 1
>
> /em/
>
>> * STIS spectroscopy (2D long slit):
>>
>> s_dim1 = 1024
>> s_dim2 = 1
>> em_dim = 1024
>> pol_dim = 1
>> pol_state = I
>> t_dim = 1
>
> /s/em/
>
>> * ALMA:
>>
>> s_dim1 = 1000
>> s_dim2 = 1000
>> em_dim = 3000
>> pol_dim = 4
>> pol_state = I/U/V/Q
>> t_dim = 1
>
> /s/s/em/pol/
>
> I claim that's enough for typical discovery problems; for instance:
>
> * Give me spectral cubes:
>
> WHERE obs_axes='/s/s/em/'
>
> * Give me anything that has a spectral axis
>
> WHERE obs_axes LIKE '%/em/%'
>
> * Give me time series
>
> WHERE obs_axes='/t/'
>
> * Give me things that have both resolved time and resolved
> polarization
>
> WHERE obs_axes LIKE '%/t/%' and obs_axes LIKE '%/pol/%'
>
> The one drawback I can see is that the prevalence of % at the
> beginning of patterns isn't really index-friendly, and hence queries
> with *only* constraints of this type may involve all-table seqscans.
> I'd claim that such queries would be fairly rare, since you'd usually
> have additional constraints on position or other fields.
>
> Again: If we have usecases that justify increasing field count by
> 20%, I retract this entire post. All I'm saying is we shouldn't
> column count lightly.
>
> Cheers,
>
> Markus
>
--
---- Laurent MICHEL Tel (33 0) 3 68 85 24 37
Observatoire de Strasbourg Fax (33 0) 3 68 85 24 32
11 Rue de l'Universite Mail laurent.michel at astro.unistra.fr
67000 Strasbourg (France) Web http://astro.u-strasbg.fr/~michel
---
More information about the dm
mailing list