ObsCore update discussion : adding Axes information in Obscore table

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Apr 27 10:14:51 CEST 2015


Hi,

[keeping Pat's restriction to DM to limit crossposting]

On Fri, Apr 24, 2015 at 12:41:32PM -0700, Patrick Dowler wrote:
> On 24/04/15 11:15 AM, Louys Mireille wrote:
> >if t_dim > 20,  I may be interested in this time serie aspect: enough
> >time samples , or I may discard it : not enough time samples...
> 
> From the provider point of view, they also don't really know how to draw the
> line when the number of samples is small-ish, so they won't know when to say
> that something has a "useful" time axis or not. For example, we have data
> (WirCam data from CFHT) where the raw data usually has t_dim = 4. That's not
> really a time series, but the product as-is isn't exactly a 2d image either.

Hmyeswell.  While I'm still not convinced use cases like these merit
a 20% increase in model size, since nobody else seems to feel that
way I'll stop grumbling.

But even then I'm worried that this way we'll have to add obscore
columns every time products with new axes types come along (which
has already happened in the few days this proposal has been out).

The sane mapping of the type+size metadata to a relational model
would again be a separate table, essentially as discussed in
http://mail.ivoa.net/pipermail/dm/2015-April/005154.html, just with
the additional column already mentioned there:

  /------ primary key -----\
  pubDID      | axis index | axis descriptor | axis length
  ivo://ab           1            s             200
  ivo://ab           2            t             4000
  ...
  ivo://ab           5            pol           2
  ivo://abc          1            t             567
  ...

Given that the discovery problems people want to solve with this
apparently are fairly complex, I'm almost tempted to propose to have
that table, then.  If people come up with new axis descriptors, you
just need to define a value for "axis descriptor" and be done with
it.  Given it's relationally sound, query patterns can be expected to
be fairly simple, too.

Now, I admit introducing a second table to obscore is a bit drastic,
and opposition to that would be well justified. Which brings us back
to the options for denormalization considered in the mail referenced
above.

Arrays (one each for axis type and axis length) would be almost ideal
here, but I don't think we can really formally introduce array
processing in ADQL.  Can we?

Then there'd be the simulated arrays I had originally proposed for
just the axis labels; with axis length thrown in, you'd now have
axes="s/t/pol" *and* axis_lengths="300/5/2", which IMHO might
marginally work for human inspection ("Could *this* data set work for
me?"), but clearly does not work at all for constraining ("Give me
all data sets with time axes longer than 50").

Another form of denormalization could be to limit the number of axis
and store labels and sizes within ivoa.obscore itself.  The
additional columns might look somewhat like this:

  ax1_label ax1_size ax2_label ax2_size  ax3_label ax3_size
  s            200      s        200       sp       50

Apart from the question what providers of 4d or 5d cubes would do,
the problem is that this would give extremely ugly query patterns;
the "time series with more than 50 samples" use case would work out
to something like

  (ax1_label='t' and ax1_size>50)
  or (ax2_label='t' and ax2_size>50)
  or (ax3_label='t' and ax3_size>50)

Not good either.  Even worse if we want to allow >3d.

So, I'm at a loss: For me, the +6-column proposal is much too
unflexible (again: What would be in there for my visibilities?
What's with Arnold's velocity axis? What's with instruments a couple
of years down the line?).  If it's not that, it seems to me from what
I can work out we'd need the extra table.

You're sure labeling the "discovery by number of samples" use case as
"20% functionality that's 80% of the effort" category and just
content ourselves with the axis labels in obscore *really* is not an
option?

Cheers,

         Markus



More information about the dm mailing list