Implementation experience with SIA 2

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Mon Sep 29 18:49:25 CEST 2014


Reminder: the parameters you are critiquing are from the SIAv2 query 
capability. This capability is designed to query for datasets that match 
the specified conditions, so the parameters are inherently those that 
express the conditions and not (necessarily) those that describe some 
ideal (construct-able) data.

Comment/response inline

On 26/09/14 04:28 PM, Walter Landry wrote:
> Hello Everyone,
>
> Back in July, I sent a note to this list about some issues I had with
> SIA 2.0.  Since then, we have implemented a synthetic image generation
> service for the Planck satellite.  We tried to implement this is in a
> way consistent with SIA 2.0, but we had some difficulties.
>
> 1) BAND
>
>     The Planck satellite detector bands are all specified in GHz: 30,
>     44, 70, etc.  These are nice, integer numbers.  Mapping to
>     wavelength leaves me with numbers that are not exactly
>     representable in floating point.  This means that every search has
>     to give a range.  It would be nice if I could specify the frequency
>     instead of the wavelength.  We ended up using the keyword FREQ in
>     MHz, since I do not know of any astronomical observations of EM
>     radiation that go lower than that.

We've had this argument so many times and it always comes own to "just 
pick something". In ObsCore-1.0 the em_min and em_max fields are 
wavelength in meters, so that is what SIA-2.0 query uses to query those 
fields. No matter what you pick for the standard, someone has to 
transform nice looking values into something with scientific notation... 
Implementing FREQ in addition is fine for you; requiring it in the 
standard is more work for all services; implementing FREQ instead means 
services are not compatible or (at best) clients have to get/grok 
service capabilities before being able to call them.

> 2) RANGE
>
>     As I mentioned in July, RANGE is prohibitively expensive for this
>     data set.  So we do not support it, will never support it, and I
>     still think it should not be part of the spec.

I don't disagree with this; at best it is a convenience for making some 
common (large) polygons.

> 3) POS
>
>     Since this is a synthetic image generation service, it would be
>     nice to make rectangular images.  The current SIA 2.0 spec has no
>     great circle rectangles.  The client has to construct the polygons
>     themselves, which is non-trivial.  Given the negative reaction I
>     got to Box's last time, we ended up ditching SIA 2.0 for this
>     entirely and using SIA 1 syntax: POS, SIZE, CFRAME, CDELT (though
>     SPATRES would have been fine).  I would really prefer a better
>     mechanism than this.

Well, I basically grok what you re trying to do and it is just using a 
small bit of FTS WCS to describe what you want to get back. That seems 
to me to fit much better in an AccessData-ish service and not in a pure 
data discovery service. Since it seems to be driven from real data, 
there is obviously some part of the usage that would involve 
discovery... let's make sure to discuss this kind of usage in Banff next 
week.

> 4) TARGET vs OBJECT
>
>     Why does SIA 2.0 use the TARGET parameter?  OBJECT is an existing
>     standard FITS convention.

The query result is ObsCore; in there the name of the field being 
constrained is target_name, hence TARGET.

> 5) Syntax
>
>     Consider these issues:
>
>     a) In July, I highlighted a problem with the syntax of POS
>        parameters.  It requires spaces, which must be URL encoded or
>        things silently break.  Silent breakage is the worst kind of
>        breakage.

Syntax requires encoding, yes. Even without syntax, parameter values 
must be encoded to be safe or strange things happen. How many times have 
I cursed the IAU naming convention that includes + sign? Lost count :-) 
  Must encode.


>     b) Polygon searches use a straight list of numbers.  It would be
>        better to have a list of pairs to make typos more obvious.

It is possible to make syntax errors. More syntax solves it?

>     c) We need to be able to select multiple detectors at once, so we
>        would like to have an array of strings.

Not sure I follow... you said synthetic but now are talking about 
multiple detectors. If the underlying data is some kind of mosaic camera 
then you have several choices on what constitutes a single ObsCore 
entity (been there, we can discuss off-line), but describing the 
complexity of 1 observation -> N subarrays is not in the scope of 
ObsCore-1.0 so not in the scope of SIA-2.0 query. There is ObsCore-1.1 
work underway, plus the ImageDM and consequent SIAv2 "metadata" 
capability for exposing it.
>
>     d) There is no way to add arbitrary parameters.  COORD was the way
>        to do that in old versions of SIA 2.0.  Now I have to use up a
>        keyword and hope it does not accidentally conflict with new
>        versions of the standard.  This is not going to scale.

SELECT and COORD were never part of SIA-2.0 query; they were part of 
WD-AccessData-1.0 to show a way that SimDAL could be supported within 
that spec.


>     e) We have a smart client doing searches on behalf of the user.  In
>        general, we would like to set arbitrary metadata that are not
>        necessary for the search but convenient for the user.

You can always add custom fields to your ObsCore output. If they aren't 
really custom, but just in the optional fields of the appendix, then 
chosing standard names would be a god thing to do.


>     This prompted me to use a more general syntax to express queries.
>     Specifically, I used json5
>
>       https://github.com/aseemk/json5
>
>     It is an extension of JSON to make it friendlier to write.  It is a
>     strict superset of JSON and a strict subset of Javascript.  So
>     every valid JSON file is valid json5, and eval() will still work
>     for those of you foolish enough to run it on unverified user input ;)
>     To be specific, a sample query would be
>
>       http://irsa.ipac.caltech.edu/cgi-bin/Planck_TOI/nph-planck_toi_sia?POS=[0.053,-0.062]&CFRAME='GAL'&ROTANG=90&SIZE=1&CDELT=0.05&FREQ=44000&ITERATIONS=20&INSTRUMENT=['24m','24s']&TIME=[[0,55300],[55500,Infinity]]&USER_METADATA={CLIENT:'IRSA Smart Client'}
>
>     Note that the service is not public yet, so this URL will not work
>     for you yet.
>
>     Internally, every parameter is converted into a json5 element.  So
>     this would turn into the json5 document
>
>     {
>       POS:[0.053,-0.062],
>       CFRAME:'GAL',
>       ROTANG:90,
>       SIZE:1,
>       CDELT:0.05,
>       FREQ:44000,
>       ITERATIONS:20,
>       INSTRUMENT:['24m','24s'],
>       TIME:[[0,55300],[55500,Infinity]],
>       USER_METADATA:{CLIENT:'IRSA Smart Client'}
>     }
>
>     Modulo whitespace, this is just replacing '&' with ',' and '=' with
>     ':'.  We also support submitting a json5 document directly
>
>       http://irsa.ipac.caltech.edu/cgi-bin/Planck_TOI/nph-planck_toi_sia?{POS:[0.053,-0.062],CFRAME:'GAL',ROTANG:90,SIZE:1,CDELT:0.05,FREQ:44000,ITERATIONS:20,INSTRUMENT:['24m','24s'],TIME:[[0,55300],[55500,Infinity]],USER_METADATA:{CLIENT:'IRSA Smart Client'}}

This is interesting and I remember talking about json5 at the last 
interop. Hopefully we can see/discuss this further in Banff.

>     On a side note, I have used JSON (not json5) as input for
>     simulations in geology [1].  The SAMRAI Adaptive Mesh Refinement
>     framework for massively parallel simulations [2] also uses a format
>     that is almost indistinguishable [3] from json5 for input files.
>     So I would claim that json5 would be able to cover any needs that
>     SIMDAL would need in specifying model parameters.
>
> Given all of the issues that I ran into, it is not clear to me that it
> would be a good idea to ratify SIA 2 as it is now.  Whether or not you
> like the changes I made, there seem to be some major deficiencies that
> need to be addressed before the standard can be accepted.
>
> Cheers,
> Walter Landry
>
>
> [1] http://geodynamics.org/cig/software/gale
> [2] https://computation-rnd.llnl.gov/SAMRAI/index.php
> [3] You can separate elements in an array or object with newlines instead of commas.
> .
>

-- 

Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2E7

250-363-0044 (office) 250-363-0045 (fax)


More information about the dal mailing list