II ) Formats and vocabularies
bonnarel at alinda.u-strasbg.fr
bonnarel at alinda.u-strasbg.fr
Sun Oct 26 19:48:02 PDT 2008
Part A on Ucs and utypes
--------------------------------------------------------------------------------
*Step 1 FB answers to Fabien
---------------------------------------------------------------------------------
> 1.2 Stop using ucd and utypes. What is needed is a unique,
> straightforward >identifier. UCD and uTypes are derived from complex
> theoretical consideration >making them difficult to parse and
> understand. In practice, experience shows that software developers
> use both of them only as >a static string identifier to identify a
> field (i.e. softwares don't try to >make use of the hierachy of
> classes). For this purpose, the JSON variable name >is just enough,
> and all it needs is to be clear and self-explanatory, e.g.
> >'instrument' instead of 'meta.id;instr' or 'ssa:DataID.Instrument'.
> >'centralPosition' instead of 'pos.eq.ra;meta.main' and
> 'pos.eq.dec;meta.main' etc..
----> Oh !!! I strongly disagree as you can imagine.
1 ) Ucd and utypes are not at all the same thing
Ucd is a standard denomination for physical quantities.
If you want to see that at utypes it is only the utype of the General
model of physical quantities. But many piece of data/metadata Structure
may have the same UCD. It is there mainly to allow comparison Of values
in different fields/variables when they give the same physical
quantity. You cannot avoid them totally.
Utypes are totally different. They are there to point to
attributes/classes in a given datamodel in a non hierarchical context:
Ascii, Votable , relational database or FITS.
Just to take the latter, the utype in FITS context is an IVOA
standardize FITS KEYWORD. See the FITS serialisation of Spectrum For
details.
The fact that there is no general standard for the utype
syntax, or that there are short version (FITS keywords) and long
version of utypes is another issue, AND DON'T HAVE To be confused with
the usefulness of the utype concept. A discussion on syntax will take
place in Baltimore (Mireille
?)
In a hierarchical context now (Xml or JSON) you definitely
don't need utypes and the Xml indeed don't have them...
--------------------------------------------------------------------------------
*Step 2 Fabien answers to FB
-------------------------------------------------------------------------------
OK, I meant in a hierarchical context only indeed.
---------------------------------------------------------------------------------
*Step 3 Juan de Dios advice
--------------------------------------------------------------------------------
Lets get into the semantics: utypes are only needed if you are
serialising into a VOTable... and in any case you would need anything
similar to utypes in that case. Of course, VOTable is not the best XML
you can do with Characterisation, which I believe it must be provided,
unless otherwise required, in its own XML flavour. In that case, there
is nothing to be gained by the JSON representation versus the XML one.
What we _do need_ to do is promote Char with things which really answer
user needs.
And for leaving out UCDs... I think UCDs must evolve, but I do think
there is a lot of value in them, specially from the new suggestions
that arise, and from an intersection with the emerging IVOA
vocabularies.
----------------------------------------------------------------------------------
Part B Formats: xml, votable and JSON
-----------------------------------------------------------------------------------
*Step1 FB answers to fabien
-----------------------------------------------------------------------------------
> 1.1 For increasing readability allow a serialization in the JSON
> format. JSON is very easy to parse, and also to be read by humans.
> Note that it is >possible to convert JSON to xml if needed. See
> json.org for more info.
----> very good point. After the discussions we had in
Cambridge Thomas Boch made an attempt to take a char example in xml and
convert it in JSON the result is much less verbose (no closing tags eg)
and more human readable Of course it is just an attempt, not a standard
yet. I attach Thomas' work here....
In my opinion JSON is JAS (Just another serialisation) not THES (The
serialisation !). If people have xml libraries and ready to read the
xml or are happy with VOTABLE let them keep their preferred format.
"characterisation": {
"characterisationAxis": [
{
"accuracy": {
"statError": {
"ErrorRefVal": {
"stc:Error2": {
"stc:C1": ".000055",=20
"stc:C2": ".000055"
}
},=20
"flavor": "statistical"
}
},=20
"axisName": "spatial",=20
"calibrationStatus": "CALIBRATED",=20
"coordsystem": {
"id": "TT-ICRS-WAVELENGTH-TOPO",=20
"xlink:href": =
"ivo:\/\/STClib\/CoordSys#TT-ICRS-TOPO",=20
"xlink:type": "simple"
},=20
"coverage": {
"bounds": {
"limits": {
"Coord2VecInterval": {
"stc:HiLimit2Vec": {
"stc:C1": "190.37601",=20
"stc:C2": "11.369167"
},=20
"stc:LoLimit2Vec": {
"stc:C1": "190.37157",=20
"stc:C2": "11.364722"
}
},=20
"coord_system_id": "TT-ICRS-WAVELENGTH-TOPO"
}
},=20
"location": {
"coord": {
"coord_system_id": =
"TT-ICRS-WAVELENGTH-TOPO",=20
"stc:Position2D": {
"stc:Name1": "RA",=20
"stc:Name2": "Dec",=20
"stc:Value2": {
"stc:C1": "190.37379",=20
"stc:C2": "11.366944"
}
}
}
}
},=20
"independentAxis": "true",=20
"numBins2": {
"I1": "16",=20
"I2": "16"
},=20
"regularsamplingStatus": "true",=20
"resolution": {
"resolutionRefVal": {
"stc:Resolution2": {
"stc:C1": "1.4",=20
"stc:C2": "1.4"
}
},=20
"unit": "arcsec"
},=20
"samplingPrecision": {
"samplingPrecisionRefVal": {
"samplingPeriod": {
"stc:C1": "1.0",=20
"stc:C2": "1.0"
}
},=20
"unit": "arcsec"
},=20
"ucd": "pos",=20
"undersamplingStatus": "false",=20
"unit": "deg"
},=20
{
"axisName": "time",=20
"calibrationStatus": "UNCALIBRATED",=20
"coordsystem": {
"idref": "TT-ICRS-WAVELENGTH-TOPO"
},=20
"coverage": {
"location": {
"coord": {
"coord_system_id": =
"TT-ICRS-WAVELENGTH-TOPO",=20
"stc:Time": {
"stc:TimeInstant": {
"stc:ISOTime": "2004-05-24T22:23:58"
}
}
}
}
},=20
"independentAxis": "true",=20
"numBins1": "1",=20
"ucd": "time",=20
"unit": "none"
},=20
{
"accuracy": {
"statError": {
"ErrorRefVal": {
"stc:Error": "0.0001"
},=20
"flavor": "statistical"
}
},=20
"axisName": "spectral",=20
"calibrationStatus": "CALIBRATED",=20
"coordsystem": {
"idref": "TT-ICRS-WAVELENGTH-TOPO"
},=20
"coverage": {
"bounds": {
"limits": {
"CoordScalarInterval": {
"stc:HiLimit": "0.56548382",=20
"stc:LoLimit": "0.4140"
},=20
"coord_system_id": "TT-ICRS-WAVELENGTH-TOPO"
}
},=20
"location": {
"coord": {
"coord_system_id": =
"TT-ICRS-WAVELENGTH-TOPO",=20
"stc:Spectral": {
"stc:Value": "0.4858137"
}
}
}
},=20
"independentAxis": "true",=20
"numBins1": "2048",=20
"regularsamplingStatus": "false",=20
"resolution": {
"resolutionBounds": {
"resolutionLimits1": {
"stc:HiLimit": "101.142",=20
"stc:LoLimit": "48.3233"
}
},=20
"resolutionRefVal": {
"stc:Resolution": "78.6162"
},=20
"unit": "km\/s"
},=20
"samplingPrecision": {
"samplingPrecisionRefVal": {
"samplingPeriod": "40.0000"
},=20
"unit": "km\/s"
},=20
"ucd": "em",=20
"undersamplingStatus": "false",=20
"unit": "um"
},=20
{
"accuracy": {
"statError": {
"ErrorBounds": {
"ErrorLimits1": {
"stc:HiLimit": "1.12e-16",=20
"stc:LoLimit": "5.80e-19"
}
},=20
"ErrorRefVal": {
"stc:Error": "5.63e-17"
},=20
"flavor": "statistical"
}
},=20
"axisName": "flux",=20
"calibrationStatus": "UNCALIBRATED",=20
"coordsystem": {
"id": "UNKNOWN"
},=20
"coverage": {
"bounds": {
"limits": {
"CoordScalarInterval": {
"stc:HiLimit": "1.1838107e-14",=20
"stc:LoLimit": "-2.8933970e-15"
},=20
"coord_system_id": "UNKNOWN"
}
},=20
"location": {
"coord": {
"coord_system_id": "UNKNOWN",=20
"stc:ScalarCoordinate": {
"stc:Value": "2.3519587e-17"
}
}
}
},=20
"independentAxis": "false",=20
"numBins1": "0",=20
"regularsamplingStatus": "true",=20
"ucd": "phot",=20
"undersamplingStatus": "false",=20
"unit": "counts"
}
]
}
}
--------------------------------------------------------------------------------
*Step2 Fabien answers FB
--------------------------------------------------------------------------------
No problem with that (at least for xml). VOTable simply doesn't fit the
needs to have structured information.
Fabien's metadata in JSON in the same mail ....
// Beginning of the main astrox structure
{
// Unique identifier for this astrox file. It points to a valid
permanent URL where this file is located
"astroxUri":"http://archive.eso.org/astrox/MY_NGASID",
// Last modif date (UTC) of this file, useful for automatic updating
by e.g. crawlers
"timeStamp" : "2008-07-01 16:34:15",
"title": "Observation of galaxy clusters blablabla",
"creator": "ESO/ADP",
"creationDate": "2006-01-01 16:34:15",
"publisher": "ESO SAF",
"copyrights": "ESO (C) 2008",
"collection": "QC UVES Spectra",
"accessRights": "Anonymous",
"originalSource": "http://www.eso.org/astrox/MY_NGASIDXX",
"instrumentSetup":
{
"facility": "ESO"
"telescope": "VLT/UT1",
"instrument": "WFI",
"filter": "V",
"mode": "Image",
"mvm":
{
"param1": "maValeur"
}
}
// Contain high-level meta-data on the characterization of the data set
"characterization":
{
// Link to the binary low-level char file
"lowlevelcharacterizationUrl": "http://www.eso.org/astrox/MY_NGASID.astrox",
// 2D spatial direction Ra, Dec in ICRS (deg)
"spaceAxis":
{
"centralPos": [10.15, -20.18],
// Union of convex polygons containing all the relevant data
"boundingBox": [[[-36.52279463, -0.93122968], [-36.73541852,
-0.9630006], [-36.767187, -0.75040453], [-36.55457444, -0.71863535]]],
// Define a (possibly hierachical) preview image using the format
implemented in Stellarium
// The description of preview should not be considered as
scientificaly valid characterization.
// It is somewhat subjective and should be used only for display
"preview":
{
"credit" : "Grasslands Observatory",
"imageUrl" : "nebulae/default/m2.png",
"worldCoords": [[[-36.52279463, -0.93122968], [-36.73541852,
-0.9630006], [-36.767187, -0.75040453], [-36.55457444, -0.71863535]]],
"textureCoords" : [[[0,0], [1,0], [1,1], [0,1]]],
"minResolution" : 0.2148810463,
"maxBrightness" : 13.9,
"alphaBlend" : true,
"subTiles":
[
"http://mySubTile1.json",
"http://mySubTile2.json"
]
},
// Define a (possibly hierachical) footprint geometry. A footprint
is like the thresholding of the (spatial in this case) transmission
curve
"footprint":
{
"worldCoords": [[[-36.52279463, -0.93122968], [-36.73541852,
-0.9630006], [-36.767187, -0.75040453], [-36.55457444, -0.71863535]]],
"minResolution" : 0.2148810463,
"subFootprint":
[
"http://mySubFootprint1.json",
"http://mySubFootprint2.json"
]
}
},
// lambda in heliocentric standard of rest (m)
"wavelengthAxis":
{
"centralPos": 1235.5,
// Union of ranges containing all the relevant data
"boundingBox": [[1235.6, 1235.458]],
// Transmission curve: two arrays in m/value between 0 and 1. Zero
is assumed oustide the array.
// I provided this one as an example with values between 0 and 1,
but I think the correct way should be to give the value in maximum SNR
or error
"transmission": [[12255,12258,122569,12289],[0,0.125,0.56,0.236,0]]
},
// time in TT (s)
"timeAxis":
{
// Union of ranges containing all the relevant data
"boundingBox": [[256368.6, 256368.6]]
},
// intensity (W?)
"intensistyAxis":
{
// Union of ranges containing all the relevant data
"boundingBox": [[123.6, 124.6]]
},
// 3D observer position X,Y,Z in ICRS heliocentric (m)
"observerPosAxis":
{
// Union of 3D convex polygons containing the observer position
"boundingBox": [[[10023.6, 124.6, 1235.0], [10023.6, 124.6, 1235.0],
[10023.6, 124.6, 1235.0], [10023.6, 124.6, 1235.0], [10023.6, 124.6,
1235.0], [10023.6, 124.6, 1235.0], [10023.6, 124.6, 1235.0], [10023.6,
124.6, 1235.0]]]
}
},
// The list of all the data files (including calibration) associated
to this data set
"dataSources":
{
"mainDataSet":
{
"format": "application/fits",
"FITShdu": 1,
"url": "http://data.eso.org/myFileID.fits",
"fileSize": 512.12,
// Follow the numpy array interface
http://numpy.scipy.org/array_interface.shtml
"shape": [1024, 256],
"typestr": "f"
}
},
// Other astrox instances which are logically childs of the current
one, e.g. multi-extension FITS files
// or element of a survey etc..
// The only constraint is that the characterization of the parent
'contains' the union of the characterization of the childs.
// This is the basic concept which allows to search through trees of
datasets using neutral characterization descriptors
"subAstrox":
{
"chip1": "http://www.eso.org/astrox/MY_OTHER_NGAS_ID",
"chip2": "http://www.eso.org/astrox/MY_NGASID#chip2",
"chip3":
{
// Full included astrox structure can be put here as well
}
},
// Info on the target if any
"target":
{
// TODO in relation with the Observation data model
},
// FITS keywords.
"FITS":
{
"NAXIS": 2,
"NAXIS1": 500,
"NAXIS2": 500,
"CRVAL1": 10.23,
"CRVAL2": 156.23,
"CRPIX1": 12.3,
"CRPIX2": 156.36,
"CTYPE1": "RA---TAN",
"CTYPE2": "DEC--TAN",
"CD1_1": 15.2,
"CD1_2": 0.02,
"CD2_1": 269.2,
"CD2_2": 0.01,
"RADECSYS": "ICRS",
"EPOCH": "2000",
"EQUINOX": "J2000"
// Etc..
},
// ESO specific meta-data
"ESO":
{
"PIName": "Gerard Dupont",
"transmissionCurveURL": "http://myCurveService.xml",
"programID": "XXXX.DDD-ABC",
"OBName": "string",
"OBID": "1256",
"category": "SCIENCE", // ESO DPR CAT
"mode": "SPECTRUM", // ESO DPR TECH
"type": "OBJECT", // ESO DPR TYPE
"processingType": "highlyProcessed",
"accessFlag": "Anonymous"
}
}
--------------------------------------------------------------------------------
*Step 3 Fabien's answer to Anita
--------------------------------------------------------------------------------
> VOTable is generally well-liked. Hence I am against adopting yet
> another language...
If VOTable would allow to store structured information I would also be
against a new format. But there is here a real new need, and nothing is
currently used for it, excepted for prototypes. We have basically the
choice between xml, JSON and other serializations. Before the choice is
taken, I just wanted to point out the qualities of JSON.
--------------------------------------------------------------------------------
*Step 4 Anita replies
----------------------------------------------------------------------------------
I appreciate that Fabien, but it is a matter of who is going to ahve to
understand it. Astronomers can use VOTable precisely because it is
simple. VO engineers and software experts can use whatever is best
within their domains, but if anyone outside - even data publishers,
since most archives only have part-time maintainers at best - then it
has to be already widely used. Astronomers are not usually going to
learn a new language just for VO. We are not yet seen as that
indispensible.
--------------------------------------------------------------------------------
*Step 5 Fabien again
--------------------------------------------------------------------------------
if the VO would work, astronomer would not have to see how it works.
They would just use the tools. So the main users for such a format are
the engineers making the tools and the data providers who expose their
data.
But the main problem is not even here. The real problem is that VOTable
simply doesn't suit the need for characterization. VOTable is good for
tabular data, such as a SIA output, but not for structured data which
is what we have here.
(A bit off topic: JSON is actually very widely used for web
applications and there are about 10 parsers libraries for each major
programming languages, and it took me half a day to code my own parser)
------------------------------------------------------------------------------
*Step 6 Igor's comment
------------------------------------------------------------------------------
1) I'm strictly against JSON or whatever pseudo-XML. Characterisation
(or whatever other, e.g. STC) metadata is not supposed to be
human-readable. If it's looks too complex for the data providers, they
have to fire their software engineers and hire more qualified ones.
---------------------------------------------------------------------------------
*Step 7 Fabien answers to Igor
-------------------------------------------------------------------------------
The only objective way to choose a serialization format is to measure
the pro and con for each class of users. Ease of use and
human-readableness are 2 very important criteria for engineers.
-------------------------------------------------------------------------------
*Step 8 Gretchen on VOTABLE
------------------------------------------------------------------------------
The VOTable data model is however in my view more a transport mechanism
that provides a simple framework for higher level abstraction and
generalization. The specific data models which characterize region,
time, spectral distribution, etc.
need to account for the complexity and heterogeneity or information is
lost. I don't see how this can not be obvious.
-------------------------------------------------------------------------------
*Step 9 Juan de Dios comment
-------------------------------------------------------------------------------
But I don't think JSON is much better than XML for readability, and I
think is more fragile than XML in case of partial truncation. And
relationships (hierarchical or purely relational) have to be specified
by foreign keys, which hamper readability.
In any case, you can see JSON notation and XML notation as
complementary, one syntactic sugar for the other, not as something that
really gains you anything from the implementation or human readability
point of view.
-----------------------------------------------------
*Step 10 FB eventually to Fabien
--------------------------------------------------------------------------
Model and FORMAT are not the same thing.
Contrary to Igor I have nothing against A JSON format for a given datamodel.
But it's definitly to early (at least) to say that this is to replace xml
Why should we not just add JSON as a new serialisation BESIDE the
previous ones ?
(I feel like repeating what I already wrote OK ....)
--------------------------------------------------------------------------------
*Step 11 FB on VOTABLE and datamodels
--------------------------------------------------------------------------------
By gathering all this discussion on Formats, it occurs to me I didn't
say a word in favor of VOTABLE.
I strongly disagree with the statement that VOTABLE is not suited to
transporting modelized metadata. Of course structuration has limits in
VOTABLE but is not at all impossible. This has been shown IN PRACTICE
several times.
Of course the VOTABLE "model" is nothing more than a static version of
relational model, and the actual metadata model semantic is conveyed by
the values and not by the tags. But utypes, GROUPS combinations with
RESOURCES, key/reference mechanism indeed allow some structuration.
There is a LOT of VO litterature and references as well as implementations
doing that.
Just a couple of them:
The recent Note by Ochsenbein, Rots and McDowell
"Referencing STC in VOTABLE"
SSA recommendation describes utypes and their meaning and
the Extension DAL mechanism and its relationships with models.
Spectrum DM recommendation provides some rules to map a Datamodel
in VOTABLE.
Older references are the IVOA notes:
"DAL query Response with Extensions: Use cases and implementation
rules. Example of SIAP"
and "Data Model serialisation in VOTable"
So beside FITS, Xml and maybe JSON, VOTABLE is still another serious
candidate to modelized metatada serialisation. Don't rule it out.
More information about the dm
mailing list