Collaboration on Source Catalogue DM, ADQL and SkyNodes
Maria A. Nieto-Santisteban
nieto at skysrv.pha.jhu.edu
Thu Dec 22 05:39:50 PST 2005
Hi,
I'm on "vacations" with very limited internet connection so I have tried
to summarize in one single mail my comments. After Dec 28th I will
be back to a fast link in case my comments generate rivers of bits that I
cannot respond :-)
Cheers
Maria
Legend:
M - Maria
P - Pedro
MT - Mark Taylor
C - Clive
>M - How does Catalogue Data Model used look like, especially what is the
>M common set of attributes and the associated metadata.
>P The point is in the (Source) Catalogue Data Model, with emphasis in the
>P "Source" part. This one is the one I showed on behalf of the Catalogue
>P DM subgroup at our last interop meeting here at ESAC. I attach a pdf
>P with the initial proposal, but please use it only for temporal
>P reference, as the whole document will be changed (according to
>P requirements from Jonathan after the interop meeting).
>M Unfortunatelly, I'm in a dial-up connection and I cannot get the 6.6MB
pdf
>M but from Patricio's email and what I remember from the last IVOA I can
imagine.
>M Being more specific, what I am interested is to know how the mapping
>M "original catalog - SCDM" is done for its two aspects: scientific and
technical.
>M By scientific I mean: How did you map USNOB and Tycho-2 columns into
the model?
>M I'm very interested in seeing this mapping. This is the very first step
to
>M have mechanisms that allow for common query. If all collumns are called
the same
>M and represent the same, running engines asking the same ADQL question
>M is trivial.
>M By technical: Do the original catalogs remain the same and you compute
on the
>M fly the new columns? I assume some relationships "original-model" will
not be
>M direct. I personally would create new columns and pre-compute the
transformations
>M to make things faster but probably not all catalog providers are
willing to do so.
>M - What are the plans about registration? Will these nodes (Basic?) be
>M registered and therefore accessible through Open SkyQuery? How many?
>P yes, they will. How many, I don't know. In Strasbourg, Inaki and
>P Aurelien worked on a couple of them, Tycho-2 and USNOB, but the CDS
>P colleagues will work on more.... Francois will answer to this question
>P at some point I presume.
>M This is good but brings two issues:
>M - 1) If many Basic SkyNodes are going to be registered, we need to plan
>M how to do it.
>M - 2) Having a second USNOB skynode which is not exactly the same USNOB
as
>M the one currently working.
>M Both issues, how to deal with many skynodes and how to deal with
"mirrors" has
>M been "avoided" but it is about time we start attacking the problem.
>P n-catalogue cross-match is what we are trying to get at; it will be a
>P client based cross-match, and therefore the cross-match function will
be
>P designed and run at the client side (i.e., servers do not need to worry
>P about implementing one specific cross-match or the other).
>M The client based cross-match is a good idea. You cannot be dissapointed
with
>M your own specific cross-match. However, I wonder what is the plan
>M to cross-match your own "big" source catalog (let's say 700.000 rows
>M as Mark mentions) against USNOB 1000 millions rows (If I remmember
correctely)
>M If your objects are in a region, I can see making 1 query and get all
objects inside a
>M region or few but without that ... I hope the idea is not to make
700.000 ADQL queries.
>P At the current status, the client sends an ADQL to the server to
discern
>P which type of cross-match it can do with it (whether only positional,
>P positional with errors, etc.), and takes the corresponding action.
>M Let's see, ADQL is the language. In principle, an ADQL query will not
>M tell you what cross-match can be performed. You can use ADQL to gather
the
>M information you are thinking of like ra, dec, ra_err, dec_err, only
>M if the SkyNodes(databases) contain tables with this type of metadata. I
hope
>M the proposal to make this mandatory is successful and publishers
actually follow
>M it. In any case, what it is mandatory are the Tables and Columns
methods which
>M should give you this information, but that is not ADQL. It is a call to
a Web
>M service interface.
>MT STILTS provides this functionality from a command-line
>MT tool (tmatch2), but a public java API is also available for
>MT programs that want access to it within a JVM.
>M What would be worth a try is using Mark's library to set up a server
that
>M does the cross-match when providers don't want to use a DBMs, because
as
>M Clive mentioned "if the data are already in a relational DBMS
>M then by far the simplest way to do the cross-match, and in many cases
>M also the fastest, is to use R-tree indexing and a spatial join."
>M I will not get now into the R-tree indexing, HTM, Zones, Healpix debate
but
>M without a question if the data is already in a database then probably
will
>M be less bourden for the system doing the job that answer millions of
>M individual queries. This is the MyDATA skyNode approach which putting
aside
>M the problem of uploading big tables, it is much more efficient.
>M However, I'm kind of interested (proabably, eassier than working in
writting my thesis ;-))
>M in this other debate
>C Support for spatial indexing is now included in or readily available
for
>C DB2, Oracle, Informix, Sybase, MySQL, and Postgres, i.e. just about all
>C the DBMS widely used in astronomy (with perhaps just one exception,
>C which Jim can tell you about :-).
>M It would be nice to know what exactly widely mean.
>M So I volunteer to have an inventory (catalog :-) ) with information
>M about
> Catalog Name, Acess point (URL), Default position, DBMS, Host
Organization
>M This could give us an excellent test bed to compare data access and
>M cross-match functionalities provided by different DBMS and
>M organizations
>M So if you guys sent me a list with those 4 data points.
>M I will collect and make public the information. Since I'm a database
girl
>M please send me a file in CVS format if you have many catalogs and
>M I will import the data into a database.
>J But, getting objects into a node dominates all other costs (moving
>J stuff thru xml is expensive).
>C Indeed that is a very serious problem. I wonder if we can't solve this
>C by using, instead of XML, some more efficient data format, e.g. one
>C which holds tabular data in binary form with just the metadata in plain
>C text.
>C There's something called the "FITS table" with exactly these properties
>C which perhaps astronomers should investigate :-)
>M I do agree something needs to be done about this as well.
--
------------------------------------------------
Maria A. Nieto-Santisteban (nieto at pha.jhu.edu)
Johns Hopkins University
3400 N. Charles St.
Physics & Astronomy Department
Baltimore, MD 21218 (USA)
Tel: 1 410 516-7679 Fax: 1 410 516-5096
More information about the dm
mailing list