building a search engine

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Tue Oct 18 10:12:36 PDT 2005


In Madrid I brought up the topic of having a "last modification time" on
records returned from SSA and SIA. The intent is to allow on this to
get new or changed records - something needed to build a search engine,
for example. 

** buildng a search engine (SE)

To elaborate further, a useful SE on SSA and SIA would also need to find the 
following things for each record:

1. unique identifier that could be used sometime later to get the AccessReference
(ie to get the data or let a user get the data): 

- publisher ID is tied to the specific service, so one would need to keep the tuple of 
<resourceID, pubID> where resourceID lets you find the same service in the registry 
and pubID lets you find the record within that service.... Correct?

2. a globally unique "dataset ID" culd be used, but the SE would still need to know
which service(s) can deliver the record and data... plus specific implementations of a
SE might need specific things from the record not supplied by everyone that can deliver
the dataset (eg. I need spatial support, time bounds, and energy bounds to build my 
search engine - someone else might need more or less).... 

To support an SE, "mtime" needs to be a query parameter of the form mtime=MIN,MAX
with support for mtime=MIN, (for >=) and it has to be part of each record on output. Personally
I would like to see these as REQUIRED.

** using/getting AccessReference

In addition, if I build an SE that stores <resourceID,pubID> then I will also like to have a
fast way to convert them into AccessReference (URLs). I'm assuming the AccessReference 
one gets from the query is currently valid but not guaranteed to be valid indefinitely (publishers
may want/need to change data delivery, which I don't think should mandate changing 
the modification time). Specifically, it would be nice to be able to pass a list of pubID values to
a service and get one response, rather than have to issue separate queries and get one response
(VOTable) per pubID with one record each. With http get, the length of the list would be limited, of
course. 

Logically, I an SE will need pubID as a REQUIRED query and output parameter. List
support is an optimisation.

Thoughts? Comments?

I really hope this can get into SSA 1.0 and hence SIA 1.1,

-- 
Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)



More information about the dal mailing list