building a search engine

Markus Dolensky Markus.Dolensky at
Wed Oct 19 05:05:10 PDT 2005


Before commenting on Pat's search engine use case here's where one can 
find the latest info:
DAL presentations of the respective interop session at ESAC are here
- many thanks to the authors for promptly providing them. This includes 
the minutes with action items related to Pat's proposal
Finally note that, Francesco has added the sample files of his demos.

Patrick Dowler wrote:
> In Madrid I brought up the topic of having a "last modification time" on
> records returned from SSA and SIA. The intent is to allow on this to
> get new or changed records - something needed to build a search engine,
> for example. 

My perception when adding your idea to the DAL minutes was that a query 
parameter MTIME=<interval> and a corresponding output parameter was 
generally considered an excellent enhancement and it's merely a matter 
of agreeing how to do it.

> 1. unique identifier that could be used sometime later to get the AccessReference
> (ie to get the data or let a user get the data): 
> - publisher ID is tied to the specific service, so one would need to keep the tuple of 
> <resourceID, pubID> where resourceID lets you find the same service in the registry 
> and pubID lets you find the record within that service.... Correct?

There is an action to clarify the meaning of CREATORID and PUBID since 
Doug and Jonathan had slightly different expectations. Therefore, I'd 
like to ask them to agree on a (uniform) answer to point #1.

> 2. a globally unique "dataset ID" culd be used, but the SE would still need to know
> which service(s) can deliver the record and data... plus specific implementations of a
> SE might need specific things from the record not supplied by everyone that can deliver
> the dataset (eg. I need spatial support, time bounds, and energy bounds to build my 
> search engine - someone else might need more or less).... 
> To support an SE, "mtime" needs to be a query parameter of the form mtime=MIN,MAX
> with support for mtime=MIN, (for >=) and it has to be part of each record on output. Personally
> I would like to see these as REQUIRED.

In general, this is how such range conditions should be specified:
example1: MTIME=lo,hi  # bounded range
example2: MTIME=lo,    # bigger or equal to lo
example3: MTIME=,hi    # smaller than or equal to hi

> ** using/getting AccessReference
> In addition, if I build an SE that stores <resourceID,pubID> then I will also like to have a
> fast way to convert them into AccessReference (URLs). I'm assuming the AccessReference 
> one gets from the query is currently valid but not guaranteed to be valid indefinitely (publishers
> may want/need to change data delivery, which I don't think should mandate changing 
> the modification time). Specifically, it would be nice to be able to pass a list of pubID values to
> a service and get one response, rather than have to issue separate queries and get one response
> (VOTable) per pubID with one record each. With http get, the length of the list would be limited, of
> course. 

> Logically, I an SE will need pubID as a REQUIRED query and output parameter. List
> support is an optimisation.

Unless there are objections I'll turn the parameter specification of 
PUBID and CREATORID into type 'comma separated list' in the SSA 
interface doc. This again requires a final word on the meaning of the 
two parameters. Presumably chances are dim that this will break already 
existing services(?)

Let me try to work out what REQUIRED means in this context:
A service needs to recognize query parameter MTIME. If there is no MTIME 
value - for instance, because a mosaic is computed on the fly  (=> 
virtual data) - then the service must not produce an error but ignore 

- Markus

More information about the dal mailing list