new RI document

Ray Plante rplante at ncsa.uiuc.edu
Fri Jun 16 04:50:28 PDT 2006


On Fri, 16 Jun 2006, Paul Harrison wrote:
> I think that I have to agree with Noel here - keyword searches that  
> give different results for different implementations over the same  
> data set are very confusing for the end user - it lowers the  
> confidence of the end user about the "completeness" of the registry  
> coverage of the available resources. 

I think we're making the wrong analogy here.  People don't expect Google 
to return the exact same results for a set of words as, say, Ask.com or 
Yahoo or any other search engine.  The reason is because each uses 
different searching and sorting algorithms behind the scenes.  That is the 
very reason we have different results from our registries.  

(Nor, BTW, do users expect Google to return the same answer to the same 
keywords two days in a row.)

Furthermore, that difference in behavior is how they attract users--they 
try to give the best results for their target community.  Now we shouldn't
think of our registries as in competition for users; however, registries 
do need to be able to innovate and gradually improve the effectiveness of 
the keyword search.  And there are a number of very useful techniques that 
could be added, such as Soundex, as Noel suggested.  However, these can 
get complex, and there's no way we can mandate their use across different 
implementations and back-end databases.

I think that a keyword search is understood to be imprecise and, frankly, 
a bit magical.  People don't understand how Google works; however, most 
people like the results more than other engines.  In the VO, if we want 
more consistant and precise query, we should use the advanced interface.  
I think that the fact that our advanced query interface to date has been 
largely unusable (for user friendliness reasons) has shifted more of our 
expectations onto the keyword search interface.

> I think that keyword search ought to be either over a fixed mandatory  
> list of fields or perhaps as Noel suggests 'full text literal match  
> search' - In the Google age, this is what people expect of a query  
> that consists of a single word.

The latter is not possible as we have said that registries are not 
obligated to search extensions they do not understand.  However, searching 
over more than the mandatory list is more useful than just than the 
minimum.  

The choice of the mandatory list was taken in the spirit of what are the 
minimal requirements we place on a registry to call it compliant and 
interoperable.  We do want to encourage high quality implementations that 
help sell the VO to users; however, the standard is not the right place to 
do this.  I think well-maintained registries will drift toward consistancy 
as they share solutions that have been shown to be effective.  Poorly 
maintained registries will simply lose users.  

cheers,
Ray





More information about the registry mailing list