new RI document
Ray Plante
rplante at ncsa.uiuc.edu
Fri Jun 16 04:50:28 PDT 2006
On Fri, 16 Jun 2006, Paul Harrison wrote:
> I think that I have to agree with Noel here - keyword searches that
> give different results for different implementations over the same
> data set are very confusing for the end user - it lowers the
> confidence of the end user about the "completeness" of the registry
> coverage of the available resources.
I think we're making the wrong analogy here. People don't expect Google
to return the exact same results for a set of words as, say, Ask.com or
Yahoo or any other search engine. The reason is because each uses
different searching and sorting algorithms behind the scenes. That is the
very reason we have different results from our registries.
(Nor, BTW, do users expect Google to return the same answer to the same
keywords two days in a row.)
Furthermore, that difference in behavior is how they attract users--they
try to give the best results for their target community. Now we shouldn't
think of our registries as in competition for users; however, registries
do need to be able to innovate and gradually improve the effectiveness of
the keyword search. And there are a number of very useful techniques that
could be added, such as Soundex, as Noel suggested. However, these can
get complex, and there's no way we can mandate their use across different
implementations and back-end databases.
I think that a keyword search is understood to be imprecise and, frankly,
a bit magical. People don't understand how Google works; however, most
people like the results more than other engines. In the VO, if we want
more consistant and precise query, we should use the advanced interface.
I think that the fact that our advanced query interface to date has been
largely unusable (for user friendliness reasons) has shifted more of our
expectations onto the keyword search interface.
> I think that keyword search ought to be either over a fixed mandatory
> list of fields or perhaps as Noel suggests 'full text literal match
> search' - In the Google age, this is what people expect of a query
> that consists of a single word.
The latter is not possible as we have said that registries are not
obligated to search extensions they do not understand. However, searching
over more than the mandatory list is more useful than just than the
minimum.
The choice of the mandatory list was taken in the spirit of what are the
minimal requirements we place on a registry to call it compliant and
interoperable. We do want to encourage high quality implementations that
help sell the VO to users; however, the standard is not the right place to
do this. I think well-maintained registries will drift toward consistancy
as they share solutions that have been shown to be effective. Poorly
maintained registries will simply lose users.
cheers,
Ray
More information about the registry
mailing list