Severely suboptimal queries on reg.g-vo.org
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Tue May 21 13:44:49 CEST 2024
Dear Registry enthusiasts,
Since a couple of hours, someone queries reg.g-vo.org
programmatically from an amazon IP (35.173.228.154). The queries
look like pyvo queries for an ivoid constraint, and from the looks of
it you iterate over all ivoids.
For a one-shot operation, I don't mind that much. But this is severely
inefficient, and hence if you're building something that should run more
often, I'd appreciate if you did a TAP upload instead of the mass query
and the client-side loop. This will do the same thing, except it'll
only take a couple of seconds. Basically something like this (sketch,
probably not runnable as-is):
svc = pyvo.dal.TAPService("http://reg.g-vo.org")
identifiers = svc.run_sync("select ivoid from rr.resource").to_table()
query = """SELECT ivoid, res_type, short_name, res_title, content_level, res_description, reference_url, creator_seq, created, updated, rights, content_type, source_format, source_value, region_of_regard, waveband, string_agg(COALESCE ( access_url , '' ), ':::py VO sep:::') AS access_urls, string_agg(COALESCE ( standard_id , '' ), ':::py VO sep:::') AS standard_ids, string_agg(COALESCE ( intf_type , '' ), ':::py VO sep:::') AS intf_types, string_agg(COALESCE ( intf_role , '' ), ':::py VO sep:::') AS intf_roles
FROM rr.resource AS res
NATURAL LEFT OUTER JOIN rr.capability
NATURAL LEFT OUTER JOIN rr.interface
NATURAL LEFT OUTER JOIN rr.alt_identifier
NATURAL JOIN tap_upload.identifiers as ids
GROUP BY ivoid , res_type , short_name , res_title , content_level , res_description , reference_url , creator_seq , created , updated , rights , content_type , source_format , source_value , region_of_regard , waveband
LIMIT 20000"""
the_whole_mess = svc.run_async(query,
uploads={"identifiers": identifiers})
That's better for me (quite a bit less load), but in particular for you
(although your current method will make for really cool metrics on my
RegTAP endpoint:-).
Incidentally, when building bots of this kind, you're doing people like
me a favour if you give an indication of who you are in the user agent
string. In pyVO, you would do something like
pyvo.utils.http.DEFAULT_USER_AGENT = "GAVO foo harvester/0.1 (IVOA-test msdemlei at ari.uni-heidelberg.de)"
You'll find more on how this string should be formed in
https://ivoa.net/documents/Notes/softid/20210115/NOTE-softid-1.0-20210115.html#tth_sEc3.3
Thanks,
Markus
More information about the registry
mailing list