From gpdf at ipac.caltech.edu Tue Aug 9 12:13:36 2022 From: gpdf at ipac.caltech.edu (Dubois-Felsmann, Gregory P.) Date: Tue, 9 Aug 2022 10:13:36 +0000 Subject: UCDs and DataLink Message-ID: On https://github.com/ivoa-std/DataLink/issues/89, I raised, on behalf of my colleague Russ Allbery who had noticed the point, an issue about an inconsistency between the prescribed UCD of a column in the _output_ of "{links}" resources and its use as an _input_ to a SODA service. This came up when we were implementing both of these services in the Rubin Science Platform. Comments on the specific proposal I made in the GitHub issue should stay there, but in his response, Markus reacted to a more general point I made in the issue, and suggested that there might be something for us to discuss on a mailing list, regarding the larger issue of whether we care about such compatibility issues at all, and what the purpose of UCDs is. As I said on the GitHub issue, from a client software perspective, it seems plausible that a client should take a service descriptor's word as law, and not try to second-guess whether a column named by a service descriptor is UCD-compatible with the service parameter in which it is intended to be used. It might only be for validators to comment on such issues. But Markus made a stronger point, which I think I can go ahead and quote, since the issue is public: > I'd argue against making any requirements on UCDs [in these sorts of contexts]. > UCDs are mainly intended for ad-hoc or discovery use, like: > > * a client sees an arbitrary VOTable and wants to get an idea what > kind of physics is represented in order to suggest plots for it, > perhaps match it with columns in other tables or similar. > > * a client is looking for tables having a particular sort of data in > the registry. I'd like to push back a bit on how narrow that field of applicability is. In particular, in the context of service descriptors, we are finding, as we actually implement the DataLink-intensive design of the Rubin Science Platform (RSP), that client software often needs some hints as to how to present, in a UI, the "optional" parameters to a service named in a service descriptor. If a service descriptor represents a service for which there is an IVOA standard, of course, the client software UI can be written against the whole of that standard. But if (as is _very_ frequently the case for us) the service descriptor points to a custom data service, the UCDs can be useful in providing rendering hints without our having to hard-code the client software against the specifics of the custom data services. We _could_ do the latter, but it creates tighter coupling and fragility in our deployments, and I would also like non-RSP client software (e.g., TOPCAT) to have a reasonable chance at providing good UIs for our service descriptors. Looking forward to the conversation... Gregory Dubois-Felsmann From m.b.taylor at bristol.ac.uk Fri Aug 12 10:54:27 2022 From: m.b.taylor at bristol.ac.uk (Mark Taylor) Date: Fri, 12 Aug 2022 09:54:27 +0100 (BST) Subject: DataLink local_semantics optional column proposal Message-ID: <9d4d5bca-eb2a-47ca-b1f9-cc409e76c77@andromeda.star.bris.ac.uk> Hi DAL, This message is to propose a new optional column in the {links} response table of the DataLink standard. I initially raised it as a github issue; you can see a bit more discussion at https://github.com/ivoa-std/DataLink/issues/88, and an earlier incarnation on slide "7/8" of my Victoria 2018 presentation https://wiki.ivoa.net/internal/IVOA/InterOpMayy2018DAL/dlfeedback.pdf. The problem I want to solve is to do with looking at links tables from multiple different rows of a parent table; given a row from a links table from one parent table row, how does a client identify the corresponding row in the links table from a different parent table row? For instance: Gaia DR3 queries on gaia_source can return a service descriptor associating a links table with each row. For one source (i.e. parent table row) that links table might look like: semantics, description, [other cols] ---------, -----------, ------------ #this, MCMC MSC source Gaia DR3 4040949706019490560, ... #this, XP mean sampled spectra source Gaia DR3 4040949706019490560, ... #this, XP mean continuous spectra source Gaia DR3 4040949706019490560, ... #this, MCMC GSP-Phot source Gaia DR3 4040949706019490560, ... and for another row like: semantics, description, [other cols] ---------, -----------, ------------ #this, MCMC MSC source Gaia DR3 4040165887420469760, ... #this, XP mean continuous spectra source Gaia DR3 4040165887420469760, ... #this, XP mean sampled spectra source Gaia DR3 4040949706019490560, ... If a user selects e.g. the "XP mean sampled spectra" datalink item for the first source they will probably want the same thing when they look at the next source, and it would be nice for a client like topcat to be able to default to the corresponding links row rather than forcing the user to select manually for each source. It's obvious to a human which this corresponding row is, but at present there is no reliable way for software to identify it (admission: topcat currently does some unholy partial string matching on the description column hacked to do the right thing for the ESA Gaia DR3 service). So I'd like to see an additional column to facilitate this. Markus has suggested the following definition for such a column: column name: local_semantics type: text UCD: meta.id.assoc description: An identifier that allows clients to associate rows from different datalink documents on the same service with each other. The above examples might then look like: semantics, local_semantics, description, [other cols] ---------, ---------------, -----------, ------------ #this, 1, MCMC MSC source Gaia DR3 4040949706019490560, ... #this, 3, XP mean sampled spectra source Gaia DR3 4040949706019490560, ... #this, 4, XP mean continuous spectra source Gaia DR3 4040949706019490560, ... #this, 2, MCMC GSP-Phot source Gaia DR3 4040949706019490560, ... semantics, local_semantics, description, [other cols] ---------, ---------------, -----------, ------------ #this, 1, MCMC MSC source Gaia DR3 4040165887420469760, ... #this, 3, XP mean sampled spectra source Gaia DR3 4040165887420469760, ... #this, 4, XP mean continuous spectra source Gaia DR3 4040165887420469760, ... The intention is that within a given context (parent table at least, perhaps data service or similar) the same local_semantics value is unique per links response table and always means the "same" thing (corresponding type of data product). The content could be either an opaque value like the numeric tokens in the example above or some more descriptive text (I'd be inclined to allow any data type for this column rather than requiring text content, but I'm not adamant). Such a column would be strictly optional and supplied on a best efforts basis by the service. It could be documented in a future version of the DataLink standard, but until then data providers and consumers could agree informally to make use of it. Since links response tables are allowed to contain non-standard columns, this would not infringe any standards. Any comments? Assuming some agreement or lack of disagreement is established here about the general idea and specifics, then if at least one data provider implements this I will add code in topcat to make use of it. Thanks Mark -- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b.taylor at bristol.ac.uk http://www.star.bristol.ac.uk/~mbt/ From m.b.taylor at bristol.ac.uk Fri Aug 12 11:25:40 2022 From: m.b.taylor at bristol.ac.uk (Mark Taylor) Date: Fri, 12 Aug 2022 10:25:40 +0100 (BST) Subject: Row count in TAP_SCHEMA In-Reply-To: References: Message-ID: It looks to me like agreement or lack of disagreement on this. I have provisionally added code to topcat to read and use the column "nrows" in tap_schema.tables if it is present (the value is displayed under the heading "Rows (approx)" in the Table tab of the TAP window Use Service panel). Pre-release with this feature available here: http://andromeda.star.bristol.ac.uk/releases/topcat/pre/topcat-full.jar If some TAP service implements this, please let me know and I'll check it works properly. Mark On Wed, 13 Jul 2022, Mark Taylor wrote: > That's fine by me. The nrows attribute in VODataService is documented: > > Meaning > The approximate size of the table in rows. > Comment > This is not expected to be exact. For instance, the estimates > on table sizes databases keep for query planning purposes are > suitable for this field. > > so a similar definition for the TAP_SCHEMA equivalent would make > good sense. > > Mark > > On Wed, 13 Jul 2022, Patrick Dowler wrote: > > > +1 on implementing now and adding to TAP-next (I intend to make a TAP_next > > wiki page asap; will announce) > > > > Can I assume that the definition of "nrows" would be that it is > > approximate? In most cases I would have to > > implement a periodic update to set the value based on current content so > > the value returned could be out of date > > wrt. reality (eg not agree with "select count(*) from ". > > > > I am thinking about cases where there are millions of rows and the count > > changes by thousands each day. My gut > > says a daily update would be feasible... maybe a few times per day. > > Probably not less frequent than 1/day. > > > > > > -- > > Patrick Dowler > > Canadian Astronomy Data Centre > > Victoria, BC, Canada > > > > > > On Mon, 11 Jul 2022 at 08:30, Gregory MANTELET < > > gregory.mantelet at astro.unistra.fr> wrote: > > > > > Hi Mark, DAL, > > > > > > I agree with the addition of this optional column to the TAP_SCHEMA. > > > > > > A little note from the ADQL side though. `size` is a reserved keyword in > > > SQL/ADQL. It would be better to choose another one in order to avoid the > > > annoying wrapping between double quotes. > > > > > > It was the main reason why I chose to call it `row_count` when I added > > > this column in the TAP service of ARI-Gaia. The other reason was that > > > the unit is immediately obvious, on the contrary to the generic keyword > > > `size`. > > > > > > `nrows` seems to be a very nice alternative to me: short, explicit, not > > > reserved and consistent with VODataService. > > > > > > Cheers, > > > Gr?gory M. > > > > > > > > > On 11/07/2022 16:34, Mark Taylor wrote: > > > > Hi DAL, > > > > > > > > since VODataService v1.2 (see sec 3.3), the Table element has had > > > > an optional attribute "nrows" which allows services to declare how > > > > many rows a table has. That is useful information, and TOPCAT > > > > displays it, if known, as part of the table metadata in its TAP window. > > > > > > > > However, there is currently no corresponding standard way to report > > > > this information from TAP_SCHEMA. Topcat sometimes gets TAP service > > > > metadata from the /tables endpoint (VODataService) and sometimes from > > > > TAP_SCHEMA (depending on things like apparent service size); in the > > > former > > > > case it's able to report table sizes, but in the latter case it's not. > > > > > > > > So it would be nice to have a standard way in which TAP services > > > > could report table size in TAP_SCHEMA if they wanted to. > > > > This would just need to be a new optional column with an agreed > > > > name in TAP_SCHEMA.tables. > > > > > > > > In fact some services already do this, but different column names > > > > are in use. ARI-Gaia uses "row_count" and ESA uses "size" > > > > (and also has "size_bytes" for size in bytes). "size" is a > > > > somewhat problematic choice since it's an ADQL reserved word, > > > > it's also not very explicit about what it means. > > > > "row_count" is OK by me, though "nrows" would also be reasonable > > > > for consistency with VODataService. > > > > > > > > Could we agree here on a suitable column name for this? > > > > Next time there's a TAP update it could go in there, > > > > but there's nothing to stop people agreeing on and implementing > > > > best practice in the mean time; since the column would be optional, > > > > and you're allowed to add non-standard columns in TAP_SCHEMA, > > > > it doesn't break anything. > > > > > > > > Mark > > > > > > > > -- > > > > Mark Taylor Astronomical Programmer Physics, Bristol University, UK > > > > m.b.taylor at bristol.ac.uk http://www.star.bristol.ac.uk/~mbt/ > > > > > > > > > > -- > Mark Taylor Astronomical Programmer Physics, Bristol University, UK > m.b.taylor at bristol.ac.uk http://www.star.bristol.ac.uk/~mbt/ > -- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b.taylor at bristol.ac.uk http://www.star.bristol.ac.uk/~mbt/ From francois.bonnarel at astro.unistra.fr Thu Aug 11 13:20:40 2022 From: francois.bonnarel at astro.unistra.fr (BONNAREL FRANCOIS) Date: Thu, 11 Aug 2022 13:20:40 +0200 Subject: UCDs and DataLink In-Reply-To: References: Message-ID: <63d0671f-ef9c-605d-f0a7-9fd9df7e0e22@astro.unistra.fr> Dear Gregory, dear all, Le 09/08/2022 ? 12:13, Dubois-Felsmann, Gregory P. a ?crit?: > On https://github.com/ivoa-std/DataLink/issues/89, I raised, on behalf > of my colleague Russ Allbery who had noticed the point, an issue about > an inconsistency between the prescribed UCD of a column in the > _output_ of "{links}" resources and its use as an _input_ to a SODA > service. > > This came up when we were implementing both of these services in the > Rubin Science Platform. > > Comments on the specific proposal I made in the GitHub issue should > stay there, but in his response, Markus reacted to a more general > point I made in the issue, and suggested that there might be something > for us to discuss on a mailing list, regarding the larger issue of > whether we care about such compatibility issues at all, and what the > purpose of UCDs is. > > As I said on the GitHub issue, from a client software perspective, it > seems plausible that a client should take a service descriptor's word > as law, and not try to second-guess whether a column named by a > service descriptor is UCD-compatible with the service parameter in > which it is intended to be used. It might only be for validators to > comment on such issues. > > But Markus made a stronger point, which I think I can go ahead and > quote, since the issue is public: >> I'd argue against making any requirements on UCDs [in these sorts of >> contexts]. >> UCDs are mainly intended for ad-hoc or discovery use, like: >> >> * a client sees an arbitrary VOTable and wants to get an idea what >> kind of physics is represented in order to suggest plots for it, >> perhaps match it with columns in other tables or similar. >> >> * a client is looking for tables having a particular sort of data in >> the registry. > I'd like to push back a bit on how narrow that field of applicability is. > > In particular, in the context of service descriptors, we are finding, > as we actually implement the DataLink-intensive design of the Rubin > Science Platform (RSP), that client software often needs some hints as > to how to present, in a UI, the "optional" parameters to a service > named in a service descriptor. If a service descriptor represents a > service for which there is an IVOA standard, of course, the client > software UI can be written against the whole of that standard. But if > (as is _very_ frequently the case for us) the service descriptor > points to a custom data service, the UCDs can be useful in providing > rendering hints without our having to hard-code the client software > against the specifics of the custom data services. I would say it depends what you want the UI to take into account . If the quantity represented by the parameter is the only thing to be taken into account then the ucd should be perfectly fine. If something like a role or a format is of interest, we can also use utype and xtype. In a footnote of the SODA spec there is something about relation of the SODA input parameters and ObsCore utypes. ObsCore utypes define the characterization of the datasets ( in the sense of the char datamodel). IN the context of SODA , some input Parameters force the characterization of the cutout dataset. I imagine that for services dealing with spectra, some utypes of the currently revised version of SDM could help. As a matter of conclusion I think your concern can be solved by appropriate definition of a combination of ucd,utype, xtype and unit of the input parameters to describe. Best regards Fran?ois PS : To be clear : this usage of utypes I am talking about is not related to the issue of mapping datamodels to VOTable for which MIVOT is the right solution. IN practice in the IVOA utypes are used to tag roles and a good way to do that is to point them to datamodel leaves when available. this is different from a full mapping as we know. > We _could_ do the latter, but it creates tighter coupling and > fragility in our deployments, and I would also like non-RSP client > software (e.g., TOPCAT) to have a reasonable chance at providing good > UIs for our service descriptors. > > Looking forward to the conversation... > > Gregory Dubois-Felsmann From francois.bonnarel at astro.unistra.fr Thu Aug 11 15:46:37 2022 From: francois.bonnarel at astro.unistra.fr (BONNAREL FRANCOIS) Date: Thu, 11 Aug 2022 15:46:37 +0200 Subject: Test Message-ID: From msdemlei at ari.uni-heidelberg.de Mon Aug 15 10:50:45 2022 From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner) Date: Mon, 15 Aug 2022 10:50:45 +0200 Subject: UCDs and DataLink In-Reply-To: References: Message-ID: <20220815085045.ybgkgebrqyw447jh@victor> Dear Gregory, dear DAL, On Tue, Aug 09, 2022 at 10:13:36AM +0000, Dubois-Felsmann, Gregory P. wrote: > But Markus made a stronger point, which I think I can go ahead and quote, since the issue is public: > > I'd argue against making any requirements on UCDs [in these sorts of contexts]. > > UCDs are mainly intended for ad-hoc or discovery use, like: > > > > * a client sees an arbitrary VOTable and wants to get an idea what > > kind of physics is represented in order to suggest plots for it, > > perhaps match it with columns in other tables or similar. > > > > * a client is looking for tables having a particular sort of data in > > the registry. > > I'd like to push back a bit on how narrow that field of applicability is. > > In particular, in the context of service descriptors, we are > finding, as we actually implement the DataLink-intensive design of > the Rubin Science Platform (RSP), that client software often needs > some hints as to how to present, in a UI, the "optional" parameters > to a service named in a service descriptor. If a service > descriptor represents a service for which there is an IVOA > standard, of course, the client software UI can be written against > the whole of that standard. But if (as is _very_ frequently the > case for us) the service descriptor points to a custom data > service, the UCDs can be useful in providing rendering hints > without our having to hard-code the client software against the > specifics of the custom data services. Well, that is *exactly* the kind of ad-hoc (that is: not governed by a standard on the, if you will, presentation or application layers in ISO/OSI lingo) use I was talking about, and to enable that kind of thing we indeed need to urge data providers to annotate their data with UCDs (and usually advise them as to what good UCDs might be). What I was arguing against is that standards require ("MUST") all of column name, utype, and UCD at the same time, as that has led to a continuous stream of errata while actual clients didn't actually care because they were using either the name (e.g., obscore) or the utype (e.g., SSAP). And rightly so. We should give *one* way to find columns, and only one (my take: column names are just fine). A side benefit of not having exact UCDs as requirements in standards is, by the way, that providers can attach richer semantics to their columns as appropriate. As an example, take SSAP, which says that services can give a column with the utype Target.Redshift. The current spec *forces* that column's UCD to be src.redshift. Now consider an SSA service with Gaia RP/BP (low-resolution) spectra; whatever redshifts you estimate from those probably won't count as spectroscopic, and so to reinforce that point, I might like to have a UCD of src.redshift.phot there. Since SSA has fixed UCDs, my service would (probably[1]) become invalid then -- for no good reason. That's where my proposal comes from: In future standards, let's give advice as to suitable UCDs, but let's not require (and not even recommend ("should"), as that would result in warnings) them. Whether we ought to open up existing standards as we revise them -- well, you'd certainly have my support... -- Markus [1] Though of course, given that that column is tagged optional, it is unclear whether a validator should raise an error if there is a column with utype Target.Redshift but some other UCD; repeating my usual pitch: Let's avoid optional features, and let's be clear on what breaks if a requirement is violated. If it turns out that nothing breaks, then let's drop the requirement. From msdemlei at ari.uni-heidelberg.de Tue Aug 23 08:42:38 2022 From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner) Date: Tue, 23 Aug 2022 08:42:38 +0200 Subject: DataLink local_semantics optional column proposal In-Reply-To: <9d4d5bca-eb2a-47ca-b1f9-cc409e76c77@andromeda.star.bris.ac.uk> References: <9d4d5bca-eb2a-47ca-b1f9-cc409e76c77@andromeda.star.bris.ac.uk> Message-ID: <20220823064238.eiftdkx2q2ratquc@victor> Hi DAL, On Fri, Aug 12, 2022 at 09:54:27AM +0100, Mark Taylor wrote: > So I'd like to see an additional column to facilitate this. > Markus has suggested the following definition for such a column: > > column name: local_semantics > type: text > UCD: meta.id.assoc > description: An identifier that allows clients to associate rows from > different datalink documents on the same service with each other. > [...] > > Any comments? Assuming some agreement or lack of disagreement is > established here about the general idea and specifics, then if > at least one data provider implements this I will add code in > topcat to make use of it. I think it's a good thing to have that, and in my book it's fairly orthogonal to semantics (or any other feature we have in datalink so far). Hence, I've put it into DaCHS (SVN only so far; DaCHS operators: if you want this, let me know and I'll make a beta release), and I've taught a service where this looks reasonable to spit out local_semantics. That's PPAKM31, a service giving narrow-band maps of HII regions in M31 from cubes (reference URL: https://dc.zah.uni-heidelberg.de/browse/ppakm31/q); the datalink essentially links together the maps in the various bands extracted from each cube, and hence the local semantics is just an (opaque) label with the band info. To try this, go to the TAP service at and execute: select accref, imagetitle, cube_id from ppakm31.maps In TOPCAT, you can then configure an activation action "Invoke Service"; the default "View Datalink Table" is fine. When you then activate a row, you will see the local_semantics column already in non-aware TOPCATs. Mark's proposed functionality would then let people say "whatever band I'm actually looking at, I always want the [OIII]5007 image of the field in my extra datalink window". Which, I'd say, is a highly reasonable thing to want. So... you'd totally have my support for adding this to a Datalink-1_2-Next page on the Wiki. -- Markus From msdemlei at ari.uni-heidelberg.de Wed Aug 24 10:54:02 2022 From: msdemlei at ari.uni-heidelberg.de (Markus Demleitner) Date: Wed, 24 Aug 2022 10:54:02 +0200 Subject: ADQL 2.1: Grammar simplification? Message-ID: <20220824085402.qsodqjousqdyd6b6@victor> Dear DAL, I have to admit it is only now that I realise that ADQL 2.1 liberalises ORDER BY and GROUP BY to also accept value expressions. That's going beyond SQL92, which has, as ADQL 2.0, ::= | In ADQL 2.1, we now say: ::= | | I'm not *really* objecting to these changes, in particular because it's not a problem for DaCHS with its postgres backend (and DaCHS will support this from version 2.6.2 onwards). But since it's a fairly profound change beyond SQL92 that *may* give people a headache on simpler SQL engines I figured it should be mentioned on the mailing list at least once. Meanwhile, I do not understand why the ADQL 2.1 rule has both *and* as alternatives. Perhaps I'm missing something, but since ? ? ? ? ? ? ? expands to as is. Was the explicit inclusion of done on purpose? If not, can we drop the explicit to avoid future confusion? A similar consideration applies to ::= | In this case, by the way, I'm particularly hesitant about endorsing the change. You see, anything you do not group by can only be in the select list via aggregate functions, and thus allowing -s here raises the question of expression equivalence big time. For instance, if this rule has to have any sense at all, then select nrows+1 from tap_schema.tables group by nrows+1 would be a valid statement (it is in postgres). Mathematically, select 1+nrows from tap_schema.tables group by nrows+1 would plausibly be the same thing (whenever + is a commutative operator). Should engines realise that? Postgres, for one, does not. If they should realise that, would they also have to work out that select nrows*nrows+2*nrows*table_index+table_index*table_index from tap_schema.tables group by power(nrows+table_index, 2) is fine, too? All that is so subtle, and you can easily avoid having to group by expressions at all -- in the example, you'd just say select modrows from (select 1+nrows as modrows from tap_schema.tables) as q group by modrows -- that I'd suggest reverting the GROUP BY grammar to what it was in ADQL 2.1 unless someone remembers a strong reason for why it was changed in the first place. -- Markus