<div dir="ltr"><div class="gmail_default" style="font-size:small">** use cases</div><div class="gmail_default" style="font-size:small"></div><div class="gmail_default" style="font-size:small">The primary use case for our youcat service is for projects to publish astronomical catalogues they create and curate. To that end, the tables are added to the tap_schema and visible in the /tables endpoint. There is access control that the users/projects control so they can control who can create tables, who can insert rows, and who can query (all using external GMS service). The general usage pattern is for tables to be protected (only the group can see/query them) until the project publishes a paper, at which point they would make the table publicly queryable.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">We do not (yet) put table metadata into the registry so I haven&#39;t thought that bit through, but probably only public tables should go there and I&#39;d probably make it an additional manual step to &quot;publish&quot; (to registry) and not just have it triggered by a project admin changing a table to public (and then back again a day later). <br></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">If you look at the details of the bulk loading,  you see that it is a streaming operation that directly inserts rows into the database. There&#39;s a lot to go wrong there, both  transient network failures, an input row rejected because of invalid values or duplicate key, etc. By streaming input directly into the tables, the client has the ability to look at direct error messaging from the attempt to insert and can immediately query to see the last row that was successful in order to resume. Any async process is going to make that much harder, and very hard to standardise so clients could automatically recover from content failures. <span class="gmail_default" style="font-size:small">It&#39;s hard to push 

500e6 rows into a database table without failures, but that&#39;s what 

youcat users do and with the ability to diagnose and resume they can 

eventually succeed. </span></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Our secondary use case is at the other extreme: the DRAO Solar Flux Monitor (not yet public/operational). This is a set of instruments that record and persist a handful of measurements a few times each day. The process is to add a few rows each day, so it is still &quot;append rows to table&quot; but at a very small scale and never finishes. This use case is also very nicely satisfied by our current implementation, allows the client to immediately detect failures and retry, if they are feeling extra cautious, query for recently ingested measurements to very success.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">-- Definitely interested in more use cases for user-generated database content...<br></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">** about vospace<br></div><div>Both the error handling/failure/resume of real bulk loading and the<span class="gmail_default" style="font-size:small"> trickle of measurements from sensors benefit from a synchronous direct-to-database approach that can be immediately queried via the TAP API. We do have a complete vospace service (vault) that could accept/stage catalogue content and we did look at those heady ideas but it is at least as complex or maybe more so. That&#39;s the primary road block for the &quot;vospace&quot; ideas and as far as I am aware, no one has ever made it work. We stopped thinking about that approach during the design phase when the list of &quot;vospace magic&quot; things that had to happen and the opaqueness of such a system grew too large in comparison with the vosi-tables + bulk load approach.<br></span></div><div><span class="gmail_default" style="font-size:small"><br></span></div><div><span class="gmail_default" style="font-size:small">I&#39;ve kind of skipped the whole topic of indices, but we do have an async (uws) job endpoint to run create index commands either before or after bulk loading. We recommend people create a unique key index (pk) before loading and other indices after for the typical bulk loaded catalogue use case. So we are re-using existing APIs where they are applicable, but this part was looking much more complicated as &quot;vospace magic&quot;.<br></span></div><div><span class="gmail_default" style="font-size:small"><br></span></div><div><span class="gmail_default" style="font-size:small">The final thing about &quot;vospace magic&quot; is that for someone who is into TAP and catalogues, requiring a vospace implementation in order to get user content into a tap service is a big ask. First, it&#39;s an implementation/deployment/operational burden to require a vospace someone might not otherwise want to offer; that&#39;s a big barrier to adoption. Second, you need to either (i) have your vospace service connecting to your tap database or (ii) some external agent has access to both vospace content and the tap database, which has big red bad/monolithic architecture flags all over it; that&#39;s obviously a personal opinion, but I see a lot of tight-coupling between two services that are already individually complicated to operate and that&#39;s something I want to avoid. We also thought a little about simply repurposing some parts of the vospace api rather than having a complete vospace for this, but it just didn&#39;t seem to buy very much here even where the concepts are the same.<br></span></div><div><span class="gmail_default" style="font-size:small"><br></span></div><div><div style="font-size:small" class="gmail_default">-- Would like to stop hearing about how someone once thought vospace could do this :-) unless of course someone wants to show a working service and explain how they made it work...<br></div><br></div><div><br></div><div>--<br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div>Patrick Dowler<br></div>Canadian Astronomy Data Centre<br></div>Victoria, BC, Canada<br></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 21 Mar 2022 at 09:51, Dave Morris &lt;<a href="mailto:dave.morris@metagrid.co.uk">dave.morris@metagrid.co.uk</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

This is indeed one of the use cases that we had in mind for VOSpace.<br>

<br>

A section of space in a VOSpace service where the directory structure <br>

maps to the catalog/schema/table hierarchy of a writable database.<br>

<br>

Creating a &#39;file&#39; called &#39;mytable&#39; in &#39;mycatalog/myschema&#39; would create <br>

a new table.<br>

<br>

All of the object construction and access control rules map fairly well <br>

onto a virtual directory structure and from a user&#39;s perspective it can <br>

be made really simple.<br>

<br>

To create a new database table, just drag a VOTable file from my desktop <br>

into &#39;mycatalog/myschema&#39;, and the service takes care of the rest.<br>

<br>

As a side effect, you get all of the 3rd party asynchronous transfer <br>

capabilities needed to transfer a multi-Tbyte result set from one <br>

service to another.<br>

<br>

Cheers,<br>

-- Dave<br>

<br>

--------<br>

Dave Morris<br>

Research Software Engineer<br>

Wide Field Astronomy Unit<br>

Institute for Astronomy<br>

University of Edinburgh<br>

--------<br>

<br>

On 2022-03-17 07:22, Markus Demleitner wrote:<br>

&gt; <br>

&gt; The thing that worries me a bit about the current proposal is that<br>

&gt; the operations *are* fairly similar to what we offer in VOSpace, and<br>

&gt; if we have two rather different APIs for what&#39;s straightforwardly<br>

&gt; subsumed as remote data management, I think we should have strong<br>

&gt; reasons.<br>

&gt; <br>

&gt; Have you considered employing VOSpace for this?  If so, why did you<br>

&gt; discard it?  Could it perhaps be fixed to work for this use case?<br>

&gt; <br>

</blockquote></div>