Microsoft buys GitHub

Petr Skoda skoda at sunstel.asu.cas.cz
Tue Jun 5 16:05:02 CEST 2018


On Tue, 5 Jun 2018, Dave Morris wrote:

> Ok, I'll admit it. Markus Demleitner was right, GitHub might not be the safe 
> place I thought it was.
>

Hi Dave and all IVOA colleagues,

I was shocked to reading this on google in the morning  just 10 min 
before your email arrived !

But as I was eagerly  following the IVOA interop reading slides almost in 
the real time, I think the answer why the MS has done this is in a nice 
slides of Giuliano Taffoni in KDDIG - and IMHO it is the
HPDA - High performance data analysis. Also the example of Kai Polsterer
about ML-based explorative web service together with most presentations 
from GWS II gives it a real sense.

Not only the science needs the exlorative big-data scalable platform , 
data to processing, based on visual analysis, similarity search (query by 
example ..)

Also the Jupyter delivered on cloud in docker containers is exactly very 
trendy today, but already has been showing its bottlenecks (you may 
remember our VO-CLOUD which started as a bachelor and later master thesis 
of my student Jakub Koza - which is exactly doing the same what Gerard's 
was showing - SciServer - for every call creates the docker and runs 
Jupyter on my "big" server - including HDFS and Spark - and BTW we were 
with Jakub among the first who noticed the need of Apache Avro for packing 
the small fits files from LAMOST for Spark processing -  the slides 
about ZTF alerts by Matthew clearly show that it is the right way for 
large-scale ML (namely on Spark) - due to the limits of HDFS big overhead.

But all af this fancy technologies are mostly on a GitHub - it easy to 
take it but anyway its very complicated to install the stuff and put it in 
operation (e.g. our VO-CLOUD has, servers, workers, needs database, 
libraries, jupyterHub, authentication - when you upgrade the linux version 
it always brokes .....

And now imagine that you will be able (after paying substatial  amount to 
MS pockets ) after few clicks deploy a complex platform with the fresh 
(on GitHub) published deep learning algorithm on a thousands of cores + 
multi GPUs in a cloud interfacing this with DB and web servers with 
properly configured ports and after few mins you have a new site 
delivering on-demand customized ML infrastructure that you may re-sell.

As you remember, I am involved in COST action where the Geo and remote 
sensing is integrated with astronomy and (deep) ML. And we have seen 
examples of really big money waiting for those who are able to deploy 
customized ML platform for processing of remote sensing hyperspectral 
frames . Lot of start-ups are already rising for delivering timely warning 
about diseases, pests, funghi showing on fields, wineyards etc ...

The key issue is the TARGETED - e.g. the farmer will order alerts for a 
few bucks when a particular things happens on his field. And he will 
recieve on mobile phone the current image of his field/wineyard with 
highlighted region and a classification of the changes). Or he gets 
recommendation for the right time when and where to deposit fertilizer 
(and how much) . And the targeted agriculture is just one of endless 
application of customized classifier ...

So building the customized ML websites in cloud might become a new kind of 
small-company bussines ...

I do not know if this is what MS had in head but sure they'll soon 
recognize such potential. So having results of all fresh research in ML 
(in a open source world) on company servers ready for immeadiate deploying 
is worth of 7.5B$

What to do as a community ?

Currently the Zenodo seems to be the stable non-profit rock having for 
some years interface with GitHUb  - so probably a lot of GitHub contents 
is already on Zenodo (there are features allowing to ingest automatically 
after GitHub updates) . It might be feasible to build a free repository of 
code on Zenodo  ..

If Jonathan Fay reads this - can you comment on the transaction (if you 
are allowed to ;-) ?

Best regards,

Petr

*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute CAS         Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                             skoda at asu.cas.cz          *
*************************************************************************


More information about the interop mailing list