Microsoft buys GitHub
Petr Skoda
skoda at sunstel.asu.cas.cz
Tue Jun 5 16:05:02 CEST 2018
On Tue, 5 Jun 2018, Dave Morris wrote:
> Ok, I'll admit it. Markus Demleitner was right, GitHub might not be the safe
> place I thought it was.
>
Hi Dave and all IVOA colleagues,
I was shocked to reading this on google in the morning just 10 min
before your email arrived !
But as I was eagerly following the IVOA interop reading slides almost in
the real time, I think the answer why the MS has done this is in a nice
slides of Giuliano Taffoni in KDDIG - and IMHO it is the
HPDA - High performance data analysis. Also the example of Kai Polsterer
about ML-based explorative web service together with most presentations
from GWS II gives it a real sense.
Not only the science needs the exlorative big-data scalable platform ,
data to processing, based on visual analysis, similarity search (query by
example ..)
Also the Jupyter delivered on cloud in docker containers is exactly very
trendy today, but already has been showing its bottlenecks (you may
remember our VO-CLOUD which started as a bachelor and later master thesis
of my student Jakub Koza - which is exactly doing the same what Gerard's
was showing - SciServer - for every call creates the docker and runs
Jupyter on my "big" server - including HDFS and Spark - and BTW we were
with Jakub among the first who noticed the need of Apache Avro for packing
the small fits files from LAMOST for Spark processing - the slides
about ZTF alerts by Matthew clearly show that it is the right way for
large-scale ML (namely on Spark) - due to the limits of HDFS big overhead.
But all af this fancy technologies are mostly on a GitHub - it easy to
take it but anyway its very complicated to install the stuff and put it in
operation (e.g. our VO-CLOUD has, servers, workers, needs database,
libraries, jupyterHub, authentication - when you upgrade the linux version
it always brokes .....
And now imagine that you will be able (after paying substatial amount to
MS pockets ) after few clicks deploy a complex platform with the fresh
(on GitHub) published deep learning algorithm on a thousands of cores +
multi GPUs in a cloud interfacing this with DB and web servers with
properly configured ports and after few mins you have a new site
delivering on-demand customized ML infrastructure that you may re-sell.
As you remember, I am involved in COST action where the Geo and remote
sensing is integrated with astronomy and (deep) ML. And we have seen
examples of really big money waiting for those who are able to deploy
customized ML platform for processing of remote sensing hyperspectral
frames . Lot of start-ups are already rising for delivering timely warning
about diseases, pests, funghi showing on fields, wineyards etc ...
The key issue is the TARGETED - e.g. the farmer will order alerts for a
few bucks when a particular things happens on his field. And he will
recieve on mobile phone the current image of his field/wineyard with
highlighted region and a classification of the changes). Or he gets
recommendation for the right time when and where to deposit fertilizer
(and how much) . And the targeted agriculture is just one of endless
application of customized classifier ...
So building the customized ML websites in cloud might become a new kind of
small-company bussines ...
I do not know if this is what MS had in head but sure they'll soon
recognize such potential. So having results of all fresh research in ML
(in a open source world) on company servers ready for immeadiate deploying
is worth of 7.5B$
What to do as a community ?
Currently the Zenodo seems to be the stable non-profit rock having for
some years interface with GitHUb - so probably a lot of GitHub contents
is already on Zenodo (there are features allowing to ingest automatically
after GitHub updates) . It might be feasible to build a free repository of
code on Zenodo ..
If Jonathan Fay reads this - can you comment on the transaction (if you
are allowed to ;-) ?
Best regards,
Petr
*************************************************************************
* Petr Skoda Phone : +420-323-649201, ext. 361 *
* Stellar Department +420-323-620361 *
* Astronomical Institute CAS Fax : +420-323-620250 *
* 251 65 Ondrejov e-mail: skoda at sunstel.asu.cas.cz *
* Czech Republic skoda at asu.cas.cz *
*************************************************************************
More information about the interop
mailing list