[SEMANTICS] Simulation Categories and Standard Vocab words

Laurie Shaw lds at ast.cam.ac.uk
Fri Jul 21 11:47:41 PDT 2006


Dear theory group,

I would just like to make a few comments (rather belatedly) on
the Semantics discussion, especially the emails by Frank, Herve and
Miguel. I've listed my points by topic.

UCDs vs Standard Vocabulary
----

It seems that there is a little confusion on whether we are trying to
define new UCDs or words for the Standard Vocabulary. Initially, on the
twiki page
(http://www.ivoa.net/twiki/bin/view/IVOA/TheorySemanticVocabulary) we
outlined lists of words under various categories (e.g. physical process,
subject, algorithm, etc) describing various different types of simulation
properties. In his email, Frank Le Petit then defined a specific set of
categories that must be filled in in order to describe the purpose and
operation of a piece of simulation code (to be published in the VO)
adequately.

As far as I understand it, we will require there to be a UCD describing
each of the categories that we eventually come up with. However, the words
that we use to populate a category will come from the IVOA standard
vocabulary (a vocabulary that encompasses the set of words that can be
used as UCDs, plus all standard terms that are currently in use in
astronomy (in UCD syntax)).

So, eventually, once we have decided on a set of categories, we will need
to propose a UCD for each category, plus words for the standard vocabulary
that will be used within these categories that we identify as being
necessary in order to ensure that most astrophysical simulation codes can
be adequately represented.


Simulation Categories
----

The set of simulation categories that must be filled in order to fully
describe a simulation that were proposed by Franck are as follows:

1 - Name of the code
2 - Name of the developer / team / contact
3 - Version of the code
4 - Description of the code (ASCII text)
5 - Physical processes
6 - Subject
7 - Algorithm
8 - Time evolution
9 - Type of results
10 - Results format

So far, it seems like no-one has any problems with the first three, each
of which I think there already exists a UCD (meta.id, meta.curation,
meta.version, meta.note). I don.t think that we need define any new
standard vocabulary (SV) words for these as they are all fairly specific
to each simulation, although, it could be argued that the .Name. be part
of the SV as simulated datasets may point to this tag.??

The next four categories on the hand seem to be the most important (at
least in terms of defining new SV words) as they entirely describe what
the code is trying to do and how it does it.

Taking the .Subject. category first, which describes the astrophysical
objects that the code is primarily dealing with, looking at the IVOA
standard vocabulary as it is now
(http://www.ivoa.net/internal/IVOA/IvoaUCD/VO-standard-vocabulary_8.pdf)
it seems like many of the objects are already roughly accounted for (I.ve
only had a quick glance at this though). The only things that I can see
that are not there are Dark matter halo (and subhalo) and volume of space.
Note that we can still use the UCD syntax and structure for SV words, so
.stellar cluster. is .stars.globular.cluster. or .stars.cluster..

The .physical process. and .algorithm. categories are by far the most
complicated due to the sheer range of physical processes that are
modelled, in how much detail (approximations or exactly), their relative
impact on the results (does a process, e.g. stellar feedback, have a major
or minor effect on whatever is being simulated) and how it is
incorporated.

It seems to me that .physical process. could mean two things here .

1)	the overall phenomenon that is being simulated (e.g. galaxy
*formation*, stellar *evolution*)
2)	the physics that is accounted for in doing so, (e.g. radiative
transfer, GR, etc)

We could either go the way of having two separate categories, one for
.process. and one for .physics., or just keep .physical process. and have
multiple entries that together define what is going on, and the physics
that is making it happen. It should be noted here that there is already a
.process. field in the SV (e.g. procees.accretion, process.emission, etc),
so I guess we have to propose those that are missing, e.g.
process.radtransfer, process.evolution).

So for stellar evolution we might have, .stars. in the Subject  category
and .process.evolution. and .process.radTransfer. in the Physical
Processes category (or, alternatively, have the last two in separate
Process and Physics categories).

I.m wondering whether this approach gets past the Process or Subject?
category problem for .stellar evolution. and .stellar population
synthesis. that Herve pointed out.



Algorithm
----

As others have pointed out, for this category I think we have to be
careful not to be  too specific else there will end up being millions of
words in the SV list for .comp.algorithm. (or whatever it ends up being
called). I think that the words required for this category should only
refer to the top-most level algorithm. Looking at the list on the twiki,
it seems that some of the words suggested are almost code Names or
Physical Processes -- I don.t think .collisionless. or .Fuel consumption
theorem. are algorithms (although I could be wrong). Furthermore, tpm,
pppm, pm, pp, etc, are all types of Nbody code, so the word for tpm might
be .comp.alg.Nbody.tpm., whilst I.m guessing .adaptive refinement mesh.
would be .comp.alg.mesh.adaptive-refinement. , or something like that.

I.m thinking that in the near term most of the entries for algorithm will
be under .mesh., .hyrdo (including sph). or .nbody. with a few extras. If
we decide to get more specific than this, then the number of words we.ll
require to describe different simulations to the same degree with increase
exponentially. If someone was to need more detail, they could always look
at the .Description. category, or even a paper that that points to, having
before rounded down the search to tree-sph, etc.

Time Evolution
--

Do we really actually need this? In terms of methods like .leap frog.,
etc, it seems like this is more for the Algorithm category (and a very
specific detail at that). I would have though that the temporal
resolution, included under some kind of Parameters category for individual
simulated datasets might be more relevant. I agree with Franck that this
should at most be a YES or NO flag with regards to whether the code is
time dependant or not.


Type of Results and Results Format
---

I totally agree with Franck.s suggestion here..


Code language and .parallelism.
---

I also propose a new category (Language?) for the language (c++, fortran,
etc) in which the code is written and whether it is designed for use on
multi-processor machines or clusters. I guess these could also be under
two separate categories. Parallism could be either by protocol (mpi,
openMP) or by a flag (yes/no).


Single or Multiple Entries in each Category
----

I think that we should not place limits on any of the categories (except
for maybe Version and Time Evolution) to just one entry.  Could cause
problems down the road as simulations get more and more powerful in size
and especially scope.


Would be great to hear peoples thoughts on these topics, and to make some
decisions so that I can write a proposal that we agree on!

Sorry for the long email,

Cheers,

Laurie



More information about the theory mailing list