Apps Messaging - Semantics of a Message

Fri Apr 6 16:41:01 PDT 2007

Hi Mike,

On 4/6/07, Mike Fitzpatrick <fitz at tucana.tuc.noao.edu> wrote:
>
> Hi All,
>
>         The thread has died down a bit and while I think we've agreed
> there
> needs to be some sort of "Hub" process (this is not uncommon in other
> messaging systems) and made some good progress on  how an app might
> connect
> to a messaging system, there is much left to be discussed about "What is a
> Message" and "What does it mean".  Keeping in mind our desire to separate
> the message spec from any particular implementation or transport protocol,
> I offer the following fodder for discussion.

We haven't touched on the messages themselves at all yet, so thanks for
kicking this off.  There's a lot of work and some great stuff in here.
In the following I've used your new "mtype" term interchangeably with
"message ivorn" (our equivalent concept in Plastic), and also occasionally
with "message", which is a bit sloppy of me.

[]
>

>   Note that the 'Hub' and transport are not
> discussed, I'd like to focus on the semantics and purpose of the messages
> for the moment.

Although we want to keep them separate, they inevitably affect each other,
so I've just fired off some thoughts on a first cut at the hub protocol.

        So, let the debate begin and thanks for your feedback....
>
>
> -Mike
>
>
>
> Message Concepts
> ----------------
>
>         A MESSAGE is an abstract container for the information we wish
> to send to another application.  Some attributes of a message are required
> to ensure proper delivery and handling, but there are also optional
> attributes that may only be used in a specific context (e.g. to return a
> result/response rather than from an original request for some action).
> Attributes such as the type or ID of message will be required to be
> delivered to all recipients, however an attribute such as the sender of a
> message may be optional and depend on the implementation.

Could you explain a little what you mean by implementation here?  Do you
mean the implementation of a hub?

        MESSAGEs can be described generally as being a NOTIFY, a REPLY,
> or a REQUEST message.  NOTIFY messages are purely informational and
> require
> no response or confirmation of delivery.  A REQUEST message is one that
> asks another application to perform some action; in this case the
> application SHOULD reply explicitly with a status message indicating
> whether the request was completed, however it is not an error if it does
> not.  [Note: Applications may set a property in the Hub to request
> confirmation of delivery of every message and so are free to decide for
> themselves whether a missing reply is to be considered an error.] A REPLY
> message is one sent to return a status code or other result in response to
> an originating message; REPLY messages are tagged with the same message ID
> as the original and are returned to the sender of the originating message
> as well as apps that wish to monitor the message traffic.

The NOTIFY/REPLY/REQUEST distinction is useful for us as mtype authors to
understand new mtypes that we're trying to write, and learn (patterns?) from
existing mtypes.  Do you see this as something that machines will also need
to know?  e.g. a message has a parameter giving its type?  Surely if an
application understands a message, it will know whether or not a response is
needed, and what form that response will take?

        More specifically, a MESSAGE contains an 'mtype' attribute that
> defines the semantic meaning of the message.  The concept behind the
> 'mtype' (described in more detail below) is based loosely on the use of
> UCDs in that a (small) controlled vocabulary is sufficient to describe the
> majority of concepts needed in applications messaging.  The mtype is made
> up of 'atoms' to construct 'words' that are not only meaningful to the
> developer, but allow applications to easily match regular expressions by
> using wildcard characters to filter messages that may or may not match a
> specific capability in the application.  For example,  an mtype expression
> of "display.*" might imply some sort of image display capability where
> 'display.URL' means an app can specifically download/display and image
> from
> a URL.  A task may wish to connect by advertising interest in the more
> general display messages, but is free to reject specific messages it
> cannot
> handle.

I like this idea.  We had considered using pattern matching on message
ivorns in PLASTIC, but didn't pursue it.  One area where this way of
constructing mtypes could be really useful is a problem that we never really
solved in PLASTIC.  If you want to tell an application to load a spectrum
then it's not clear how to define the message.  Do you do it by data
format?  So your application supports the messages: loadVOTable,
loadFITS....etc.  All very well, but then what if someone sends you a
VOTable that isn't a spectrum?  Or do you do it by the semantic meaning of
the data?  So you have a loadSpectrum message, that takes an argument
describing the format.  Then what happens if you receive a format you don't
understand?  Or do you do both: loadSpectrumVOTable, loadSpectrumFITS.....?
Your mtypes might deal with this quite well.
Even if we didn't exploit the pattern matching aspect, it still offers a
consistent and logical way that people can build up new messages.

        Because we wish to loosly couple the capabilities one application
> is searching for from the details of what another may provide, we don't
> create a rigorous definition of a message and its behavior in an
> application.

Absolutely.

> Instead, the mtype is meant to create a "rough concept" of a
> message such as "display an image";

I'd go further.  As Alasdair has already mentioned, we've started to move
towards an even more general model where the sender just says "here is an
image: do something with it"...or "here's a couple of columns from that
table I sent you earlier".  In the latter case, Topcat might create a new
view on the table just showing those columns.  In response to the same
message, AstroWeka might run a kmeans algorithm on the two attributes
comprising those columns.  This sounds quite confusing: how is the user
supposed to know what will happen?   Well, we found a solution that seems
quite promising...it's quite involved to explain it so I've bracketted it
with ===========, so that you can skip past it if you wish.
=============================
We came up with the idea that a receiving application can annotate an mtype
with an arbitrary string describing what it would do on receipt of such a
message.  This annotation would be used by the _sending_ application to
inform the user what will happen if he sends that message.   In Plastic we
do it with URI fragments appended to the message ivorn (mtype in your
notation).  Here's a trivial example: consider the mtype we use to load a
VOTable: ivo://votech.org/loadVOTable

On receipt, application A might display the table, while application B might
convert it to a FITS table.  In that case, application A would declare that
it supported the mtype:
ivo://votech.org/loadVOTable#display
while B would declare that it supported
ivo://votech.org/loadVOTable#convertToFITS

Both mtypes are completely equivalent to the vanilla
ivo://votech.org/loadVOTable - they get sent with exactly the same arguments
(namely a VOTable), and the application that sends the message will have no
understanding of what the fragment means.  It will just use it to display to
the user.  For example it might populate a menu thus:
send table to -> Application A: display
                    -> Application B: convertToFITS

In practice, other tricks are used to make the menu more user-friendly
(supplying a human-readable name, description and icon for the action).  If
you want to see it in action try starting two instances of
http://plastic.sourceforge.net/tupperware/tabview/tabview.jnlp
along with Topcat
http://www.star.bris.ac.uk/~mbt/topcat/topcat-full.jnlp

Use Topcat's PLASTIC hub (see interop menu), and use it to send a VOTable to
the TabViews.  (e.g. use Topcat's built in example data).  Then in one of
the TabViews send the VOTable to the other (using the PLASTIC menu).  You'll
see that you have two (trivial) options for sending the table: create a new
tab in the recipient TabView, or replace an existing one.  This is because
TabView has declared that it supports both:
ivo://votech.org/loadVOTable#createNew
ivo://votech.org/loadVOTable#overwrite

=============================

The use of a specific mtype might also
> suggest the parameters required (e.g. a filename or URL) to form a valid
> request and so sending apps will have some hope that another listening app
> can do something sensible without knowing the detailed capabilities of the
> receiving application.  As a part of the specification an mtype such as
> 'display.url' may require that a URL be the only argument and expect no
> return value, however by sending a message to a *named* application we
> assume the sender wishes to use that app for some specific reason and may
> know the details of how a particular message is implemented, and so it is
> free to use an argument list peculiar to the target app and perhaps expect
> a reference of some kind to the downloaded image for later processing.

I think that for many messages we'll want to send a set of mandatory
parameters, plus a bunch of optional ones.  These optional ones could indeed
be application-specific, though even then you might have optional parameters
that are applicable to a certain class of applications (e.g. all
visualization apps might be expected to understand "color").  What I'm
saying is that we might sometimes want to write down and standardize
optional parameters.

Likewise, any two client apps are still free to use a mutually agreed
> private set of messages outside of the mtype vocabulary and exchange them
> using the same underlying system.  If we were to tag the message as
> something like "generic;display.URL' an app MAY choose to provide only
> the functionality implied by the spec for that message (e.g. not save a
> result reference), otherwise it may always reply with a reference that
> a sender might ignore.

Like Alasdair, I feel this is essential.  I see our role as being the means
through which the community can standardize and record commonly-used
messages, not as a rigidly controlling body.  If someone wants to go off and
build a better loadTable mtype, then they're welcome to.  They'll probably
get better interop if they work with us though.
Not only can we work to create the mtypes to satisfy the current common use
cases, we can also (hopefully) put the experience to use in suggesting (not
mandating) good patterns and best practice.

Message Attributes
> ------------------
>
>         The list of message attributes is intentionally small to maintain
> simplicity.  We will assume that the attribute names listed here are
> retained when a message is serialized or queried.  These include:

In my recent post on the the messaging protocol, I mentioned that we need to
decide which parameters need to be part of the message, and which are part
of the messaging protocol.  My _feeling_ is that the mtype and the args[]
list are definitely part of the message while the sender/receiver
information fall under the remit of the messaging protocol.

    sender      The <DEFANGED_appName> of sender application.  The Hub will
> supply
>                 this attribute to the receiving application.  The
>                 implementation should not require that a "sender ID" be
>                 present in a message request since this allows spoofing
>                 of the message and the application IDs are more properly
>                 suited as being an attribute in the HUb than in an app
>                 knowing what the Hub assigns as some ID.  Just as there
>                 may be multiple identical instances of a Recipient, a
>                 Sender should be likewise free from a specific instance
>                 defined by a Hub ID.

Sorry Mike - I don't quite understand this.

    recipient   The recipient of the message.  This is a String value
>                 supplied by a sending applicaton that may be one of the
>                 reserved words:
>
>                 Hub         Message is intended for the Hub only and
>                             will not be forwarded to other apps.  May
>                             only be used with SET and GET class messages.
>                             (Message classes are discussed below).

This could also be achieved by the hub having its own Id and being treated
like any other app.  If the hub always assigns itself the Id "Hub", I guess
it comes to the same thing.  Having a reserved word removes the need for a
"getHubId" operation in the messaging protocol, so I'm happy either way.

                Any         Sender wishes to broadcast to all clients
>                             currently connected.  These are typically
>                             used only with STATUS and EVENT class msgs.
>                             Applications that cannot, or choose not to,
>                             handle the message is free to ignore it
> without
>                             posting a reply notification.

Again, this could also be achieved by having a distinct broadcast operation
in the protocol, rather than using a reserved word. I would argue that if an
application doesn't declare that it can handle the message, then the hub
shouldn't forward it to it, even in broadcast mode.

                Additionally, the recipient may be one of:
>
>                 <DEFANGED_appName>   Indicating the message should be sent
> to all
>                             clients with this <DEFANGED_appName>.

We need to decide whether apps are identified by a session-unique,
hub-assigned key, or a (possibly non-unique) application-assigned name.  I'd
argue for the former as less likely to get us into trouble.  I do see the
benefits of your worker1 & worker2 use case below though.

                <pattern>   Indicating the message should be sent to all
>                             clients that have registered a capability to
>                             handle messages matching the specified
>                             pattern.  The use of '*' as a <pattern> is
>                             a special case of the reserved word 'Any'

Perhaps I'm missing something here, but isn't this implicit in the mtype of
the message being sent?  That is, if an application has registered its
interest in mtypes foo.*.bar it will receive messages foo.donkey.bar,
foo.duck.bar etc.

                In these last two cases, the Hub will first attempt to
>                 match based on the <DEFANGED_appName> before
> <pattern>.  Wildcards
>                 are permitted in the <DEFANGED_appName> to allow sending
> to a subset
>                 of apps (e.g. sending to "worker*" will deliver the
> message
>                 to identical instances of both the 'worker1' and 'worker2'
>                 apps as well as multiple instances of an app that
> connected
>                 simply using the name 'worker').
>
>     msgid       A unique id assigned to each message by the Hub and
> included
>                 as a message attribute for the RECEIVEd message.  This
>                 value is returned to the sender as a response to the SEND
>                 method, the msgid will remain the same in a REPLY method
>                 message so the sender can identify the originating
> message.
>
>     mtype       A UCD-like string indicating the request or message type.
>                 The semantics and syntax of the mtype are described in
> more
>                 detail below.
>
>     refID       An ID assigned by an app to be used as a reference to a
>                 result object in future messages it may receive.  In some
>                 cases this may be an opaque handle to something like "the
>                 image you loaded in the display", in other cases it can be
>                 de-referenced to an actual file/image that was created
>                 (e.g. the output of a "Save Image" message).  Applications
>                 may choose to not return a refID if it is a purely
>                 transient result (e.g. a plot of an image histogram where
>                 the data for the plot is not saved) or cannot meaningfully
>                 be referenced by a later message (e.g. a subset of table
>                 rows that may be highlighted but cannot be addressed as a
>                 new table).

A refID is essential in some cases, but I think it's mtype-specific.  Thus,
the mtype describing the loading of a table will need it, as you might want
to refer to the table again later (when selecting rows, for example).  But
it's not needed by an "application has registered " NOTIFY mtype, for
example.  As such, I think this is best bundled in with the other
arguments.  Nevertheless, the mtypes we define for loading tables, images
etc...will all have a refID and thus serve as patterns for other people who
define something similar.

    arguments   A whitespace-delimited string specifying the arguments of
>                 the message.  In the case of a REPLY message this will be
>                 either a string containing the response value, or in the
>                 case of a request that does not return a value, one of the
>                 reserved strings:
>
>                 OK          Request was executed w/out error
>                 ERR         Request encountered an error
>                 REJECTED    Recipient refused to process the request
>                 DELIVERED   Recipient received the message but did not
> reply

Again, I see this more as a good pattern, rather than mandatory.  "If you
define a message that could be categorized as a REPLY, then we've found this
particular pattern to be effective..."
As for the argument list in general, I'd prefer not to tie it down to being
a white-space delimited string just yet.  In Plastic we've allowed arguments
to be a subset of the XML-RPC types, which is quite rich and includes such
useful things as structs and arrays.  That said, it might be a little too
general, as we do need to make sure that it can be mapped losslessly to and
from our wire protocols, and too rich a set of types would restrict our
choices.  Furthermore, we have experienced some problems with dynamic
languages such as Perl having trouble dealing with the strong typing implied
by the XML-RPC types.

Message Representation
> ----------------------
>
>         A MESSAGE can be represented in a number of ways suitable to the
> needs of the transport protocol and/or what is simplest for the sending or
> receiving client.

Yup.

 For example, a simple string of keyword=value pairs
> listing the attributes of the message, a serialized XML document, as an
> RPC
> call based on the message type and arguments, or as parameters for an HTTP
> GET request.  Attributes not explicitly present in a message are assumed
> to
> be undefined, required attributes may not be blank and an application
> should trigger and error if they are.
>         Note that no requirement is made that ALL attributes of a MESSAGE
> are presented to the recipient.  Just as a sender may not care who gets
> the message, a recipient may not care who sent it -- however provisions
> should be made for clients to get this information if they choose.  This
> can be done e.g. by the recipient querying the Hub for a message attribute
> based on its message ID, or by setting a property in the Hub that requests
> messages be delivered with this extra information.

Why not just send the attributes anyway and the recipient can ignore them?
Simpler than adding an operation for the recipient to get the attribute
after the message has been delivered, or do you have some other use in mind
that I don't see?

Message Delivery
> ----------------------
>
>         All messages are sent asynchronously from the Sender's
> perspective.
> The method that a client uses to send a message MUST return the msgID
> assigned by the Hub for that message.  Because a REPLY message is tagged
> with the originating msgID, a client that needs synchronous behavior can
> simply block until it receives a reply message with the same id.  This
> also
> allows for multiple recipients to reply individually without the sender
> knowing how many potential recipients are available and provides a
> "broadcast" behavior even in cases where a recipient app is named (i.e.
> because multiple instances of that app could be connected).
>
>         In this model, clients can determine for themselves how many other
> apps are available, what their capabilities are, etc.  This differs
> slightly from the PLASTIC model where messages are either synchronous
> (blocking the client or the Hub waiting for replies), or asynchronous (and
> thereby losing all replies) based on the delivery method invoked.  A Hub
> may still need to maintain information about attached clients in order to
> determine the receipients of a message, but the messages being exchanged
> and the number of apps involved in any one exchange is separate from the
> underlying transport.

An asynchronous-and-still-get-the-replies approach could be fudged in
Plastic by having the sending application include a messageId in the
arguments list for the recipients to use in a reply message.  However,
there's no foolproof way to guarantee the uniqueness of the messageId -
having the hub provide it as you suggest is much better.  I still think
there's a case for also allowing the synchronous model, and I set it out in
my other email.

Message Types
> ----------------------
>
>         As mentioned above, the 'mtype' is a message type similar to the
> UCD+ convention of using a controlled vocabulary to build up more complex
> meaning.  For example, the mtype 'display.image' is logically understood
> to
> mean "display an image";  However the actual actions taken by applications
> can be quite different in response to this message even though they "make
> sense" for that particular application.

Yes - agree completely with all this (cf my earlier ramblings).

 []

>
>         The list of controlled words and their hierarchy will no doubt
> suffer the same debates as with UCDs, but there is promise of consensus.

I think the fact that we are not saying: "you can only send messages of
mtypes made of these atoms" will make agreement easier.

The initial list should be small and cover current usage but should also be
> extensible to apps and use-cases not currently in use.  Below we list a
> suggested hieraarchy based on minimal consideration:
>
>     Notify (no response required)
>         app.*                           Application status msgs
>             connected
>             disconnected
>             error

+ hub shutdown message?

 []

>
>
> Rigorous Message Types
> ----------------------
>
>         The PLASTIC implementation of messages is based on the idea that a
> message type is an 'ivorn'.  The primary argument for this approach is
> that
> the Registry can be used as a central repository for a description of the
> message, however there are several problems in a generalized system:
>
>     - The message ivorn is based on an ad-hoc agreement between developers
>       of specific apps and does not specify the semantics of what that
>       message might mean to later app developers.

That's our poor documentation.  If I'd got around to it, the message
semantics would be better documented.

    - While an ivorn may be searchable in a registry, the descripion of that
>       ivorn isn't generally machine-parsable.

Agreed.  I should say that it wasn't intended to be - the idea was that we'd
register them to
a) provide the definitive human-readable definition of the mtype
b) get the uniqueness and namespacing benefits of ivorns
c) a human could search the registry for mtypes "to do with" (say) spectra,
recover the ivorns of the mtypes, and then search the registry again for
client-side applications that understood these mtypes.

 This means an app can't
>       effectively resolve an ivorn at runtime to determine a needed
>       capability, and even a search for capabilities registered for a
>       specific app would require a network connection for the 'system' to
>       operate at all.

This isn't quite what we intended.  The capability of an application was
defined by the set of mtypes/message ivorns that the application declared.
There was (and is) never any need to look them up in the registry (in fact,
none has ever been registered).

Apps should be able to make requests for capabilities
>       without an available Registry using only the locally running
> processes
>       (or tasks that can be invoked based on a requested message).
>     - IVORNs are more complicated to parse than UCD schemes and so the
>       filtering of messages is also more complicated in an app.
>
> That said, the current PLASTIC messages appear to have the same idea of a
> message hierarchy (e.g. by including "/fits" or "/table" in the path) even
> if these messages are implicitly tied to a specific implementation.
>
>
Thus far, we've never tried to parse them - they're just opaque strings.  We
did flirt with the idea of pattern matching, but never pursued it.  The
structure of the message ivorns, such as it is, was just an attempt by me to
have some sort of naming system.

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/apps/attachments/20070407/5fdbdceb/attachment.html>