WasDerivedFrom vs. WasGeneratedBy

Thu Oct 19 00:53:19 CEST 2017

Hi DM,

On Tue, Oct 17, 2017 at 02:04:44PM -0700, Tim Jenness wrote:
> As another data point, LSST will have the ability to attach a WCS
> to a raw image that is derived by looking at 1000 processed images.
> We will be tracking the provenance of that WCS and its inputs and
> have to attach it to the raw data as provenance. If someone asks
> for "all the inputs" they are not really going to want all 1000
> processed images. They need those to exactly reproduce the
> processed image they will generate from that updated raw image but
> it's clearly distinct in the provenance tree.
> 
> To be more concrete, if you now coadd two images that came from raw
> data that had WCS derived from 1000 other images, when someone says
> "what went into that coadd" they probably mean the two parent
> images and possibly the two raw data files.

But isn't the provenance structure in this case something like (notation
contrived, roles suppressed in this graph -- imagine labels on the
vertices if you will)

rawim2001 -- Photoproc ----- im2001 -,
              /                       \
  Flatfield and such                   \
              \                         \
rawim2002 -- Photoproc ----- im2002 ---- Coaddition --- coadd10001
                                        /
im1   --,                              /
...   ----- Calibration -- wcs -------/
im1000--/     /
        sectractor conf

So, if you just look at the immediate operation of the co-addition,
you'll succintly see that there were two reduced images and a WCS
calibration coming in.  Only when you're interested in where that
calibration comes from you see the 1000 images, at it should be, and
just as you don't see the raw images as sources in the coaddition if
the stacking was performed on flatfielded and darkframed images.

Similarly, in Ole's example:

On Tue, 17 Oct 2017 11:24:57 +0200, Ole Streicher wrote:

> To give you a real-world use case, which is kind-of debugging: Someone
> detects an "interesting structure" on a science-ready exposure, and to
> be sure he wants to process the raw image with his own, alternative
> pipeline (which may or may not need the same kind of calibration). Then
> he has to find out "which is *the* raw image that I need to process?",
> and the answer is wasDerivedFrom (maybe recursively).

I argue it's more straightforward to inspect the photo processing
activity and figure out what the input with the role "raw image" was.
After all, you might just as well suspect that the flat for this day
was flawed and you'd just like to drop in yesterday's flat, or that
any other gear in the provenance chain is at fault, and you might
just as well want to replace that.

Sure, you'll have to define roles in this world for all inputs to all
activities, but I'm sure you want that anyway.

          -- Markus