[evla-sw-discuss] SDM Flag table proposal

Mon Dec 7 18:46:08 EST 2009

----------------------------------------------------------------------------
                             FLAG TABLE PROPOSAL
----------------------------------------------------------------------------

v. 0.0  mrupen 11nov09: EVLA draft
v. 0.1  mrupen  4dec09: finally sent out!!
   Distribution: EVLA: bbutler, bclark, ksowinsk
v. 1.0  mrupen  7dec09: revised per bbutler, bclark, ksowinsk
     Other possibles: wbrisken, vdhawan, efomalon, egreisen

----------------------------------------------------------------------------

INTRODUCTION & CONTEXT
----------------------

   The SDM is intended to provide all the information necessary to
interpret and obtain useful scientific results from EVLA data.
In keeping with this the Flag Table records mainly what should
be flagged, with enough information behind those flags to help the
sophisticated observer in choosing which flags to apply, and inform
her thinking on other consequences of those error conditions behind the
direct flags recorded here.  The Flag Table is NOT intended to record
enough information for full debugging -- the flags here represent the
end results of hardware and software conditions which engineers and
staff scientists will track down in detail using the Monitor Database.
The exact boundaries implied by this general intention are of course
debatable.

   The Flag Table is to be used for flagging the visibility data, NOT
weather, Tsys, and other ancillary information.  Those ancillary data are
recorded in SDM Tables, and the thinking at the SDM review in Jan09 was
that bad values should simply not be recorded in those tables.  In the
case of Tsys and a few other values this may prevent the use of the
corresponding visibility data.  We may wish to re-visit the possibility of
flagging ancillary data as well as visibilities, but I consider that beyond
the scope of the Flag Table discussed here.  Again I would welcome
discussion.

   The flags stored in this Table are in addition to those produced by the
CBE and stored with the binary data (see the FLAGS binary component in
the BDF documentation).  The BDF's FLAGS component consists of 32 flag bits;
each bit corresponds to a different condition, and setting that bit means
that condition obtains.  The meaning of those bits is yet to be determined.
There is currently no concept of severity -- a BDF flag is either set or
not,
and that's the end of it.  The dimensionality of the BDF flags may match
or be a subset of the dimensionality of the corresponding visibilities
(CROSS_DATA); but the time resolution must match that of the visibilities,
i.e., zero or one binary FLAGS component per integration.

   [It is unclear whether the BDF flags also apply to the AUTO_DATA, though
Martin may know.  I suspect we might manage auto flags as a separate FLAGS
component, but have not thought about this much.]

   My current thinking is that the BDF flags are to be used for low-level
correlator flagging, most obviously those which can only be detected by the
CBE (e.g., a lag set which cannot be formed for a given baseline at a given
time due to lost frames).  Whether the CRM or the ConfigMapper should
send higher-level flags (e.g., a bad chip or a bad board) to the CBE or
to MCAF is TBD.  I would personally prefer to make MCAF the repository of
such long-term, large-scale flags, and reserve CBE flagging for conditions
only the CBE can detect (of which I'm hoping there will be few).  A few
reasons for this:
   * The Flag Table in the SDM allows arbitrary time ranges, while
     long-lasting BDF flags must be repeated every integration.
   * The Flag Table records hardware flags from other parts of the system,
     and it seems reasonable to similarly record hardware flags from
     the correlator.  E.g., the loss of a StB is roughly equivalent to
     the loss of a receiver, though there's more redundancy built into the
     correlator.
   * The Flag Table will likely be implemented before the binary FLAGS
     component, and may allow the latter to be put off for some time (a year
     or more).  There are other drivers for other binary components (e.g.,
     data weights) but this still may give Martin some useful breathing room.

   Finally, a few words on SDM Tables. I have put some of the more useful
SDM documentation up on the Web, at
    http://www.aoc.nrao.edu/~mrupen/SDM/sdmdocs.html
for those brave or foolish enough to dive in.  The important point here
is the difference between required and optional data.  Quoting from the
SDM Intro document:

   As so often in the SDM, Required Data means rather more than it says:
   * Required fields must have explicit values in order for the
     corresponding row to be valid.
   * In addition, a set of required fields specifies a unique row in the
     table. One row may not differ from another row in a table, solely in
     its Optional Data; each row must differ in one of the required data
     fields.
   * Each such unique row has a corresponding unique key.

   Optional Data fields are those which can but do not have to have values
   for a given row, and which are not considered when determining
   uniqueness. Two rows in a table cannot differ only in their optional
   data --  every row must differ in at least one required field.

This is important because we have to be sure that we are happy having only
one row with a given set of Required Data elements.

----------------------------------------------------------------------------

FLAG TABLE: LIST OF DATA ELEMENTS
---------------------------------

Keys:
   flagId -- tag

Required data:
   reason     -- enumeration
   startTime  -- integer (MJD nanoseconds)
   endTime    -- integer (MJD nanoseconds)
                 should have a special value (0?) meaning "until the end
                 of this ExecBlock".  Currently this special value is
                 "a very large number".

Optional data:
   details    -- string (REQUIRED if reason=OTHER)

   severity   -- integer (0-15)  [not used by EVLA]

   module     -- string (e.g., T304)
     ... not sure we want this, as it may not be interesting for scientists

   $ Specifying the flags
   $ Mostly based on BDF axes: TIM BAL ANT BAB SPW SIB SUB BIN APC SPP STO POL

   $ - Antenna-based flags  (BDF's ANT)
   numAntenna    -- integer
   antenna[Nant] -- antennaId array

   $ - Baseline-based (mostly correlator) flags (BDF's BAL)

   $ - Receiver-based flags (have to convert to BDF values)
   numFeed       -- integer
   feed[Nfeed]   -- feedId
   numReceiver[Nfeed] -- integer array
   polarization[Nfeed, Nrec] -- Stokes of the receiver

   $ - Processor-based flags (have to convert to BDF values)
   correlator

   $ - Baseband-based flags (BDF's BAB)
   numBaseband     -- integer
   baseband[Nbase] -- integer array

   $ - Frequency-based flags: true frequencies [Hz]
   numFreq       -- integer
   frequency[Nfreq] -- float   -- should be ranges

   $ - Frequency-based flags: SpW and SPP (channels) (BDF's SPW, SPP)
   numSpWin      -- integer
   spWin[Nspwin] -- spectralWindowId array
   numChan[Nspwin] -- integer array
   chan[Nspwin,Nchan] -- integer array

   $ - Sideband-based flags (BDF's SIB)
   numSideband   -- integer
   sideband[Nside] -- integer array

   $ - Subband-based flags (BDF's SUB)
   numSubband    -- integer
   subband[Nsub] -- integer array

   $ - Bin-based flags (BDF's BIN)
   $   (pulsar phase bins for EVLA; nutator or freq switching for ALMA)
   numBin        -- integer
   bin[Nbin]     -- integer array

   $ - APC-based flags (Atmospheric Phase Correction) (BDF's APC)
   numAPC        -- integer
   APC[Napc]     -- integer array

   $ - Pol/Sto -- pol'n or Stokes-based flags (BDF's STO, POL)
   numStokes     -- integer
   stokes[Nsto]  -- integer array
   numPol        -- integer
   pol[Npol]     -- integer array

----------------------------------------------------------------------------

FLAG TABLE: DESCRIPTION OF DATA ELEMENTS
----------------------------------------
   reason     -- enumeration
     * --> one we want for EVLA ASAP
     Antenna pointing flags:
     * ANTENNA_NOT_ON_SOURCE
         IDCAF has Az/El position error and source change in
         progress -- do we need both?
         Barry: source-change-in-progress not needed for EVLA
       ANTENNA_SHADOWED
       ANTENNA_NOT_IN_SUBARRAY
     * REFERENCED_POINTING_FAILED
         ...i.e., refptg was requested but could not be done
         ...separate flag for first & second order refptg? NO for now
       REFERENCED_POINTING_NOT_APPLIED
         ...i.e., it _was_ requested but _was not_ actually applied

     Antenna hardware flags:
     * SUBREFLECTOR_ERROR (prefer this to FRM_POSITION_ERROR)
         details= not at commanded position
     * FOCUS_ERROR
         details= not at commanded position

     Band selection, tuning, and related flags:
       BAND_SELECTION_ERROR
       TUNING_FAILED (or MIS_TUNED?)
         details= L301 or whatever
       RADIO_FREQUENCY_INTERFERENCE
         details= source?
         this will normally be a range of frequencies or channels

     Flags between receiver & correlator:
       ROUND_TRIP_PHASE_ERROR
       ROUND_TRIP_PHASE_NOT_APPLIED
       ATTENUATOR_ERROR
         this means BEFORE the correlator
       SAMPLER_ERROR

     * TSYS_TOO_HIGH

     Calibration flags: don't need these for some time
       DELAY_NOT_SET
       BANDPASS_NOT_SET
       FOCUS_NOT_SET
       COLLIMATION_NOT_SET
       ANTENNA_LOCATION_ERROR
         ...if antenna moved recently but position not updated yet
       WVR_ERROR
         ...when we have one ;) but ALMA will want this

       from TelCal:
         PHASE_CALIBRATION_ERROR
           - want this when phasing the array
         DELAY_CALIBRATION_ERROR
         FLUX_CALIBRATION_ERROR
         BANDPASS_CALIBRATION_ERROR
         - might also include worries that it's too long since the last
           calibrator, if this is supplied by the observatory.

     Processor flags:
     * CORRELATOR_LEVEL_TOO_HIGH
         requantizer power too high (e.g., RFI in subband)
     * CORRELATOR_LEVEL_TOO_LOW
         requantizer power too low (e.g., poorly set gain)
       CORRELATOR_HARDWARE_ERROR
         details= StB, BlB, XBB
       CORRELATOR_SOFTWARE_ERROR
         details= models, CMIB, etc.
       CORRELATOR_BACKEND_ERROR
       -- do we want more specifics?
       CORRELATOR_ATTENUATOR_ERROR
       -- e.g., attenuators haven't been set recently

     Post-correlation flags:
       ARCHIVE_ERROR

     Other flags:
       SILLY_OBSERVER
         ...doubtless the most common

       OTHER
         ...in which case details are *required*
         If OTHER shows up to much we should extend the enumeration to cover
         those forgotten flags

     Missing flags:
       Gain & Tsys-related flags
         TSYS_FLUCTUATING (see FILLM CPARM(2))
           -- Barry says not needed any more

   severity   -- 0-15  (low= minor, high= critical)
     Alternatives would be to have fewer values or make this an enumeration.
     FITS-IDI uses:
               -1        No severity level assigned
                0        Data are known to be useless
                1        Data are probably useless
                2        Data may be useless
     The VLA on-line system uses 4 bits per IF, with the bits as follows:
         0000 = 0        OK
         0001 = 1 (int)  Warning
         0010 = 2 (int)  Not so good
         0100 = 4 (int)  Bad
         1000 = 8 (int)  Extremely bad
       and allows combining these, e.g., 0101 is possible

   startTime, endTime
     ...would like a value meaning "all times"
     Should we be able to specify scan/subscan instead?

----------------------------------------------------------------------------

Misc.
-----

See http://www.aoc.nrao.edu/~mrupen/SDM/sdmdocs.html for various
documentation related to data formats.

CorrelatorMode Table gives axesOrderArray & numAxes (BDF info)
   BDF axes include: TIM BAL ANT BAB SPW SIB SUB BIN APC SPP STO POL

IDCAF flags (currently used) are:
   * reference pointing requested but not applied
   * AZ/EL position error
   * FRM position error
   * Total Power error (per IF)
   * LO mistuned (possibly per IF)
   * round trip phase error
   * band select switch incorrect
   * first integration of a scan (a VLA correlator specific flag)
   * antenna shadowed
   * antenna taken out of subarray
   * source change in progress

VLA export data format flags (currently used) are:
   Antenna flags
     source change in progress (used to indicate 1st record of each scan)
     sub-reflector not in position
     antenna off source
     L6 module not locked      (--> L304 for EVLA)
     L8 module not locked      (--> L305 for EVLA)
     back-end filters misset   (not relevant to EVLA)
     back-end total power out of range (--> corresponds to quantizer or
                                re-quantizer for EVLA)
     antenna flagged bad by Operator (not used any more)
     Tsys fluctuating          (--> probably not relevant for EVLA)
     first LO not locked       (--> L301 for EVLA?)

     Ken says this list takes care of all flags that are useful in
       flagging the data, and should be kept SHORT.

   IF flags
     noise tube is not both on and switching
     flagged bad by Operator

   Baseline flags -- can be applied to some or all channels for a baseline;
       not yet implemented
     frequency RMS too big
     time RMS too big
     value too big (clip)

----------------------------------------------------------------------------