[evla-sw-discuss] System design decisions needed for Widar

Tue Sep 22 13:11:23 EDT 2009

A few comments on Barry's discussion points...

> There are several issues to be decided about what we want the system
> to look like in the long and intermediate term so we know what
> general direction to take in the short run.
>
> 1.  Flags from the telescopes.
> The long term route is clear - they should be collected by MCAF
> and inserted into the SDM.  There are problems with doing this
> in the short term because of a breakdown in the colaboration
> with ALMA.

This is not quite fair.  The ball is very definitely in our court
to propose a definition for the Flag Table.  Obviously that needs a higher 
priority, and should not be that hard.  I'll try to put something together
this week.

> Alternatives in the short term are
> A.  Have MCAF put the flags in a temporary flag table in the SDM
>     which may or may not morph into the final flag table when ALMA
>     is interested in talking about it.  Table would be handled by
>     the CASA filler in 'transparent mode', and a special CASA program
>     written to run after the filler to compute the flags and put them
>     in the measurement set.

Unknown tables can currently be passed transparently through the filler,
tho' we've not tested that yet.  I intend to try this as soon as we get data
again.

> B.  Writing a separate flag handling program to bundle them up, label
>     them appropriately, and ship them off to the CBE to apply as
>     baseline based flags.  (Suffers the disadvantage of creating a new
>     data route that will be abandoned in the long run.)

I am strongly in favor of option A.

> 2.  Setting the filter requantizer gains
> Alternatives are
> A.  Continuous ALC.  I would strongly recommend that the ALC loop be
>     based on an integration boxcar synchronous and conterminous with
>     integrations.  Anything else is just too hard to keep track of.
>     Then the station board product is the requantizer gain, which must
>     be recorded every integration to be able to tie things together.
> B.  ALC for each scan.  (My favorite.)  This is the way the attenuators
>     in the T304s are set.  In this case, the output of the station
>     board is the requantizer gain once per scan (used for bandpass
>     stitching), and the requantizer power.  This option raises the
>     question of by what route the station board is notified that a
>     new scan is occurring.

Ken points out that slew time could be a problem -- one would like to
do the ALC when the antennas reach the source (or at least a near
elevation).

> C.  Set gains to a standard value.  Then the output of the station board
>     is the requantizer power, used for converting to correlation
>     coefficients.  This runs into trouble when subbands have very different
>     power levels.

1- How hard is it for the StB to synchronize with the BlB dumps?  This is
   a general question, as it comes up for the Tsys and lag 0 as well.

2- I see this as a separate command to the correlator, not necessarily
   synchronous with a scan boundary (or anything else).  We should
   invalidate the data while the filters are being set; I'm not sure whether
   this is done at the moment.

3- I don't think we know yet how often we really have to tweak the filter
   gains.  Every scan seems excessive, though maybe that would be OK if the
   bandpass doesn't change much.  I was hoping for tweaking the filter gains
   at each change of setups (frequency tunings or correlator changes), and
   possibly maintaining a history so we use standard values (a more
   sophisticated version of option C, with different settings depending on
   the placement of the subbands).

   How much difference does the filter gain make when we're using 8-bit
   samplers?  This is on our list of checks to do when we get data again.
   Presumably the 3-bit samplers will lead to somewhat different (and
   likely more important?) filter gain settings.  We should also check that
   they don't affect the bandpass, and that the resulting corr'n coefficient
   is as it ought to be different only in the noise rather than in any
   overall scaling.

4- In any case we need a much more efficient automated procedure than
   the interactive round-robin implemented currently.  If it takes several
   seconds to do the ALC option A is clearly out the window.

> In any of these cases we probably need a program to collect the outputs
> from the station boards, package them up and send somewhere useful.  The
> 'somewhere useful' probably includes MCAF to include in the SDM.  Even if
> we decide that these data products are to be used in the CBE, rather than
> CASA, they probably need to be included in the SDM to provide a record of
> what was done.

Option A looks pretty nasty then...lots of data!

> At a guess, we'll end up doing all three options above.  C is what we
> have now and is therefore the default, B produces the easiest data to
> process, so we are likely to want to do it for now, and A may be insisted
> upon by a solar observer if we ever have one.

Seems to me we should start by allowing the user to command a filter gain
check & setup via the VCI (probably originating in the OPT, passed along
via the Executor at observe time), at his/her convenience.  If solar folks
want this to happen all the time (assuming we can do the setting very fast)
maybe that's a special VCI command as part of the relevant scan(s).

Do we want a filter gain check option as well as the filter gain setting?
Perhaps we use standard setups (or "no change") unless those filter gains
get us too far from optimal, at which point we reset them.

> 3.  System/Noise source application.
> I believe it is generally agreed that the ratio of system to cal be
> included in the SDM for application during post processing.  This
> probably implies a listener within the correlator system to collect
> these, package them, and export to MCAF.  Again, I strongly urge that
> these be collected and recorded once per integration.

This was definitely the consensus of the discussion group (Ken, Barry,
Vivek, Michael).  The big question is how painful the synchronization is.

Bryan also raised the issue of whether we ever want Tcal/Tsys for real time
work (TelCal).  This could change how MCAF works, so it's a fairly big
issue.  My initial feeling is that we don't need Tsys unless we want some
standard data products for the operator.  I personally don't see this as
a show-stopper but others may disagree.

Another issue that came up was the relation of the Tcal/Tsys integration
period to the noise tube cycle.  I think we came to a consensus that we
wanted the Tcal/Tsys dump rate to be the maximum of {dump time out of CBE,
2x NT period} to be sure the Tcal/Tsys is meaningful.  Otherwise we'll have
to keep track of where we are in the NT cycle.  I would argue that
we should further maintain a *minimum* dump time for Tcal/Tsys of order a
second, synchronized to the vis. dump time.  All of this is in aid of
allowing sensible Tsys post-processing, e.g., time averaging, applying Tsys
from one subband to another, etc.

> 4.  Are correlation coefficients useful?
> From a sufficiently high level abstract viewpoint, the answer is no.
> Calibration is done by dividing each subband by the subband gain, and
> then multiplying that whole combined spectrum by a number that converts
> it to Janskys.  This number is derived by looking at the "best"
> subband (least interference, best known cal) and dividing by the
> measured cal power (that is, cal on minus cal off), and multiplying
> by the Tcal for that subband.  This converts the measure correlations
> into kelvin.  One then multiplies by the antenna effective area which
> converts them into Janskys.  (but see footnote)  The power level out
> of the requantizer nowhere enters in this description of calibration.
> On the other hand, for various intermediate operations, particularly
> in telcal, having correlation coefficients, derived by dividing the
> measured correlations by the power out of the requantizer, really
> simplifies things by not having to worry about changes in power level
> across the time of its calculations.

I would phrase the above as "Calibration CAN BE done..." rather than
"Calibration is [implying must be] done...".  Certainly for instance we
do not currently use the effective area for most VLA calibrations, instead
referencing to sources of known flux density.

Here are my arguments for the utility of "true" (rather than scaled)
correlation coefficients:

1- Correlation coefficients are a standard data product which is easy to
   explain, and for which many people already have a gut-level understanding.

2- The SSRs for both ALMA and the EVLA have always required that we store
   the visibilities as correlation coefficients.  So does the BDF definition.
   Many other interferometers (including the current VLA and the VLBA)
   report correlation coefficients.  There has to be a mighty strong reason
   to move away from this, at this late date.

   - Both AIPS and CASA already happily deal with corr'n coefficients.

3- TelCal really needs some uniform scaling across all baselines, which
   at a minimum means correlation coefficients.  If we do not put these
   directly into the BDF we'll have to send the scalings across to TelCal
   and do them there.

4- If we do any averaging in the CBE (e.g., to give subscan averages to
   TelCal, or to limit data rates) it's more proper to average correlation
   coefficients.  [Barry points out that you've chosen your integration
   time assuming that nothing changes on shorter timescales, so if things
   do change very fast you've got bigger problems than improper averaging!]

5- We don't need the zero lags (power levels) for anything else, so why
   not use them where they're needed and then throw them away?
   [I stand ready to hear counter-arguments on this point!]

> If we decide correlation coefficients are useful, it raises the question
> of when they are calculated - in the CBE or later in the stream.  (If
> this division is done in the CBE, we will have to have the power of
> undoing it later, since, as noted above, correlation coefficients are
> not used in the calibration process.)

I do not understand this.  Currently we and many others happily give
corr'n coeffs, which we happily calibrate.  What is different about WIDAR?

Another question is how we should derive the normalization to get to corr'n
coeffs, assuming we want them.  Above I've implicitly assumed we're using
the filter-based power measurements, which requires some care in ensuring
everything (including data blanking) is treated identically e.g. when using
recirculation.  Another approach would be to use the autocorrelation
spectra directly, as calculated on the Baseline Board. 
>
> 5.  Delay residuals.
> The see-sawing of phase across the band due to the delay cogging is
> alleged to be a problem, and it probably is at some very high dynamic
> range, and becomes a nuisance in more ordinary observations.  We
> should plan to correct it, although it is not clear to me that there
> is a very high current priority on correcting it.

I agree.

> I can see two
> ways of making the correction.
> A.  Make available the delay polynomials used at observe time, and
>     replicate the calculations done on the station board to estimate
>     the mean phase slope residuals during the integration to calculate
>     the correction.  The delay polynomials may be captured from the
>     Executor multicast or relayed from the station board, or from the
>     polynomial distributor.
>
> B.  The station board may be taxed with the job of producing the average
>     and RMS delay error over the integration, and this can be provided as
>     one of the station board products.  A rough calculation indicates that
>     correcting the delay by the mean error and the amplitude by the RMS
>     error should be sufficient even for quite high dynamic ranges.
>
> In either case, the correction may be done in the CBE, or later, in CASA,
> determining whether the data stream is sent to the CBE or to the SDM.

The complexity of the calculation suggests that we consider doing some checks
in post-processing at first to be sure we've got things right.  With
Martin's new CBE setup we could record the data both with and without
this correction to be sure it's done right.

> 6.  Tcal.
> We probably, in the long run, need cal spectra.  If we decide we do,
> probably the most reasonable place to store them is in a new table
> in the evlaparm database.  We might need a receiver serial number
> in the main parameters table to simplify the situation if a receiver
> is removed from one antenna and installed in another, though it would
> not be too complicated to keep track of this by hand.  The question
> that arises is who reads the table - Executor or MCAF.  (One might
> also palm this off on Scheduler.)  If the Executor, when should it
> be read - every scan?  However, as noted above, we really need a Tcal
> only for one subband for each sampler.  If we get properly organized,
> we can probably live with the existing system for a long time.

I would argue that proper organization and tracking is more difficult than
maintaining Tcal spectra!

I think we concluded that we don't need Tcal at all for OSRO.

I don't see the need for saving Tcal every scan.  Tcals are measured
quite infrequently; further, there just aren't that many numbers.  I've
been envisioning either sending the whole table (or maybe just the Tcal
for the receivers which will be used) to include in each SDM at the beginning
of each Scheduling Block, or alternatively giving CASA (or maybe the Archive)
access to the Tcal when they're actually wanted.  I lean towards putting
everything in the SDM, but it's not a big deal.

Note that it's not entirely clear where to save Tcal in the SDM.
I'm looking into this...

> 7.  Scans and subscans.
> The CBE must know about scans and subscans, obviously.  It is currently
> told about them directly by the Executor.  Is this intended to be the
> final route, or is this a temporary expedient?  If permanent, we need
> to know the information to be added to let the CBE sort out the multiple
> subarray case.

If the Executor is NOT the sole source determining beginning/ending of
(sub)scans, who is?  Somehow the antenna hardware and the CBE have to be
synchronized, and surely that's the Executor's job.  I guess I don't
see the alternative.

Another question is how scans/subscans are numbered: is there some master
keeper of scan & subscan numbers or does everyone "roll their own"? 
If the former, should that keeper by the Executor?  I think the consensus
was that the answer is "yes" in both cases.  (See also footnote 2.)
At minimum the SDM and the BDF *must* use consistent (sub)scan numbers,
or we cannot properly associate individual binary data files with individual
rows in the SDM Tables.  A secondary requirement at the moment is that the
scan numbers for a given Scheduling Block begin at 1, as do the subscan
numbers within a given scan.  That secondary requirement appears in the
format definition documents; I don't know whether the filler actually
implements it, but we should either go ahead obeying the format definitions
or get them changed.

> Also, as mentioned above, it is perhaps expedient to be
> able to tell subband filters "This is a new scan - set your output power
> level to something sensible." This clearly has to go through the VCI
> interface - any problems with that?  We need to decide the message content.
> In particular, we would rather like to send a single message, and have
> the ConfigurationMapper keep track of what station boards to forward
> it to.
> (also see footnote 2)

Perhaps David H. can help out here.  My recollection is that you can send
an update to a configuration by name, possibly saying stations=all or some
such.

Also there's the question of whether we can send actual filter gain
settings. I think we need to be able to do this, and may have this as the
standard mode (as compared to ALC loops).

> 8.  How does correlator configuration information get from the OPT to
> the correlator?
> The plan is for the scheduler to produce an Executor script and
> correlator configuration documents.  I can see two ways for the
> documents to get to the correlator.
> A.  The scheduler may write files containing the documents and insert
>     the file names in the script.  The Executor would then send the
>     document to the VCI as needed, either through a socket or by passing
>     a file name, as the current human interface to the ConfigurationMapper
>     does.
> B.  The scheduler creates a queue of correlator configuration times,
>     and delayed processes that wake up and send configurations to the
>     VCI at the appropriate times.
> It seems to me that A is the best way to do this.  Among several other
> reasons, it trivially permits human edited scripts, which would be a
> bear via route B.

I thought we had already decided on A.  B basically re-implements what
Sonja has done with her various configuration queues in the Configuration
Mapper.

An addendum to A, which again I think is already agreed upon, was that the
Executor would know the name of the configuration described in the files
created by the OPT/Scheduler.  This allows the Executor to append
information only know at observe time, e.g., the antennas which will be used
in a particular subarray.

>
> 9.  How does configuration information get from the OPT to the CBE?

I see this as part of the overall correlator configuration covered in
question 8 above.  I think the rest of the discussion below is really about
who does the configuration of the CBE.  That's between Sonja and Martin;
all the OPT knows about is the need to send VCI commands saying "do this
somehow."  Maybe I'm missing something?

> This is the ConfigurationMapper, whether one chooses to call it that
> or not.  If you choose to write it as a separate entity, you will
> have to worry a lot about communication protocols between the two
> pieces.  From an ethereal plane, one might simply recommend
> combining ConfigurationMapper and the CBE master.  Even on a less
> ethereal plane, it would seem to me better to make it as much as
> possible one piece, so that things currently only in the CBE
> master get put in ConfigurationMapper, like a list of available nics
> and ports would be kept in the ConfigurationMapper.  Then the
> ConfigurationMapper can organize sending the addresses to the
> baseline boards.  It will also have to pass along to the CBE slave
> processes what it has done, and what processing they need to do.
> (Note:  here I am interfering in an area I know nothing about, so
> the content may be inane, but something needs to be decided and
> written down.)
>
> Footnote 1.  Instead of simply the switched power, sending along
> (Pon - Poff)/(Pon + Poff)*(Requantizer_Power)/Requantizer_Gain
> cancels a few mildly annoying fiddly corrections.
>
> Footnote 2.  Despite the fact that scan_number, subscan_number are
> hermetic within the BDF/SDM system and do not leak out, Rich and
> Martin have been urging to have the Executor provide them, on the
> grounds that associating data via an integer is nicer than via the
> floating point time.  It is reasonable for the Executor to do this,
> but one would prefer not to unless the current direct communication
> scheme to the CBE is the long term one.

A couple other general observations & comments.

1- The current plan we've been carrying in our heads is that the CBE
   will eventually have to handle at least some station-based information.
   In the end the CBE is supposed to:
   * produce correlation coefficients, which requires station-based
     zero lags (powers)
   * stitch together subbands, which requires station-based filter gain
     settings and possibly Tcal/Tsys
   * correct for residual delays, which requires station-based delay model
     information of some kind
   * perform quantization corrections, which requires station-based knowledge
     of lag zero & level statistics
   * derive accurate visibility durations and (centroid) time stamps, which
     could involve knowing some antenna-based flags
   Barry has argued that we should avoid all station-based processing in the
   CBE, if at all possible.  This would certainly simplify the CBE, but
   would require a great deal of WIDAR-specific off-line processing.  All of
   these corrections _could_ be done off-line, with some pain and some loss
   of flexibility (e.g., no cross-subband averaging in the CBE).  I am leery
   of such a major change in approach at this late date, but we really need
   a fuller discussion of how we might get the StB data to the CBE and to the
   individual nodes before we can make any final decision.  I see the
   post-processing option as a fall-back rather than a desireable approach at
   this point, but others may disagree.

2- We really need to decide what exactly goes into the SDM and how.
   This is touched on many times above but it needs some focussed effort,
   alas involving myself.

That's it for now...

          Michael