[evla-sw-discuss] Simple subarrays: a modest proposal

Tue Apr 1 14:32:57 EDT 2014

[Daniel/Rick: please forward this to evla-sss-discuss, to which I'm not 
allowed to  submit...or maybe I have the name wrong??]

This is a proposal for a simple approach to scientific subarrays which I 
believe would handle the most scientifically interesting cases, without being 
too painful for the SSA group to implement (the latter based on conversations 
with Daniel).

1- Define a linked set of a master and multiple slave SBs, one SB
        per subarray.
        - This is done in the OPT.

        - Master and slave SBs are written independently by the observer, and
          use completely independent resources.

        - It is the observer's responsibility to obey subarray restrictions
          (in much the same way that she's responsible for putting in dummy scans,
          ensuring sufficient time on-source, etc., for regular observing).
          : If it's easy it would be a good idea to automatically check that
            the lengths of each SB in the set are consistent, and to COMPLAIN
            LOUDLY (i.e., don't let the observer submit the SBs) if any
            slave SB is longer than the master SB.

        - The master and slave SBs are clearly marked as such in the OPT
          displays.

        - There is an additional display in the header tab of subarray SBs,
          showing which antennas to use for this subarray.  Ideally the OPT
          would check that these are consistent (i.e., that no antennas are
          duplicated, and that restrictions on the distribution of antennas
          between subarrays are obeyed), but this is a nicety.
          : The simplest display would simply show the set of antennas with
            check boxes, ideally with checks of different colors for the
            different subarrays:

                  ea01 ... ea09
                  ea10 ... ea18
                  ea19 ... ea27
                  ea28     (PT, someday)

          : Another nicety would be to allow the observer to specify antennas
            by location instead of number.  Until this is available it would
            be nice to have a link to the antenna positions file, which is
            available on the VLA home page:
              http://www.vla.nrao.edu/operators/CurrentPos.pdf
            This is not ideal, as that page generally shows the current
            configuration rather than the configuration for which the SB is
            written (e.g., one might write an A cfg SB before that
            configuration actually begins), but that's life.

          : If listing of antennas is allowed we could accurately calculate
            the data output rates for the subarrays.  The SBs in the
            set could report both that SB's data rate and the total for the
            entire set; the latter would then be used to check whether the
            observer exceeds currently-allowed data rates.  We have already
            had one subarray observation which requested faster dump rates
            than the OPT/RCT allowed, but which were fine given the reduced
            number of baselines (Osten et al. stellar observations).
            ...This would require some interaction between OPT and RCT.  One
              approach would be to allow resources to be marked as "Subarray
              resources", and have the observer enter the maximum number of
              antennas to be used with that resource in the RCT.  The reported
              data rate (and RCT restrictions on that data rate) would then be
              calculated using that maximum number of antennas.  Such subarray
              resources could only be used in SBs which are part of a set of
              subarray SBs, and the maximum number of antennas allowed for
              that SB could be compared with the number of antennas actually
              selected for that SB, with errors resulting if they are not
              consistent.
              ...Such an approach could be readily extend to those wishing
                to trade antennas for integration time (e.g., to allow
                faster dumps as GO or OSRO for resources which use only
                20 antennas).

2- The master SB (only!) is used for scheduling the experiment.
        - The OST only looks at the master's scheduling requirements (weather,
          LST start times, etc.).
          : Ideally the slave SBs do not even show scheduling requirements
            as options in the OPT, instead showing "tied to Master SB #XXXXX".

        - When the OST displays the master (i.e., when it's put on the
          suggested schedule), the master is marked using a special
          (colorful?) label, to show this is a multi-SB script.

3- When the master is selected by the operator:
        - m2s is run automatically for all SBs in the set (both master and
          slaves) -- see also below!

        - The operator is then prompted as usual to run each SB in the set.
          Ideally the antennas in that subarray are pre-populated in the
          screen that pops up for each SB.
          : Daniel tells me this is easy.  If it's not, just do a separate
            popup for each SB, with some text telling the operator to use
            the list of antennas in that SB's comments (which could also pop
            up automagically).  This sounds unwieldy, but note that it's
            basically what we ask the operator to do now!

        - When the SBs are successfully executed, charge the observing program
          for the master SB's time (only).

4- Staggered starts & simultaneous reconfigurations

        The most common (only?) current failure mode for subarrays occurs
        when the correlator is reconfigured at the same time in multiple
        subarrays.  The worst case appears to be at the beginning of the
        script, when typical high-frequency subarrays have a dummy scan with
        one configuration followed by a dummy scan with another; the first
        scan of a script is special because both of those configurations are
        sent to the CM in rapid succession, so the CM (and the CBE and
        everything else) is very busy.

        Currently we try to minimize this problem by offsetting the start
        times of the various subarrays: we tell the operator in the nightly
        note to start SB 1 on these antennas, then wait 30sec before starting
        SB 2 on these other antennas, etc.

        Ideally we would fix the underlying problem, but that has proved
        elusive, and subarrays have only occasionally risen to a sufficient
        priority that we look seriously at debugging and/or fixing this.  So
        for now I assume we remain in the current state.  How do we handle
        this in the proposed scheme?

        I see several possible approaches.

        A. Do what we do now: when the operator selects a subarray experiment
          from the OST list, pop up a message telling him:
          (1) do not queue up these subarray SBs
          (2) wait 30secs (-ish) between submitting each SB in the set to
            the Executor

        B1. Modify m2s to automatically insert n*30sec of blank time (doing
          NOTHING -- definitely not talking to the CM!) in the .evla scripts
          for the slave SBs.
          ...Ken tells me this is simple to do.
          ...This has the advantage that the operator could submit them
            all at once, and in advance.

        B2.  Modify m2s to insert an n*30sec-long dummy scan using the
          same resource as the 1st scan of the observation, to avoid piling
          up multiple configuration messages at the start of an SB.
          ...Sounds good but I would want to test this before going too crazy.
          ...Like B1, the operator could then submit the SBs just like normal
            ones.

        C. Enforce non-overlapping reconfigurations in the OPT.
          ...This sounds nice but also sounds painful; and I really hope that
            in the end we eliminate this restriction, so this would become
            "throwaway code".

        There are doubtless many other possibilities.  I'm leaning towards B
        at the moment but we should do some tests to be sure this actually
        works.

        Note that both A and B assume that the subarrays are changing
        configurations (if at all) at the same time in the SB (before we put
        in any blanking).  If the subarrays are truly independent this is not
        a given.  However, all the scientific subarrays I've seen so far have
        indeed changed configurations in lock step, if they change
        configurations at all.

        Another problem with A and B is that the execution time of all the
        subarrays is increased by up to a couple minutes, because
        we've added dead time.  We could calculate this in the OPT (by
        charging each "slave" SB the extra n*30sec) or just live with it until
        we fix the fundamental problem.  Those who know more about the OST
        will have to comment on any scheduling issues (e.g., we used to have
        to start SBs on 15min boundaries, which would mean that staggered starts
        effectively add 15min to the total observing time).

5- Subarray resource restrictions
        I claim that our current restriction on subarrays (that they not share
        rows/columns on any Baseline Board) means that subarrays are truly
        independent in terms of correlator use -- subarrays can use any
        currently-allowed GO or OSRO (not RSRO!) setup, and those setups are
        completely independent from subarray to subarray.  Similarly I claim
        that there are no additional restrictions related to the crossbar
        boards. [But see below, as reality does intrude on these ideals...]

        - I do not include phased array or VDIF outputs in these claims.

        - Contrary to the above...
          : the CBE's assignment of BlB outputs to CBE node NICs might set
            some restrictions on the use of BlB stacking
          : the limited number of DUMPTRIGs (and current limits on routing those
            DUMPTRIGs) may lead to restrictions on recirculation (as well as on
            phase binning, intermittent dumps, and very fast dumps -- but those
            are all RSRO anyhow).

          For now I believe it's safe to allow any continuum experiments
          (3-bit or 8-bit), and I think we could write down some simple (and
          not overly onerous) restrictions on spectral line setups which would
          avoid these problems.  To my knowledge there have yet to be any
          approved spectral line subarray experiments, so simply not allowing
          those is probably fine for quite a long time.  [Basically this
          translates to disallowing recirculation and BlB stacking when using
          subarrays.  If we use the "subarray" tag in the RCT, as outline
          above, the RCT could enforce those restrictions on all subarray
          resources.]

        - We may have to explicitly call out the requested autoCorrelation
          request in the .vci scripts for subarrays, to ensure they do not
          "leak over" into extra rows/columns.  I'm not sure where this stands
          at the moment.

I believe this proposal covers all science cases we have seen thus far
except those which violate non-subarray restrictions (e.g., very fast
dump times).  For non-RSRO experiments the proposed implementation:
        - avoids manual m2s/edits
        - allows dynamic scheduling
        - allows proper (automatic) accounting of observing hours
        - covers almost all of the interesting science cases
        - does not place too great a burden on our very limited programming
          and testing/commissioning staff
        - should be reasonably easy to explain to garden-variety observers

Comments?

                 Michael