[alma-config] thoughts on simulation metrics

Mon Jan 29 16:47:47 EST 2001

all,

please find below a summary of metrics discussions from my perspective, 
with my own comments and thoughts interjected.  

i make some recommendations at the end.  if you disagree strongly with
one of the suggested metrics, or are upset that some other metric isn't
included (or even discussed), let me (and the group) know.

i think we should come to an agreement by the end of the phone
telecon on wednesday if at all possible.

	-bryan

Introduction

   We now have at least 3 methods of simulating ALMA imaging with 
different configurations (Heddle's in AIPS, Viallefond/Guilloteau's in 
GILDAS, and Holdaway/Morita's in SDE), with more possibly on the 
horizon (AIPS++ hopefully).  I may have left some out (miriad, e.g.), 
but the point is just that we do have the capability now to do some 
real imaging simulations and compare the results.

   The question now is how do we make comparisons between the different 
arrays, given the results of the simulations?  One way, which has been 
already used to some extent on the results of Heddle's simulations, is 
to simply qualitatively examine the images and differenced images, and 
make statements regarding the relative "quality" of the different 
configurations based on that.  All along, we have also had the idea 
that eventually we come up with some standard _metrics_ which would be 
quantitatively based and would be used for true comparison.  We seem, 
however, to have gotten bogged down in the details, and have failed to 
come to any consensus as to which metric (or combination of them) to 
use.  This is partly because nobody has been willing to assume the 
leadership in this respect, and so we have waffled along with little 
progress.

   Separately from this, there is the fact that a better metric of this 
type does not necessarily imply a better overall configuration 
design - most importantly because of what we have been calling 
"operational issues."  I don't know how to rationally address that 
issue, because it quickly degenerates into a subjective and political 
one, and is hard to get a good handle on.  However, I will note that we 
have had no good exposition of exactly what "operational issues" really 
entails in its entirety (the one issue that gets stressed is how often 
we move antennas, but what are the others?), much less whether these 
issues favor one configuration type over another.  Even after such an 
exposition, it is still useful to have the information on which is the 
better configuration "scientifically" (I use the term broadly), so that 
some weighting of the relative importance of operational vs. scientific 
quality can be specified, and a final decision on configuration type 
can be made.  I note that there are other issues which affect the 
scientific quality of a configuration design which cannot be strictly 
set down as a quantitative measure in the way that I'm discussing here 
(as a simple e.g., repeatability of antenna positions for monitoring 
observations), and they will have to be folded into the discussion in 
the end.  At any rate, I think it still behooves us to come up with the 
quantitative metrics as soon as possible, and hence I will address some 
of them here, and then make a real suggestion at the end.

Types of Metrics

   They generally fall into 2 categories: uv-based metrics, and 
image-based metrics.  We have been recently concentrating on the 
image-plane ones, but I don't think we should necessarily completely 
forget about the uv-based ones.

uv-based Metrics

   In my opinion, the attractiness of uv-based metrics is that they are
not dependent on the source structure, but rather only on the source
declination and hour angle range of observation.  Also, they tend to
be simple to calculate.

   1. fraction of occupied cells

      Simply count the number of occupied cells in the uv space for a
      given observation.  The selection of the uv cell size is
      somewhat arbitrary, but should probably be roughly the half the 
      antenna diameter?

      The earliest place I can find where this metric is actually
      explicitly calculated is Bob Hjellming's MMA memo 30, where he 
      tabulates Nocc/Ntheo, which is equivalent to the fraction of 
      occupied cells.  However, the idea is certainly older than that, 
      i.e., that one desires "complete" uv coverage in observations 
      (Cornwell maybe instigated this in MMA thinking, but it is 
      probably much older than that, harkening back to the old Ryle 
      idea of complete coverage...).  More recently, Holdaway & Morita 
      have used this measure.  

   2. large-scale "smoothness" of cell population (I borrowed the
      terminology [and description] from Ed Fomalont).

      from Ed's email:

      I suggest 'gridding' the uv plane into large cells, say 10x10 
      original cells for the smaller configurations and 100x100 for the 
      larger configurations, and determining the data weight in each of 
      these big cells.  By data weight I mean the integration time of 
      the data in each of these cells, but other weighting schemes 
      could be used.  

      For a 'good' array, this average uv coverage should be a smoothly
      decreasing function of distance from zero spacing, the smoother 
      the better.  Fit the distribution of these uv densities to the 
      best elliptical Gaussian (maybe something else is better like a 
      density related to the inverse distance from the center).  The 
      rms deviation of the average uv distribution to this best 
      fitting Gaussian is a measure of the overall smoothness of the 
      actual uv coverage.  

      A normalized metric which measures this overall smoothness of the
      uv coverage could be:

            M2 = SUM(i){ [(W(i)-E(i)]**2 } / NT*NT

         W(i) is the weight of data in the ith big cells
         E(i) is the weight of the best fitting elliptical Gaussian to
              the distribution of W(i) over the uv plane.
              (any tapering of the data included before this gridding)
         NT   is the total weight of data.

      Of course, all array uv coverage will have a central hole with a
      size of the diameter of the array telescope.  This hole can or 
      can not be included in the calculation of this metric, whatever 
      is felt most appropriate.

      end of this topic in Ed's email.

      I note here that rather than fitting a elliptical Gaussian, one 
      might wish to use some other function, e.g., uniform (flat),
      Blackman-Harris, Kaiser-Bessel, etc.  Also, as a simplification, 
      one might wish to do azimuthal binning in uv space, to obtain a 
      single radial profile, rather than doing the full 2-D fit and 
      deviation.

   3. detailed (smaller-scale) "lumpiness" of cell population
      (terminology and description again borrowed from Ed).

      from Ed's email:

      For each of the big uv cells which were used in the above 
      smoothness calculation, calculate the following:

            M3 = SUM(i){ SUM(j) {[W(i)/n - w(i,j)]**2 } } / NT*NT

         w(i,j) is the weight of the jth uv cell in the ith big cell
         n    is the number of little cells in each big cell

      end of this topic in Ed's email.

      Another way of defining this is, e.g., calculating something like
      what they did for the VLBA "quality metric" - calculate for each 
      uv cell the distance to the nearest uv data point, square it,
      and sum this over all cells.  Do this for various declinations, 
      then sum over those for the overall metric.  

      An even simpler proxy of this might be to calculate the size of 
      the largest "hole" in the uv plane.  IIRC, Adrian Webster did 
      some of this when looking at designs of the most compact 
      configuration.  

   4. the Visibility SNR (VSNR) curve.

      Defined in Cornwell et al. (1993) [but see also Holdaway 1990].
      Take the FT of the difference image, average in radial bins,
      and divide this into the radially binned FT of the model.
      This is really a hybrid between uv- and image-based metrics, but 
      since what comes out is defined in the uv plane, I put it in here
      with the uv-based metrics.  Note that this metric *is* explicitly
      source structure dependent, unlike the previous 3, and it is a bit
      trickier/more complicated to compute.

image-based Metrics

   I would divide these into 2 subclasses: beam-based, and true 
image-based.

   beam-based

      As with the uv-based metrics, the attractiness of beam-based 
   metrics is that they are not dependent on the source structure, and 
   are also simple to calculate.

      1. amplitude of maximum positive sidelobe.

         Here, there can be a distinction between "near-in" and 
         "far-out" sidelobes if one wishes (and there probably should
         be, I guess?).  Note that the amplitude of the maximum 
         negative sidelobe is defined to be 1/N for N antennas - at 
         least in the case of natural weighting.

      2. beam sidelobe rms

         This can be analytically defined in uv space also, so is kind 
         of a hybrid between a uv- and a beam-based metric (see e.g.,
         Cornwell 1984).  As for the maximum sidelobe, the "near-in" 
         vs. "far-out" distinction can be made.

      3. how close is the central lobe of the beam to a Gaussian?

      4. what is the relative amount of "power" in the central lobe
         to that in the sidelobes?

   image-based

      In my opinion, the attractiness of image-based metrics is that 
   they are more intuitive to us as astronomers, i.e., we are 
   accustomed to dealing with images, and used to seeing the errors 
   associated with, e.g., incomplete uv plane sampling.  We are also 
   accustomed to dealing with some of these metrics (the dynamic range, 
   e.g.) directly.  Also, at least some of them tend to be simple to 
   calculate.

      1. Dynamic Range

         Usually the astronomer defines this as the peak in the image
         to the off-source rms.  One could also do this using the 
         on-source rms, but that would be done separately to the 
         off-source calculation (it makes no sense to me to combine the 
         two).

         We have had some discussion about how to define "on-source" 
         vs. "off-source".  I must admit that I don't see the problem 
         in this - we have the models which went into the simulations, 
         so just pick some level which is less than the desired final 
         dynamic range (IIRC, we've spec'ed this at 10^6) and define 
         those pixels in the model with flux density > that cutoff as 
         "on-source".  This is no more arbitrary than the definition of 
         the models which go into the simulation in the first place, 
         IMHO.

         As a subclass of this, it might also be interesting to find
         the peak (both positive and negative) off-source, in addition 
         to the rms (since [at least with VLA data] the off-source 
         noise is often non-Gaussian).  This gives some indication of 
         the possibility of "false-detections."

      2. Fidelity

         A generic description of this quantity is that it is defined 
         for a given pixel as the ratio of the flux density in the input
         model at that pixel to the absolute value of the flux density 
         in the difference image (the input model [convolved to the 
         correct resolution] minus the simulated/restored image) at 
         that pixel.  In general, this only makes sense for "on-source" 
         pixels, and in practice, this might involve some lower level 
         cutoff in the difference image.

         Now, combinations over pixels can be formed, in order to come
         up with one (or a few) numbers which attempt to quantify the
         whole image.  In the simplest case, one might take all of the
         "on-source" pixel fidelities, and take the median.  In more
         complicated cases, one could consider taking only pixels above 
         some flux density (probably ratioed to the peak), and 
         calculating the median of that set of pixels - repeat this for 
         many different levels and a histogram can be constructed.  
         I would also suggest that in each of these histogram bins we 
         calculate the min fidelity as well as the median.  Mark 
         Holdaway has suggested a possible variant of this which he 
         calls "moment fidelity" - in this case, the fidelity is 
         weighted by the flux density at that pixel in the convolved 
         model:
            f_i = w_i * model / abs( model - reconstruction )
         where w_i = F_i / sum_j{F_j}, i.e., w_i is the weighted flux
         density in pixel i (normalized by the flux density in all
         pixels).  Another possibility is to take the mean fidelity at 
         different spatial scales - this allows you to find, e.g., 
         striping (as the logical conclusion of this, just take the FT 
         of the difference image and analyze that).

      3. fractional error

         Just take, for each "on-source" pixel, the fractional error as 
         the inverse of the fidelity.  Then, quantities as described 
         above for fidelity can be calculated in a similar way.  This 
         avoids a divide by 0 problem when the reconstructed image is 
         exactly equal to the convolved model (infinite fidelity).

      4. Dave Woody has made a suggestion:

         What about doing a simple linear fit of the (diff-map)^2 to
         A + B*(original simulation image)^2 ?
         1/sqrt(B) would be interpreted as the fidelity, i.e., the 
         errors in the map that are proportional to the image. 
         1/sqrt(A) would be the "off-source" dynamic range.
         This fit should not be computationally time consuming or 
         difficult to code.

      5. ability to distinguish near-by multiple sources

         This is a metric that we haven't really discussed before, but
         is a very standard one in the discussion of filters in signal 
         processing.  The point is that even though two configurations 
         may have the same "resolution" (which we generally take as the
         full-width of the best fit Gaussian to the central lobe of the 
         synthesized beam), one may still be better than the other at 
         distinguishing two very near-by point sources.  One might be 
         able to analytically define this in uv space, but it would be 
         relatively easy to make a simulation image which would test
         this property (a modification of John Conway's DOTS image,
         with more well-defined [rather than random] point source
         placement).  The metric which comes out of this is the minimum
         detectable separation for two point sources.

         A modification of this test is to have one of the point sources
         be much stronger than the other (10000:1 or even more?).  Also,
         issues of whether having the point sources centered on pixels 
         or not could be explored.

      6. random uv generation

         Mel Wright used a method where he randomly sampled the uv
         plane with some number of data points, then compared that to
         the original image (with both point source and eye chart 
         models).  This suggests to me a possible metric, where a given 
         configuration is compared against either completely random uv 
         sampling, or possibly random antenna placement.

Recommendations

My recommendation is to use the following set of metrics:

 1 - All uv-based except the VSNR, and numbers 1 and 2 of the 
     beam-based.

     I think we should do these because they are easy, and they are 
     source independent.  We can sort out the details after agreeing 
     that we actually want to calculate them (e.g., exactly how to pick 
     size and extent of uv cells, details of the "smoothness" and 
     "lumpiness" metrics, where to set the cutoff between near-in and 
     far-out sidelobes if we want to, etc.).

     The real reason to include these is to see if there is a uv-based 
     metric that is a good proxy for the image-based metrics.  In this 
     way, a uv-based metric might be identified that was as good at 
     distinguishing as any of the image-based ones, and the benefit of 
     source stucture independence would then be retained through its
     use.

 2 - Dynamic range (using off-source rms).

     The reason to include this is because it is something that 
     astronomers are used to, and is the only metric in real 
     observations that means anything (because we cannot measure true 
     "fidelity" in reality, e.g.).  We can decide on details after
     agreeing to calculate this beast (specifically how to specify 
     "on-source" vs. "off-source", e.g.).

 3 - Histogram of fractional error vs. pixel flux density.

     The reason to include this is because it is probably the best 
     indicator of true imaging quality.  The problem is that the metric 
     is heavily biased toward the particular source being modeled (we
     have already discussed the impact of this WRT short spacings).
     Note that I prefer fractional error instead of fidelity, due to 
     reasons we have already discussed.  We can decide on details after
     agreeing to calculate this beast (how many bins, how to specify
     them, etc.).

We should use these in a first iteration, then see if there is one (or
a few) that are particularly good at indicating "quality".  This is a 
bit slippery, as it is unclear how to absolutely define quality (which
is why we are having all of this extended discussion in the first
place), but I think we should start with a larger set of possible 
metrics and gain some experience with them, and then narrow it down at 
a (not so far away) future date.

References

Ryle & Hewish, The Syntheis of Large Radio Telescopes, MNRAS, 1960

Harris, On the Use of Windows for Harmonic Analysis with the Discrete
   Fourier Transform, Proc. IEEE, 66, 51-83, 1978

Mutel & Gaume, A Design Study for a Dedicated VLBI Array, 
   VLBA Memo 84, 1982

Walker, Fast Quality Measure, VLBA Memo 144, 1982

Cornwell, Quality Indicators for the MM Array, MMA Memo 18, 1984

Hjellming, The 90 meter Configuration of the Proposed NRAO mm Array,
   MMA Memo 30, 1985

Cornwell, Crystalline Antenna Arrays, MMA Memo 38, 1986

Holdaway, Imaging Characteristics of a Homogeneous Millimeter Array, 
   MMA Memo 61, 1990

Holdaway, Evaluating the MMA Compact Configuration Designs, 
   MMA Memo 81, 1992

Cornwell, Holdaway, & Uson, Radio-interferometric imaging of very large 
   objects: implications for array design, A&A, 271, 697-713, 1993

Holdaway, Foster, & Morita, Fitting a 12km Configuration on the 
   Chajnantor Site, MMA Memo 153, 1996

Holdaway, What Fourier Plane Coverage is Right for the MMA?, 
   MMA Memo 156, 1996

Keto, The Shapes of Cross-Correlation Interferometers, ApJ, 475, 
   843-852, 1997

Holdaway, Effects of Pointing Errors on Mosaic Images with 8m, 12m, and 
   15m Dishes, MMA Memo 178, 1997

Helfer & Holdaway, Design Concepts for Strawperson Antenna 
   Configurations for the MMA, MMA Memo 198, 1998

Holdaway, Hour Angle Ranges for Configuration Optimization, 
   MMA Memo 201, 1998

Wright, Image Fidelity, BIMA memo 73, 1999

there are a couple of more obscure references to Morita's work that I 
couldn't get proper references for or full copies of but which probably 
have relevant information:

Morita, Array Configuration of Large Radio Interferometers for 
   Astronomical Observations, National Astronomical Observatory, 
   NRO-TR-56, 1997

Morita, Ishiguro, & Holdaway, Array Configuration of the Large 
   Millimeter and Submillimeter Array (LMSA), URSI-GA, 1996