[alma-config] thoughts on simulation metrics
Bryan Butler
bbutler at aoc.nrao.edu
Mon Jan 29 16:47:47 EST 2001
all,
please find below a summary of metrics discussions from my perspective,
with my own comments and thoughts interjected.
i make some recommendations at the end. if you disagree strongly with
one of the suggested metrics, or are upset that some other metric isn't
included (or even discussed), let me (and the group) know.
i think we should come to an agreement by the end of the phone
telecon on wednesday if at all possible.
-bryan
Introduction
We now have at least 3 methods of simulating ALMA imaging with
different configurations (Heddle's in AIPS, Viallefond/Guilloteau's in
GILDAS, and Holdaway/Morita's in SDE), with more possibly on the
horizon (AIPS++ hopefully). I may have left some out (miriad, e.g.),
but the point is just that we do have the capability now to do some
real imaging simulations and compare the results.
The question now is how do we make comparisons between the different
arrays, given the results of the simulations? One way, which has been
already used to some extent on the results of Heddle's simulations, is
to simply qualitatively examine the images and differenced images, and
make statements regarding the relative "quality" of the different
configurations based on that. All along, we have also had the idea
that eventually we come up with some standard _metrics_ which would be
quantitatively based and would be used for true comparison. We seem,
however, to have gotten bogged down in the details, and have failed to
come to any consensus as to which metric (or combination of them) to
use. This is partly because nobody has been willing to assume the
leadership in this respect, and so we have waffled along with little
progress.
Separately from this, there is the fact that a better metric of this
type does not necessarily imply a better overall configuration
design - most importantly because of what we have been calling
"operational issues." I don't know how to rationally address that
issue, because it quickly degenerates into a subjective and political
one, and is hard to get a good handle on. However, I will note that we
have had no good exposition of exactly what "operational issues" really
entails in its entirety (the one issue that gets stressed is how often
we move antennas, but what are the others?), much less whether these
issues favor one configuration type over another. Even after such an
exposition, it is still useful to have the information on which is the
better configuration "scientifically" (I use the term broadly), so that
some weighting of the relative importance of operational vs. scientific
quality can be specified, and a final decision on configuration type
can be made. I note that there are other issues which affect the
scientific quality of a configuration design which cannot be strictly
set down as a quantitative measure in the way that I'm discussing here
(as a simple e.g., repeatability of antenna positions for monitoring
observations), and they will have to be folded into the discussion in
the end. At any rate, I think it still behooves us to come up with the
quantitative metrics as soon as possible, and hence I will address some
of them here, and then make a real suggestion at the end.
Types of Metrics
They generally fall into 2 categories: uv-based metrics, and
image-based metrics. We have been recently concentrating on the
image-plane ones, but I don't think we should necessarily completely
forget about the uv-based ones.
uv-based Metrics
In my opinion, the attractiness of uv-based metrics is that they are
not dependent on the source structure, but rather only on the source
declination and hour angle range of observation. Also, they tend to
be simple to calculate.
1. fraction of occupied cells
Simply count the number of occupied cells in the uv space for a
given observation. The selection of the uv cell size is
somewhat arbitrary, but should probably be roughly the half the
antenna diameter?
The earliest place I can find where this metric is actually
explicitly calculated is Bob Hjellming's MMA memo 30, where he
tabulates Nocc/Ntheo, which is equivalent to the fraction of
occupied cells. However, the idea is certainly older than that,
i.e., that one desires "complete" uv coverage in observations
(Cornwell maybe instigated this in MMA thinking, but it is
probably much older than that, harkening back to the old Ryle
idea of complete coverage...). More recently, Holdaway & Morita
have used this measure.
2. large-scale "smoothness" of cell population (I borrowed the
terminology [and description] from Ed Fomalont).
from Ed's email:
I suggest 'gridding' the uv plane into large cells, say 10x10
original cells for the smaller configurations and 100x100 for the
larger configurations, and determining the data weight in each of
these big cells. By data weight I mean the integration time of
the data in each of these cells, but other weighting schemes
could be used.
For a 'good' array, this average uv coverage should be a smoothly
decreasing function of distance from zero spacing, the smoother
the better. Fit the distribution of these uv densities to the
best elliptical Gaussian (maybe something else is better like a
density related to the inverse distance from the center). The
rms deviation of the average uv distribution to this best
fitting Gaussian is a measure of the overall smoothness of the
actual uv coverage.
A normalized metric which measures this overall smoothness of the
uv coverage could be:
M2 = SUM(i){ [(W(i)-E(i)]**2 } / NT*NT
W(i) is the weight of data in the ith big cells
E(i) is the weight of the best fitting elliptical Gaussian to
the distribution of W(i) over the uv plane.
(any tapering of the data included before this gridding)
NT is the total weight of data.
Of course, all array uv coverage will have a central hole with a
size of the diameter of the array telescope. This hole can or
can not be included in the calculation of this metric, whatever
is felt most appropriate.
end of this topic in Ed's email.
I note here that rather than fitting a elliptical Gaussian, one
might wish to use some other function, e.g., uniform (flat),
Blackman-Harris, Kaiser-Bessel, etc. Also, as a simplification,
one might wish to do azimuthal binning in uv space, to obtain a
single radial profile, rather than doing the full 2-D fit and
deviation.
3. detailed (smaller-scale) "lumpiness" of cell population
(terminology and description again borrowed from Ed).
from Ed's email:
For each of the big uv cells which were used in the above
smoothness calculation, calculate the following:
M3 = SUM(i){ SUM(j) {[W(i)/n - w(i,j)]**2 } } / NT*NT
w(i,j) is the weight of the jth uv cell in the ith big cell
n is the number of little cells in each big cell
end of this topic in Ed's email.
Another way of defining this is, e.g., calculating something like
what they did for the VLBA "quality metric" - calculate for each
uv cell the distance to the nearest uv data point, square it,
and sum this over all cells. Do this for various declinations,
then sum over those for the overall metric.
An even simpler proxy of this might be to calculate the size of
the largest "hole" in the uv plane. IIRC, Adrian Webster did
some of this when looking at designs of the most compact
configuration.
4. the Visibility SNR (VSNR) curve.
Defined in Cornwell et al. (1993) [but see also Holdaway 1990].
Take the FT of the difference image, average in radial bins,
and divide this into the radially binned FT of the model.
This is really a hybrid between uv- and image-based metrics, but
since what comes out is defined in the uv plane, I put it in here
with the uv-based metrics. Note that this metric *is* explicitly
source structure dependent, unlike the previous 3, and it is a bit
trickier/more complicated to compute.
image-based Metrics
I would divide these into 2 subclasses: beam-based, and true
image-based.
beam-based
As with the uv-based metrics, the attractiness of beam-based
metrics is that they are not dependent on the source structure, and
are also simple to calculate.
1. amplitude of maximum positive sidelobe.
Here, there can be a distinction between "near-in" and
"far-out" sidelobes if one wishes (and there probably should
be, I guess?). Note that the amplitude of the maximum
negative sidelobe is defined to be 1/N for N antennas - at
least in the case of natural weighting.
2. beam sidelobe rms
This can be analytically defined in uv space also, so is kind
of a hybrid between a uv- and a beam-based metric (see e.g.,
Cornwell 1984). As for the maximum sidelobe, the "near-in"
vs. "far-out" distinction can be made.
3. how close is the central lobe of the beam to a Gaussian?
4. what is the relative amount of "power" in the central lobe
to that in the sidelobes?
image-based
In my opinion, the attractiness of image-based metrics is that
they are more intuitive to us as astronomers, i.e., we are
accustomed to dealing with images, and used to seeing the errors
associated with, e.g., incomplete uv plane sampling. We are also
accustomed to dealing with some of these metrics (the dynamic range,
e.g.) directly. Also, at least some of them tend to be simple to
calculate.
1. Dynamic Range
Usually the astronomer defines this as the peak in the image
to the off-source rms. One could also do this using the
on-source rms, but that would be done separately to the
off-source calculation (it makes no sense to me to combine the
two).
We have had some discussion about how to define "on-source"
vs. "off-source". I must admit that I don't see the problem
in this - we have the models which went into the simulations,
so just pick some level which is less than the desired final
dynamic range (IIRC, we've spec'ed this at 10^6) and define
those pixels in the model with flux density > that cutoff as
"on-source". This is no more arbitrary than the definition of
the models which go into the simulation in the first place,
IMHO.
As a subclass of this, it might also be interesting to find
the peak (both positive and negative) off-source, in addition
to the rms (since [at least with VLA data] the off-source
noise is often non-Gaussian). This gives some indication of
the possibility of "false-detections."
2. Fidelity
A generic description of this quantity is that it is defined
for a given pixel as the ratio of the flux density in the input
model at that pixel to the absolute value of the flux density
in the difference image (the input model [convolved to the
correct resolution] minus the simulated/restored image) at
that pixel. In general, this only makes sense for "on-source"
pixels, and in practice, this might involve some lower level
cutoff in the difference image.
Now, combinations over pixels can be formed, in order to come
up with one (or a few) numbers which attempt to quantify the
whole image. In the simplest case, one might take all of the
"on-source" pixel fidelities, and take the median. In more
complicated cases, one could consider taking only pixels above
some flux density (probably ratioed to the peak), and
calculating the median of that set of pixels - repeat this for
many different levels and a histogram can be constructed.
I would also suggest that in each of these histogram bins we
calculate the min fidelity as well as the median. Mark
Holdaway has suggested a possible variant of this which he
calls "moment fidelity" - in this case, the fidelity is
weighted by the flux density at that pixel in the convolved
model:
f_i = w_i * model / abs( model - reconstruction )
where w_i = F_i / sum_j{F_j}, i.e., w_i is the weighted flux
density in pixel i (normalized by the flux density in all
pixels). Another possibility is to take the mean fidelity at
different spatial scales - this allows you to find, e.g.,
striping (as the logical conclusion of this, just take the FT
of the difference image and analyze that).
3. fractional error
Just take, for each "on-source" pixel, the fractional error as
the inverse of the fidelity. Then, quantities as described
above for fidelity can be calculated in a similar way. This
avoids a divide by 0 problem when the reconstructed image is
exactly equal to the convolved model (infinite fidelity).
4. Dave Woody has made a suggestion:
What about doing a simple linear fit of the (diff-map)^2 to
A + B*(original simulation image)^2 ?
1/sqrt(B) would be interpreted as the fidelity, i.e., the
errors in the map that are proportional to the image.
1/sqrt(A) would be the "off-source" dynamic range.
This fit should not be computationally time consuming or
difficult to code.
5. ability to distinguish near-by multiple sources
This is a metric that we haven't really discussed before, but
is a very standard one in the discussion of filters in signal
processing. The point is that even though two configurations
may have the same "resolution" (which we generally take as the
full-width of the best fit Gaussian to the central lobe of the
synthesized beam), one may still be better than the other at
distinguishing two very near-by point sources. One might be
able to analytically define this in uv space, but it would be
relatively easy to make a simulation image which would test
this property (a modification of John Conway's DOTS image,
with more well-defined [rather than random] point source
placement). The metric which comes out of this is the minimum
detectable separation for two point sources.
A modification of this test is to have one of the point sources
be much stronger than the other (10000:1 or even more?). Also,
issues of whether having the point sources centered on pixels
or not could be explored.
6. random uv generation
Mel Wright used a method where he randomly sampled the uv
plane with some number of data points, then compared that to
the original image (with both point source and eye chart
models). This suggests to me a possible metric, where a given
configuration is compared against either completely random uv
sampling, or possibly random antenna placement.
Recommendations
My recommendation is to use the following set of metrics:
1 - All uv-based except the VSNR, and numbers 1 and 2 of the
beam-based.
I think we should do these because they are easy, and they are
source independent. We can sort out the details after agreeing
that we actually want to calculate them (e.g., exactly how to pick
size and extent of uv cells, details of the "smoothness" and
"lumpiness" metrics, where to set the cutoff between near-in and
far-out sidelobes if we want to, etc.).
The real reason to include these is to see if there is a uv-based
metric that is a good proxy for the image-based metrics. In this
way, a uv-based metric might be identified that was as good at
distinguishing as any of the image-based ones, and the benefit of
source stucture independence would then be retained through its
use.
2 - Dynamic range (using off-source rms).
The reason to include this is because it is something that
astronomers are used to, and is the only metric in real
observations that means anything (because we cannot measure true
"fidelity" in reality, e.g.). We can decide on details after
agreeing to calculate this beast (specifically how to specify
"on-source" vs. "off-source", e.g.).
3 - Histogram of fractional error vs. pixel flux density.
The reason to include this is because it is probably the best
indicator of true imaging quality. The problem is that the metric
is heavily biased toward the particular source being modeled (we
have already discussed the impact of this WRT short spacings).
Note that I prefer fractional error instead of fidelity, due to
reasons we have already discussed. We can decide on details after
agreeing to calculate this beast (how many bins, how to specify
them, etc.).
We should use these in a first iteration, then see if there is one (or
a few) that are particularly good at indicating "quality". This is a
bit slippery, as it is unclear how to absolutely define quality (which
is why we are having all of this extended discussion in the first
place), but I think we should start with a larger set of possible
metrics and gain some experience with them, and then narrow it down at
a (not so far away) future date.
References
Ryle & Hewish, The Syntheis of Large Radio Telescopes, MNRAS, 1960
Harris, On the Use of Windows for Harmonic Analysis with the Discrete
Fourier Transform, Proc. IEEE, 66, 51-83, 1978
Mutel & Gaume, A Design Study for a Dedicated VLBI Array,
VLBA Memo 84, 1982
Walker, Fast Quality Measure, VLBA Memo 144, 1982
Cornwell, Quality Indicators for the MM Array, MMA Memo 18, 1984
Hjellming, The 90 meter Configuration of the Proposed NRAO mm Array,
MMA Memo 30, 1985
Cornwell, Crystalline Antenna Arrays, MMA Memo 38, 1986
Holdaway, Imaging Characteristics of a Homogeneous Millimeter Array,
MMA Memo 61, 1990
Holdaway, Evaluating the MMA Compact Configuration Designs,
MMA Memo 81, 1992
Cornwell, Holdaway, & Uson, Radio-interferometric imaging of very large
objects: implications for array design, A&A, 271, 697-713, 1993
Holdaway, Foster, & Morita, Fitting a 12km Configuration on the
Chajnantor Site, MMA Memo 153, 1996
Holdaway, What Fourier Plane Coverage is Right for the MMA?,
MMA Memo 156, 1996
Keto, The Shapes of Cross-Correlation Interferometers, ApJ, 475,
843-852, 1997
Holdaway, Effects of Pointing Errors on Mosaic Images with 8m, 12m, and
15m Dishes, MMA Memo 178, 1997
Helfer & Holdaway, Design Concepts for Strawperson Antenna
Configurations for the MMA, MMA Memo 198, 1998
Holdaway, Hour Angle Ranges for Configuration Optimization,
MMA Memo 201, 1998
Wright, Image Fidelity, BIMA memo 73, 1999
there are a couple of more obscure references to Morita's work that I
couldn't get proper references for or full copies of but which probably
have relevant information:
Morita, Array Configuration of Large Radio Interferometers for
Astronomical Observations, National Astronomical Observatory,
NRO-TR-56, 1997
Morita, Ishiguro, & Holdaway, Array Configuration of the Large
Millimeter and Submillimeter Array (LMSA), URSI-GA, 1996
More information about the Alma-config
mailing list