[Difx-users] Debugging a Nightmare problem

Richard Dodson richard.dodson at uwa.edu.au
Wed Aug 24 05:01:33 EDT 2016


Hi Adam

 As I say this is a mess. The first TianMa, ATCA & KaVA observations. The
Australians, the Chinese and the Japanese all have their own `unique'
systems. Then these (except ATCA) have been extracted for the KJJCC and
hardware correlated. This is the data that was exported for use with DiFX.

At least 3 conversions between this file and the sky, all of which could be
wrong.

The BW should be 32MHz. 8IFs of L pol. T6 and KaVA with different
sidebands. ATCA with 64MHz and dual pol (so only 50% coverage).

So VDIF_1280-1024-8-2  is what I have been using. You say "which you supply
to the v2d file". In which place? As the FORMAT field? I have used VDIF --
is this wrong?

As an aside the conversion to VDIF was wrong (in invalid flag, day(!) and
no of sidebands). These I _think_ I have fixed, but using tools I don't
understand.

I made spectra, (m5spec) of a few of the files and they looked OK.

I will get back to this later (tonight?). Juggling events. I'll check
bandpasses for a number of possible setups. If the Bpass looks right I will
get the correct filters have been used.

   Thanks for the help. I am at sea at the mo'

         Richard

On Wed, Aug 24, 2016 at 4:20 PM, Adam Deller <deller at astron.nl> wrote:

> Hi Richard,
>
> I have a few observations for you:
>
> * Nothing strange in the file at a first glance - countVDIFPackets and
> printVDIF are happy with it.  It is 2 bit data.  Frame size is 1312 bytes,
> and the number of frames per second indicates that this is 1 Gbps data.
> * Using printVDIFheader tells me there are 8 channels in the single VDIF
> thread.  Combined with the other info, that implies the bandwidth per
> subband is 32 MHz? So then the format name (which you supply to the v2d
> file and hence the .input file) should be VDIF_1280-1024-8-2, I think.
>
> However, I then get funny results when I try to unpack the data using m5d
> and that format name.  It's happy for a while, and then starts to give
> unpack errors (which one usually gets if one mucks up the format name).  If
> I instead say the number of channels is 1 (so VDIF_1280-1024-1-2), which
> would mean a single 256 MHz wide channel, then it unpacks happily.
>
> So what's the deal with the number of subbands?  I think something is
> wrong somewhere, either 8 has been written into the header where 1 should
> have been, or something else like that.
>
> Cheers,
> Adam
>
> On Wed, Aug 24, 2016 at 4:31 AM, Richard Dodson <richard.dodson at uwa.edu.au
> > wrote:
>
>> Hi Adam
>>
>> vdifsummary seems to be a file in ~/Util in oper as KASI. I guess it is
>> something that Jan wrote. I will check.
>>
>> countVDIF is slow (took all night to finish) &  I should have looked at
>> thread 1 not 0 (correct?). It is now running for 1. Nothing to note so far
>> eg:
>>
>> For thread 1, at second 39896, read 29300000 frames, spotted 0 missing
>> frames
>> The start of the VDIF file (1GB) is at:
>>  http://ict.icrar.org/store/staff/rdodson/k16mk02f_ktn_start.vdif
>>
>>   Thanks for your help
>>      Richard
>>
>>
>>
>>
>> On Mon, Aug 22, 2016 at 6:18 PM, Adam Deller <deller at astron.nl> wrote:
>>
>>> Hi Richard,
>>>
>>> Looks like there is a problem mid-file, and when it tries to re-sync the
>>> header it finds is corrupted.  I can suggest a couple of things to try:
>>>
>>> you can run countVDIFpackets (a utility in vdifio) which is probably
>>> slower than vdifsummary (what utility is this?  I'm not aware of a
>>> "vdifsummary", there is a "vsum"...?) and is pretty basic but actually does
>>> check for every packet, and prints a message every time a problem is seen.
>>> That might give you some extra clues, so I'd try that first.  And if you
>>> really want to get blasted away by lots of logging, you can use printVDIF,
>>> which prints a little summary of each and every packet header.  You could
>>> pipe that to grep to look for anomalies.
>>>
>>> Looks like the problem is very early in the file, so if you dd the first
>>> second or so and put it on an ftp server somewhere, I could also take a
>>> look.
>>>
>>> Cheers,
>>> Adam
>>>
>>> On Mon, Aug 22, 2016 at 10:57 AM, Richard Dodson <
>>> richard.dodson at uwa.edu.au> wrote:
>>>
>>>> Dear All
>>>>
>>>>  I have one of the usual nightmare twisted DiFX correlation problems.
>>>>
>>>>  I am trying to use DiFX on VDIF data which has been copied off the
>>>> VERA OCTAVE systems (and similar) and converted.
>>>>
>>>>   The problem is almost certainly in the data copying -- but I need to
>>>> provide some feedback on what is wrong for it to be fixed
>>>>
>>>>   The first problem that I found was in the VDIF file: all the invalid
>>>> flags were set, the number of channels was wrong and the date was wrong by
>>>> 1 day. :(
>>>>
>>>>   Jan has a program to fix all of these :) -- but he is not around to
>>>> check if I have used this correctly :( :(
>>>>
>>>>    After these fixes the correlation runs, but the data file is empty.
>>>> What messages should I be checking to work out what is happening? I append
>>>> some messages which look suspicious but don't convey any information to me
>>>> ...
>>>>
>>>>         All the best
>>>>             Richard
>>>>
>>>> Comments:
>>>>   vdifsummary reports seem OK
>>>>
>>>> # vdifsummary /lustre/kjcc/k16mk02f/MIZ/k16mk02f_kava_miz.vdif
>>>> [1:1] check k16mk02f_kava_miz.vdif -> Good! it is a VDIF data scan ->
>>>> add to 1
>>>> k16mk02f_kava_miz.vdif   4,108,790,400,000   31317 sec( 8:41:57)
>>>> 57467 Mar 20 2016y080d 11:00:03 - 19:41:59  1312 100000
>>>> 3,827 GB(=  3.7 TB)(= 4,108,790,400,000 B)
>>>>
>>>> Log messages which might be relevant:
>>>>
>>>> 2016-08-22 16:30:32,548 DiFXAlert INFO    MPI[ 1] compute-0-28.local
>>>> k16mk02f_1   Datastream 1 has opened file index 0, which was
>>>> /lustre/kjcc/k16mk02f/MIZ/k16mk02f_kava_miz.vdif
>>>>
>>>> 2016-08-22 16:30:32,548 DiFXAlert VERBOSE MPI[ 2] compute-0-28.local
>>>> k16mk02f_1   input.bad() is 0, input.fail() is 0
>>>>
>>>> 2016-08-22 16:30:32,700 DiFXAlert ERROR   MPI[ 1] compute-0-28.local
>>>> k16mk02f_1   Lost Sync on segment 1! Will attempt to resync. Deltatime was
>>>> -1.13239e+09
>>>>
>>>> 2016-08-22 16:30:32,701 DiFXAlert INFO    MPI[ 1] compute-0-28.local
>>>> k16mk02f_1   Config has changed!
>>>>
>>>> 2016-08-22 16:30:32,702 DiFXAlert INFO    MPI[ 1] compute-0-28.local
>>>> k16mk02f_1   After regaining sync, the frame start day is 70573, the frame
>>>> start seconds is 70631, the frame start ns is -2147483648, readscan is 2,
>>>> readseconds is 1132388471, readnanoseconds is -2147483648
>>>>         note the 2^31 values !!!!
>>>>
>>>> _______________________________________________
>>>> Difx-users mailing list
>>>> Difx-users at listmgr.nrao.edu
>>>> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>>>>
>>>>
>>>
>>>
>>> --
>>> !=============================================================!
>>> Dr. Adam Deller
>>> Ph  +31 521595785 / Fax +31 521595101
>>> Staff Astronomer, Astronomy Group
>>> ASTRON, Oude Hoogeveensedijk 4
>>> 7991 PD Dwingeloo, The Netherlands
>>> !=============================================================!
>>>
>>
>>
>>
>> --
>> -------------------------
>> Dr Richard Dodson,
>> International Centre for Radio Astronomy Research
>> University of Western Australia
>> P: +8 6488 7842 E: richard.dodson at icrar.org
>>
>
>
>
> --
> !=============================================================!
> Dr. Adam Deller
> Ph  +31 521595785 / Fax +31 521595101
> Staff Astronomer, Astronomy Group
> ASTRON, Oude Hoogeveensedijk 4
> 7991 PD Dwingeloo, The Netherlands
> !=============================================================!
>



-- 
-------------------------
Dr Richard Dodson,
International Centre for Radio Astronomy Research
University of Western Australia
P: +8 6488 7842 E: richard.dodson at icrar.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20160824/91354423/attachment.html>


More information about the Difx-users mailing list