[fitsbits] start of Public Comment Period on compressed FITS image and tables

William Pence William.Pence at nasa.gov
Mon Jul 6 11:08:52 EDT 2015


I'm confused.  If you are talking about the details of a new type of 
multi-threaded compression algorithm, this is not something we would 
want to try to implement immediately, so this is not really relevant to 
the current discussion about the compression convention itself.  We 
could perhaps continue this discussion offline...

But on the other hand, the current convention as described does record 
all the keywords necessary to determine which rows and columns of pixels 
are stored in each tile.  This allows software to skip over the tiles 
that are not needed when reading just a part of the image or table.

-Bill

On 7/6/2015 1:27 AM, van Nieuwenhoven, Richard wrote:
> Hi,
>
> maybe it would be good also to include the number of rows. So one
> unsigned integer for the size and one unsigned short for the number of
> rows. In case the program only needs to read a part of the table/image
> it can just fast forward over the blocks that would be skipped anyway.
>
>       Ritchie
>
> Am 2015-07-02 um 06:37 schrieb van Nieuwenhoven, Richard:
>> Yes, by defining a "blocked" variant of every compressesion type. Or
>> just add a prefix/suffix to the compression algorithem identifier, that
>> way the the case of a new compression type is also clearly defined. The
>> amount of blocks is free for the user to define but normally 16 would be
>> sufficient for most cases.
>>
>> In the attachment, as requested, a visual description.
>>
>>      Ritchie
>>
>>
>> Am 2015-07-01 um 22:52 schrieb William Pence:
>>> The most obvious way to make use of multiple cores with tile compressed
>>> images is to assign a different core to each tile and then uncompress
>>> multiple tiles in parallel.  CFITSIO does not currently make use this
>>> technique, but it could be done.
>>>
>>> If I understand correctly, you are suggesting that it might also be
>>> beneficial to be able to use multiple cores when uncompressing a single
>>> tile.  This probably could be done and would only require defining one
>>> or more new compression algorithms that support multiple cores.
>>>
>>> -Bill
>>>
>>>
>>> On 7/1/2015 3:21 PM, van Nieuwenhoven, Richard wrote:
>>>> OK, on request of Tom I did some programming to test the benefits of
>>>> using blocked compression.
>>>> Using Java I have thrown togetheran very raw and basic implementation.
>>>> The results are very prommesing.
>>>>
>>>> What I did was a very simple extension of the current compression
>>>> system. The difference is that I wrote an uncompressed integer
>>>> (containing the block size before the compressed data and continue
>>>> that till there is a 0 length.
>>>>
>>>> The speed gain useing the join fork pattern (every block is
>>>> decompressed by parallel threads) was 33% per extra core, without any
>>>> optimalisation my pc compressed and decompressed 3 times faster.
>>>> Probably with a more sofisticated implementation there should be more
>>>> to gain.
>>>>
>>>> Wenn we specify in the standard that the blocks must be on row
>>>> boundary, the row construction can also be done in parallel.
>>>>
>>>> A non parallel implementation would still be very similar to the
>>>> standard decompression.
>>>>
>>>> any thoughts on this?
>>>>
>>>>         Ritchie
>>>> ________________________________________
>>>> From: fitsbits [fitsbits-bounces at listmgr.nrao.edu] on behalf of van
>>>> Nieuwenhoven, Richard [Richard.vanNieuwenhoven at adesso.at]
>>>> Sent: Friday, June 26, 2015 7:32 AM
>>>> To: fitsbits at nrao.edu
>>>> Subject: Re: [fitsbits] start of Public Comment Period on compressed
>>>> FITS image and tables
>>>>
>>>> As a programmer there is another concern, the fits file can get very big
>>>> and will become even bigger in future. Today's computers gain more power
>>>> by using more cores instead of more speed per core. So It would be good
>>>> if the standard "helps" in the use of multiple cores to process and
>>>> decompress the fits files.
>>>>
>>>> The use of tiles already helps a lot because they can be handled in
>>>> parallel. But the compression algorithms does not help at all because
>>>> most of them can not use multiple cores to do the job.
>>>>
>>>> One possibility to get around this is to use blocks of compressed data
>>>> and every block is compressed in itself. Or to have some kind of index
>>>> with multiple entry points into the compressed data. This will be
>>>> difficult to bring in line with the currently used compressions. A
>>>> simple solution for that could be to add a bocked version of every
>>>> compression type, this then uses a predefined block size.
>>>>
>>>> A minor concern is that it would help if there was some kind of index or
>>>> other way to have jump points to the separate hdu's. Currently this is
>>>> only possible by calculating the size from the header data and then
>>>> jumping over the body to the next hdu. This could be solved by adding a
>>>> special index hdu to the end of the file where the entry points of the
>>>> different hdu's are stored.
>>>>
>>>> These suggestions would enable software to process the fits files a lot
>>>> faster and as the trend goes on, more cores but not much more speed per
>>>> core, the standard should prepare for it.
>>>>
>>>>        Ritchie
>>>>
>>>>
>>>>
>>>> Am 2015-06-25 um 16:38 schrieb Tom McGlynn (NASA/GSFC Code 660.1):
>>>>> While I'm generally supportive of the compression proposal (at least for
>>>>> images), I feel that the current text reflects the sense of this being a
>>>>> convention rather than part of the standard.  By this I mean that if we
>>>>> are going to support compressed images and tables then they should be
>>>>> incorporated into the standard as first class objects.  The current text
>>>>> makes it clear that these compressed HDU's are compressed
>>>>> representations of virtual uncompressed images and tables.  Implicitly
>>>>> the idea is the the user converts from the compressed image to the
>>>>> uncompressed version and then processes that.  Instead we should
>>>>> recognize that a compressed image is just one of the ways that FITS
>>>>> allows one to store an image just like a pimary image array, an
>>>>> extension image or vector value in a table.
>>>>>
>>>>> So I would suggest that the ZSIMPLE, ZEXTEND, ZBLOCKED and such keywords
>>>>> be made optional with wording something like:  "If a compressed image is
>>>>> being used to compress an existing FITS image extension, the ZXTENSION
>>>>> keyword MAY be used to contain the value of the original extension."
>>>>> I'd suggest that in future use the use of these keywords be discouraged.
>>>>>
>>>>> The recommended practice would be that users treat the compressed image
>>>>> as the image and not worry about some
>>>>> intermediate image representation.
>>>>>
>>>>>
>>>>> My second major concern with with this convention is that it does seem
>>>>> rather ad hoc.  I think that it would be much better if the proposal was
>>>>> rigorously separated the algorithmic aspects from the non-algorithmic
>>>>> elements.  A mechanism for how additional compression techniques could
>>>>> be added should be notedE.g., the discussion of quantization should part
>>>>> of the implementation of the lossy compression algorithms and the
>>>>> ZQUANTIZ parameter should  probably be  one of the ZVALn, ZNAMEn
>>>>> elememts.  Table 36 should titled something like:
>>>>>     Supported Compression Algorithms
>>>>> with the first column being the name of the compression algorithm, the
>>>>> second the value of ZCMPTYPE, and also including the ZNAMEs using
>>>>> (flagging the critical ones).
>>>>>
>>>>> I've almost no insight into table compression.  Given that no one seems
>>>>> to be using this convention, my suggestion would be that it's premature
>>>>> to add to the standard.
>>>>>
>>>>> Overall I suspect that the tiling capabilities are going to be
>>>>> increasing essential for handling large images, so that at least that
>>>>> much needs to be made part of the standard.  However I don't feel this
>>>>> text is ready to be finalized.
>>>>>
>>>>>       Regards
>>>>>       Tom McGlynn
>>>>>
>>>>> Lucio Chiappetti wrote:
>>>>>> ANNOUNCEMENT:  START OF FORMAL PUBLIC COMMENT PERIOD
>>>>>>
>>>>>> This is to announce the official start of a 3-week formal Public
>>>>>> Comment Period on the incorporation of the Tiled Image Compression and
>>>>>> Tiled Table Compression conventions in the FITS Standard.
>>>>>>
>>>>>> This is part of a process to incorporate the most useful and widely
>>>>>> used registered conventions (which are valid FITS constructs) into the
>>>>>> official definition of the standard.
>>>>>>
>>>>>> Among these the two compression conventions benefit of a common
>>>>>> handling. Given their relative complexity they are better discussed
>>>>>> first, before other easier conventions.
>>>>>>
>>>>>> The proposed text consists
>>>>>>
>>>>>> - in the ADDITION of an entire new chapter (10)  to the FITS Standard
>>>>>>     Document which describes the two conventions in a common
>>>>>> prescriptive
>>>>>>     framework.
>>>>>> - It also includes the ADDITION of a new non-prescriptive Appendix I,
>>>>>> - plus the addition of the necessary bibliographic references,
>>>>>>
>>>>>>     and has been prepared by a technical team including L.Chiappetti,
>>>>>>     W.Pence, A.Dobrzycki, R.A.Shaw and W.Thompson (main editor Dick
>>>>>> Shaw).
>>>>>>
>>>>>> - If the proposal is approved also Appendix C will be updated
>>>>>> listing the
>>>>>>     new keywords, and a section H.3 will be added to Appendix H
>>>>>> describing
>>>>>>     the updates and the differences with the registered convention.
>>>>>>
>>>>>>     All the updates are shown in blue colour in their current context
>>>>>> (with
>>>>>>     the exception of the NEW chapter 10 which is black)
>>>>>>
>>>>>> The proposed draft text is available at
>>>>>> http://sax.iasf-milano.inaf.it/~lucio/FITS/Conventions/compression-upd2.pdf
>>>>>>
>>>>>>
>>>>>>
>>>>>> Supporting material is provided in the FITS Convention Registry at the
>>>>>> following URLs
>>>>>> http://fits.gsfc.nasa.gov/registry/tilecompression.html
>>>>>> http://fits.gsfc.nasa.gov/registry/tiletablecompression.html
>>>>>>
>>>>>> Considering that the convention(s) have been in use since several
>>>>>> years, are legal FITS, were discussed on FITSBITS when the conventions
>>>>>> were entered in the Registry and therefore their usage is well proven
>>>>>> (also for what interoperability is concerned), the Public Comment
>>>>>> Period is reduced to 3 weeks.
>>>>>>
>>>>>> Also the review by FITS Working Group Executive can be speeded up and
>>>>>> handled in parallel or quickly after the conclusion of the Public
>>>>>> Comment Period.
>>>>>>
>>>>>> Please review the text carefully and post any comments, criticisms, or
>>>>>> suggestions on the FITSBITS mailing list (not on iauwfg or elsewhere)
>>>>>> ==================================================================
>>>>>>
>>>>>> The Public Comment Period starts today 16 June 2015 and will last
>>>>>> formally for 3 weeks until July 6
>>>>>>
>>>>>> ==================================================================
>>>>>> Background information on the FITS approval process
>>>>>>
>>>>>> Under the "Rules and Procedures" of the IAU FITS Working Group,
>>>>>> http://fits.gsfc.nasa.gov/iaufwg/iaufwg_rules.html, the first step in
>>>>>> the official approval process of any FITS proposal will be a formal
>>>>>> Public Comment Period to take place on the FITSBITS mailing list.
>>>>>> After that the IAU FITS Working Group Executive will review the
>>>>>> results. Following that the IAU FITS Working Group will then conduct a
>>>>>> final vote to approve or disapprove the proposal.
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> fitsbits mailing list
>>>>> fitsbits at listmgr.nrao.edu
>>>>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>>>
>>>>
>>>> --
>>>> BSc Richard van Nieuwenhoven
>>>> Software Architekt
>>>>
>>>> adesso Austria GmbH
>>>> floridotower 26. Stock              T +43 1 2198790-0
>>>> Foridsdorfer Hauptstr. 1            F +43 1 2198790-13
>>>> A-1210 Wien                         H +43 664 88614710
>>>>                                       E richard.vannieuwenhoven at adesso.at
>>>>                                       www.adesso.at
>>>> -------------------------------------------------------------
>>>>            >>> business. people. technology. <<<
>>>> -------------------------------------------------------------
>>>> adesso Austria GmbH mit Sitz in Wien
>>>> Handelsgericht Wien FN231467v
>>>>
>>>> _______________________________________________



More information about the fitsbits mailing list