[fitsbits] start of Public Comment Period on compressed FITS image and tables

Mon Jul 6 15:37:23 EDT 2015

OK, why don't we continue talking about this privately and maybe do some 
experiments to see if this technique proves useful.

-Bill

On 7/6/2015 3:04 PM, van Nieuwenhoven, Richard wrote:
> the thing I am aiming at is to use the blocking for two purposes
> 1 decompression of one tile using mull!tiple threads
> 2 skip a part of the tile if not needed
>
> The normal compressing algorithms can be used bud just on separate series of rows instead of the hole tile at ones.
>
> The blocking could be specified by a suffix or prefix to the algorithm specification.
>
> The integer for the block size an short for the number of row in the block are the only speciality of the blocking typt.
>
> This way the software reading the fits files can use optimal performance from all the threads.
>
>       Ritchie
> ________________________________________
> From: William Pence [William.Pence at nasa.gov]
> Sent: Monday, July 06, 2015 5:08 PM
> To: van Nieuwenhoven, Richard; fitsbits at nrao.edu
> Subject: Re: [fitsbits] start of Public Comment Period on compressed FITS image and tables
>
> I'm confused.  If you are talking about the details of a new type of
> multi-threaded compression algorithm, this is not something we would
> want to try to implement immediately, so this is not really relevant to
> the current discussion about the compression convention itself.  We
> could perhaps continue this discussion offline...
>
> But on the other hand, the current convention as described does record
> all the keywords necessary to determine which rows and columns of pixels
> are stored in each tile.  This allows software to skip over the tiles
> that are not needed when reading just a part of the image or table.
>
> -Bill
>
> On 7/6/2015 1:27 AM, van Nieuwenhoven, Richard wrote:
>> Hi,
>>
>> maybe it would be good also to include the number of rows. So one
>> unsigned integer for the size and one unsigned short for the number of
>> rows. In case the program only needs to read a part of the table/image
>> it can just fast forward over the blocks that would be skipped anyway.
>>
>>        Ritchie
>>
>> Am 2015-07-02 um 06:37 schrieb van Nieuwenhoven, Richard:
>>> Yes, by defining a "blocked" variant of every compressesion type. Or
>>> just add a prefix/suffix to the compression algorithem identifier, that
>>> way the the case of a new compression type is also clearly defined. The
>>> amount of blocks is free for the user to define but normally 16 would be
>>> sufficient for most cases.
>>>
>>> In the attachment, as requested, a visual description.
>>>
>>>       Ritchie
>>>
>>>
>>> Am 2015-07-01 um 22:52 schrieb William Pence:
>>>> The most obvious way to make use of multiple cores with tile compressed
>>>> images is to assign a different core to each tile and then uncompress
>>>> multiple tiles in parallel.  CFITSIO does not currently make use this
>>>> technique, but it could be done.
>>>>
>>>> If I understand correctly, you are suggesting that it might also be
>>>> beneficial to be able to use multiple cores when uncompressing a single
>>>> tile.  This probably could be done and would only require defining one
>>>> or more new compression algorithms that support multiple cores.
>>>>
>>>> -Bill
>>>>
>>>>
>>>> On 7/1/2015 3:21 PM, van Nieuwenhoven, Richard wrote:
>>>>> OK, on request of Tom I did some programming to test the benefits of
>>>>> using blocked compression.
>>>>> Using Java I have thrown togetheran very raw and basic implementation.
>>>>> The results are very prommesing.
>>>>>
>>>>> What I did was a very simple extension of the current compression
>>>>> system. The difference is that I wrote an uncompressed integer
>>>>> (containing the block size before the compressed data and continue
>>>>> that till there is a 0 length.
>>>>>
>>>>> The speed gain useing the join fork pattern (every block is
>>>>> decompressed by parallel threads) was 33% per extra core, without any
>>>>> optimalisation my pc compressed and decompressed 3 times faster.
>>>>> Probably with a more sofisticated implementation there should be more
>>>>> to gain.
>>>>>
>>>>> Wenn we specify in the standard that the blocks must be on row
>>>>> boundary, the row construction can also be done in parallel.
>>>>>
>>>>> A non parallel implementation would still be very similar to the
>>>>> standard decompression.
>>>>>
>>>>> any thoughts on this?
>>>>>
>>>>>          Ritchie
>>>>> ________________________________________
>>>>> From: fitsbits [fitsbits-bounces at listmgr.nrao.edu] on behalf of van
>>>>> Nieuwenhoven, Richard [Richard.vanNieuwenhoven at adesso.at]
>>>>> Sent: Friday, June 26, 2015 7:32 AM
>>>>> To: fitsbits at nrao.edu
>>>>> Subject: Re: [fitsbits] start of Public Comment Period on compressed
>>>>> FITS image and tables
>>>>>
>>>>> As a programmer there is another concern, the fits file can get very big
>>>>> and will become even bigger in future. Today's computers gain more power
>>>>> by using more cores instead of more speed per core. So It would be good
>>>>> if the standard "helps" in the use of multiple cores to process and
>>>>> decompress the fits files.
>>>>>
>>>>> The use of tiles already helps a lot because they can be handled in
>>>>> parallel. But the compression algorithms does not help at all because
>>>>> most of them can not use multiple cores to do the job.
>>>>>
>>>>> One possibility to get around this is to use blocks of compressed data
>>>>> and every block is compressed in itself. Or to have some kind of index
>>>>> with multiple entry points into the compressed data. This will be
>>>>> difficult to bring in line with the currently used compressions. A
>>>>> simple solution for that could be to add a bocked version of every
>>>>> compression type, this then uses a predefined block size.
>>>>>
>>>>> A minor concern is that it would help if there was some kind of index or
>>>>> other way to have jump points to the separate hdu's. Currently this is
>>>>> only possible by calculating the size from the header data and then
>>>>> jumping over the body to the next hdu. This could be solved by adding a
>>>>> special index hdu to the end of the file where the entry points of the
>>>>> different hdu's are stored.
>>>>>
>>>>> These suggestions would enable software to process the fits files a lot
>>>>> faster and as the trend goes on, more cores but not much more speed per
>>>>> core, the standard should prepare for it.
>>>>>
>>>>>         Ritchie
>>>>>
>>>>>
>>>>>
>>>>> Am 2015-06-25 um 16:38 schrieb Tom McGlynn (NASA/GSFC Code 660.1):
>>>>>> While I'm generally supportive of the compression proposal (at least for
>>>>>> images), I feel that the current text reflects the sense of this being a
>>>>>> convention rather than part of the standard.  By this I mean that if we
>>>>>> are going to support compressed images and tables then they should be
>>>>>> incorporated into the standard as first class objects.  The current text
>>>>>> makes it clear that these compressed HDU's are compressed
>>>>>> representations of virtual uncompressed images and tables.  Implicitly
>>>>>> the idea is the the user converts from the compressed image to the
>>>>>> uncompressed version and then processes that.  Instead we should
>>>>>> recognize that a compressed image is just one of the ways that FITS
>>>>>> allows one to store an image just like a pimary image array, an
>>>>>> extension image or vector value in a table.
>>>>>>
>>>>>> So I would suggest that the ZSIMPLE, ZEXTEND, ZBLOCKED and such keywords
>>>>>> be made optional with wording something like:  "If a compressed image is
>>>>>> being used to compress an existing FITS image extension, the ZXTENSION
>>>>>> keyword MAY be used to contain the value of the original extension."
>>>>>> I'd suggest that in future use the use of these keywords be discouraged.
>>>>>>
>>>>>> The recommended practice would be that users treat the compressed image
>>>>>> as the image and not worry about some
>>>>>> intermediate image representation.
>>>>>>
>>>>>>
>>>>>> My second major concern with with this convention is that it does seem
>>>>>> rather ad hoc.  I think that it would be much better if the proposal was
>>>>>> rigorously separated the algorithmic aspects from the non-algorithmic
>>>>>> elements.  A mechanism for how additional compression techniques could
>>>>>> be added should be notedE.g., the discussion of quantization should part
>>>>>> of the implementation of the lossy compression algorithms and the
>>>>>> ZQUANTIZ parameter should  probably be  one of the ZVALn, ZNAMEn
>>>>>> elememts.  Table 36 should titled something like:
>>>>>>      Supported Compression Algorithms
>>>>>> with the first column being the name of the compression algorithm, the
>>>>>> second the value of ZCMPTYPE, and also including the ZNAMEs using
>>>>>> (flagging the critical ones).
>>>>>>
>>>>>> I've almost no insight into table compression.  Given that no one seems
>>>>>> to be using this convention, my suggestion would be that it's premature
>>>>>> to add to the standard.
>>>>>>
>>>>>> Overall I suspect that the tiling capabilities are going to be
>>>>>> increasing essential for handling large images, so that at least that
>>>>>> much needs to be made part of the standard.  However I don't feel this
>>>>>> text is ready to be finalized.
>>>>>>
>>>>>>        Regards
>>>>>>        Tom McGlynn
>>>>>>
>>>>>> Lucio Chiappetti wrote:
>>>>>>> ANNOUNCEMENT:  START OF FORMAL PUBLIC COMMENT PERIOD
>>>>>>>
>>>>>>> This is to announce the official start of a 3-week formal Public
>>>>>>> Comment Period on the incorporation of the Tiled Image Compression and
>>>>>>> Tiled Table Compression conventions in the FITS Standard.
>>>>>>>
>>>>>>> This is part of a process to incorporate the most useful and widely
>>>>>>> used registered conventions (which are valid FITS constructs) into the
>>>>>>> official definition of the standard.
>>>>>>>
>>>>>>> Among these the two compression conventions benefit of a common
>>>>>>> handling. Given their relative complexity they are better discussed
>>>>>>> first, before other easier conventions.
>>>>>>>
>>>>>>> The proposed text consists
>>>>>>>
>>>>>>> - in the ADDITION of an entire new chapter (10)  to the FITS Standard
>>>>>>>      Document which describes the two conventions in a common
>>>>>>> prescriptive
>>>>>>>      framework.
>>>>>>> - It also includes the ADDITION of a new non-prescriptive Appendix I,
>>>>>>> - plus the addition of the necessary bibliographic references,
>>>>>>>
>>>>>>>      and has been prepared by a technical team including L.Chiappetti,
>>>>>>>      W.Pence, A.Dobrzycki, R.A.Shaw and W.Thompson (main editor Dick
>>>>>>> Shaw).
>>>>>>>
>>>>>>> - If the proposal is approved also Appendix C will be updated
>>>>>>> listing the
>>>>>>>      new keywords, and a section H.3 will be added to Appendix H
>>>>>>> describing
>>>>>>>      the updates and the differences with the registered convention.
>>>>>>>
>>>>>>>      All the updates are shown in blue colour in their current context
>>>>>>> (with
>>>>>>>      the exception of the NEW chapter 10 which is black)
>>>>>>>
>>>>>>> The proposed draft text is available at
>>>>>>> http://sax.iasf-milano.inaf.it/~lucio/FITS/Conventions/compression-upd2.pdf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Supporting material is provided in the FITS Convention Registry at the
>>>>>>> following URLs
>>>>>>> http://fits.gsfc.nasa.gov/registry/tilecompression.html
>>>>>>> http://fits.gsfc.nasa.gov/registry/tiletablecompression.html
>>>>>>>
>>>>>>> Considering that the convention(s) have been in use since several
>>>>>>> years, are legal FITS, were discussed on FITSBITS when the conventions
>>>>>>> were entered in the Registry and therefore their usage is well proven
>>>>>>> (also for what interoperability is concerned), the Public Comment
>>>>>>> Period is reduced to 3 weeks.
>>>>>>>
>>>>>>> Also the review by FITS Working Group Executive can be speeded up and
>>>>>>> handled in parallel or quickly after the conclusion of the Public
>>>>>>> Comment Period.
>>>>>>>
>>>>>>> Please review the text carefully and post any comments, criticisms, or
>>>>>>> suggestions on the FITSBITS mailing list (not on iauwfg or elsewhere)
>>>>>>> ==================================================================
>>>>>>>
>>>>>>> The Public Comment Period starts today 16 June 2015 and will last
>>>>>>> formally for 3 weeks until July 6
>>>>>>>
>>>>>>> ==================================================================
>>>>>>> Background information on the FITS approval process
>>>>>>>
>>>>>>> Under the "Rules and Procedures" of the IAU FITS Working Group,
>>>>>>> http://fits.gsfc.nasa.gov/iaufwg/iaufwg_rules.html, the first step in
>>>>>>> the official approval process of any FITS proposal will be a formal
>>>>>>> Public Comment Period to take place on the FITSBITS mailing list.
>>>>>>> After that the IAU FITS Working Group Executive will review the
>>>>>>> results. Following that the IAU FITS Working Group will then conduct a
>>>>>>> final vote to approve or disapprove the proposal.
>>>>>>>
>>>>>> _______________________________________________
>>>>>> fitsbits mailing list
>>>>>> fitsbits at listmgr.nrao.edu
>>>>>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>>>>
>>>>> --
>>>>> BSc Richard van Nieuwenhoven
>>>>> Software Architekt
>>>>>
>>>>> adesso Austria GmbH
>>>>> floridotower 26. Stock              T +43 1 2198790-0
>>>>> Foridsdorfer Hauptstr. 1            F +43 1 2198790-13
>>>>> A-1210 Wien                         H +43 664 88614710
>>>>>                                        E richard.vannieuwenhoven at adesso.at
>>>>>                                        www.adesso.at
>>>>> -------------------------------------------------------------
>>>>>             >>> business. people. technology. <<<
>>>>> -------------------------------------------------------------
>>>>> adesso Austria GmbH mit Sitz in Wien
>>>>> Handelsgericht Wien FN231467v
>>>>>
>>>>> _______________________________________________
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits