Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Dec 9:2024.09.06.611720. Originally published 2024 Sep 11. [Version 2] doi: 10.1101/2024.09.06.611720

Massive Compression for High Data Rate Macromolecular Crystallography (HDRMX): Impact on Diffraction Data and Subsequent Structural Analysis

Herbert J Bernstein, Alexei S Soares, Kimberly Horvat, Jean Jakoncic
PMCID: PMC11418961  PMID: 39314279

Abstract

New higher-count-rate, integrating, large area X-ray detectors with framing rates as high as 17,400 images per second are beginning to be available. These will soon be used for specialized MX experiments but will require optimal lossy compression algorithms to enable systems to keep up with data throughput. Some information may be lost. Can we minimize this loss with acceptable impact on structural information? To explore this question, we have considered several approaches: summing short sequences of images, binning to create the effect of larger pixels, use of JPEG-2000 lossy wavelet-based compression, and use of Hcompress, which is a Haar-wavelet-based lossy compression borrowed from astronomy. We also explore the effect of the combination of summing, binning, and Hcompress or JPEG-2000. In each of these last two methods one can specify approximately how much one wants the result to be compressed from the starting file size. These provide particularly effective lossy compressions that retain essential information for structure solution from Bragg reflections.

Synopsis

New higher-count-rate, integrating, large area X-ray detectors with framing rates as high as 17,400 images per second are beginning to be available. These will soon be used for specialized MX experiments but will require optimal lossy compression algorithms to enable systems to keep up with data throughput. Some information may be lost. Can we minimize this loss with acceptable impact on structural information?

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES