ISC17 Tutorial

Large-scale numerical simulations, observations and experiments are generating very large datasets that are difficult to analyze, store and transfer. This problem will be exacerbated for future generations of systems. Data compression is an attractive and efficient technique to significantly reduce the size of scientific datasets while being rather agnostic to the applications. This tutorial reviews the state of the art in lossless and lossy compression of scientific datasets, discusses in detail one lossless (FPZIP) and two lossy compressors (SZ and ZFP), introduces compression error assessment metrics and offers a hands on session allowing participants to use SZ, FPZIP and ZFP as well as Z-checker, a tool to comprehensively assess the compression error. The tutorial addresses the following questions: Why compression, and in particular lossy compression? How does compression work? How measure and control the compression error? What is under the hood of some of the best compressors for scientific datasets? The tutorial uses examples of real world compressors and scientific datasets to illustrate the different compression techniques and their performance. The tutorial is given by two of the leading teams in this domain and targets an audience of beginners and advanced researchers and practitioners in scientific computing and data analytics.

Content Level: 60% beginner, 30% intermediate, 10% advanced

Targeted Audience: This tutorial is for researchers, students and users of high performance computing interested in lossy compression techniques to reduce the size of their datasets: Researchers and students involved in research using or developing new data reduction techniques ; Users of scientific simulations and instruments who require significant data reduction.

Prerequisites: Participants are supposed to bring their own laptop, running Linux or MAC OS X. No previous knowledge in compression or programming language is needed.