8:30 AM Welcoming by Anish Arora (Distinguished Professor of Engineering and Chair, Computer Science and Engineering, Faculty Director, 5G-OH Connectivity Center), intro/presentation of the schedule: 15 minutes
8:45 AM State of the Project/Vision (Chair: Franck)
9:00 AM Thrust I: Compression API and Generators (Chair: Robert) Progress on Compression API and Generators: 75 minutes
Robert Underwood: Compiler Abstractions
Shihui Song: CERESZ-II
Yafan Huang: cuSZp and Compression for Light Sources
Discussion: Compiler Abstractions for Heterogeneous
5:00 PM Reception (Cost on your own, Rico’s Pizzeria & Pasta House, Address: 5131 N Tamiami Trail, Sarasota, FL
34234)
Day 1 (Feb 14)
8:30 AM Door opens for Chao lecture hall
8:45 AM Please arrive at the Chao lecture hall by 8:45 AM.
9:00 AM Workshop opening by Dr. Franck Cappello
9:05 AM Welcome talk by Dr. Stacey Patterson (Vice President for Research, Florida State University)
9:15 AM Welcome talk by Dr. Varun Chandola (NSF Program Director, CISE/OAC)
9:20 AM A general introduction: workshop participants with their expertise.
9:40 AM Training on existing compressors (SZ, ZFP, LC, SPERR, LibPressio, etc.) (1h)
Overview (40 min)
Hands-on (20 min)
10:40 Break (20 min)
11:00 AM A general introduction to the FZ project and the 3 different thrusts of the FZ project (programming
interface and specific compressor generation, building of the compression module library, visualization, quality
assessment, and optimization) (1h)
12:00 PM Lunch (Cost on your own)
1:30 PM A presentation of the FZ project progress so far and the next milestones (1h)
2:30 PM A discussion about FZ module design with other compressors (30min)
3:00 PM Break (20 min)
3:45 PM Application session, part 1 (1h40min)
(4 application domains, 15 min each) A presentation of the different application domain requirements and
constraints concerning lossy compression by the application attendees (1h)
5:00 PM End of day
Day 2 (Feb 15)
9:00 AM Application session, part 2 (1h40min)
(5 application domains, 12 min each) A presentation of the different application domain requirements and
constraints concerning lossy compression by the application attendees (1h)
one-to-one break-out sessions with the application developers and users to collect (i) use case requirements
concerning compression ratio, speed, and accuracy criteria, (ii) practical compression interface requirements,
including APIs and I/O library integration, and shell command. (40min)
Group 1-4 (20 min)
Group application 1: Climate, lead: Robert + compressor developers
Group application 2: Seismology, lead: Dingwen + compressor developers
Group application 3: Quantum circuit, lead: Sheng + compressor developers
Group application 4: Fusion, lead: Hanqi + compressor developers
Group 5-9 (20 min)
Group application 5: Cosmology, lead: Dingwen + compressor developers
Group application 6: Light sources, lead: Robert + compressor developers
Group application 7: Molecular Dynamics, lead: Kai + compressor developers
Group application 8: Combustion, lead: Hanqi + compressor developers
Group application 9: System logs, lead: Sheng + compressor developers
In parallel, Preparation of the slides summarizing discussion/test results for every application. Robert,
Dingwen, Sheng, Kai, Hanqi (2 application domains each).
10:40 AM Break (20 min)
11:00 AM Hackathon sessions where multiple existing compression schemes will be tested for every application to
identify relevant compression methods and gaps that could be addressed with lossy compressor customization (1h)
12:00 PM Lunch (Cost on your own)
1:00 PM Presentation of the discussion/test results for every application (1h30)
2:30 PM Discussion/Reconciliation of the break-out session results
5:00 PM End of the workshop
Photos:
Kickoff Meeting
Kickoff Meeting to be taken place on 15th Sept 2023 at IUPUI.
Thank you for considering attending the FZ Kickoff Meeting!
All slides for talks in the meeting can be found in this shared folder.
Here is the schedule:
8:30 AM Welcome/intro/presentation of the schedule: 15 minutes
8:45 AM Review of the project objectives and deliverables: 15 minutes [slides]
Description of the general modular design (modules for pipeline generation, modules for quality assessment, modules for optimization)
9:00 AM Programming Interface and Compressor Generators: 75 minutes
Robert: 10 minutes to introduce the topic, discuss gaps, and development plan, and present progress [slides]
Dingwen: 10 minutes about some specifics of GPUs [slides]
10:15 AM Break: 15 minutes
10:30 AM Compression module library (modules for compression pipeline composition): 75 minutes
Kai: 10 minutes to introduce the topic, discuss gaps, and development plan, and present progress [slides]
The ISC Tutorials are interactive courses focusing on key topics of high performance computing, networking, storage, and data science. Renowned experts in their respective fields will give attendees a comprehensive introduction to the topic as well as providing a closer look at specific problems. Tutorials are encouraged to include a “hands-on” component to allow attendees to practice prepared materials.
The Tutorials will be held on Thursday, June 24, and on Friday, June 25, 2021.
The ISC 2021 Tutorials Committee is headed by Kevin Huck, University of Oregon, USA, with Kathryn Mohror, Lawrence Livermore National Laboratory, USA, as Deputy Chair.
International Workshop on Big Data Reduction (IWBDR)
Today’s modern applications are producing too large volumes of data to be stored, processed, or transferred efficiently. Data reduction is becoming an indispensable technique in many domains because it can offer a great capability to reduce the data size by one or even two orders of magnitude, significantly saving the memory/storage space, mitigating the I/O burden, reducing communication time, and improving the energy/power efficiency in various parallel and distributed environments, such as high-performance computing (HPC), cloud computing, edge computing, and Internet-of-Things (IoT). An HPC system, for instance, is expected to have a computational capability of floating-point operations per second, and large-scale HPC scientific applications may generate vast volumes of data (several orders of magnitude larger than the available storage space) for post-anlaysis. Moreover, runtime memory footprint and communication could be non-negligible bottlenecks of current HPC systems.
Tackling the big data reduction research requires expertise from computer science, mathematics, and application domains to study the problem holistically, and develop solutions and harden software tools that can be used by production applications. Specifically, the big-data computing community needs to understand a clear yet complex relationship between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation large-scale computing infrastructure, especially given constraints on applicability, fidelity, performance portability, and energy efficiency. New data reduction techniques also need to be explored and developed continuously to suit emerging applications and diverse use cases.
There are at least three significant research topics that the community is striving to answer: (1) whether several orders of magnitude of data reduction is possible for extreme-scale sciences; (2) understanding the trade-off between the performance and accuracy of data reduction; and (3) solutions to effectively reduce data size while preserving the information inside the big datasets.
The goal of this workshop is to provide a focused venue for researchers in all aspects of data reduction in all related communities to present their research results, exchange ideas, identify new research directions, and foster new collaborations within the community.
Lossy Compression for Scientific Data - Success Stories
Compression for scientific data
Compression for scientific data
Lossy Compression for scientific data
Large-scale numerical simulations and experiments are generating very large datasets that are difficult to analyze, store and transfer. This problem will be exacerbated for future generations of systems. Data reduction becomes a necessity in order to reduce as much as possible the time lost in data transfer and storage. Lossless and lossy data compression are attractive and efficient techniques to significantly reduce data sets while being rather agnostic to the application. This tutorial will review the state of the art in lossless and lossy compression of scientific data sets, discuss in detail two lossy compressors (SZ and ZFP) and introduce compression error assessment metrics. The tutorial will also cover the characterization of data sets with respect to compression and introduce Z-checker, a tool to assess compression error.
More specifically the tutorial will introduce motivating examples as well as basic compression techniques, cover the role of Shannon Entropy, the different types of advanced data transformation, prediction and quantization techniques, as well as some of the more popular coding techniques. The tutorial will use examples of real world compressors (GZIP, JPEG, FPZIP, SZ, ZFP, etc.) and data sets coming from simulations and instruments to illustrate the different compression techniques and their performance. This 1/2 day tutorial is improved from the evaluations of the two highly attended and rated tutorials given on this topic at ISC17 and SC17.
Compression for scientific data
DescriptionLarge-scale numerical simulations, observations and experiments are generating very large datasets that are difficult to analyze, store and transfer. Data compression is an attractive and efficient technique to significantly reduce the size of scientific datasets. This tutorial reviews the state of the art in lossy compression of scientific datasets, discusses in detail two lossy compressors (SZ and ZFP), introduces compression error assessment metrics and the Z-checker tool to analyze the difference between initial and decompressed datasets. The tutorial will offer hands-on exercises using SZ and ZFP as well as Z-checker. The tutorial addresses the following questions: Why lossless and lossy compression? How does compression work? How measure and control compression error? The tutorial uses examples of real-world compressors and scientific datasets to illustrate the different compression techniques and their performance. Participants will also have the opportunity to learn how to use SZ, ZFP and Z-checker for their own datasets. The tutorial is given by two of the leading teams in this domain and targets primarily beginners interested in learning about lossy compression for scientific data. This half-day tutorial is improved from the evaluations of the highly rated tutorials given on this topic at ISC17, SC17 and SC18.
Compression for scientific data
SC17 Tutorial
ISC17 Tutorial
Large-scale numerical simulations, observations and experiments are generating very large datasets that are difficult to analyze, store and transfer. This problem will be exacerbated for future generations of systems. Data compression is an attractive and efficient technique to significantly reduce the size of scientific datasets while being rather agnostic to the applications. This tutorial reviews the state of the art in lossless and lossy compression of scientific datasets, discusses in detail one lossless (FPZIP) and two lossy compressors (SZ and ZFP), introduces compression error assessment metrics and offers a hands on session allowing participants to use SZ, FPZIP and ZFP as well as Z-checker, a tool to comprehensively assess the compression error. The tutorial addresses the following questions: Why compression, and in particular lossy compression? How does compression work? How measure and control the compression error? What is under the hood of some of the best compressors for scientific datasets? The tutorial uses examples of real world compressors and scientific datasets to illustrate the different compression techniques and their performance. The tutorial is given by two of the leading teams in this domain and targets an audience of beginners and advanced researchers and practitioners in scientific computing and data analytics.
Targeted Audience:
This tutorial is for researchers, students and users of high performance computing interested in lossy compression techniques to reduce the size of their datasets: Researchers and students involved in research using or developing new data reduction techniques ; Users of scientific simulations and instruments who require significant data reduction.
Prerequisites:
Participants are supposed to bring their own laptop, running Linux or MAC OS X. No previous knowledge in compression or programming language is needed.