Abstract:
This work addresses the challenges associated with analysis of data generated by
high performance computing (HPC) systems under data protection and privacy
requirements. The HPC systems are the workhorse of simulation science, enabling
unique insights across many disciplines (climate modeling, life sciences, weather
forecast, etc.). System monitoring and analysis of monitoring data are highly
significant for the efficient operation and research in performance optimization of
HPC systems. Such systems generate various and large volumes of data as they
operate, constituting a case of Big Data that challenges key data protection and
privacy principles. This paper describes the Data Analysis for Improving High
Performance Computing Operations and Research (DA-HPC-OR) project funded
through the Eucor - The European Campus EVTZ via the Seed Money program1.
The main goal in this project is the analysis of data collected since July 2016 on
the HPC system (NEMO) at the University of Freiburg in order to improve their research and operations activities. Data collected on the sciCORE cluster in Basel
will be used to validate the knowledge extracted from NEMO. This knowledge
will be used to improve the monitoring, operational, and research activities of the
three HPC systems (Freiburg, Basel, and Strasbourg). Data protection requires
legal monitoring the relevant Swiss, German, and EU legislation. Compliance
with such laws will be ensured via data de-identification and anonymization prior
to analysis. We leverage the HPC, legal, and data analysis expertise of the
consortium to develop solutions that can be transferred to other Eucor members
at no additional legislative inquiries or overheads.