Data Challenge Track

The International Conference on Performance Engineering (ICPE) is hosting its third edition of the Data Challenge track. We call upon everyone interested to apply approaches and analyses to a common selection of performance datasets. The challenge is open-ended: participants can choose the research questions they find most interesting. The proposed approaches/analyses and their findings are discussed in short papers and presented at the main conference.

This year, the focus is on performance analysis of microservices systems. Given the increasing adoption of this architecture style, understanding the performance of microservices systems is becoming an essential task for performance engineers. Participants are invited to come up with new research questions and approaches for microservices performance analysis. For their papers, participants must choose one or more datasets from a predefined list derived from prior academic/industry research. Participants are expected to use this year’s datasets to answer their research questions, and report their findings in a four-page challenge paper. If the paper is accepted, participants will be invited to present the results at ICPE 2025 in London. Details on the datasets are provided below.

Datasets

This year’s ICPE data challenge is based on four datasets from both academic and industry studies. Each dataset includes performance measurements gathered from either industrial or open-source systems, with each dataset having its own unique data format and content as described in their respective repositories.

The first dataset is provided by the 2024 FGCS paper “The Globus Compute Dataset: An Open Function-as-a-Service Dataset from the Edge to the Cloud”. It encompasses 31 weeks (Nov. 22 - Jul. 23) of data from Globus Compute, with 2.1 million function invocations executed on 580 remote computing endpoints. The data contains six different time stamps for each submitted job, the endpoint where the job was executed, and software measures of the utilized function. Globus Compute implements a unique federated FaaS model that is primarily aimed at scientific computing scenarios.

Repository: https://zenodo.org/records/10044780 (Upon request, a newer and longer version of the dataset can be made available. Start of the recording Nov. 2022)
The second dataset is introduced in the 2024 JSS paper, “Enhancing empirical software performance engineering research with kernel-level events: A comprehensive system tracing approach”, and is a collection of kernel-level events, system calls, and system traces designed to facilitate advanced performance analysis research. This artifact includes a total of 24,263,691 events, capturing detailed interactions within a Linux system under varying workloads and noise conditions. It is particularly useful for studies aimed at understanding the behaviors of software systems under both light and heavy load scenarios, simulated with accompanying CPU, I/O, Network, and Memory noise to enhance realism.

Repository: https://github.com/mnoferestibrocku/dataset-repo/tree/main/KernelTracing
The third dataset is the AMTrace from Alibaba Cluster Trace Microarchitecture introduced in an ICPP paper . AMTrace is the first fine-granularity and large-scale microarchitectural metrics of Alibaba Colocation Datacenter providing detailed insights into microarchitectural behaviors within Alibaba’s computing clusters. This dataset offers extensive data on CPU and memory utilization, I/O operations, and network activities across different nodes, making it a great resource for researchers focusing on cloud computing performance, microarchitecture contention, memory bandwidth contention, workload management, and system optimization.

Repository: https://github.com/alibaba/clusterdata/tree/master/cluster-trace-microarchitecture-v2022
The fourth dataset is provided by the IDPDS’20 paper “What does Power Consumption Behavior of HPC Jobs Reveal? : Demystifying, Quantifying, and Predicting Power Consumption Characteristics”. It contains power-consumption characteristics of over 80k HPC jobs executed during 5 months at two medium-scale European HPC production clusters with a total of 1,288 compute nodes. Moreover, the dataset uses node-level power consumption counters reported using Intel’s Running Average Power Limit.

Repository: https://zenodo.org/records/3666632

Challenge

High-level possible ideas for participants include but are not limited to:

Tailor visualization techniques to navigate the extensive data generated by microservices systems.
- Beschastnikh et al., 2020: https://doi.org/10.1145/3375633
- Silva et al., 2021: https://doi.org/10.1109/IV53921.2021.00028
- Anand et al., 2020: https://doi.org/10.48550/arXiv.2010.13681
Develop automated techniques to identify patterns associated with performance degradations.
- Wang et al., 2022: https://dl.acm.org/doi/10.1145/3545008.3545026
- Traini and Cortellessa, 2023: https://doi.org/10.1109/TSE.2023.3266041
- Bansal et al., 2020: https://doi.org/10.1145/3377813.3381353
Evaluation of previous/novel root cause analysis techniques.
- Noferesti et al., 2024: https://doi.org/10.1016/j.jss.2024.112117
- Mariani et al., 2018: https://doi.org/10.1109/ICST.2018.00034
- Ma et al., 2020: https://doi.org/10.1145/3366423.3380111
Model performance of systems using machine learning algorithms.
- Xiong et al., 2013: https://doi.org/10.1145/2479871.2479909
- Liao et al., 2020: https://doi.org/10.1007/s10664-020-09866-z
Replicate prior study/approach on a selected dataset.
- Bauer et al., 2024: https://doi.org/10.1016/j.future.2023.12.007
- Patel et al., 2020: https://doi.org/10.1109/IPDPS47924.2020.00087

Submission

A challenge paper should outline the findings of your research. Starting with an introduction to the problem tackled and its relevance in the field. Detail the datasets utilized, the methods and tools applied, and the results achieved. Discuss the implications of the study findings, and highlight the paper contributions and their importance.

To maintain clarity and consistency in research submissions, authors are required to specify the dataset (or portion thereof) utilized when detailing methodologies or presenting findings. Additionally, authors are reminded to be precise in their references to datasets.

We highly encourage the solution’s source code to be included with the submission (e.g., in a permanent repository such as Zenodo, potentially linked to a GitHub repository as described here), but this is not mandatory for acceptance of a data challenge paper.

The page limit for challenge papers is 4 pages (including all figures and tables) + 1 page for references. Challenge papers will be published in the companion to the ICPE 2025 proceedings. All challenge papers will be reviewed by the program committee members. Note that submissions to this track are double-blind: for details, see the Double Blind FAQ page. The best data challenge paper will be awarded by the track chairs and the program committee members.

Submissions to be made via HotCRP by selecting the respective track.

The submission deadline can be found here.

Data Challenge Chairs

André Bauer, University of Chicago, USA.
Naser Ezzati-Jivan, Brock University, Canada.