Efficient Exploration and Analysis of Program Repair Search Spaces
Time: Fri 2024-11-22 10.00
Location: F3 (Flodis), Lindstedtsvägen 26, Sweden
Video link: https://kth-se.zoom.us/j/64148823838
Language: English
Subject area: Computer Science
Doctoral student: Khashayar Etemadi , Teoretisk datalogi, TCS
Opponent: Associate Professor Shane McIntosh, University of Waterloo, Waterloo, ON, Canada
Supervisor: Professor Martin Monperrus, Teoretisk datalogi, TCS; Professor Benoit Baudry, Université de Montréal, Montréal, QC, Canada; Assistant Professor Fernanda Madeiral Delfim, Teoretisk datalogi, TCS, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
QC 20241023
Abstract
The ubiquitous presence of software has made its failures a huge source of cost. To avoid this cost, fixing buggy programs is essential. As manual bug fixing, also known as debugging, is notoriously resource demanding, researchers have proposed to replace it with automated program repair (APR) approaches. As input, an APR approach takes a specification and a buggy program that violates the specification; it then tries to generate a patched program that complies with the specification and correctly implements the expected behavior. To ensure APR is practical and can replace manual debugging, we have to make it efficient.
APR works in two major phases: search space exploration, and search space analysis. In the search space exploration phase, the APR approach predicts the potentially faulty parts of the code (fault localization) and replaces them with various alternatives to generate patches (patch generation). In this phase, finding the best alternatives requires extensive and costly analysis, which makes patch generation a key to APR efficiency. In the search space analysis phase of APR, the patches generated in the exploration phase are analyzed to identify a correct patch with an automated patch assessment and a manual patch assessment component. The automated patch assessment aims at minimizing the number of patches that require manual assessment for identifying a correct patch. As the human expertise needed for manual patch assessment is the most valuable development resource, strong automated and manual patch assessment techniques are crucial for efficient APR.
In this thesis, we aim at improving the efficiency of three main components involved in the exploration and analysis of APR search spaces: patch generation, automated patch assessment, and manual patch assessment. For this purpose, the thesis makes five contributions as follows.
We make two contributions to make patch generation efficient. First, we introduce Sorald, a template-based APR approach for fixing SonarJava static warnings. Sorald employs accurately designed templates to generate exactly one patch that is highly likely to fix the bug. The lightweight patch generation technique and the small search space that needs little analysis makes Sorald an efficient APR approach. Our second contribution is CigaR, an iterative LLM-based APR tool that fixes bugs that cause a test failure. CigaR uses three carefully designed prompts and two prompting strategies to generate likely correct patches with minimal LLM token cost.
We also make two other contributions to make automated patch assessment efficient. First, we propose LighteR, a lightweight tool for estimating the potential of fix templates used by template-based APR approaches. LighteR compares fix templates against developer-made bug-fixes to assess if the templates follow the modification patterns used by experts. The result of this assessment is used to rank the patches based on the potential of templates used for their generation. This ranking is used to prioritize patches for manual assessment and thus, finding the correct patch with minimal manual analysis. Our second contribution to APR automated patch assessment is Mokav. Mokav is an execution-driven iterative LLM-based tool that generates tests to differentiate between patches. The generated tests help to group APR patches into separate clusters. Only one patch from each of the clusters requires manually analysis. This makes APR efficient by significantly reducing the manual patch assessment effort.
Our last contribution is Collector-Sahab, which aims at helping code reviewers better understand behavioral changes caused by patches. Given two versions of a program P \& Q, Collector-Sahab collects the execution trace of both P \& Q. It next compares the traces and identifies runtime differences at variable and return value level. Finally, it augments the code diff between P \& Q with a concise selection of extracted runtime differences. This code augmentation helps code reviewers to better understand the behavior of APR patches and thus, reduces human effort needed for manual patch assessment.
To sum up, in this thesis we aim at making APR useful in practice by improving its efficiency. For this purpose, we propose novel methods to make patch generation, automated patch assessment, and manual patch assessment efficient.