Augmenting Test Oracles with Production Observations
Time: Fri 2024-12-13 14.00
Location: Kollegiesalen, Brinellvägen 8, Stockholm
Video link: https://kth-se.zoom.us/j/64605922145
Language: English
Doctoral student: Deepika Tiwari , Programvaruteknik och datorsystem, SCS
Opponent: Professor Paolo Tonella, Universita della Svizzera Italiana, Lugano, Switzerland
Supervisor: Professor Benoit Baudry, Teoretisk datalogi, TCS; Professor Martin Monperrus, Teoretisk datalogi, TCS
QC 20241112
Abstract
Software testing is the process of verifying that a software system behaves as it is intended to behave. Significant resources are invested in creating and maintaining strong test suites to ensure software quality. However, in-house tests seldom reflect all the scenarios that may occur as a software system executes in production environments. The literature on the automated generation of tests proposes valuable techniques that assist developers with their testing activities. Yet the gap between tested behaviors and field behaviors remains largely overlooked. Consequently, the behaviors relevant for end users are not reflected in the test suite, and the faults that may surface for end-users in the field may remain undetected by developer-written or automatically generated tests.
This thesis proposes a novel framework for using production observations, made as a system executes in the field, in order to generate tests. The generated tests include test inputs that are sourced from the field, and oracles that verify behaviors exhibited by the system in response to these inputs. We instantiate our framework in three distinct ways.
First, for a target project, we focus on methods that are inadequately tested by the developer-written test suite. At runtime, we capture objects that are associated with the invocations of these methods. The captured objects are used to generate tests that recreate the observed production state and contain oracles that specify the expected behavior. Our evaluation demonstrates that this strategy results in improved test quality for the target project.
With the second instantiation of our framework, we observe the invocations of target methods at runtime, as well as the invocations of methods called within the target methods. Using the objects associated with these invocations, we generate tests that use mocks, stubs, and mock-based oracles. We find that the generated oracles verify distinct aspects of the behaviors observed in the field, and also detect regressions within the system.
Third, we adapt our framework to capture the arguments with which target methods are invoked, during the execution of the test suite and in the field. We generate a data provider using the union of captured arguments, which supplies values to a parameterized unit test that is derived from a developer-written unit test. Using this strategy, we discover developer-written oracles that are actually generalizable to a larger input space.
We evaluate the three instances of our proposed framework against real-world software projects exercised with production workloads. Our findings demonstrate that runtime observations can be harnessed to generate complete tests, with inputs and oracles. The generated tests are representative of real-world usage, and can augment developer-written test suites.