Writing a simple module for fuzzy testing in Python
by Tiago Tresoldi
This is a post that was drafted for a different blog, where it was never published. I decided to give it an ephemeral life here.
Fuzzying, or “fuzzy testing”, is a technique for testing software that involves running a program on a potentially infinite series of arbitrary inputs. In analogy to the theorem that infinite monkeys with infinite typewriters would eventually recreate Hamlet, by fuzzying we can observe all logic paths in our code — especially those not imagined or intended. Rather than confirming that the results are the expected ones, as in the conventional “unit testing”, the goal is to check whether and how a program handles unexpected data and flaws. The technique is common in software security for diagnosing problems like poor memory management or permission scaling, and has been used in web development as well: user behaviour can be so unexpected that the “fuzzer” just randomly clicks and enter inputs in a page, trying to bring the server down.
Despite being more prevalent in system software, we can adopt the technique in programs for the humanities. This is the case of the AlteruPhono library on which I have been working on, for automating processes of sound change. Despite a consensus on how encode these “rules”, there is no unified notation, so that every linguist who experimented with my library could use a slightly different syntax, certainly displeased if their habits bring down the system. Besides, the library is developed to assist in the identification of phonological changes, and a headache already in the first experiments were breaks caused by the exploration algorithm proposing unsupported candidates. In cases like these, the library should disclose that it cannot handle a request, being explicit about the problem, and never “just crash”.
None of the solutions for fuzzying that I found was really suitable.
Most were too complex. Several had too many dependencies. Many were
targeted at identifying problems due to malicious users. Some did not
allow structured input, merely applying one random string after the
other. I decided to write a small independent Python script to
demonstrate fuzzying in a simple way. The two central data of the
library (rules, like
p > b / _ V meaning that “/p/ changes into /b/
when preceding a vowel”, and sound sequences, like
# p a t e #) are
the starting point, with the tests resulting from random “disturbances”
(or “perturbations”): at each iteration, a string has characters
removed, replaced, swapped, or added. The fundamental structure is
maintained, better simulating the two real types of problems,
discrepancies in rule encoding and typing errors, while triggering the
desired “valid-but-unsupported” paths with a higher probability.
The function responsible for this loop operates on a list of
distribution probabilities that can be replaced by a better one (for
example, from a Poisson process via scipy).
Other improvements could be a measure of the distance between the
original element and the disturbed one, for example with edit distance.
More suitable modifications can be coded, such as following some phonotactics, but these results are adequate for our demonstration purposes:
The program’s main loop is a
while True construct that can be
interrupted with CTRL+C (demonstrating the simplest way to control the
SIGINT and SIGTERM signals in Python). In this loop, at each iteration
we draw a rule from the pool, apply the random disturbance function, and
attempt to parse it. The latter task can be successful (returning the
type of response we expect), fail in a controlled manner (when the code
identifies that the rule is malformed, throwing an AlteruPhonoError
exception), or fails in an unforeseen way (throwing some other Python
exception, like a ValueError). The first two events are successes, while
the latter is recorded as a failure.
If the parsing attempt yields an expected response, it is applied on a random sound sequence employing the two methods provided by the library, for forward and backward operation. Both tests help to confirm if the response is valid, as it may happen that our parsing provides a non-applicable rule, and the test is performed on sequences that are not associated to the original rule in the manual tests. This increases the unpredictability of all tests, as desired. After these operations on well-formed sequences, the code takes care of disturbing them as well, repeating the forward and backward transformations in a more demanding assessment of performance on non-applicable, invalid, and malformed sequences.
At each iteration we append the results to a simple record in tabular format, as in this screenshot:
Combined with tests written by hand, which focus on the linguistic aspects of the task, the fuzzy testing allowed to identify several problems in the code and data. For example, one case derived from the wrong treatment I made of sibilacity, evidenced during a random attempt to transform dorsal fricatives into coronal ones. The solution required more than a simple check, slightly changing the way our pyclts library and BIPA are integrated.
The code is available on GitHub and can serve as a template for similar experiments — it was trivial, for example, to tweak it in order to play with lingpy’s ngram functions for a different project. Even with the AlteruPhono library itself, the function for disturbing the input should be expanded with more “phonologically aware” code soon, for example by taking into account distinctive feature instead of manipulating their labels, thus operating on the tokens and not the strings. The basic operation, however, would be the same: take real inputs, change them, test with the modifications, and keep track of failures.
The technique has been useful in other programming aspects as expected, such as writing an adequate PEG grammar. However, what attracts me the most is its scientific value. It is too easy but worth to trace an analogy with the path of so many linguistic theories, especially “universal” ones. Formed on a restricted and well-behaved group of examples, they collapse in front of data from languages unknown or at first disregarded by the author(s) (for whom they operate in analogy with random inputs), leading to reformulations, scope restrictions, or even abandonment. For a library on phonological changes, dealing with abstractions and giving up any pretense of totality, the experiments allow to overcome a mere lamentation over ill-defined limitations, acknowledging and discussing them. After all, scientific progress is built upon unsuccessful tests, right?tags: python, - fuzzy