The central problem of text processing is pattern matching: given two strings (a pattern and a text), find all fragments of the text matching the pattern. In exact matching, solely equal strings match; however, this condition is relaxed in many variants of approximate pattern matching. In the k-mismatch problem, we allow for up to k substitutions, that is, strings of the same length are assumed to match even if there are at most k mismatching characters between them.
In the streaming model, the input data can be read only once in the left-to-right order and needs to be processed online in limited memory. In the case of pattern matching, the stream consists of the pattern followed by the text, and one has to report an occurrence before reading the subsequent characters. The state-of-the-art streaming algorithm for exact pattern matching uses O(1) time per character and O(log² n) bits of working space. Our streaming algorithm for the k-mismatch problem takes Õ(√k) time per character and O(k log² n) bits of working space. Up to logarithmic factors, this space usage is optimal, whereas the time complexity matches that of the fastest online algorithms with unbounded space consumption.
I will present this result, joint work with Raphaël Clifford (University of Bristol, UK) and Ely Porat (Bar Ilan University, Israel), on January 7th at the SODA’19 conference held in San Diego, California. The final version of the paper is available here.
Tomasz Kociumaka