Abstrakt: | In this diploma thesis we present data structure for representation of maximal repeats in strings - R3 tree, based on well known data structure - suffix tree. It requires O(n) space and it can be constructed in O(n) time and space for string of length n over constant-sized alphabet. We formalize repeat in string S as triple (p1, p2, l), where p1, p2 are two distinct positions in S and l is the length of the repeat. We formulate query for maximal repeats in S in the form of the function findPairs(p1, k, S) that returns all pairs (p2, l) such that (p1, p2, l) is maximal repeat in S with l >= k. R3 tree allows computation of findPairs queries in optimal time O(z), where z is the number of found pairs. We also describe design and functionality of R3lib - library written in C, for finding maximal repeats in arbitrary binary data, that works with proposed structure.
|
---|