EPR-dictionaries (Enhanced Prefixsum Rank dictionaries) by Pockrandt et al. are a practical and fast data structure that allow constant-time searching in both uni- and bidirectional FM indices. Its space consumption of O(log σ * n) + o(log σ * σ * n) is also an improvement of the implementation by Lam et al. This makes EPR-dictionaries a space/time-tradeoff compared to Wavelet trees. In practice EPR-dictionaries are faster than Wavelet trees by a factor of 2.2 – 4.2 depending on the alphabet size.
Table 1 shows the speed-ups for uni- and bidirectional FM indices. WT and 2WT refer to Wavelet tree implementations in SeqAn, 2SDSL and 2SCH refer to the bidirectional Wavelet tree implementations in the SDSL respectively the one by Schnattinger et al.
Our implementation of FM indices using EPR-dictionaries is the first constant-time implementation of a bidirectional FM index. Bidirectional indices have become of great importance in the past years and are crucial when it comes to computational expensive tasks such as approximate string searching and lncRna structural search.
EPR-dictionaries are part of our library since SeqAn 2.2.