Analysis of the Average Depth in a Suffix Tree under a Markov Model

English
11 Pages
Read an excerpt
Gain access to the library to view online
Learn more

Description

Niveau: Supérieur, Doctorat, Bac+8
Analysis of the Average Depth in a Suffix Tree under a Markov Model Julien Fayolle1 and Mark Daniel Ward2 1Projet Algorithmes, INRIA, F-78153 Rocquencourt, France. 2Department of Mathematics, Purdue University, West Lafayette, IN, USA. In this report, we prove that under a Markovian model of order one, the average depth of suffix trees of index n is asymptotically similar to the average depth of tries (a.k.a. digital trees) built on n independent strings. This leads to an asymptotic behavior of (logn)/h+C for the average of the depth of the suffix tree, where h is the entropy of the Markov model and C a constant. Our proof compares the generating functions for the average depth in tries and in suffix trees; the difference between these generating functions is shown to be asymptotically small. We conclude using the asymptotic behavior of the average depth in a trie under the Markov model found by Jacquet and Szpankowski ([JS91]). Keywords: Suffix trees, depth, average analysis, asymptotics, analytic methods Contents 1 Introduction 1 2 Main Results 2 3 Autocorrelation 4 4 On the Generating Functions 6 5 Asymptotics 7 5.1 Isolating the dominant pole . . . . . . .

  • words starting

  • markov model

  • hence dn

  • letter has

  • probability

  • pk?i ≤ pk?

  • minimal period

  • suffix trees


Subjects

Informations

Published by
Reads 43
Language English
Report a problem
Analysis of the Average Depth in a Suffix Tree under a Markov Model
1 2 Julien Fayolle and Mark Daniel Ward
1 Projet Algorithmes, INRIA, F78153 Rocquencourt, France.julien.fayolle@inria.fr 2 Department of Mathematics, Purdue University, West Lafayette, IN, USA.mward@math.purdue.edu
In this report, we prove that under a Markovian model of order one, the average depth of suffix trees of indexnis asymptotically similar to the average depth of tries (a.k.a. digital trees) built onnindependent strings. This leads to an asymptotic behavior of(logn)/h+Cfor the average of the depth of the suffix tree, wherehis the entropy of the Markov model andCa constant. Our proof compares the generating functions for the average depth in tries and in suffix trees; the difference between these generating functions is shown to be asymptotically small. We conclude using the asymptotic behavior of the average depth in a trie under the Markov model found by Jacquet and Szpankowski ([JS91]).
Keywords:Suffix trees, depth, average analysis, asymptotics, analytic methods
Contents
1
2
3
4
5
6
Introduction
Main Results
Autocorrelation
On the Generating Functions
Asymptotics 5.1 Isolating the dominant pole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Computing residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Asymptotic behavior ofQn(u). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion
1
2
4
6
7 7 8 10
11
1 Introduction The suffix tree is a data structure that lies at the core of pattern matching, used for example in the lossless compression algorithm of Lempel and Ziv [ZL77]. Suffix trees are also utilized in bioinformatics to track