174 Pages
English

Frame synchronization processes in gene expression [Elektronische Ressource] / Johanna Weindl

Gain access to the library to view online
Learn more

Subjects

Informations

Published by
Published 01 January 2008
Reads 13
Language English
Document size 3 MB

Lehrstuhl fu¨r Nachrichtentechnik
Frame Synchronization Processes
in Gene Expression
Johanna Weindl
Vollst¨andiger Abdruck der von der Fakult¨at fu¨r Elektrotechnik und Informationstechnik
der Technischen Universit¨at Mu¨nchen zur Erlangung des akademischen Grades eines
Doktor–Ingenieurs
genehmigten Dissertation.
Vorsitzender: Univ.–Prof. Dr. rer. nat. habil. B. Wolf
Pru¨fer der Dissertation: 1. Univ.–Prof. Dr.–Ing., Dr.–Ing. E. h. J. Hagenauer (i. R.)
2. Univ. Prof. Dr.–Ing. K. Diepold
Die Dissertation wurde am 10.06.2008 bei der Technischen Universit¨at Mu¨nchen
eingereicht und durch die Fakult¨at fu¨r Elektrotechnik und Informationstechnik am
28.11.2008 angenommen.iii
Preface
This thesis was written between January 2006 and June 2008 during my time at the
Institute for Communications Engineering (LNT) of Technische Universit¨at Mu¨nchen. It
would not have been possible without the following persons:
First, I would like to thank my supervisor Professor Joachim Hagenauer for his constant
support and guidance. The fruitful discussions with him during our frequent ComInGen-
meetings have doubtlessly shaped this work. Moreover, I also thank Professor Klaus
Diepold for acting as co-supervisor despite the interdisciplinary nature of this work.
Some other people contributed significantly to this thesis: Jakob Mu¨ller was a constant
help regarding biological questions, while Zaher Dawy and my colleagues Bernhard and
Janis spent many hours proofreading the thesis and discussing its technical aspects with
me. Furthermore, Torsten and Steffi were important and precise proofreaders regarding
the linguistics. I strongly appreciate your opinion!
I was very lucky to have some excellent students working under my supervision. Nora
Tax, Nabeel Sulieman, Tobias Rehrl and Friedrich Kischkel were courageous enough to
face the risk of such an interdisciplinary topic and all have a major share in this thesis.
Last but most importantly, my friends and my family have made this thesis possible by
accompanying me through a time of stress and pressure (towards the end), high demands
and doubts (most of the time), overload and frustration (fortunately only from time to
time). These are above all my parents Jutta and Hugo, my brother Torsten, Wolfgang
as well as my close friends Sibylle, Philipp, Robert and Dominique. The importance of
my Habibi Bernhard can hardly be verbalized and will therefore be expressed in personal
moments instead of in this preface. Without your support, love and understanding, this
thesis would not have become what it is now!
Johanna WeindlMu¨nchen, June 2008v
Contents
1 Introduction 1
2 Frame Synchronization in Continuous Transmission 4
2.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Threshold detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Maximum selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Optimum sync word location rule . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Periodically inserted sync words . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Aperiodically inserted sync words . . . . . . . . . . . . . . . . . . . 8
2.3 Synchronization performance and error sources . . . . . . . . . . . . . . . . 8
2.3.1 Sequence model: symbols independently and uniformly distributed . 9
2.3.2 Sequence model: Markov chain . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Threshold detection of periodically inserted sync words . . . . . . . 12
2.3.4 Threshold detection of aperiodically inserted sync words . . . . . . 13
2.4 Sync word design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Random occurrence of the sync word . . . . . . . . . . . . . . . . . 13
2.4.2 Shifted synchronization . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Sync word families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Sync words for channels with phase ambiguities . . . . . . . . . . . 18
2.5.2 Sync words for channels without phase ambiguities . . . . . . . . . 19
2.5.3 Bifix-free sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.4 Distributed sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 20vi Contents
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Biological Background 21
3.1 The DNA as a digital signal . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Historical steps in molecular biology. . . . . . . . . . . . . . . . . . . . . . 22
3.3 Terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 DNA and RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 Genes and proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.4 Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.5 Prokaryotic and eukaryotic organisms . . . . . . . . . . . . . . . . . 25
3.4 Gene expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.2 Prokaryotic transcription . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Eukaryotic transcription . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.4 Prokaryotic translation . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.5 Eukaryotic translation . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Protein-DNA interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Changes in the DNA geometry . . . . . . . . . . . . . . . . . . . . 34
3.5.2 Major and minor groove . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Fundamental interactions. . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.4 Target search of proteins on the DNA . . . . . . . . . . . . . . . . . 35
3.6 Gene expression as a communication system . . . . . . . . . . . . . . . . . 36
3.6.1 Non-protein-coding DNA . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6.2 Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.3 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.4 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.5 Protein-DNA interactions . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Contents vii
4 Analysis of Biological Synchronization Words in Bacteria 41
4.1 Promoter in Escherichia coli . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1 Autocorrelation properties . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.2 Adapted autocorrelation function . . . . . . . . . . . . . . . . . . . 43
4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.4 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.5 The promoter as a distributed synchronization sequence . . . . . . 48
4.1.6 Markov analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Translation initiator region in Escherichia coli . . . . . . . . . . . . . . . . 53
4.2.1 Sequence data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . 54
4.2.3 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.4 Synchronization properties . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Prokaryotic Transcription Initiation 61
5.1 Promoter detection in Escherichia coli . . . . . . . . . . . . . . . . . . . . 62
705.1.1 Weight matrix model of σ . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Synchronization algorithm . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.3 Average consideration . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Results and interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Additional synchronization signals . . . . . . . . . . . . . . . . . . . 66
5.2.2 Energy landscape in the wider surrounding . . . . . . . . . . . . . . 66
5.2.3 Clustering of promoters . . . . . . . . . . . . . . . . . . . . . . . . 67
705.3 Kinetic analysis of promoter search by σ . . . . . . . . . . . . . . . . . . 68
5.3.1 Arrhenius equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Linear approximation of the energy landscape . . . . . . . . . . . . 69
5.3.3 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.4 Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71viii Contents
5.3.5 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.6 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Eukaryotic Transcription Initiation 75
6.1 Differences to bacterial transcription initiation . . . . . . . . . . . . . . . . 75
6.1.1 Protein-DNA interaction of the RNA polymerase . . . . . . . . . . 76
6.1.2 Promoter elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1.3 Transcription factor binding sites . . . . . . . . . . . . . . . . . . . 77
6.1.4 CpG islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1.5 Chromatin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Information theoretic analysis . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.1 Weight matrix model . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2.2 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.3 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . 85
6.3 Results and interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3.1 Comparison of the information theoretic measures . . . . . . . . . . 87
6.3.2 Promoter surrounding . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.3 Promoter site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4 Clustering of promoters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.1 Transcription-factor binding site . . . . . . . . . . . . . . . . . . . . 91
6.4.2 Nucleosome positioning . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.3 DNA bendability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7 Prokaryotic Translation Initiation 95
7.1 Detection of the Shine-Dalgarno sequence in Escherichia coli . . . . . . . . 95
7.1.1 Synchronization algorithm . . . . . . . . . . . . . . . . . . . . . . . 96
7.1.2 Sequence data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Contents ix
7.1.3 Performance measure . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.1.4 13 bases complement model . . . . . . . . . . . . . . . . . . . . . . 98
7.1.5 Shine-Dalgarno sequence based model . . . . . . . . . . . . . . . . . 99
7.1.6 May’s parity check model . . . . . . . . . . . . . . . . . . . . . . . 101
7.1.7 16S rRNA based model . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.1.8 Detection signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.2 Energy metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.2.1 Watson-Crick base pairing . . . . . . . . . . . . . . . . . . . . . . . 106
7.2.2 Including wobble base pairs . . . . . . . . . . . . . . . . . . . . . . 107
7.2.3 Including terminal mismatches . . . . . . . . . . . . . . . . . . . . . 107
7.2.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3 Mutational analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.1 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.2 Generalization to all bases . . . . . . . . . . . . . . . . . . . . . . . 111
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8 Eukaryotic Translation Initiation 114
8.1 Differences to prokaryotic translation initiation . . . . . . . . . . . . . . . . 114
8.1.1 Initiator region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.1.2 mRNA modification for protection . . . . . . . . . . . . . . . . . . 115
8.1.3 Translation initiation factors . . . . . . . . . . . . . . . . . . . . . . 115
8.1.4 mRNA ring structure . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.1.5 Protein interactions during initiation . . . . . . . . . . . . . . . . . 116
8.2 Information theoretic analysis . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.2.1 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . 117
8.2.2 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.3 Detection of the Kozak sequence . . . . . . . . . . . . . . . . . . . . . . . . 121
8.3.1 Codebook model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3.2 Results and interpretation . . . . . . . . . . . . . . . . . . . . . . . 123x Contents
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9 Conclusions 126
9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.2 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.3 Future research directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
A Notation and Symbols 131
A.1 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
A.2 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
B Sync Word Families 137
B.1 Barker sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
B.2 Sequences found by Maury and Styles . . . . . . . . . . . . . . . . . . . . . 138
B.3 Sequences found by Neuman and Hofman. . . . . . . . . . . . . . . . . . . 139
B.4 Bifix-free sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
B.5 Distributed sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
C Sequence Data and Implementation Details 141
C.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
C.1.1 Promoters of Escherichia coli . . . . . . . . . . . . . . . . . . . . . 141
C.1.2 Eukaryotic promoters . . . . . . . . . . . . . . . . . . . . . . . . . . 142
C.1.3 mRNAs of Escherichia coli . . . . . . . . . . . . . . . . . . . . . . . 143
C.1.4 Eukaryotic mRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
C.2 Data access and processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
C.3 Nucleotide composition of the eukaryotic promoter datasets . . . . . . . . . 144
C.3.1 Human promoter surrounding . . . . . . . . . . . . . . . . . . . . . 144
C.3.2 Arthropod promoter surrounding . . . . . . . . . . . . . . . . . . . 145
D Derivations 146
D.1 Escape rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146