182 Pages
English

Alternative splicing and protein structure evolution [Elektronische Ressource] / vorgelegt von Fabian Birzele

-

Gain access to the library to view online
Learn more

Description

AlternativeSplicingandProteinStructureEvolutionFabianBirzeleMunchen¨ 2008AlternativeSplicingandProteinStructureEvolutionFabianBirzeleDissertationanderFakultat¨ fur¨ Mathematik,InformatikundStatistikderLudwig–Maximilians–Universitat¨Munchen¨vorgelegtvonFabianBirzeleausWurzb¨ urg¨Munchen,den09.12.2008Erstgutachter: Prof. Dr. RalfZimmerZweitgutachter: Prof. Dr. DimtrijFrishmanTagdermundlichen¨ Prufung:¨ 27.01.2009ContentsAbstract xiiiZusammenfassung xv1 Introduction 1I ProteinStructureComparison 72 IntroductiontoProteinStructureAnalysis 92.1 Goalsandopenquestionsinproteinstructureanalysis . . . . . . . . . . . . . . . 92.2 Existingapproachestoproteinstructurecomparison . . . . . . . . . . . . . . . . 102.2.1 SCOPandCATH-thestandardoftruth . . . . . . . . . . . . . . . . . . 102.2.2 Automatedproteinstructurealignmentandcomparison . . . . . . . . . . 113 ASystematicComparisonofSCOPandCATH 133.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.1 Mappingofdomainassignments . . . . . . . . . . . . . . . . . . . . . . 143.1.2innernodesofthehierarchies . . . . . . . . . . . . . . . . . . 153.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2.2 DetailedComparisonofSCOPandCATH . . . . . . . . . . . . . . . . . 163.3 ApplicationsoftheSCOP CATHmapping . . . . . . . . . . . . . . . .

Subjects

Informations

Published by
Published 01 January 2008
Reads 17
Language English
Document size 9 MB

AlternativeSplicingandProtein
StructureEvolution
FabianBirzele
Munchen¨ 2008AlternativeSplicingandProtein
StructureEvolution
FabianBirzele
Dissertation
anderFakultat¨ fur¨ Mathematik,InformatikundStatistik
derLudwig–Maximilians–Universitat¨
Munchen¨
vorgelegtvon
FabianBirzele
ausWurzb¨ urg
¨Munchen,den09.12.2008Erstgutachter: Prof. Dr. RalfZimmer
Zweitgutachter: Prof. Dr. DimtrijFrishman
Tagdermundlichen¨ Prufung:¨ 27.01.2009Contents
Abstract xiii
Zusammenfassung xv
1 Introduction 1
I ProteinStructureComparison 7
2 IntroductiontoProteinStructureAnalysis 9
2.1 Goalsandopenquestionsinproteinstructureanalysis . . . . . . . . . . . . . . . 9
2.2 Existingapproachestoproteinstructurecomparison . . . . . . . . . . . . . . . . 10
2.2.1 SCOPandCATH-thestandardoftruth . . . . . . . . . . . . . . . . . . 10
2.2.2 Automatedproteinstructurealignmentandcomparison . . . . . . . . . . 11
3 ASystematicComparisonofSCOPandCATH 13
3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Mappingofdomainassignments . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2innernodesofthehierarchies . . . . . . . . . . . . . . . . . . 15
3.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 DetailedComparisonofSCOPandCATH . . . . . . . . . . . . . . . . . 16
3.3 ApplicationsoftheSCOP CATHmapping . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 BenchmarkingStructure Comparisonmethods . . . . . . . . . . . . . . 20
3.3.2 Inter FoldSimilaritiesrevealedbyConsistencyChecks . . . . . . . . . . 24
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 ProteinStructureAlignmentconsideringPhenotypicPlasticity 27
4.1 Outlineofthemethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Vorolign-fastproteinstructurealignmentusingVoronoicontacts 33
5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1.1 Voronoitessellationofproteinstructures . . . . . . . . . . . . . . . . . 34vi CONTENTS
5.1.2 PropertiesofVoronoicells . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.3 SimilarityofVcells . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.4 Pairwisealignmentofproteinstructures . . . . . . . . . . . . . . . . . . 37
5.1.5 Multipleof . . . . . . . . . . . . . . . . . . 38
5.1.6 Fastscanforfamilymembers: VorolignScan . . . . . . . . . . . . . . . 39
5.1.7 Automaticdetectionofdomainsinmulti domainstructures . . . . . . . . 40
5.1.8 ParameterOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1 Familyrecognitiononatestsetof979queryproteins . . . . . . . . . . . 41
5.2.2 DetailedEvaluationontheSCOP CATHset . . . . . . . . . . . . . . . . 43
5.2.3 Alignmentquality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.4 Specificexamplesformultiplealignmentofproteinstructures . . . . . . 47
5.2.5 Largescaleevaluationofmultiplequality . . . . . . . . . . . 49
5.3 TheAutoPSIdatabase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3.1 Thecurrentdatabasecontent . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3.2 Databaseaccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Voronoicontactpatternsaroundfunctionallyimportantsites . . . . . . . . . . . 55
5.4.1 Outlineofthemethod . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4.2 Results: CasestudyoftheTrypsin likeserineproteasefamily . . . . . . 58
5.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
II AlternativeSplicingintheContextofProteinStructure 63
6 IntroductiontotheAnalysisofAlternativeSplicing 65
6.1 Fundamentalsofalternativesplicing . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.1 Thesplicingprocessinthecell . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.2 Differenttypesofalternativesplicingevents . . . . . . . . . . . . . . . . 66
6.1.3 Regulationofalternativesplicing . . . . . . . . . . . . . . . . . . . . . 67
6.2 Biologicalfunctionsofveevents . . . . . . . . . . . . . . . . . 68
6.3 Alternativesplicinganddisease . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7 AlternativeSplicingandProteinStructureEvolution 71
7.1 Hypothesesandmajorconcepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2.1 Alternativesplicingandliteraturedata . . . . . . . . . . . . . . . . . . . 73
7.2.2 Proteinstructureassignment . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2.3 Assignmentofproteinstructurestofamilies . . . . . . . . . . . . . . . . 74
7.2.4 Multiplestructurealignmentsandevolutionary”isoforms” . . . . . . . . 74
7.2.5 Alternativesplicingandalternativestructuralmodels . . . . . . . . . . . 74
7.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.3.1 Structuralcomplexityoffunctionalisoforms . . . . . . . . . . . . . . . 75CONTENTS vii
7.3.2 Interpretationofsplicingeventsusingevolutionaryisoforms . . . . . . . 79
7.3.3 Supportinghypothesesonfoldevolutionviaalternativesplicingdata . . . 80
7.3.4 Alternativesplicing: amechanismtoexploretheproteinfoldspace? . . . 81
7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8 AlternativeSplicingandProteinRepeats 87
8.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.1.1 Genomicdataandannotatedsplicevariants . . . . . . . . . . . . . . . . 88
8.1.2 SplicingeventsinSwissprot . . . . . . . . . . . . . . . . . . . 88
8.1.3 Repeatassignmentanddatacollection . . . . . . . . . . . . . . . . . . . 89
8.1.4 Assessingthestatisticalsignificanceoftheresults . . . . . . . . . . . . . 89
8.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.2.1 FrequencyofrepeatsinSwissprotandEnsembl . . . . . . . . . . . . . . 89
8.2.2 Alternativesplicingofproteinscontainingrepeats . . . . . . . . . . . . . 90
8.2.3veofC H Zinc fingermotifs . . . . . . . . . . . . . . 942 2
8.2.4 Sculptingproteinbindingdomains-Ankyrinrepeats . . . . . . . . . . . 95
8.2.5 Alternativesplicingofβ propellers . . . . . . . . . . . . . . . . . . . . 97
8.2.6 Importanceofsplicingforcomplextissueorganizations . . . . . . . . . 97
8.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
III Genome wideDetectionandAnalysisofAlternativeSplicing 101
9 DetectionandpredictionofAlternativeSplicingEvents 103
10 TheProSAS-Database 107
10.1 TheProSASpipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10.1.1 Genomeandalternativesplicingdata . . . . . . . . . . . . . . . . . . . 108
10.1.2 Proteinstructureprediction . . . . . . . . . . . . . . . . . . . . . . . . . 108
10.1.3 Characterizationofsplicingevents . . . . . . . . . . . . . . . . . . . . . 109
10.1.4 Affymetrixdatamapping . . . . . . . . . . . . . . . . . . . . . . . . . . 110
10.1.5 Furtherdatasources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
10.2 TheProSASdatabasecontent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
10.3 Thewebinterface . . . . . . . . . . . . . . . . . . . . . . . . 112
10.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
11 PASS-IdentifyingsplicingeventsinAffymetrixExonArrays 117
11.1 Outlineofthemethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
11.2 ResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
12 Alternativesplicingandproteomecomplexityinmassspectrometrydata 123
12.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
12.1.1 IdentificationofisoformsinprecompiledMSpeptidedatasets . . . . . . 124
12.1.2 Expectedversusobserveduniqueisoformpeptides . . . . . . . . . . . . 124viii Contents
12.1.3 Sequence basedanalysisofalternativesplicing . . . . . . . . . . . . . . 125
12.1.4 Proteinstructurepredictionpipeline . . . . . . . . . . . . . . . . . . . . 125
12.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
12.2.1 Presenceofisoformsinmassspectrometrydata . . . . . . . . . . . . . . 126
12.2.2 Changing signal peptides, localization and domain composition via al
ternativesplicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
12.2.3 Structuralanalysisofisoformsidentifiedinmassspectrometrydata . . . 130
12.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
12.4 Outlook: Identificationofnovelsplicevariantsinmassdata . . . . 135
IV ConclusionsandOutlook 139
13 ConclusionandOutlook 141
Bibliography 147
Acknowledgements 163
CurriculumVitae 165ListofFigures
3.1 ExamplesofinterfoldsimilaritiesdetectedbythesystematiccomparisonofSCOP
andCATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Evaluation of the performance of TM align on SCOP and the novel benchmark
setofconsistentlyclassifiedpairsofstructures . . . . . . . . . . . . . . . . . . . 22
4.1 Phenotypicplasticityobservedinthepheromonebindingdomainfamily(a.39.2.1) 28
4.2 SchematicoverviewonthesinglestepsofthePPMmethod . . . . . . . . . . . . 29
5.1 Anomalyinastructuralalignmentduetostructuraldivergence . . . . . . . . . . 34
5.2 Constructionofatwo dimensionalVoronoitessellation . . . . . . . . . . . . . . 35
5.3 Basic principle of the Vorolign scoring function to score the similarity of two
Voronoicells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Detailed evaluation of the performance of PPM and Vorolign on the SCOP
CATHconsensusset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Structural quality of Vorolign alignments across different sequence identities in
morethan8000pairsofstructuresfromthesameSCOPfamily . . . . . . . . . . 46
5.6 FlexiblestructuralalignmentofmembersoftheCalmodulinfamilycomputedby
Vorolign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.7 PairwiseVorolignalignmentqualityinducedfrommultiplestructurealignments
ofmorethan800SCOPfamilies . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.8 Conservation rate of biologically relevant residues (CSA patterns and cystein
bridges)inmultiplestructurealignments . . . . . . . . . . . . . . . . . . . . . . 51
5.9 Conservationrateofbiologicallyrelevantresidues(PROSITEpatternsandSwis
sprotannotatedfunctionalresidues)inmultiplestructurealignments . . . . . . . 51
5.10 DetailviewintheAutoPSIdatabasewebinterface . . . . . . . . . . . . . . . . . 54
5.11 Schematic example of the consistency check in the Vorolign Pattern identifica
tionmethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.12 PROSITE pattern matches of the catalytic triad on the Trypsin structure (PDB:
1a0j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.13 InitialVoronoipatternintheTrypsinstructure(PDB:1a0j) . . . . . . . . . . . . 59
5.14 FinalconsensusVoronoipatternforeukaryoticTyrpsin likeserineproteasesfam
ily(SCOP:b.47.1.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60x ListofFigures
6.1 Importantconsensussplicingsignalsinintronicsequences . . . . . . . . . . . . 66
6.2 Differenttypesofalternativesplicingevents . . . . . . . . . . . . . . . . . . . . 67
7.1 Schematic workflow of our approach for a structure based analysis of splicing
events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Distribution of 488 splicing events in domains, large parts (but no domain) as
wellasvariableandconservedregions . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Examples for the location of splicing events of different types, known to lead to
functionalproteins,visualizedonthecorrespondingproteinstructure . . . . . . . 77
7.4 Examplehowevolutionaryisoformscanhelptounderstandtheoutcomeofsplic
ingevents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.5 Examplesforpotentialfoldchangingeventsmediatedbyalternativesplicing . . . 82
8.1 Evolutionarymodelfortheenhancedgenerationofnewproteinvariantsbycou
plingrepeatexpansion(duplication)andalternativesplicing . . . . . . . . . . . 91
8.2 Schematic mechanism how to generate functional diversity through spliced re
peatproteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.3 Examplesforsplicingeventsaffectingrepeatsontheproteinstructurelevel . . . 96
9.1 Experimentalapproachestodetectalternativesplicingevents . . . . . . . . . . . 104
10.1 DatasourcesandtheirinterdependencyintheProSASdatabase . . . . . . . . . . 108
10.2 ProSASgenedetailview(Transcriptdetails)intheProSASwebinterface . . . . 113
10.3transcriptdetailviewintheProSASwebinterface . . . . . . . . . . . . 114
11.1 Examplesfortissue specificsplicevariantsdetectedbyPASS . . . . . . . . . . . 121
12.1 Basicprincipleofourmethodtodetectknownisoformsinmassspectrometrydata126
12.2 Evaluation of the structural complexity of splicing events confirmed in mass
spectrometrydata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
12.3 Examples of splicing events confirmed by mass spectrometry data visualized on
thecorrespondingproteinstructure . . . . . . . . . . . . . . . . . . . . . . . . . 133
12.4 NovelVitamin Dbindingproteinsplicejunctionandisoformidentifiedinmouse
plasmamassspectrometrydata . . . . . . . . . . . . . . . . . . . . . . . . . . . 138