Computational methods for the integration of biological activity and chemical space [Elektronische Ressource] / vorgelegt von Eugen Lounkine

Computational methods for the integration of biological activity and chemical space [Elektronische Ressource] / vorgelegt von Eugen Lounkine

-

English
145 Pages
Read
Download
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

Computational Methods for theIntegration of Biological Activityand Chemical SpaceDissertation zurErlangung des Doktorgrades (Dr. rer. nat.) derMathematisch-Naturwissenschaftlichen Fakult¨at derRheinischen Friedrich-Wilhelms-Universitat¨ Bonnvorgelegt vonEugen Lounkineaus MoskauBonn2009Angefertigt mit Genehmigung der Mathematisch-NaturwissenschaftlichenFakult¨at der Rheinischen Friedrich-Wilhelms-Universit¨at Bonn1. Referent: Univ.-Prof. Dr. rer. nat. Jurg¨ en Bajorath2. Referent: Dr. rer. nat. Michael Gut¨ schowTag der Promotion: 29.10.2009Erscheinungsjahr: 2009For my ParentsAbstractOne general aim of medicinal chemistry is the understanding ofstructure-activity relationships of ligands that bind to biological targets. Ad-vances in combinatorial chemistry and biological screening technologies allowthe analysis of ligand-target relationships on a large-scale. However, in order toextractusefulinformationfrombiologicalactivitydata,computationalmethodsare needed that link activity of ligands to their chemical structure.In this thesis, it is investigated how fragment-type descriptors of molec-ular structure can be used in order to create a link between activity and chem-ical ligand space. First, an activity class-dependent hierarchical fragmentationscheme is introduced that generates fragmentation pathways that are alignedusing established methodologies for multiple alignment of biological sequences.

Subjects

Informations

Published by
Published 01 January 2009
Reads 16
Language English
Document size 2 MB
Report a problem

Computational Methods for the
Integration of Biological Activity
and Chemical Space
Dissertation zur
Erlangung des Doktorgrades (Dr. rer. nat.) der
Mathematisch-Naturwissenschaftlichen Fakult¨at der
Rheinischen Friedrich-Wilhelms-Universitat¨ Bonn
vorgelegt von
Eugen Lounkine
aus Moskau
Bonn
2009Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen
Fakult¨at der Rheinischen Friedrich-Wilhelms-Universit¨at Bonn
1. Referent: Univ.-Prof. Dr. rer. nat. Jurg¨ en Bajorath
2. Referent: Dr. rer. nat. Michael Gut¨ schow
Tag der Promotion: 29.10.2009
Erscheinungsjahr: 2009For my ParentsAbstract
One general aim of medicinal chemistry is the understanding of
structure-activity relationships of ligands that bind to biological targets. Ad-
vances in combinatorial chemistry and biological screening technologies allow
the analysis of ligand-target relationships on a large-scale. However, in order to
extractusefulinformationfrombiologicalactivitydata,computationalmethods
are needed that link activity of ligands to their chemical structure.
In this thesis, it is investigated how fragment-type descriptors of molec-
ular structure can be used in order to create a link between activity and chem-
ical ligand space. First, an activity class-dependent hierarchical fragmentation
scheme is introduced that generates fragmentation pathways that are aligned
using established methodologies for multiple alignment of biological sequences.
These alignments are then used to extract consensus fragment sequences that
serve as a structural signature for individual biological activity classes.
It is also investigated how defined, chemically intuitive molecular frag-
ments can be organized based on their topological environment and co-
occurrence in compounds active against closely related targets. Therefore, the
Topological Fragment Index is introduced that quantifies the topological envi-
ronment complexity of a fragment in a given molecule, and thus goes beyond
fragment frequency analysis. Fragment dependencies have been established on
the basis of common topological environments, which facilitates the identifica-
tionofactivityclass-characteristicfragmentdependencypathwaysthatdescribe
fragment relationships beyond structural resemblance.
Becausefragmentsareoftendependentoneachotherinanactivityclass-
specificmanner, theimportanceofdefinedfragmentcombinationsforsimilarity
searching is further assessed. Therefore, Feature Co-occurrence Networks are
introduced that allow the identification of feature cliques characteristic of in-
dividual activity classes. Three differently designed molecular fingerprints are
compared for their ability to provide such cliques and a clique-based similarity
searching strategy is established. For molecule- and activity class-centric fin-
gerprint designs, feature combinations are shown to improve similarity search
performance in comparison to standard methods. Moreover, it is demonstrated
that individual features can form activity-class specific combinations.
Extending the analysis of feature cliques characteristic of individual ac-
tivity classes, the distribution of defined fragment combinations among several
compound classes acting against closely related targets is assessed. FragmentFormalConceptAnalysisisintroducedforflexibleminingofcomplexstructure-
activity relationships. It allows the interactive assembly of fragment queries
that yield fragment combinations characteristic of defined activity and potency
profiles. It is shown that pairs and triplets, rather than individual fragments
distinguish between different activity profiles. A classifier is built based on
these fragment signatures that distinguishes between ligands of closely related
targets.
Going beyond activity profiles, compound selectivity is also analyzed.
Therefore, Molecular Formal Concept Analysis is introduced for the systematic
mining of compound selectivity profiles on a whole-molecule basis. Using this
approach, structurally diverse compounds are identified that share a selectivity
profilewithselectedtemplatecompounds. Structure-selectivityrelationshipsof
obtained compound sets are further analyzed.
Acknowledgments
I like to thank my supervisor Prof. Dr. Jurg¨ en Bajorath for his guidance and
help. IalsowouldliketothankProf.Dr.MichaelGut¨ schowforhiswillingnessto
be the co-referent. Special thanks go to Dr. Jos´e Batista for his help and advice
during the entire project and to Ye Hu and Felix Krug¨ er for their collaboration
onindividualstudies. Finally, IwouldliketothankallmycolleaguesfromB-IT
for the encouraging and friendly working atmosphere and all my friends who
have supported me.Contents
1 Introduction 1
2 Molecular Fragmentation Approaches 9
2.1 Historical Overview of Fragment Design . . . . . . . . . . . . . 9
2.2 Molecular Fragmentation Approaches . . . . . . . . . . . . . . . 10
2.2.1 Knowledge-based . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Hierarchical . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Retrosynthetic . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 Random . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Core Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Molecular Core Mapping . . . . . . . . . . . . . . . . . . 17
2.3.2 Core Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Topological Fragment Index 29
3.1 Topological Fragment Index Method . . . . . . . . . . . . . . . 30
3.2 Application to RECAP Fragments . . . . . . . . . . . . . . . . . 32
3.2.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 RECAP Fragmentation and Mapping . . . . . . . . . . . 32
3.2.3 ToFI Calculation for RECAP Fragments . . . . . . . . . 33
3.3 Hierarchical Organization of RECAP Fragments . . . . . . . . . 35
3.3.1 Dependency Graph Calculation . . . . . . . . . . . . . . 35
3.3.2 Activity Class-Characteristic RECAP Fragments . . . . 37
3.3.3 Fragment Relationships. . . . . . . . . . . . . . . . . . . 38
3.3.4 Fragment Topology Clusters . . . . . . . . . . . . . . . . 40
3.3.5 Distribution of ACCRF in Topology Clusters . . . . . . . 40
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Feature Combinations in Similarity Searching 45
4.1 Structural Fingerprints . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Multiple Template Similarity Searching . . . . . . . . . . . . . . 47
4.2.1 Quantification of Fingerprint Overlap . . . . . . . . . . . 47
4.2.2 Nearest Neighbor Searching . . . . . . . . . . . . . . . . 48
iii Contents
4.2.3 Centroid Searching . . . . . . . . . . . . . . . . . . . . . 48
4.3 Feature Co-occurrence Networks . . . . . . . . . . . . . . . . . . 49
4.3.1 FCoN Generation and Clique Detection . . . . . . . . . . 49
4.3.2 Clique-Based Similarity Searching . . . . . . . . . . . . . 56
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Fragment Formal Concept Analysis 63
5.1 Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Lattices . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 FragFCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Formal Context . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.2 Fragment Formal Concept Analysis . . . . . . . . . . . . 67
5.2.3 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . 78
5.3 FragFCA Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3.1 Fragment Generation . . . . . . . . . . . . . . . . . . . . 79
5.3.2 Scale and Query Design . . . . . . . . . . . . . . . . . . 79
5.3.3 Compound Classification . . . . . . . . . . . . . . . . . . 80
5.3.4 Similarity Searching . . . . . . . . . . . . . . . . . . . . 82
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6 Molecular Formal Concept Analysis 85
6.1 Compound Selectivity . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 MolFCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1 Compound Selectivity Annotation . . . . . . . . . . . . . 86
6.2.2 MolFCA Scale Design . . . . . . . . . . . . . . . . . . . 86
6.2.3 MolFCA Queries . . . . . . . . . . . . . . . . . . . . . . 88
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7 Summary and Conclusions 103
A Software and Databases 107
B Additional Data 111
B.1 Feature Co-occurrence Networks . . . . . . . . . . . . . . . . . . 111
B.2 Fragment Formal Concept Analysis . . . . . . . . . . . . . . . . 118
B.3 Molecular Formal Concept Analysis . . . . . . . . . . . . . . . . 121List of Figures
1.1 Molecular descriptors . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Target-, ligand-, and target-ligand space . . . . . . . . . . . . . 4
1.3 Activity cliffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Non-drug-like chemical groups . . . . . . . . . . . . . . . . . . . 10
2.2 Atom-centered fragments and atom pairs . . . . . . . . . . . . . 11
2.3 Hierarchical fragmentation . . . . . . . . . . . . . . . . . . . . . 12
2.4 RECAP fragmentation . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Brownian processing . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 MolBlaster fragmentation . . . . . . . . . . . . . . . . . . . . . 16
2.7 Activity class characteristic substructures . . . . . . . . . . . . . 16
2.8 Molecular core mapping . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Core-based fragmentation . . . . . . . . . . . . . . . . . . . . . 20
2.10 Exemplary core tree . . . . . . . . . . . . . . . . . . . . . . . . 21
2.11 Fragmentation pathways . . . . . . . . . . . . . . . . . . . . . . 23
2.12 Fragment string similarity . . . . . . . . . . . . . . . . . . . . . 25
2.13 Multiple core path alignment . . . . . . . . . . . . . . . . . . . 26
3.1 Binary fingerprints and counts . . . . . . . . . . . . . . . . . . . 30
3.2 ToFI calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Exemplary ToFi values . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 RECAP ToFI calculation. . . . . . . . . . . . . . . . . . . . . . 34
3.5 Dependency graph calculation . . . . . . . . . . . . . . . . . . . 37
3.6 Exemplary ToFI dependency subgraph . . . . . . . . . . . . . . 39
3.7 ToFI fragment topology clusters . . . . . . . . . . . . . . . . . . 41
3.8 ACCRF topology cluster distribution . . . . . . . . . . . . . . . 42
4.1 Fingerprint design strategies . . . . . . . . . . . . . . . . . . . . 46
4.2 Multiple template similarity searching . . . . . . . . . . . . . . . 48
4.3 FCoN clique detection . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Feature clique distribution in database . . . . . . . . . . . . . . 53
4.5 Feature clique search strategy . . . . . . . . . . . . . . . . . . . 56
4.6 FCoN virtual screening trials. . . . . . . . . . . . . . . . . . . . 60
iiiiv List of Figures
5.1 Formal concept analysis . . . . . . . . . . . . . . . . . . . . . . 64
5.2 FCA scales and scale combination . . . . . . . . . . . . . . . . . 65
5.3 General GPCR scales . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Specific GPCR scales . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5 Redundancy filtering . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 D1 signature fragment distribution . . . . . . . . . . . . . . . . 72
5.7 D1 signature fragment combinations . . . . . . . . . . . . . . . 73
5.8 α1 fragment distribution . . . . . . . . . . . . . . . . . . . . . . 74
5.9 Fragment combinations in α1 and serotonin antagonists . . . . . 74
5.10 Fragment combinations specific for α and D2 against 5-HT an-1
tagonists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.11 Fragment combinations specific for 5-HT1A antagonists . . . . . 76
5.12 Fragment combinations specific for highly potent D4 antagonists 77
5.13 FragFCA classification results . . . . . . . . . . . . . . . . . . . 81
5.14 FragFCAn ROC curves . . . . . . . . . . . . . . . . 82
6.1 MolFCA scales . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Vorinostat selectivity profile query . . . . . . . . . . . . . . . . . 90
6.3 Compounds matching the Vorinostat profile . . . . . . . . . . . 91
6.4 deviating from the Vorinostat profile . . . . . . . . 93
6.5 Cilomilast profile query . . . . . . . . . . . . . . . . . . . . . . . 95
6.6 Compounds matching the Cilomilast profile . . . . . . . . . . . 96
6.7 matching the MPA profile . . . . . . . . . . . . . . 98
6.8 De novo MolFCA query design. . . . . . . . . . . . . . . . . . . 99
6.9 Compounds matching the de novo MolFCA query . . . . . . . . 100
B.1 FCoN clique number distribution . . . . . . . . . . . . . . . . . 112
B.2 FCoN size distribution . . . . . . . . . . . . . . . . . . . 113