Equipe ComBi - sujet de thèse -2011
9 Pages
English
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Equipe ComBi - sujet de thèse -2011

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer
9 Pages
English

Description

LABORATOIRE D’INFORMATIQUE DE NANTES-ATLANTIQUE ÉCOLE DOCTORALE STIM, N. 503UMR 6241 « Sciences et technologiesde l’information et des mathématiques »Ph. D. Proposal for 2011NGS : fast and furious genomic data analysisPh. D. Thesis DirectorNAME, Surname : RUSU, IrenaResearch Team : ComBiLaboratory : LINA (UMR 6241)Affiliation : Univ. NantesE-mail : Irena.Rusu (at) univ-nantes.frPhone : (+33/0) 2 51 12 58 16Advising Rate : 50 %Current Ph. D. Students : 3Ph. D. adviserNAME, Surname : FERTIN, GuillaumeResearch Team : LINALaboratory : LINA (UMR 6241)Affiliation : Univ. NantesE-mail : Guillaume.Fertin (at) univ-nantes.frPhone : (+33/0) 2 51 12 58 14Advising Rate : 50 %Current Ph. D. Students : 3Possible Funding : MESRPh. D. Proposal for 2011NGS : fast and furious genomic data analysisAbstract. The analysis of partially sequenced (or draft) genomes is one of the very recentproblems brought in by the next generation sequencing technologies. We address here threeclasses of problems related to draft genomes: (1) Recovering mapping errors of fragments along adraft genome and correcting them; (2) Improving the assembly of draft genomes, by filling in thegaps between the fragments; (3) Designing algorithms for identifying genome rearrangements indraft genomes. These three classes of problems will be addressed using comparison, thatneeds to be reinvented within this new context.Keywords. next generation sequencing, draft genomes, genome ...

Subjects

Informations

Published by
Reads 7
Language English

Exrait

LABORATOIRE D’INFORMATIQUE DE NANTES-ATLANTIQUE ÉCOLE DOCTORALE STIM, N. 503
UMR 6241 « Sciences et technologies
de l’information et des mathématiques »
Ph. D. Proposal for 2011
NGS : fast and furious genomic data analysis
Ph. D. Thesis Director
NAME, Surname : RUSU, Irena
Research Team : ComBi
Laboratory : LINA (UMR 6241)
Affiliation : Univ. Nantes
E-mail : Irena.Rusu (at) univ-nantes.fr
Phone : (+33/0) 2 51 12 58 16
Advising Rate : 50 %
Current Ph. D. Students : 3
Ph. D. adviser
NAME, Surname : FERTIN, Guillaume
Research Team : LINA
Laboratory : LINA (UMR 6241)
Affiliation : Univ. Nantes
E-mail : Guillaume.Fertin (at) univ-nantes.fr
Phone : (+33/0) 2 51 12 58 14
Advising Rate : 50 %
Current Ph. D. Students : 3
Possible Funding : MESRPh. D. Proposal for 2011
NGS : fast and furious genomic data analysis
Abstract. The analysis of partially sequenced (or draft) genomes is one of the very recent
problems brought in by the next generation sequencing technologies. We address here three
classes of problems related to draft genomes: (1) Recovering mapping errors of fragments along a
draft genome and correcting them; (2) Improving the assembly of draft genomes, by filling in the
gaps between the fragments; (3) Designing algorithms for identifying genome rearrangements in
draft genomes. These three classes of problems will be addressed using comparison, that
needs to be reinvented within this new context.
Keywords. next generation sequencing, draft genomes, genome comparison
2Introduction
Context
Introduced three years ago, Next Generation Sequencing (NGS) technologies dramat-
ically changed the acquisition of genomics data. The new platforms offer much reduced
costs and an increased speed of data acquisition, allowing for instance the treatment of
many genomes simultaneously. However, the mapping and assembly of this data into
complete is much slower than data acquisition, especially when many genomes
are involved. As a result, an increasing number of genomes are being published in unfin-
ished (scaffold or contig) form following new standards [CGea09]. The analysis of those
draft genomes with the aim of either finishing the assembly, or recovering mapping errors
has thus become a very important issue in current genomics [JZSZdf, Mun10].
Problems
A lot of information is extracted, in genome analysis, by comparison of two genomes.
A genome is represented either as a sequence of nucleotides (denoted A, C, T, G) or as
a sequence of genes (usually denoted by signed integers +1, -2 etc.). The nucleotide se-
quence of a genome (for a given organism) is obtained by a technique called sequencing,
which consists in cutting many copies of the genome in small fragments, obtaining the
nucleotide sequence of each fragment, mapping each fragment on the genome and then
assembling the fragments. Very recently, the NGS techniques revolutionized genome se-
quencing by allowing us to obtain the nucleotide sequence of the fragments very quickly
and with low cost. However, the assembly of the fragments did not evolved at the same
rate, which implies that many genomes are only partially known. Such a genome ap-
pears as sequences of nucleotides, or sequences of genes, with missing elements. Thus,
the NGS data has a very different (with respect to classic sequencing) error model, re-
quiring modifications to classical algorithms. Moreover, the huge size of the data requires
the use of efficient algorithms, appropriate hardware, and effective implementations.
The analysis of partially sequenced (or draft) genomes is thus one of the very recent
problems brought in by the NGS. We address here three classes of problems, namely:
– Recovering mapping errors of fragments along a draft genome and correcting them
– Improving the assembly of draft genomes, by filling in the gaps between the frag-
ments
– Designing algorithms for identifying genome rearrangements in draft genomes.
3Expected work
Objectives
The three classes of problems presented above will be addressed using genome com-
parison, that needs to be reinvented within this new context. Indeed, the comparison
of partially sequenced genomes is a new topic, to which only a few studies have been
devoted to the date [JZSZdf, Mun10]. Its aim is to identify similar regions between two
or several genomes, based either on local structural differences between the genomic se-
quences or on rearrangement operations between genes in the genomes. In completely
+sequences genomes, such problems have been intensively studied [FLR 09], resulting in
algorithms or hardness proofs depending on the nature of the chosen model (i.e. defini-
tion of a similar region, genomes represented as permutations or sequences, hypothesis
on genome evolution etc.).
Working plan
The main steps of the work are (1) studying the large bibliography on completely
sequenced genome comparison and the little bibliography on draft genomes; (2) propos-
ing comparison methods adapted to draft genomes first in the simple case of genomes
represented as permutations, and then as sequences, and this under different models of
evolution; (3) developing an approach to compare a draft genome with one or several
phylogenetically close complete genomes; (4) proposing criteria to identify mapping er-
rors, to fill in the gaps between fragments and to find the rearrangement operations that
held between the draft genome and the complete ones during evolution; (5) devising
the corresponding algorithms for which the efficiency is essential; (6) implementing and
testing them on real data.
Such an approach allows us to perform an integrated analysis of the three issues
(fragment mapping, gaps between scaffolds, rearrangement events) simultaneously, thus
permitting a global (and thus optimized) treatment of errors and evolution events.
Applicants
Skills
The work is mainly algorithmic, and there is no pre-requisite in biology. The applicant
is therefore expected to have very good skills in algorithmics, as well as a real concern
for algorithmic complexity.
Applicants names and academic results (if any)
None yet.
4Bibliography
[CGea09] P.S. Chain, D.V. Grafham, and R.S. Fulton et al. Genome projects standards
in a new era of sequencing. Science, (326):236–237, 2009.
+[FLR 09] G. Fertin, A. Labarre, I. Rusu, E. Tannier, and S. Vialette. Combinatorics of
Genome Rearrangements. MIT Press, 2009.
[JZSZdf] H. Jiang, C. Zheng, D. Sankoff, and B. Zhu. Scaffold filling under the break-
point distance. In Proc. RECOMB-CG’10, LNBI series 6398, pages 83–92, 2010,
http: //www.cs.montana.edu/bhz/scaffold.pdf.
[Mun10] Scaffold filling, contig fusion and comparative gene order inference. (11),
2010.
5CV de I. Rusu
Position
– Professor in Computer Science
– Address: LINA, University of Nantes, 2, rue de la Houssinière, 44300 Nantes
– Contact: Tel : (+33) 2 51 12 58 16, Fax :: (+33) 2 51 12 58 00, e-mail :
– Home page :
Education
– 2000-: Professor in Computer Science, LINA, University of Nantes.
– 1995-2000: Assistant Professor, LIFO, University of Orleans.
– 1999: HDR “Aspects théoriques et algorithmiques des graphes parfaits”, LIFO, Uni-
versity of Orleans.
– 1994-1995: Temporary Research and Teaching Assistant, LRI, Paris-Sud University
(XI).
– 1992-1994: Ph. D. Thesis “Graphes parfaits: étude algorithmique et structurale”,
Paris-Sud University.
Service
– Head of the ComBi team at LINA
– Co-Chair, JOBIM 2009; Program Committee, JOBIM 2010
– Referee for ISBRA 2009, ISBRA 2010, RECOMB-CG 2008
– Expert for the NSA Mathematical Sciences Program standard grant, 2010
Publications
1. 29 articles in refeered international journals, 13 articles in refeered international
conferences, 1 book (co-author), 2 book chapters, 2 proceedings books (editor)
2. 5 significant publications :
– G. Fertin, A. Labarre, I. Rusu, E. Tannier, S. Vialette - Combinatorics of Genome
Rearrangements, book, MIT Press, 2009.
– L. Bulteau, G. Fertin, I. Rusu - Revisiting the Minimum Breakpoint Linearization
Problem, TAMC 2010, 163-174 (2010).
– G. del Mondo, D. Eveillard, I. Rusu - Homogeneous decomposition of protein in-
teraction networks: refining the description of intra-modular interactions, Bioin-
formatics 25(7), 926-932 (2009).
– F. Roussel, I. Rusu, H. Thuillier - The Strong Perfect Graph Conjecture: 40
years of attempts, and its resolution, Discrete Mathematics 309(20), 6092-6113
(2009).
6

– L. Bulteau, G. Fertin, I. Rusu - Maximal Strip Recovery Problem with Gaps: Hard-
ness and Approximation Algorithms, ISAAC 2009: 710-719 (2009).
7CV de G. Fertin
Position
– Professor in Computer Science
– Address: LINA, University of Nantes, 2, rue de la Houssinière, BP 92208, 44322
Nantes Cedex 3
– Tel: (+33) 2 51 12 58 12, Fax: (+33) 2 51 12 58 00, e-mail :

Education
– 2005-: Professor in Computer Science, LINA, University of Nantes.
– 2000-2005: Assistant Professor, LINA, University of Nantes.
– 2004: HDR “Algorithmique et Optimisation Combinatoire: Applications aux RÃc seaux
d’Interconnexion, à la Coloration de Graphes et à la Bio-Informatique”, LINA,
University of Nantes.
– 1999-2000: Temporary Research and Teaching Assistant, LaBRI, University Bor-
deaux 1.
– 1996-1999: Ph. D. Thesis “Etude des Communications dans les RÃc seaux d’Interconnexion”,
LaBRI, University Bordeaux 1.
Service
– Associate Editor for the journal BMC Bioinformatics
– Program Committee: JOBIM 2006/2010, ISBRA 2009/2010/2011, RECOMB-CG
2008/2009/2010, IEEE ICCABS 2011
– Referee for journals and conferences (approx. 10 per year), e.g. J. Comp. Biology,
IEEE Trans. Comp. Biology and Bioinformatics, J. Discrete Algorithms, J. Graph
Theory, CPM, ESA, MFCS.
Publications
1. 31 articles in refeered international journals, 39 articles in refeered international
conferences, 1 book (co-author), 1 book chapters
2. 5 significant publications :
– G. Fertin, A. Labarre, I. Rusu, E. Tannier and S. Vialette - Combinatorics of
Genome Rearrangements, book, MIT Press, 2009.
– S. Angibaud, G. Fertin, I. Rusu, A. Thévenin and S. Vialette - Efficient Tools for
Computing the Number of Breakpoints and the Number of Adjacencies between
two Genomes with Duplicate Genes. J. Comp. Biology, 15(8), pp. 1093–1115,
2008
– R. Dondi, G. Fertin and S. Vialette - Complexity issues in Vertex-Colored Graph
Pattern Matching. J. Discrete Algorithms. To appear.
8