SURF tutorial
3 Pages
English
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

SURF tutorial

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer
3 Pages
English

Description

SURF TUTORIAL 1 Prelude This document presents the SURF steps from instance creation to batch feature detection. You may add de SURF bin directory to your PATH variable (.cshrc. file). setenv PATH ${PATH}:/project/surf/bin We will assume in this tutorial that SURF is installed in /project directory and that the studied instance will be called ‘MyProject. SURF can process chromatogram, fasta, dbest and embl batches, we will use a chromatogram batch in that tutorial in order to see the whole SURF functionalities. 2 Instance creation Surf provides an instance creation program called ‘create_instance.pl’ which: - create database schema (create db itself if you are granted) - install a file system (‘MyInstance’ directory) under /projectsurf/data/ We will use bovine sequences, so species parameter will be equal to ‘bos taurus’ create_instance.pl –-name MyInstance --species ‘bos taurus’ species parameter is used by RepeatMasker in order to use NCBI taxonomy data, so you may use full taxon syntax (eg bos taurus, sus scrofa). Do not forget quotes in order to keep any blanks into parameters. Create_instance.pl installs some fasta banks into instance file system (/project/surf/data/MyInstance/banks). These banks are symbolic links to a SURF global banks repository (/project/surf/data/banks) but some of these banks (mito.fasta and ribo.fasta) are species specific so you may update theses links to the dedicated files. We assume that mito_cattle.tfa and ...

Subjects

Informations

Published by
Reads 25
Language English

Exrait

SURF TUTORIAL 1 Prelude This document presents the SURF steps from instance creation to batch feature detection. You may add de SURFbindirectory to your PATH variable (.cshrc. file).setenv PATH ${PATH}:/project/surf/bin We will assume in this tutorial that SURF is installed in/projectdirectory and that the studied instance will be called ‘MyProject.SURF can processchromatogram,fasta,dbestandemblbatches, we will use a chromatogram batch in that tutorial in order to see the whole SURF functionalities. 2 Instance creation Surf provides an instance creation program called‘create_instance.pl’which: -create database schema (create db itself if you are granted) -install a file system (‘MyInstance’directory) under/projectsurf/data/We will use bovine sequences, sospeciesparameter will be equal to ‘bos taurus’create_instance.pl –-name MyInstance --species ‘bos taurus’ species parameteris used by RepeatMasker in order to use NCBI taxonomy data, so you may use full taxon syntax (egbos taurus,sus scrofa).Do not forget quotes in order to keep any blanks into parameters.Create_instance.plinstalls somefastabanks into instance file system (/project/surf/data/MyInstance/banks).These banks are symbolic links to a SURF global banks repository (/project/surf/data/banks) but some of these banks (mito.fastaand ribo.fasta) are species specific so you may update theses links to the dedicated files. We assume thatmito_cattle.tfaandribo_cattle.tfaare in/project/fasta_banksdirectory. cd /project/surf/data/MyInstance/banks/ rm mito.fasta rm ribo.fasta ln -s / ro ect/fasta banks/mito cattle.tfa mito.fasta ln –s /project/fasta_banks/ribo_cattle.tfa ribo.fasta 3 Library management The link between sequences and libraries is made during sequence batch loading. For each sequence, library name is deduced from file name (chromatogram or FASTA batches) or from sequence record fields (DBEST and EMBL batches). Each sequence library name is done using regular expressions stored into instance specific files (/project/surf/data/MyInstance/regexpmodel_*.conf).In order to build sequence/library link, you may create libraries BEFORE loading sequence batches. Libraries provide information about sequence clone construction (cloning vector, adapter, etc.). SURF tries to detect cloned inserts by using ‘construction’features. A feature is considered as a ‘construction’feature when : -Feature name equal ‘vector’ or -Feature name start by ‘adapter’ (eg adapter5, adapter3) or -Feature name finish by a ! (eg primer!) or
-Feature name equal PolyA or PolyT If a sequence is not linked to a library (EMBL sequences) or if it is linked to a library with no vectorfeature (DBEST),vectorfeature detection is done usingunivecgeneric bank. Library management is made through a web interface, accessible by SURF general menu: Choose your instance, click onLibrary Managementand add a library as follow:
Codefield contains string used to build link with sequences.Minmatchandminscoreparameters default values are 10 and 15. If a feature length is lower then 16 bp, you may change them in order to be applicable bycross_match, here is a simple rule to set them when length is lower than 16 bp: Minscore=length-1 andMinmatch= Minscore-2.
4 Batch load Batches are loaded using batch type specific shell scripts. To load chromatogram batch, use run_chromato.shprogram (the others arerun_dbest.sh, run_embl_shandrun_fasta.sh) and answer questions. Chromatogram archive must be zipped in a SINGLE LEVEL archive (no sub directory in archive). EMBL, FASTA and DBEST archive must be gzipped. <to aze>~>run chromato.sh in ut instance :M Instance in ut batch name :M Batch in ut zifile ath:/ath/to/zi /file in ut se uence te [cDNA]: in ut librar[autodetect]: in utlate name [autodetect]: in ut strand [autodetect]: clone name autodetection ? [ es]: row and column values autodetection ? [ es]: re ular ex ression model name ? [default]: Workflow stenumber to start (in ut 1 to avoid error autodetection) ? [autodetect]: Address to send email notification when finished ? [leave em tfor no mail]:me@us.org You can track load progress in web interface by using theWorkflowlink (look for MyBatch_insertion line).
5 Batch feature detection Feature detection is launched for a whole batch as follow: <to aze>~>run feature.sh in ut instance :M Instance in ut batch id :1 Workflow stenumber to start (in ut 1 to avoid error autodetection) ? [autodetect]: Workflow stenumber to stoat (included). In ut 0 for no sto? [0]: Address to send email notification when finished ? [leave em tfor no mail]:me@us.or You can track feature progress in web interface by using theWorkflowlink (look for MyBatch_Feature line).