New SpikeOMatic Tutorial
97 Pages
English

New SpikeOMatic Tutorial

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

New SpikeOMatic Tutorial
Christophe Pouzat
Copyright C Pouzat 2006
Contents
1 Introduction 4
1.1 Never Used SpikeOMatic before? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 About this New Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 General Remarks on the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 A Sketch of the Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Your Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Required R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.9 Using this tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.10 R version used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.11 Setting some variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Starting Up 8
2.1 Generating and Loading Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Loading Data and Creating rawData Object . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 rawData Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Creating Our First rawData Object . . . . . . . . . . . . . . . . . . . . . . ...

Subjects

Informations

Published by
Reads 255
Language English
Document size 2 MB
New SpikeOMatic Tutorial Christophe Pouzat Copyright C Pouzat 2006 Contents 1 Introduction 4 1.1 Never Used SpikeOMatic before? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 About this New Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 General Remarks on the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 A Sketch of the Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Your Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.7 Required R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.8 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.9 Using this tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.10 R version used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.11 Setting some variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Starting Up 8 2.1 Generating and Loading Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Loading Data and Creating rawData Object . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 rawData Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Creating Our First rawData Object . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Getting Info and“Interacting”with rawData Object . . . . . . . . . . . . . . . . . 12 2.4 See What’s in a rawData Object . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Show Method for rawData Objects . . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Plot Method for rawData Objects . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.3 Getting an Overall View on rawData Objects . . . . . . . . . . . . . . . . . 15 2.4.4 summary Method for rawData Objects . . . . . . . . . . . . . . . . . . . . 16 2.4.5 Exploiting the“Matrix”Features of rawData Objects . . . . . . . . . . . . . 18 3 Detecting Spikes 19 3.1 Filtering rawData Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Detecting Spikes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 The markedPP Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Checking Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.3 Detecting both Peaks and Valleys . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.4 Getting Info on markedPP Objects . . . . . . . . . . . . . . . . . . . . . . 25 3.2.5 Exploiting the“Matrix”Features of markedPP Objects . . . . . . . . . . . . 26 1 4 Extracting Sweeps 26 4.1 cutEvents Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Visualizing Sweeps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.1 MPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.2 show method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.3 plot method for markedPP objects . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Getting Basic Summary Stats of Sweeps . . . . . . . . . . . . . . . . . . . . . . . 29 5 Getting a Clean Sweep Sub-Sample 31 5.1 setEnvelopePara Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 selectEvents Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6 Feature Space Dimension Reduction and Sample Visualization 38 6.1 reduceFeatureSpaceD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2 Exporting Data for GGobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.3 Exploring Data with GGobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3.1 Changing colors and glyphs . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.2 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.3 2 D Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7 Some Examples of GGobi Use 47 7.1 Extra Outliers Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2 Manual Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.2.1 Exporting Sorting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 8 From GGobi to R 60 8.1 Function readClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 8.2 Figuring Out the Correspondence Between GGobi Labels and Levels . . . . . . . . . 60 8.2.1 likeGGobi function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 9 Automatic and Semi-Automatic Clustering 65 9.1 Kmeans Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 9.1.1 Kmeans Clustering Visualization . . . . . . . . . . . . . . . . . . . . . . . . 68 9.2 Bagged Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 9.3 Gaussian Mixture Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 70 9.3.1 Mclust function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 9.3.2 EMclustN function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 9.4 What To Do With All That?: Example of Adjustment . . . . . . . . . . . . . . . . 75 10 Classification 79 10.1 A Simple Approach Coping with Superpositions . . . . . . . . . . . . . . . . . . . . 79 10.1.1 Getting a Labeled Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 10.1.2 Ideal Waveform Construction with Superpositions . . . . . . . . . . . . . . 80 10.1.3 Nearest Neighbor Classification . . . . . . . . . . . . . . . . . . . . . . . . 81 10.1.4 Checking Classification Results . . . . . . . . . . . . . . . . . . . . . . . . 82 10.1.5 Extracting a markedPP Object from Classification Results . . . . . . . . . . 82 10.1.6 Extracting a SpikeTrain Object from Classification Results . . . . . . . . . 85 10.1.7 Updating a modelCenter Model with Results from Classification . . . . . . 85 2 A Using Parallel Features 86 A.1 Required Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.2 Setting some variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.3 Starting Up a Snow Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.4 A Parallel selectEvents Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.5 Parallel Clustering Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.5.1 Parallel kmeans clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.5.2 Parallel EMclust and EMclustN . . . . . . . . . . . . . . . . . . . . . . . . 88 A.6 Parallel Classification Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.6.1 Parallel modelCenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.6.2 Parallel predict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 A.6.3 Parallel update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 A.7 Before Quitting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3 1 Introduction What follows is work in progress. I’m working on this new SpikeOMatic version basically everyday so you can expect regular updates. 1.1 Never Used SpikeOMatic before? If you are brand new to SpikeOMatic you have to know that to use what follows you will need to install the R software. It is free, open source and of course really great. You can get it from the R 1project site . You will also need another great, free and open source software: the GGobi Data 2Visualization System . These two software can seem a bit hard to use at first sight. R does not follow the nowadays common“point and click”paradigm. That means that a bit of patience and a careful reading of the tutorials are de rigueur. R documentation is plentiful and goes from the very basic to the most 3advanced stuff. The“contributed documentation”page is good place to start. Look in particular at: “R for Beginners”by Emmanuel Paradis (french and spanish versions are also available) and “An Introduction to R: Software for Statistical Modelling & Computing”by Petra Kuhnert and Bill Venables [12]. Another good place is Ross Ihaka’s course: “Information Visualisation”, the course 4uses R to generate actual examples of data visualization and is a wonderful introduction to its subject. Ross Ihaka, together with Robert Gentleman, is moreover one of the original R developers. 5The“two-day short course in R” of Thomas Lumley form the R Core Development Team is also 6great. There is also a R Wiki site which is worth looking at. 7 WindowsuserscanenjoytheSciViews R GUI developedbyPhilippeGrosjean&EricLecoutre 8and are strongly encouraged to use the Tinn-R editor to edit R codes, etc. Information on how to configure Tinn-R and R can be found in [12]. On Linux I’m using the emacs editor together 9with ESS : Emacs Speaks Statistics . 1.2 About this New Release 10The major change compared to the previous SpikeOMatic version is a full integration with R (not finished yet) and an implementation based on S4 classes and methods developed by John 11Chambers [3] . Don’t get scared reading that if you have no clue about S4 classes and methods, it means, if I don’t completely screw up (!), that SpikeOMatic should be easier to use in an efficient way by anyone. In addition SpikeOMatic makes now use of GGobi . This has two consequences. It should first allow users to develop an“intuitive understanding”of the analysis process. GGobi provides numerous ways to display data (which in our cases will be clusters). These displays can be dynamic and are interactive. Second, it is also possible to perform the clustering“by hand”with GGobi . I do not 1 http://www.r-project.org 2http://www.ggobi.org/ 3http://cran.r-project.org/other-docs.html 4http://www.stat.auckland.ac.nz/~ihaka/120/ 5You can find it at the following address: http://faculty.washington.edu/tlumley/b514/ R-fundamentals.pdf. 6http://www.sciviews.org/_rgui/wiki/doku.php 7http://www.sciviews.org/SciViews-R/index.html 8http://www.sciviews.org/Tinn-R/index.html 9http://ess.r-project.org/ 10You can get it at: http://www.biomedicale.univ-paris5.fr/SpikeOMatic/. 11See also: http://www.omegahat.org/RSMethods/index.html. 4 recommend it in general but that could and should help users to make the transition from the hand clustering they are used to, to a more automatic one. 1.3 General Remarks on the Approach This new release of SpikeOMatic as mentioned in the previous section makes heavy use of functions and packages already implemented in R . It can almost be seen as an interface between the data we, as electrophysiologists, are use to see and work with and R built-in functionality. In this tutorial we won’t have space to discuss the mathematical / statistical bases of the functions we will actually be using. If you want to know more about these, I strongly recommend to check Brian Ripley’s “Pattern Recognition and Neural Networks”[19] as well as“The Elements of Statistical Learning”by Trevor Hastie, Robert Tibshirani and Jerome Friedman [11]. A warning can be useful though. These books will be hard to go through for neurophysiologists. This is so, I think, because neuroscience became a realm of sophists. More precisely, concepts in statistics have a precise meaning, proofs are proofs and authors are expected to state the limits of their claims. Doing otherwise is considered as a professional mistake and the field has designed safeguard mechanisms to avoid overstatements (mainly by publishing together papers and discussions of papers or by having talks presented by someonediscussedimmediatelybysomebodyelsetowhomthecontentofthetalkwascommunicated days in advance). This situation generates the strange feeling of discovering a new (intellectual) land for the neuroscientist used to the keywords in a vacuum of his modern days“high impact”journals. 1.4 A Sketch of the Procedure To start with we will distinguish 2 phases: 1. Reference sample generation and/or model estimation. 2. Classification. 12The typical experimental paradigm I have here in mind is one where we record for a long time data inrelativelystableconditions. BystableconditionsImeanthatthedriftoftherecordingelectrode(s) is not fast compared to the time required by the less active neuron of interest to generate, say, 100 to 200 spikes. I also mean of course that the neuronal population seen by the recording electrodes does not change radically on the slow time scale we just defined. Then we typically have to solve two problems in a row. We first want to find out: How many neurons are active in the data set? What is a good way to label or classify the spikes generated by the neurons? Answering these two questions is what I mean by“reference sample generation and/or model estima- tion”. This will be the most time consuming part of the analysis on the user side (but not necessarily on the the computer side). In the good cases and when we work on brain regions we know well it will even be possible to do it automatically. In general though, user input will be required at some point(s). The amount of data we are going to use during this first stage will depend on the tissue we work on. A rule of thumb is that it would be nice to have roughly 100-200 (or more) spikes for each neuron. 12Tens of minutes to hours. 5 Once this first stage is accomplished we can go on and use our reference sample / model to classify the spikes which were subsequently recorded during the experiment. In order to do that we will chop the data set into chunks (again with, say, 100-200 or more spikes per neuron), classify the spikes of these chunks sequentially allowing ourselves to adapt our model to correct for electrode drift (that will be done automatically). 1.5 Your Job If you have time I would greatly appreciate that you guys try this stuff out and let me know what 13you think . If you think something is not clearly explained, let me know. If you find a bug, instead of simply sending me an e-mail stating:“it does not work!”, be cool and take time to write a short R script allowing you and therefore me to reproduce the problem. Then send me the stuff (with the associated data if necessary). 1.6 Acknowledgments 14This work started a while ago when I was a post-doc in the lab of Gilles Laurent in Caltech . 15The first version of SpikeOMatic was developed in collaboration with Ofer Mazor [17] and at 16that time was running on Igor . When I came back to Paris after taking a CNRS position I kept 17 18working on the problem with a PhD student, Matthieu Delescluse , and with Pascal Viot from the“Laboratoire de Physique Th´eorique des Liquides et de la Mati`ere Condens´ee”and Jean Diebolt 19from the “Laboratoire d’Analyse et de Math´ematiques Appliqu´ees” . This work resulted in the 20first publicly available release of SpikeOMatic in 2002. This one was running on Scilab . Last year we (Matthieu and I) made a switch to R and included in SpikeOMatic the MCMC approach we developed [16,18,6]. Two grants supported this work during this period, a pre-projet of the ACI (“Action Concert´ee Incitative”)“Neurosciences Computationnelles et Int´egratives”(2001) and an inter EPST BioInformatique grant in 2003. 21We are now supported by a Decrypton Project grant. This project is a joint effort between 22 23 24the“Association Fran¸caise contre les Myopathies” , the CNRS and IBM . Clearly data analysis software cannot be developed without (good) data to start with. Although I got my own data, together with Ofer, few years ago, I am now much dependent on students and colleagues for that: Nicole Lindemann and Peter Kloppenburg from the Kloppenburg Lab in 25Cologne , Antoine Chaffiol and Hang Ung from my lab, Cyril Dejean and Thomas Boraud from 26the“Physiologie et Physiopathologie de la Signalisation Cellulaire”lab in Bordeaux , Cl´ement L´ena 27and Camille de Solages from the“Laboratoire de Neurobiologie Mol´eculaire et Cellulaire”in Paris , 13My e-mail address is: christophe.pouzat@univ-paris5.fr 14http://marvin.caltech.edu/ 15omazor@fas.harvard.edu 16http://www.wavemetrics.com/ 17matthieudelescluse@hotmail.com 18http://www.lptl.jussieu.fr/users/viot/ 19http://umr-math.univ-mlv.fr/ 20http://www.scilab.org/ 21http://www.decrypthon.fr 22http://www.afm-france.org 23http://www.cnrs.fr/ 24http://www.ibm.com/fr/ 25http://www.kloppenburg-lab.uni-koeln.de/content/index_eng.html 26http://www.umrnp.u-bordeaux2.fr/ 27http://www.biologie.ens.fr/neuro/ 6 28Jean-Yves Hogrel from the“Institut de Myologie”at the Piti´e-Salpˆetri`ere hospital in Paris . Developing software is a tough job because users won’t say anything when it works but will pretty quickly send you harsh comments if it does not (or even if they have not taken the time to 29do RTFM and they therefore do not know how to make it work). They probably forget that the same applies (or should apply) to the experimental results they publish. Have you ever tried to take a paper and set out to reproduce the results it reported? I must therefore warmly thank, among the SpikeOMatic users who are not my direct collaborators, Robert Steward from the University of Sheffield, for thanking me albeit all the shortcomings of SpikeOMatic . ThanksalsotoArthurLeblois,fromthe“LaboratoryofNeurophysicsandPhysiology”,and(again) to Antoine Chaffiol and Nicole Lindemann for testing this new version. Finally I want to make abuse of my right to write what I want because this is not to be published in a highbrow journal, to dedicate this work to the free software developers, in particular to the R core development team and to: The quiet statisticians [who] have changed our world – not by discovering new facts or technical developments but by changing the ways that we reason, experiment and form 30our opinions ... 1.7 Required R Packages In order to run SpikeOMatic you will have to install the following packages from CRAN: ppc XML mclust e1071 1.8 License 31SpikeOMatic is released under the GNU General Public License except for functions: clusterEMclust clusterEMclustN whose codes are modified from EMclust and EMclustN of package mclust. The distribution and 32modification license of mclust does therefore apply to clusterEMclust and clusterEMclustN. I thank Chris Fraley for authorizing me to distribute these functions. 28http://www.institut-myologie.org/ 29Read The Fucking Manual 30Ian Hacking 31http://www.gnu.org/licenses/licenses.html 32http://www.stat.washington.edu/fraley/mclust/license.txt 7 1.9 Using this tutorial In this tutorial, commands typed at the R command line appear like: > sqrt(9) And R answers appear like: [1] 3 Unfinished commands are preceded by a + sign like the third command in Sec. 2.1. Some functions used in this tutorial call a (pseudo-)random number generator (rng). If you want to get the exact same results and figures, you will have to initialize your rng as shown in Sec. 1.11. That also means you will have to type in the same commands in the same sequence... That’s kind of painful to do so you can start by downloading the vignette of the tutorial, new- 33 SOMtutorial.Rnw and then you can extract the actual commands by typing: > Stangle("newSOMtutorial.Rnw") That will generate a newSOMtutorial.R file (in the folder from which you are running R ) that you can edit. Using Copy and Paste should then save you a fair amount of time. 1.10 R version used This tutorial was generated using the following R version: > R.version.string [1] "R version 2.2.1, 2005-12-20" 1.11 Setting some variables In order to get the same sequences of (pseudo)random numbers everytime this tutorial is run we set variables associated with the Random Number Generator (RNG) : > set.seed(123, kind = "Mersenne-Twister") 2 Starting Up 2.1 Generating and Loading Code Westartbygeneratingthe.Rsourcefilefromthevignette(SpikeOMatic.Rnw)thisisdonebycalling theStanglefunctionafterdowloadingthelastSpikeOMaticversionwithfunctiondownload.file: > urlRoot <- "http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/" > sourceURL <- paste(urlRoot, "Code_folder/SpikeOMatic.Rnw", sep = "") > download.file(url = sourceURL, destfile = "SpikeOMatic.Rnw", + quiet = TRUE) > Stangle("SpikeOMatic.Rnw") 33http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/Code_folder/newSOMtutorial.Rnw 8 Writing to file SpikeOMatic.R We then load the source file into R workspace: > source("SpikeOMatic.R") In addition the reference manual can be generated in the classical way with function Sweave: > Sweave("SpikeOMatic.Rnw") That will generate a SpikeOMatic.tex which can be processed with pdflatex (at the shell) to give the pdf version of the reference manual. 2.2 Loading Data and Creating rawData Object 2.2.1 rawData Object Here is the first major“innovation”of the new SpikeOMatic. We will keep our raw data together with some information associated with them in a single object of class rawData. By raw data I mean here the regularly sampled amplitudes of the extracellular potential on one or several recording 34sites . One way to look at the rawData class for the R expert is as a time series object, with someslotsspecificofphysiologicalexperiments. Solet’stakeaquicklookatwhatrawDataobjects are made of: @Data: A data matrix (possibly with a single dimension). Each row corresponds to a recording site. The rows should be named (e.g.,“site 1”,“site 3”, etc). @start: A numeric containing the time in seconds at which data collection started. The time is measured with respect to the first data collection in the analyzed experiment. @frequency: A numeric. The sampling rate in Hz at which data were collected. @epochName: A character string used to label the recording epoch. Here the actual data are contained in the Data matrix. The other slots allow plots and other functions to be performed in term of seconds. That sounds stupid but that spares the user of the intellectual gymnastic of always converting back and forth between sample points and actual time. As wewill seebellow, rawDataobjects can, in manyoccasions, be manipulatedas“normal”matrices. 2.2.2 Creating Our First rawData Object We are going to use in this tutorial some Purkinje cells recordings performed by Matthieu Delescluse during his PhD. The data were recorded with a“linear Michigan probe”on a cerebellar slice from 35a young rat in the presence of 40 M DHPG . These data are analyzed in [6]. These data are made of four files: PK_1.fl.gz, PK_2.fl.gz, PK_3.fl.gz and PK_4.fl.gz. They contain 56.7 s of recording from the 4 sites. The sampling rate was 15000 Hz and each data point is stored in floating point representation with 32 bits (or 4 Bytes). The files are moreover compressed with gz. 34In the context of the present tutorial we will always think of our raw data as extracellular voltage, but the rawData class would work equally well with intracellular voltage or current traces or with Calcium fluorescence measurements 35For details check: http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/Data.html. 9 36Downloading Files We are first going to download the data to our working directory . To do that we will use function download.file (to know more about it, look at the help file of the function: ?download.file or help(download.file)). We have to give as arguments the URL of the file to download and the name of the file where the downloaded data will be stored. We will create a dataURL variable for the former and a dataNames variable for the latter. dataURL is built in 2 stages, first by extending our former urlRoot variable and then by combining it with our 37dataNames character vector resulting in another character vector : > dataNames <- c("PK_1.fl.gz", "PK_2.fl.gz", "PK_3.fl.gz", "PK_4.fl.gz") > dataURL <- paste(urlRoot, "Data_folder/", sep = "") > dataURL <- paste(dataURL, dataNames, sep = "") > dataURL [1] "http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/Data_folder/PK_1.fl.gz" [2] "http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/Data_folder/PK_2.fl.gz" [3] "http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/Data_folder/PK_3.fl.gz" [4] "http://www.biomedicale.univ-paris5.fr/physcerv/C_Pouzat/Data_folder/PK_4.fl.gz" To download the 4 files we are going to use an elegant R construct, sapply, which allows to repeat an operation on the different elements of a list or vector. In the present case we indeed have to repeat four times the downloading, once for each file. Let’s do it: > sapply(1:4, function(i) { + download.file(url = dataURL[i], destfile = dataNames[i], + quiet = TRUE, mode = "wb") + }) [1] 0 0 0 0 We also have defined our first (anonymous) function here with the construct: function(i) { } . We can check that the files have been downloaded with function list.files which, as its name says, lists files in the current working directory: > list.files(pattern = ".fl.gz") [1] "PK_1.fl.gz" "PK_2.fl.gz" "PK_3.fl.gz" "PK_4.fl.gz" In order to get a short list we have here specified a pattern which is matched by the file names returned by list.files. Loading Files into R Workspace We want now to load into R workspace compressed floating points files whose elements are coded on 4 Bytes. To this end, we will successively use three R functions: gzfile, readBin and close. gzfile allows to open for reading and/or writing compressed files. readBin is made to read binary files. close closes open files. Here again we have to repeat four times the same job to load our four files. That will be the occasion to exploit the ease with which R can be programmed, that is, the ease with which functions can be created. We are going now to create a function which takes one argument, a file name, and which opens the corresponding file, reads it and closes it. All that assuming the file is compressed with gz, in floating point representation with 4 Bytes per element: 36To get your working directory, use function getwd() 37Check the documentation of function paste do be sure you understand what’s going on. 10