Tutorial R : Spatial Analysis

Tutorial R : Spatial Analysis

English
14 Pages
Read
Download
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

Tutorial using the free softwarePerforming a spatial Principal ComponentAnalysis using the R softwareT. JOMBARTNovember 2006For any questions or comments, please send an email to:jombart@biomserv.univ-lyon1.fr.Contents1 Introduction 22 Building a connection network 43 Performing the analyses 53.1 Testing for the existence of spatial structures . . . . . . . . . . . . . 63.2 Identifying and displaying the spatial structures . . . . . . . . . . . 83.2.1 The Principal Component Analysis . . . . . . . . . . . . . . 83.2.2 The spatial Principal Component Analysis . . . . . . . . . . 101T. JOMBART, A sPCA tutorial using R1 IntroductionThistutorialshowshowtoperformaspatial Principal Component Analysis (sPCA,Jombart et al., submitted), starting from a data set of allelic frequencies of geo-referenced individuals or groups of individuals. The second illustration presentedinthepreviouslycitedarticleisreproducedbelow. Itinvolvesgroupsofindividuals(cat colonies) data, but the process is exactly the same using individuals data.Before starting, please download the files Nancy cats.rda and map.ppm athttp://pbil.univ-lyon1.fr/R/html/additifs.html. The first file is a dataset with R format (’rda’), and the second one is a picture of the sampled area.We assume that both files were saved in your R working directory (which can beknown and set using the commands getwd() and setwd()). Several R packagesare also used to perform the analyses. These can be installed using ...

Subjects

Informations

Published by
Reads 98
Language English
Report a problem

Tutorial using the free software
Performing a spatial Principal Component
Analysis using the R software
T. JOMBART
November 2006
For any questions or comments, please send an email to:
jombart@biomserv.univ-lyon1.fr.
Contents
1 Introduction 2
2 Building a connection network 4
3 Performing the analyses 5
3.1 Testing for the existence of spatial structures . . . . . . . . . . . . . 6
3.2 Identifying and displaying the spatial structures . . . . . . . . . . . 8
3.2.1 The Principal Component Analysis . . . . . . . . . . . . . . 8
3.2.2 The spatial Principal Component Analysis . . . . . . . . . . 10
1T. JOMBART, A sPCA tutorial using R
1 Introduction
Thistutorialshowshowtoperformaspatial Principal Component Analysis (sPCA,
Jombart et al., submitted), starting from a data set of allelic frequencies of geo-
referenced individuals or groups of individuals. The second illustration presented
inthepreviouslycitedarticleisreproducedbelow. Itinvolvesgroupsofindividuals
(cat colonies) data, but the process is exactly the same using individuals data.
Before starting, please download the files Nancy cats.rda and map.ppm at
http://pbil.univ-lyon1.fr/R/html/additifs.html. The first file is a data
set with R format (’rda’), and the second one is a picture of the sampled area.
We assume that both files were saved in your R working directory (which can be
known and set using the commands getwd() and setwd()). Several R packages
are also used to perform the analyses. These can be installed using the following
commands:
> install.packages("ade4", dep = TRUE)
> install.packages("spdep", dep = TRUE)
> install.packages("pixmap", dep = TRUE)
We first load the data object:
> load("Nancy_cats.rda")
> names(Nancy_cats)
[1] "xy" "tab" "sampsizes"
This list contains a matrix of geographic coordinates of the 17 stray cats colonies
(Nancy_cats$xy), a table of allelic frequencies for 9 microsatellite markers
(Nancy_cats$tab), and the number of cats sampled for each colonies
(Nancy_cats$sampsizes, not used in this tutorial). We rename these objects
respectively to xy and tab. We also load the map of the sampled area, stored as
a ppm file, using the pixmap library.
> xy = Nancy_cats$xy
> colnames(xy) <- c("x", "y")
> rownames(xy) <- 1:nrow(xy)
> head(xy)
x y
1 263.3498 171.10939
2 183.5028 122.40790
3 391.1050 254.70148
4 458.6121 41.72336
5 182.7769 219.08398
6 335.2121 344.83557
2T. JOMBART, A sPCA tutorial using R
> tab = Nancy_cats$tab
> tab[1:5, 1:6]
L1.01 L1.02 L1.03 L1.04 L1.05 L1.06
P01 0.00000000 0.00000000 0.1000000 0.3000000 0.10000000 0.05000000
P02 0.00000000 0.27272727 0.0000000 0.5454545 0.04545455 0.00000000
P03 0.04166667 0.16666667 0.1666667 0.2083333 0.00000000 0.08333333
P04 0.02173913 0.08695652 0.1304348 0.4130435 0.00000000 0.06521739
P05 0.00000000 0.00000000 0.0000000 0.5333333 0.00000000 0.10000000
> library(pixmap)
> map <- read.pnm("map.ppm")
> plot(map, main = "Sampled area and positions \nof the colonies")
At this step, we need to express the spatial relationships between the colonies
in order to include spatial information in further analyses. This is achieved by
building a connection network (also called ’neighbouring graph’), which defines
which colonies are neighbours or not.
3T. JOMBART, A sPCA tutorial using R
2 Building a connection network
As we have no information about the real connectivity between these stray cat
colonies, we use an algorithm to build their connection network. The R software
offers many tools to perform this task (see Table 1). A remaining difficulty is
that these tools are spread throughout different packages, the main of which are
presented below. Note that each package produces different output types (called
classes in R framework); however, this difficulty is easily circumvented by ’trans-
lation’ functions. For instance, having an object of the class titi, it will often be
possible to get an object of the class toto using a function titi2toto().
We choose Gabriel criterion (Gabriel and Sokal, 1969). The connection
network is built using the function gabrielneigh, whose argument is a matrix of
the geographic coordinates of the points.
> library(spdep)
Package SparseM (0.71) loaded. To cite, see citation("SparseM")
> gab.graph <- gabrielneigh(Nancy_cats$xy)
> class(gab.graph)
[1] "Graph" "Gabriel"
Note that we could add or remove connections of the Gabriel graph using the
function fix.nb of the spdep package. The presentation of usual algorithms other
than Gabriel criterion (Table 1) is beyond the scope of this document. For more
details, a starting point can be found in (Legendre and Legendre, 1998, p.
752-756), although R documentation may be sufficient.
Table 1: Usual algorithms used to build connection networks, and the corresponding func-
tions in the R software, precising their library, class and the associated conversion functions.
Name of the algorithm Function Library Class Useful conversion functions
Delaunay triangulation tri2nb tripack nb nb2neig
Gabriel graph gabrielneigh spdep graph graph2nb, nb2neig
Relative neighbours relativeneigh spdep graph graph2nb, nb2neig
K nearest neighbours knearneigh spdep knn knn2nb, nb2neig
Neighbourhood by distance dnearneigh spdep nb nb2neig
Minimum spanning tree mstree ade4 neig neig2nb
Among the many R classes related to the connection networks, we are partic-
ularly interested in the the following: nb, which can be viewed as the reference
4T. JOMBART, A sPCA tutorial using R
class for connection networks, listw, which is used in computations (Moran’s I
and sPCA), and neig which is used for plotting the results in ade4 graphical func-
tions. As said previously, many functions allow to convert one class into another
(Table 1).
In this example, we need graph2nb to convert a graph object into the a nb
object, required for further operations. We call the resulting object cn.gab.nb.
Note that the argument sym=TRUE specifies that spatial connections are symetrical
(i is connected to j implies that j is connected to i). We also proceed to other
conversions, latter involved in computations and graphical display.
> library(spdep)
> library(ade4)
> cn.gab.nb <- graph2nb(gab.graph, sym = TRUE)
> class(cn.gab.nb)
[1] "nb"
> cn.gab.listw <- nb2listw(cn.gab.nb)
> class(cn.gab.listw)
[1] "listw" "nb"
> cn.gab.neig <- nb2neig(cn.gab.nb)
> class(cn.gab.neig)
[1] "neig"
3 Performing the analyses
Once the connection network is defined, we can proceed to the analyses, which
can be detailed into two main steps. First, the existence of spatial structures is
assessed in the raw data, using a spatial autocorrelation test. If this test detects
the presence of spatial patterns, then these patterns are searched and displayed
using the sPCA.
5T. JOMBART, A sPCA tutorial using R
3.1 Testing for the existence of spatial structures
The procedure proposed by Smouse and Peakall (1999) is used to test the
existence of spatial autocorrelation. Basically, as this test is based on Moran’s
I, it would not be able to differenciate between the absence of structures, and a
mixture of global and local structures. As such a case is less likely to occur in a
single locus, we perform the test for each locus separately.
Smouse and Peakall (1999) procedure was generalized by Ollier (2004),
and is available in the function multispati.randtest (package: ade4). We have
to split the main allelic frequencies table in order to have a table for each locus.
Here are the numbers of alleles per marker:
> table(substr(colnames(tab), 1, 2))
L1 L2 L3 L4 L5 L6 L7 L8 L9
11 18 10 9 12 8 16 12 12
> blocs <- as.numeric(table(substr(colnames(tab), 1, 2)))
Then we use the ade4 function ktab.data.frame to create one table per marker.
> tab_loc <- ktab.data.frame(tab, blocs, tabnames = paste("loc",
+ 1:9, sep = "."))
> names(tab_loc)[1:9]
[1] "loc.1" "loc.2" "loc.3" "loc.4" "loc.5" "loc.6" "loc.7" "loc.8" "loc.9"
Now, we perform the spatial autocorrelation test for each of the nine tables. Note
that here, the function dudi.pca is only used by the procedure to get the inertia
of each table.
> SPtest <- lapply(1:9, function(i) multispati.randtest(dudi.pca(tab_loc[[i]],
+ center = T, scale = F, scannf = F), cn.gab.listw, nrep = 9999))
> names(SPtest) <- names(tab_loc)[1:9]
> SPtest
$loc.1
Monte-Carlo test
Observation: -0.1228694
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.7608
$loc.2
Monte-Carlo test
Observation: 0.1051725
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
6T. JOMBART, A sPCA tutorial using R
Simulated p-value: 0.0421
$loc.3
Monte-Carlo test
Observation: 0.06423195
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.0858
$loc.4
Monte-Carlo test
Observation: 0.01707664
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.2046
$loc.5
Monte-Carlo test
Observation: -0.1462053
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.8424
$loc.6
Monte-Carlo test
Observation: -0.2374583
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.9757
$loc.7
Monte-Carlo test
Observation: -0.09831204
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.6916
$loc.8
Monte-Carlo test
Observation: 0.008750039
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.1854
$loc.9
Monte-Carlo test
Observation: -0.1030899
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.6401
Thep-valuescorrespondtotheprobabilitythatateststatistic(roughlyinterpreted
like Moran’s I) as high as the observed one occurs by chance. Thus, the second
locus is significantly positively autocorrelated, whereas the locus 6 displays nega-
tive spatial autocorrelation. As a consequence, the raw data contain both global
7T. JOMBART, A sPCA tutorial using R
and local structures.
3.2 Identifying and displaying the spatial structures
3.2.1 The Principal Component Analysis
Nowwecanstartthemultivariateanalysesusingthe ade4library(Chessel et al.,
2004), which offers functions for most of the multidimensional analyses, genetic
datahandlingandgraphicalrepresentations. First, weperformacentredPrincipal
Component Analysis of the allelic frequencies table, using the function dudi.pca
(see ?dudi for the other ordination methods). Note that the term ’dudi’ refers
to a the duality diagram, which is a general framework for ordination methods
(Escoufier, 1987). We tell the function to perform a centred (center=T) but
not scaled PCA (scale=F), not to display the screeplot (scannf=F), and directly
choose to retain the three highest eigenvalues (nf=3).
> pca1 <- dudi.pca(tab, center = T, scale = F, scannf = F, nf = 3)
> pca1
Duality diagramm
class: pca dudi
$call: dudi.pca(df = tab, center = T, scale = F, scannf = F, nf = 3)
$nf: 3 axis-components saved
$rank: 16
eigen values: 0.1118 0.1064 0.0945 0.08141 0.07786 ...
vector length mode content
1 $cw 108 numeric column weights
2 $lw 17 numeric row weights
3 $eig 16 numeric eigen values
data.frame nrow ncol content
1 $tab 17 108 modified array
2 $li 17 3 row coordinates
3 $l1 17 3 row normed scores
4 $co 108 3 column coordinates
5 $c1 108 3 column normed scores
other elements: cent norm
We then display the PCA results using the s.value function, which allows to
represent a quantitative variable on a map using black (positive values) and white
(negative values) squares whose sizes correspond to the absolute values. The row
scores, notedφ inJombart et al. (submitted), are stored in the matrix pca1$li.
For instance,φ is the first column (pca1$li[,1]),φ is in (pca1$li[,2]), etc.1 2
Wegiveoptionalargumentstothefunctioninordertochooseabackgroundpicture
(pix=map), to add a connection network (neig=nb2neig(cn.gab.nb)), and not to
include x y axes (addaxes=F), origin (include.ori=F), grid (grid=F) or legend
(cleg=0). The two last arguments are used to choose the subtitles and their size.
8'
T. JOMBART, A sPCA tutorial using R
> par(mfrow = c(2, 2))
> invisible(lapply(1:3, function(i) s.value(xy, pca1$li[, i], pix = map,
+ neig = nb2neig(cn.gab.nb), include.ori = F, grid = F, addaxes = F,
+ cleg = 0, sub = paste("PCA scores", (1:3)[i]), csub = 1.3)))
Are these scores displaying spatial structures? It seems not, but their spatial
autocorrelation can be quantified and tested using Moran’s I. The library spdep
proposes a simple permutation test for this statistic: the arguments of moran.mc
are the tested variable, the connection network and the number of simulations.
> lapply(1:3, function(i) moran.mc(pca1$li[, i], cn.gab.listw,
+ nsim = 9999))
[[1]]
Monte-Carlo simulation of Moran s I
data: pca1$li[, i]
weights: cn.gab.listw
number of simulations + 1: 10000
9'
'
T. JOMBART, A sPCA tutorial using R
statistic = -0.2048, observed rank = 2453, p-value = 0.7547
alternative hypothesis: greater
[[2]]
Monte-Carlo simulation of Moran s I
data: pca1$li[, i]
weights: cn.gab.listw
number of simulations + 1: 10000
statistic = 0.0727, observed rank = 7653, p-value = 0.2347
alternative hypothesis: greater
[[3]]
Monte-Carlo simulation of Moran s I
data: pca1$li[, i]
weights: cn.gab.listw
number of simulations + 1: 10000
statistic = -0.054, observed rank = 5256, p-value = 0.4744
alternative hypothesis: greater
We conclude that none of the scores are significantly spatially structured. Hence,
the PCA was unable to display the spatial patterns existing in the data.
3.2.2 The spatial Principal Component Analysis
We can therefore proceed to the sPCA. The function performing this analysis is
called multispati, for ’multivariate spatial’ analysis. This function requires the
original analysis (here, pca1), and a list of spatial weights of the class listw,
containing the same information as a row-standardized connection matrix L. We
tell the function not to display the screeplot (scannf=F), and to keep 2 global
(nfposi=2) as well as 2 local (nfnega=2) structures. The output is an object of
the class multispati.
> sPca1 <- multispati(pca1, cn.gab.listw, scannf = F, nfposi = 2,
+ nfnega = 2)
> sPca1
Multispati object
class: multispati
$call: multispati(dudi = pca1, listw = cn.gab.listw, scannf = F, nfposi = 2,
nfnega = 2)
$nfposi: 2 axis-components saved
$nfnega: 2 axis-components saved
Positive eigenvalues: 0.05265 0.03444 0.02745 0.01776 0.01127 ...
Negative eigenvalues: -0.05201 -0.03989 -0.03434 -0.02702 -0.02374 ...
10