CONTRIBUTED RESEARCH ARTICLES

-

English
9 Pages
Read an excerpt
Gain access to the library to view online
Learn more

Description

44 CONTRIBUTED RESEARCH ARTICLES Online Reproducible Research: An Application to Multivariate Analysis of Bacterial DNA Fingerprint Data by Jean Thioulouse, Claire Valiente-Moro and Lionel Zen- ner Abstract This paper presents an example of online reproducible multivariate data analysis. This example is based on a web page provid- ing an online computing facility on a server. HTML forms contain editable R code snippets that can be executed in any web browser thanks to the Rweb software. The example is based on the multivariate analysis of DNA fingerprints of the internal bacterial flora of the poultry red mite Dermanyssus gallinae. Several multivariate data analysis methods from the ade4 package are used to compare the fingerprints of mite pools coming from various poultry farms. All the com- putations and graphical displays can be redone interactively and further explored online, using only a web browser. Statistical methods are de- tailed in the duality diagram framework, and a discussion about online reproducibility is initi- ated. Introduction Reproducible research has gained much at- tention recently (see particularly http:// reproducibleresearch.net/ and the references therein). In the area of Statistics, the availabil- ity of Sweave (Leisch (2002), uni-muenchen.de/~leisch/Sweave/) has proved ex- tremely useful and Sweave documents are now widely used. Sweave offers the possibility to have text, R code, and outputs of this code in the same document.

  • group means

  • methods

  • poultry farms

  • genomics can

  • bacterial flora

  • rweb

  • ttge

  • mite pools

  • breed- ing facilities

  • ttge banding


Subjects

Informations

Published by
Reads 46
Language English
Document size 1 MB
Report a problem

44

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

OnlineReproducibleResearch:An
ApplicationtoMultivariateAnalysisof
BacterialDNAFingerprintData
byJeanThioulouse,ClaireValiente-MoroandLionelZen-
andcommentRcodesnippets,toredoallthecompu-
ner
tationsandtodrawallthegraphicaldisplaysofthe
originalpaper.
Abstract
ThispaperpresentsanexampleofTheexamplepresentedhererelatestomultivari-
onlinereproduciblemultivariatedataanalysis.atedataanalysis.Reproducibilityofmultivariate
Thisexampleisbasedonawebpageprovid-dataanalysesisparticularlyimportantbecausethere
inganonlinecomputingfacilityonaserver.isalargenumberofdifferentmethods,withpoorly
HTMLformscontaineditableRcodesnippetsdefinednames,makingitoftendifficulttoknow
thatcanbeexecutedinanywebbrowserthankswhathasbeendoneexactly.Manymethodshave
totheRwebsoftware.Theexampleisbasedseveraldifferentnames,accordingtothecommu-
onthemultivariateanalysisofDNAfingerprintsnitywheretheywereinvented(orre-invented),and
oftheinternalbacterialfloraofthepoultryredwheretheyareused.Translationoftechnicalterms
mite
Dermanyssusgallinae
.SeveralmultivariateintolanguagesotherthanEnglishisalsodifficult,as
dataanalysismethodsfromthe
ade4
packagearemanyareambiguousandhave“falsefriends”.More-
usedtocomparethefingerprintsofmitepoolsover,dataanalysismethodsmakeintensiveuseof
comingfromvariouspoultryfarms.Allthecom-graphicaldisplays,withalargenumberofgraphical
putationsandgraphicaldisplayscanberedoneparameterswhichcanbechangedtoproduceawide
interactivelyandfurtherexploredonline,usingvarietyofdisplays.
onlyawebbrowser.Statisticalmethodsarede-
tailedinthedualitydiagramframework,andaThissituationissimilartowhatwasdescribedby
discussionaboutonlinereproducibilityisiniti-
BuckheitandDonoho(1995
)intheareaofwavelet
ated.research.TheirsolutionwastopublishtheMat-
labcodeusedtoproducethefiguresintheirre-
searcharticles.Ratherthandothiswedecidedto
Introduction
setupasimplecomputerenvironmentusingRto
offeronlinereproducibility.In2002,weinstalled
Reproducibleresearchhasgainedmuchat-anupdatedversionoftheRwebsystem(
Banfield,
tentionrecently(seeparticularly
http://
1999
)onourdepartmentserver(see
http://pbil.
reproducibleresearch.net/
andthereferences
univ-lyon1.fr/Rweb/Rweb.general.html
),andwe
therein).IntheareaofStatistics,theavailabil-implementedseveralcomputationalwebservicesin
ityofSweave(
Leisch(2002
),
http://www.stat.
thefieldofcomparativegenomics(
Perrièreetal.
uni-muenchen.de/~leisch/Sweave/
)hasprovedex-
(2003
),seeforexample
http://pbil.univ-lyon1.
tremelyusefulandSweavedocumentsarenow
fr/mva/coa.php
).
widelyused.SweaveoffersthepossibilitytohaveThisservernowcombinesthecomputational
text,Rcode,andoutputsofthiscodeinthesamepowerofRwithsimpleHTMLforms(assuggested
document.Thismakesreproducibilityofascientificby
deLeeuw(2001
)forXlisp-Stat)andtheabilityto
paperwritteninSweaveverystraightforward.searchonlinemoleculardatabaseswiththe
seqinr
However,usingSweavedocumentsimpliesapackage(
CharifandLobry,2007
).Itisalsoused
goodknowledgeofR,ofSweaveandofL
A
TEX.Italsobyseveralresearcherstoprovideanonlinerepro-
requireshavingRinstalledonone’scomputer.Theducibilityserviceforscientificpapers(seeforexam-
installedversionofRmustbecompatiblewiththeple
http://pbil.univ-lyon1.fr/members/lobry/
).
RcodeintheSweavedocument,andalltheneededThepresentpaperprovidesanexampleofanap-
packagesmustalsobeinstalled.Thismaybeaprob-plicationofthisservertotheanalysisofDNAfinger-
lemforsomescientists,forexampleformanybiolo-printsbymultivariateanalysismethods,usingthe
gistswhoarenotfamiliarwithRandL
A
TEX,orwho
ade4
package(
Chesseletal.,2004
;
DrayandDu-
donotusethemonaregularbasis.
four,2007
).Wehaveshownrecently(
Valiente-Moro
Inthispaper,wedemonstrateanonlinerepro-
etal.,2009
)thatmultivariateanalysistechniquescan
ducibilitysystemthathelpscircumventtheseprob-beusedtoanalysebacterialDNAfingerprintsand
lems.Itisbasedonawebpagethatcanbeusedwiththattheymakeitpossibletodrawusefulconclu-
anywebbrowser.ItdoesnotrequireRtobeinstalledsionsaboutthecompositionofthebacterialcommu-
locally,nordoesitneedathoroughknowledgeofRnitiesfromwhichtheyoriginate.Wealsodemon-
andL
A
TEX.Neverthelessitallowstheusertopresentstratetheeffectivenessofprincipalcomponentanal-

TheRJournalVol.2/1,June2010

ISSN2073-4859

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

ysis,between-groupanalysisandwithin-groupanal-
ysis[PCA,BGAandWGA,
Benzécri(1983
);
Dolédec
andChessel(1987
)]toshowdifferencesindiversity
betweenbacterialcommunitiesofvariousorigins.
Insummary,weshowherethatitiseasytoset
upasoftwareenvironmentofferingfullonlinerepro-
ducibilityofcomputationsandgraphicaldisplaysof
multivariatedataanalysismethods,evenforusers
whoarenotfamiliarwithR,SweaveorL
A
TEX.
Dataandmethods
Inthissection,wefirstdescribethebiologicalmate-
rialusedintheexampledataset.Thenwepresent
thestatisticalmethods(PCA,BGAandWGA),inthe
frameworkofthedualitydiagram(
Escoufier,1987
;
Holmes,2006
).Wealsodetailthesoftwareenviron-
mentthatwasused.
Biologicalmaterial
Thepoultryredmite,
Dermanyssusgallinae
isan
haematophagousmitefrequentlypresentinbreed-
ingfacilitiesandespeciallyinlayinghenfacilities
(
Chauve,1998
).Thisarthropodcanberesponsi-
bleforanemia,dermatitis,weightlossandade-
creaseineggproduction(
Kirkwood,1967
).Ithas
alsobeeninvolvedinthetransmissionofmany
pathogenicagentsresponsibleforseriousdiseasesin
bothanimalsandhumans(
ValienteMoroetal.,2005
;
Valiente-Moroetal.,2007
).Thepoultryredmiteis
thereforeanemergingproblemthatmustbestudied
tomaintaingoodconditionsincommercialeggpro-
ductionfacilities.Nothingisknownaboutitsassoci-
atednon-pathogenicbacterialcommunityandhow
thediversityofthemicroflorawithinmitesmayin-
fluencethetransmissionofpathogens.
Moststudiesoninsectmicrofloraarebased
onisolationandcultureoftheconstituentmicro-
organisms.However,comparisonofculture-based
andmolecularmethodsrevealsthatonly20-50%of
gutmicrobescanbedetectedbycultivation(
Suau
etal.,1999
).Molecularmethodshavebeendevel-
opedtoanalysethebacterialcommunityincom-
plexenvironments.Amongthesemethods,Dena-
turingGradientandTemporalTemperatureGelElec-
trophoresis(DGGEandTTGE)(
Muyzer,1998
)have
alreadybeenusedsuccessfully.
Fulldetailsoftheexampledatasetusedinthis
paper(mitessampling,DNAextraction,PCRampli-
ficationof16SrDNAfragmentsandTTGEbanding
patternachieving)aregivenin
Valiente-Moroetal.
(2009
).Briefly,13poultryfarmswereselectedinthe
BretagneregioninFrance,andineachfarm15sin-
glemites,fivepoolsof10mitesandonepoolof50
miteswerecollected.Theresultsforsinglemitesand
mitepoolswereanalysedseparately,butonlythere-
sultsofmitepoolsarepresentedinthisstudy,asthey

TheRJournalVol.2/1,June2010

54

hadthemostillustrativeanalysis.Bandingpatterns
canbeanalysedasquantitativevariables(intensity
ofbands),orasbinaryindicators(presence/absence
ofbands),butforthesamereason,onlythepres-
ence/absencedatawereusedinthispaper.
TTGEbandingpatternswerecollectedinadata
table,withbandsincolumns(55columns)andmite
poolsinrows(73rows).Thistablewasfirstsub-
jectedtoaprincipalcomponentanalysis(PCA)toget
anoverallideaofthestructureofthebandingpat-
terns.Between-groupanalysis(BGA)wasthenap-
pliedtostudythedifferencesbetweenpoultryfarms,
andfinallywithin-groupanalysis(WGA)wasused
toeliminatethefarmeffectandobtainthemainchar-
acteristicsofthecommonbacterialfloraassociated
with
D.gallinae
instandardbreedingconditions.A
goodknowledgeofthesecharacteristicswouldallow
comparisonsofthestandardbacterialfloraamong
varioussituations,suchasgeographicregions,type
andlocationofbreedingfacilities(particularlyor-
ganicfarms),ordevelopmentalstagesof
D.gallinae
.

Dualitydiagramofprincipalcomponent
analysis
Let
X
=[
x
ij
]
(
n
,
p
)
betheTTGEdatatablewith
n
rows
(individuals=mitepools)and
p
columns(variables
=TTGEbands).Variableshavemean
x
¯
j
=
n
1

i
x
ij
andvariance
σ
j
2
=
n
1

i
(
x
ij

x
¯
j
)
2
.Individualsbe-
longto
g
groups(orclasses),namely
G
1
,...,
G
g
,with
groupcounts
n
1
,...,
n
g
,and

n
k
=
n
.
Usingdualitydiagramtheoryandtripletno-
tation,thePCAof
X
istheanalysisofatriplet
(
X
0
,
D
p
,
D
n
)
.
X
0
isthetableofstandardizedvalues:
X
0
=[
x
˜
ij
]
(
n
,
p
)
x
ij

x
¯
j
with
x
˜
ij
=
σ
j
,and
D
n
and
D
p
arethediagonal
matricesofrowandcolumnweights:
D
p
=
I
p
and
D
n
=
n
1
I
n
.
Thisinformationissummarizedinthefollowing
mnemonicdiagram,calledthe“dualitydiagram”be-
∗∗cause
R
p
isthedualof
R
p
and
R
n
isthedualof
R
n
:
DpR
p
/
/
R
p

OOTXX00R
n

o
o
R
n
DnX
0
T
isthetransposeof
X
0
.Theanalysisofthis
tripletleadstothediagonalizationofmatrix:
X
0
T
D
n
X
0
D
p
i.e.
,thematrixobtainedbyproceedingcounter-
clockwisearoundthediagram,startingfrom
R
p
.

ISSN2073-4859

64

Principalcomponents,(variableloadingsandrow
scores),arecomputedusingtheeigenvaluesand
eigenvectorsofthismatrix.
Between-groupanalysis
Thebetween-groupanalysis(BGA)istheanalysisof
triplet
(
X
B
,
D
p
,
D
n
k
)
,where
X
B
isthe
(
g
,
p
)
matrixof
groupmeans:
kX
B
=[
x
¯
j
]
(
g
,
p
)
.
1kTheterm
x
¯
j
=
n
k

i

G
k
x
˜
ij
isthemeanofvariable
j
ingroup
k
.Inmatrixnotation,if
B
isthematrixof
classindicators:
B
=[
b
ik
]
(
n
,
g
)
,with
b
ik
=
1if
i

G
k
and
b
ik
=
0if
i

/
G
k
,thenwehave:
X
B
=
D
n
k
B
T
X
0
.
Matrix
D
n
k
=
Diag
(
n
1
k
)
isthediagonalmatrixof
(possiblynonuniform)groupweights,and
B
T
isthe
transposeof
B
.
BGAisthereforetheanalysisofthetableof
groupmeans,leadingtothediagonalizationofma-
Ttrix
X
B
D
n
k
X
B
D
p
.Itsaimistorevealanyevidence
ofdifferencesbetweengroups.Thestatisticalsignifi-
canceofthesedifferencescanbetestedwithapermu-
tationtest.Rowscoresoftheinitialdatatablecanbe
computedbyprojectingtherowsofthestandardized
table
X
0
ontotheprincipalcomponentssubspaces.
Within-groupanalysis
Thewithin-groupanalysis(WGA)istheanalysisof
triplet
(
X
W
,
D
p
,
D
n
)
,where
X
W
isthe
(
n
,
p
)
matrixof
thedifferencesbetweenthestandardizedvaluesand
thegroupmeans:
kX
W
=[
x
˜
ij

x
¯
ij
]
(
n
,
p
)
with
x
¯
ikj
=
x
¯
jk
,

i

G
k
.Inmatrixnotation
X
W
=
X
0

X
B
˜
where
X
B
˜
isthematrixofgroupsmeans,repeatedin-
sideeachgroup.
WGAisthereforetheanalysisofthematrixof
theresidualsobtainedbyeliminatingthebetween-
groupeffect.Itleadstothediagonalizationofmatrix
X
TW
D
n
X
W
D
p
.Itisusefulwhenlookingforthemain
featuresofadatatableafterremovinganunwanted
characteristic.
Software
WeusedRversion2.11.0(
RDevelopmentCore
Team,2009
)forallcomputations,andthe
ade4
pack-
age(version1.4-14)formultivariateanalyses(
Ches-
seletal.,2004
;
DrayandDufour,2007
;
Thioulouse
andDray,2007
).PCA,BGAandWGAwerecom-
putedwiththe
dudi.pca
,
between
and
within
func-
tionsof
ade4
.BGApermutationtestsweredonewith
the
randtest
function.

TheRJournalVol.2/1,June2010

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

Onlinereproducibility
Thestatisticalanalysespresentedin
Valiente-Moro
etal.(2009
)arereproducible(
GentlemanandLang,
2004
)viathewebpage
http://pbil.univ-lyon1.
fr/TTGE/
.
Thispagepresentstheproblem,givesaccessto
thedataset,andallowstheusertoexecuteRcode
snippetsthatreproduceallthecomputationsand
graphicaldisplaysoftheoriginalarticle(
Valiente-
Moroetal.,2009
).TheRcodeisexplained,witha
descriptionofimportantvariablesandfunctioncalls,
andtheRoutputsaredocumented.Linkstothein-
formationpagesofthemain
ade4
functionsarealso
provided.
Thispageisthereforeanexampleof
online
liter-
ateprogramming(
Knuth,1992
),andalsoofanon-
line
dynamicdocument
(
GentlemanandLang,2004
).
ThisismadepossibleusingHTMLformscontain-
ingRcodestoredineditabletextfields.Thecode
itselfcanbemodifiedbytheuserandexecutedon
demand.Codeexecutionisachievedbysendingit
totheRwebsoftware(
Banfield,1999
)runningon
ourdepartmentserver:
http://pbil.univ-lyon1.
fr/Rweb/Rweb.general.html
.
LinkstothecompleteRcodeanddatasetsare
providedonthewebpage.Theycanbeusedto
downloadthesefilesandusethemlocallyifdesired.
ThefullRcode(thecollectionofallcodesnippets)
isavailabledirectlyfrom
http://pbil.univ-lyon1.
fr/TTGE/allCode.R
.

PCA,BGAandpermutationtest
Thefirstcodesnippetinthereproducibilitypage
readstheTTGEdatatable,buildsthefactordescrib-
ingthepoultryfarms,andcomputesthethreeanal-
yses(PCA,BGA,andWGA).ThePCAisfirstdone
usingthe
dudi.pca
functionofthe
ade4
package,
andtheresultingobjectispassedtothe
between
and
within
functionstocomputeBGAandWGArespec-
tively.TheBGApermutationtestisthencomputed
andthetestoutputisplotted.
Onthereproducibilitywebpage,thesesucces-
sivestepsareexplained:linkstothedatafilesare
presented,theRcodeisdisplayedandcommented,
andthe"Doitagain!"buttoncanbeusedtoredothe
computations.Thecodecanbefreelymodifiedby
theuserandexecutedagain.Forexample,itispossi-
bletochangethescalingofthePCA,orthenumber
ofrandompermutationsintheMonteCarlotestto
checktheinfluenceoftheseparametersonanalysis
outputs.
Figure1showstheresultofthepermutationtest
oftheBGA.Thenullhypothesisisthatthereisno
differencebetweenfarms.Thetestchecksthatthe

ISSN2073-4859

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

observedvalueoftheratioofbetween-grouptoto-
talinertia(0.67)ismuchhigherthanexpectedun-
derthenullhypothesis.Underthenullhypothesis,
mitepoolscanbepermutedrandomlyamongfarms
withoutchangingsignificantlytheratioofbetween
tototalinertia.Tocomputethetest,therowsofthe
dataframearerandomlypermuted,andtheratiois
computedagain.Thisisdonemanytimes,togetan
ideaofthedistributionofthebetweentototalinertia
ratio.Figure1showsthattheobservedvalue(black
diamond,farontheright)ismuchhigherthanall
thevaluesobtainedafterpermutingthepools.The
p
-valueis0.001,implyingthatthehypothesisofno
differencesbetweenthefarmscanconfidentlybere-
jected.

Figure1:PermutationtestoftheBGA.Theobserved
valueofthebetween-grouptototalinertiaratiois
equalto0.67(blackdiamondontheright).Thehis-
togramontheleftshowsthedistributionof1000val-
uesofthisratioobtainedafterpermutingtherowsof
thedatatable.
BGAplots
ThesecondcodesnippetdrawsFigure2,showing
thefactormapsoftheBGA.
TheloadingsofTTGEbandsareinthe
bga1$co
dataframe,andtheyareplottedusingthe
s.label
function(firstpanel).Togetanideaofthedisper-
sionofthesixmitepoolsineachfarm,wecanplot
theprojectionofeachpoolonthefactormap(second
panel).Therowscoresofthepoolsareinthe
bga1$ls
dataframe,andtwographsaresuperimposed:the
graphofpoolstars(with
s.class
),andthegraphof
convexhullssurroundingthepoolsbelongingtothe
samefarm(with
s.chull
).Wecanseethat,asthe
permutationtesthadjustevidenced,thefarmsare
indeedverydifferent.

TheRJournalVol.2/1,June2010

74

Herealso,theusercanveryeasilychangetheR
codetodrawalternativegraphs.Itispossible,for
example,toexploretheeffectoftheargumentstothe
s.label
and
s.class
functions.
Interpretingthedifferencesbetweenfarmsev-
idencedinFigure2wasnoteasy.All13farms
arestandardpoultryfarms,usingexactlythesame
breedingconditions,andthedifferencesbetween
TTGEbandingpatternscouldnotbeattributedto
anyotherfactors.
Sincetheaimofthestudywastofindthecom-
monbacterialfloraassociatedwith
D.gallinae
instan-
dardbreedingconditions,wedecidedtoremovethis
unwantedbetween-farmeffectbydoingawithin-
groupanalysis.

WGAandclusteranalysisplots
ThethirdcodesnippetdrawsFigure3,showingthe
factormapsofWGAandthedendrogramoftheclus-
teranalysisonTTGEbandloadings.
Theloadingsofthe55TTGEbandsareinthe
wga1$co
dataframe,andtheyareplottedusingthe
s.label
function(top-leftpanel).Thescoresofthe
73mitepoolsareinthe
wga1$li
dataframe.Theyare
groupedbyconvexhulls,accordingtothepoultry
farmfromwhichtheyoriginateusingthe
s.class
and
s.chull
functions(top-rightpanel).Therow
scoresoftheWGAarecenteredbygroup,sothe13
farmsarecenteredontheorigin(thiscorrespondsto
thefactthatthe"farmeffect"hasbeenremovedin
thisanalysis).
TheTTGEbandscorrespondingtothecommon
dominantbacterialfloraassociatedwith
D.gallinae
in
standardbreedingconditionswereselectedonFig-
ure3(top-leftpanel),usingclusteranalysis(lower
panel).Thiswasdoneusingthecompletelink-
agealgorithm,witheuclideandistancescomputed
onWGAvariableloadingsonthefirstthreeaxes
(wga1$co)andnotonrawdata.
Thereproducibilitypagecanbeusedtocompare
theseresultsandtheonesobtainedwithotherclus-
teringalgorithmsorotherdistancemeasures.An-
otherpossibilityistochecktheeffectofthenum-
berofaxesonwhichvariableloadingsarecomputed
(threeaxesintheRcodeproposedbydefault).
Weincludedthetwoleftmostoutgroups(bands
0.15,0.08,0.11,0.51,and0.88)andtherightgroup
(0.38,0.75,0.03,0.26,0.41,0.59,0.13,0.32,0.23,0.21,
0.31,0.28,and0.3).Band0.39wasexcludedbecause
itwaspresentinonlytwomitepoolsbelongingto
onesinglefarm.
Theseresultsshowthatthereisastrongbetween-
farmeffectintheTTGEbandingpatternscorre-
spondingtothebacterialfloraassociatedwith
D.gal-
linae
instandardbreedingconditions.However,it
wasnotpossibletogiveasatisfyingbiologicalinter-
pretationofthiseffect:thefarmsevidencedbythe
BGAdidnotshowanyparticularcharacteristic.The

ISSN2073-4859

48

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

Figure2:FactormapsofBGA(x-axis=firstprincipalcomponent,y-axis=secondprincipalcomponent,inertia
percentages:20%and17%).Thescaleisgivenbythevalued(top-rightcorner)thatrepresentsthesizeof
thebackgroundgrid.Thefirstpanelshowsthemapofthe55TTGEbands(labelscorrespondtotheposition
ofthebandontheelectrophoresisgel).Thesecondpanelshowsthe73mitepools,groupedbyconvexhulls
accordingtothepoultryfarmfromwhichtheyoriginate(labelscorrespondtothefarms).

WGAcouldremovethisfarmeffect,andrevealed
thecommondominantbacterialflora(
Valiente-Moro
etal.,2009
).Theknowledgeofthisflorawillallow
ustocomparethevariationsobservedindifferent
conditions,suchasdifferentgeographicregions,dif-
ferenttypesofbreedingfarms(forexampleorganic
farms),ordifferentdevelopmentalstagesof
D.galli-
nae
.Datafromfarmsindifferentgeographicregions
andfromorganicfarmsarealreadyavailable,and
apaperpresentingtheresultsoftheiranalysishas
beensubmittedforpublication.
Discussion
Statisticalmethodology
Fromthepointofviewofstatisticalmethodology,
BGAappearsasarobustmethodthatcanbeused
torevealanyevidencefor,andtest,asimpleeffect,
namelytheeffectofasinglefactor,inamultivariate
datatable.Manyrecentexamplesofitsusecanbe
foundintheareaofGenomics(
Culhaneetal.,2002
,
2005
;
Batyetal.,2005
;
Jefferyetal.,2007
).
FurthermoreWGAcaneliminateasimpleeffect
fromamultivariatedatatable.LikeBGA,itcanbe
usedevenwhenthenumberofcasesislessthanthe
numberofvariables.ArecentexampleofWGAin
theareaofGenomicscanbefoundin
Suzukyetal.
(2008
).
BGAcanbeseenasaparticularcaseofredun-
dancyanalysis[RDA,
StewartandLove(1968
)]and
WGAasaparticularcaseofpartialRDA[CANOCO
(
terBraakandŠmilauer,2002
)].Bothcasescorre-

TheRJournalVol.2/1,June2010

spondtocovariatesreducedtoasingledummyvari-
.elbaThiscanbecheckedwith
vegan
,anotherclassi-
calRpackageformultivariateecologicaldataanaly-
sis,usingthe
rda
function,asexplainedontheonline
reproducibilitypageincodesnippet4.Outputsare
notpresentedhere,buttheyareavailableonthatsite,
wheretheymaybeinteractivelyexplored.
Computersetup
Thereproducibilitywebpagepresentedhereisjust
anexampleofuseofasimplesetupthatcanbeim-
plementedeasilyonanywebserver.Thecurrent
configurationofourserverisanoldSunFire800,
with8UltraSparcIIIprocessorsat900MHz,28GB
ofmemoryand6x36GBdisks.Fromthepointof
viewofsoftware,theserverisrunningSolaris10,the
HTTPserverisApache2.2.13,andweuseRweb1.03
andR2.11.0.RwebismainlyasetofCGIscriptswrit-
teninPerl,andtheinstalledPerlversionisv5.8.4.
ThepageitselfiswritteninplainHTMLcode,using
standardHTMLformstocommunicatewithRweb
CGIscripts.
RwebsourcecodeisavailablefromtheMontana
StateUniversity(
http://bayes.math.montana.edu/
Rweb/Rweb1.03.tar.gz
).
TherehasbeenmanyotherattemptsatmixingR
andHTMLontheweb.Alternativesolutionscould
forexamplemakeuseof
CGIwithR
(
Firth,2003
)and
R2HTML
(
Lecoutre,2003
),orRscript.
Runningaserverofferingpubliccomputingser-
vicesnecessarilyraisesmanysecurityissues.The
serverthatisusedforRwebatthePBIL(
http://

ISSN2073-4859

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

94

Figure3:FactormapsofWGA(x-axis=firstprincipalcomponent,y-axis=secondprincipalcomponent,inertia
percentages:14and8).Thescaleisgivenbythevalued(top-rightcorner)thatrepresentsthesizeoftheback-
groundgrid.Top-leftpanel:mapofthe55TTGEbands(seelegendtoFigure2).Top-rightpanel:mapofthe
73mitepools,groupedbyfarms.Bottompanel:dendrogramoftheclusteranalysisonWGAbandloadings.

pbil.univ-lyon1.fr
)alsooffersmanyothercom-graphicaldisplay.Itismoreandmorefrequently
putingservicesintheareaofGenomicresearch,in-used,andnotonlyforreproducibleresearchinthe
cludingcomplexdatabaseexplorationandanalysisstrictestsense.Forexample,itisusedforqualitycon-
ofhugemoleculardatabases(GenBank,EMBL,etc.)trolandforkeepingup-to-datethedocumentsused
Thisserverhasbeenattackedandsuccessfullycom-inthestatisticalteachingmodulesoftheBiometry
promisedseveraltimeswithcommonrootkits,butinandEvolutionaryBiologydepartmentattheUniver-
eightyears[westartedtheRwebservicein2002,
Per-
sityofLyon,France:
http://pbil.univ-lyon1.fr/
rièreetal.(2003
)],theRwebserverhasneverbeen
R/enseignement.html
(inFrench).
usedintheseattacks.RwebprecludestheuseofUsingSweavedocuments,however,isveryde-
themostsensitivefunctions(particularly"system",mandingforusers.L
A
TEXandRmustbeinstalled
"eval","call","sink",etc.)andanadditionalsecu-onthelocalcomputer,andusersmusthaveatleast
ritymeasuretakenonourserveristhatallcompu-somenotionsofhowtocompilethesedocuments.
tationsareperformedinatemporarydirectorythatTheversionofRandthelistofpackagesmustbe
iscleanedupautomaticallyafteruse.compatiblewiththeRcodeincludedintheSweave
document.Thiscanbeaseriousobstacleformany
Onlinereproducibility
researchers,forexampleinBiologyandEcology.
Theonlinereproducibilitysolutionpresentedin
Sweaveisaverygoodsolutionforthereproducibil-thispaperismucheasiertouseforpeoplewhodo
ityofscientificpapers’statisticalcomputationandnotknowL
A
TEXandwhohaveonlyvaguenotionsof

TheRJournalVol.2/1,June2010

ISSN2073-4859

05

R.Itcanbeusedwithanywebbrowseranditcan
maketheanalysesusedinanappliedstatisticspaper
accessibletoanyone.Itcanevenbeagoodtoolto
helppeoplegetabetterknowledgeofRandencour-
agethemtoinstallRanduseitlocally.
Rwebservers
ManyexamplesoftheuseofthePBILRwebserver
areavailableonthehomepageofJ.R.Lobry,
http:
//pbil.univ-lyon1.fr/members/lobry/
.Thearti-
clesunderthetag“[ONLINEREPRODUCIBILITY]”
arereproduciblethroughareproducibilitywebpage,
usingHTMLformsandRwebasexplainedinthispa-
.repWritingsuchwebpagesisveryeasy,anddoesnot
requireR,Rweboranyotherparticularsoftwareto
beinstalledontheHTTPserver.TheHTMLcode
belowisaminimalexampleofsuchapage.Itcan
beusedwithanywebserverusing,forexample,the
apache2HTTPserver,orevenlocallyinanyweb
browser:
<html>
<head><title>Rweb</title></head>
<body>
<pstyle="font-size:30px">
RwebserveratPBIL</p>
<formonSubmit="returncheckData(this)"
action="http://pbil.univ-lyon1.fr/cgi-bin/
Rweb/Rweb.cgi"
enctype="multipart/form-data"
method="post">
<textareaname="Rcode"rows=5cols=80>
plot(runif(100))
</textarea><br/>
<inputtype="submit"value="Runit!">
</form>
</body>
</html>
Themainfeatureofthiscodeisthe
form
tagthat
declarestheCGIscriptoftheRwebserver:
http://
pbil.univ-lyon1.fr/cgi-bin/Rweb/Rweb.cgi
.The
textarea
taginthisformcontainstheRcode:here
bydefaultitisjustoneplotfunctioncallthatdraws
100randompoints.Thisdefaultcodecanbechanged
intheHTMLdocument,ormodifiedinteractivelyby
theuserwhenthepageisloadedinawebbrowser.
The
input
tagdefinesthe“Runit!”buttonthatcan
beusedtoexecutetheRcode.Whentheuserclicks
onthisbutton,thecodeissenttotheRwebCGIscript
andexecutedbyR.Thebrowserdisplayisupdated,
andtheexecutionlistingandthegraphappear.
AnothersolutionistoinstallRwebandrunalo-
cal(publicorprivate)Rwebserver.
Maintenanceanddurability
Animportantproblemforreproducibleresearchis
durability.Howlongwillareproducibleworkstay
reproducible?

TheRJournalVol.2/1,June2010

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

Theanswerdependsontheavailabilityofthe
softwareneededtoredothecomputationsandthe
graphics.Randcontributedpackagesevolve,and
Rcodecanbecomedeprecatedifitisnotupdated
onaregularbasis.Sweavehasbeenusedmainly
forjustthisreasoninthedocumentsusedinthesta-
tisticalteachingmodulesattheUniversityofLyon.
Allthedocumentsarerecompiledautomaticallyto
checktheircompatibilitywiththecurrentversionof
.RForanonlinereproducibilitysystemtoendure,
thesameconditionsapply.TheBiometryandEvo-
lutionaryBiologydepartmenthasbeenrunningan
Rwebservercontinuouslysince2002,regularlyup-
datingthevariouspiecesofsoftwareonwhichitde-
pends(R,contributedpackages,Rweb,Perl,Apache,
etc.).
Reproducibleresearchneedsanongoingeffort;
reproducibilityceaseswhenthiseffortisnotsus-
tained.

Acknowledgment
WethanktheEditorandtwoanonymousreview-
ersformanyusefulcommentsandsuggestionsthat
helpedimprovethefirstversionofthispaper.

Bibliography
J.Banfield.Rweb:web-basedstatisticalanalysis.
JournalofStatisticalSoftware
,4(1):1–15,1999.
F.Baty,M.P.Bihl,G.Perrière,A.C.Culhane,and
M.H.Brutsche.Optimizedbetween-groupclassi-
fication:anewjackknife-basedgeneselectionpro-
cedureforgenome-wideexpressiondata.
Bioinfor-
matics
,6:1–12,2005.
J.P.Benzécri.Analysedel’inertieintra-classepar
l’analysed’untableaudecorrespondances.
Les
Cahiersdel’Analysedesdonnées
,8:351–358,1983.
J.B.BuckheitandD.L.Donoho.Wavelab
andreproducibleresearch.Tech.rep.474,
Dept.ofStatistics,StanfordUniversity,1995.
URL
http://www-stat.stanford.edu/~wavelab/
Wavelab_850/wavelab.pdf
.
D.CharifandJ.Lobry.SeqinR1.0-2:acontributed
packagetotheRprojectforstatisticalcomputing
devotedtobiologicalsequencesretrievalandanal-
ysis.InH.R.U.Bastolla,M.PortoandM.Ven-
druscolo,editors,
Structuralapproachestosequence
evolution:Molecules,networks,populations
,Biolog-
icalandMedicalPhysics,BiomedicalEngineer-
ing,pages207–232.SpringerVerlag,NewYork,
2007.URL
http://seqinr.r-forge.r-project.
org/
.ISBN:978-3-540-35305-8.

ISSN2073-4859

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

C.Chauve.Thepoultryredmite
Dermanyssusgal-
linae
(DeGeer,1778):currentsituationandfuture
prospectsforcontrol.
VeterinaryParasitology
,79:
239–245,1998.
D.Chessel,A.-B.Dufour,andJ.Thioulouse.The
ade4package-I-One-tablemethods.
RNews
,4:5–
10,2004.
A.C.Culhane,G.Perrière,E.C.Considine,T.G.Cot-
terandD.G.Higgins.Between-groupanalysis
ofmicroarraydata.
Bioinformatics
,18:1600–1608,
.2002A.C.Culhane,J.Thioulouse,G.Perrière,andD.G.
Higgins.MADE4:anRpackageformultivariate
analysisofgeneexpressiondata.
Bioinformatics
,21:
2789–2790,2005.
J.deLeeuw.Reproducibleresearch:thebottomline.
Technicalreport,UCLAStatisticsProgram,2001.
URL
http://preprints.stat.ucla.edu/301/301.
.fdpS.DolédecandD.Chessel.Rythmessaisonnierset
composantesstationnellesenmilieuaquatiqueI-
descriptiond’unpland’observationscompletpar
projectiondevariables.
ActaOecologica,Oecologia
Generalis
,8(3):403–426.,1987.
S.DrayandA.-B.Dufour.Theade4package:Imple-
mentingthedualitydiagramforecologists.
Jour-
nalofStatisticalSoftware
,22(4):1–20,2007.URL
http://www.jstatsoft.org/v22/i04
.
Y.Escoufier.Thedualitydiagramm:ameansof
betterpracticalapplications.InP.Legendreand
L.Legendre,editors,
Developmentinnumericalecol-
ogy
,NATOadvancedInstitute,SerieG,pages139–
156.SpringerVerlag,Berlin,1987.
D.Firth.CGIwithR:Facilitiesforprocessingweb
formsusingR.
JournalofStatisticalSoftware
,8(10):
1–8,2003.URL
http://www.jstatsoft.org/v08/
.01iR.GentlemanandD.T.Lang.Statisticalanalysesand
reproducibleresearch.
BioconductorProjectWorking
Papers.
,May2004.URL
http://www.bepress.com/
bioconductor/paper2
.
S.Holmes.Multivariateanalysis:TheFrenchway.In
D.NolanandT.Speed,editors,
FestschriftforDavid
Freedman
,pages1–14.IMS,Beachwood,OH,2006.
I.B.Jeffery,S.F.Madden,P.A.McGettigan,G.Per-
rière,A.C.CulhaneandD.G.Higgins.Integrating
transcriptionfactorbindingsiteinformationwith
geneexpressiondatasets.
Bioinformatics
,23:298–
305,2007.
A.Kirkwood.Anaemiainpoultryinfestedwith
theredmite
Dermanyssusgallinae
.
TheVeterinary
Record
,80(514-516),1967.

TheRJournalVol.2/1,June2010

15

D.Knuth.LiterateProgramming.Centerfor
theStudyofLanguageandInformation,Stanford,
California,1992.
E.Lecoutre.TheR2HTMLPackage.
RNews
,3(33-36),
.3002F.Leisch.Sweave,PartI:MixingRandL
A
TEX
RNews
,
2(28-31),2002.
K.S.Muyzer,G.and.Applicationofdenaturinggra-
dientgelelectrophoresis(DGGE)andtemperature
gradientgelelectrophoresis(TGGE)inmicrobial
ecology.
AntonievanLeeuwenhoek
,73:127–141,1998.
G.Perrière,C.Combet,S.Penel,C.Blanchet,
J.Thioulouse,C.Geourjon,J.Grassot,C.Charavay,
M.Gouy,L.Duret,andG.DeléageIntegrateddata-
banksaccessandsequence/structureanalysisser-
vicesatthePBIL.
NucleicAcidsResearch
,31:3393–
3399,2003.
RDevelopmentCoreTeam.
R:ALanguageandEnvi-
ronmentforStatisticalComputing
.RFoundationfor
StatisticalComputing,Vienna,Austria,2009.URL
http://www.R-project.org
.ISBN3-900051-07-0.
D.K.Stewart,andW.A.Love.Ageneralcanonical
correlationindex.
PsychologicalBulletin
,70:160–
163,1968.
A.Suau,R.Bonnet,M.Sutren,J.Godon,G.R.Gib-
son,M.D.Collins,andJ.Doré.Directanalysisof
genesencoding16srRNAfromcomplexcommu-
nitiesrevealsmanynovelmolecularspecieswithin
thehumangut.
Appl.Env.Microbiol.
,65:4799–4807,
.9991H.Suzuky,C.J.Brown,L.J.Forney,andE.M.Top.
Comparisonofcorrespondenceanalysismethods
forsynonymouscodonusageinbacteria.
DNA
Research
,(inpress),2008.doi:10.1093/dnares/
dsn028.
C.J.F.terBraakandP.Šmilauer.
CANOCOReference
ManualandCanoDrawforWindowsUser’sGuide:
SoftwareforCanonicalCommunityOrdination(ver-
sion4.5).
MicrocomputerPower,IthacaNY,USA,
.2002J.Thioulouse,D.Chessel,S.Doledec,and
J.M.Olivier.ADE-4:amultivariateanalysis
andgraphicaldisplaysoftware.
Statisticsand
Computing
,7:75–82,1997.
J.ThioulouseandS.Dray.Interactivemultivariate
dataanalysisinRwiththe
ade4
and
ade4TkGUI
packages.
JournalofStatisticalSoftware
,22(5):1–
14,102007.URL
http://www.jstatsoft.org/v22/
.50iC.Valiente-Moro,C.Chauve,andL.Zenner.Vec-
torialroleofsomedermanyssoidmites(Acari,
Mesostigmata,Dermanyssoidea).
Parasite
,12:99–
109,2005.

ISSN2073-4859

25

C.Valiente-Moro,C.Chauve,andL.Zenner.Ex-
perimentalinfectionof
SalmonellaEnteritidis
bythe
poultryredmite,
Dermanyssusgallinae
.
Veterinary
Parasitology
,31:329–336,2007.
C.Valiente-Moro,J.Thioulouse,C.Chauve,P.Nor-
mand,andL.Zenner.Bacterialtaxaassociated
withthehematophagousmite,
Dermanyssusgalli-
nae
detectedby16srDNAPCRamplificationand
TTGEfingerprint.
ResearchinMicrobiology
,160:63–
70,2009.

JeanThioulouse
UniversitédeLyon,F-69000,Lyon;UniversitéLyon1;
CNRS,UMR5558,BiométrieetBiologieEvolutive,
F-69622VilleurbanneCedex,France.
jean.thioulouse@univ-lyon1.fr

TheRJournalVol.2/1,June2010

C
ONTRIBUTED
R
ESEARCH
A
RTICLES

http://pbil.univ-lyon1.fr/JTHome/
ClaireValiente-Moro
EcoleNationaleVétérinairedeLyon,
1avenueBourgelat,69280Marcyl’Etoile,France.
UniversitédeLyon,F-69000,Lyon;UniversitéLyon1;
CNRS,UMR5557,ÉcologieMicrobienne,
F-69622VilleurbanneCedex,France.
claire.valiente-moro@univ-lyon1.fr
.
LionelZenner
EcoleNationaleVétérinairedeLyon,
1avenueBourgelat,69280Marcyl’Etoile,France.
UniversitédeLyon,F-69000,Lyon;UniversitéLyon1;
CNRS,UMR5558,BiométrieetBiologieEvolutive,
F-69622VilleurbanneCedex,France.
l.zenner@vet-lyon.fr

ISSN2073-4859