The statistical analysis of truncated and censored data under serial dependence [Elektronische Ressource] / vorgelegt von Ewa Strzalkowska-Kominiak
158 Pages
English
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

The statistical analysis of truncated and censored data under serial dependence [Elektronische Ressource] / vorgelegt von Ewa Strzalkowska-Kominiak

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer
158 Pages
English

Description

The Statistical Analysis of Truncated AndCensored Data Under Serial DependenceInaugural - Dissertationzur Erlangung des Doktorgradesan den Naturwissenschaftlichen Fachbereichen(Mathematik)der Justus-Liebig-Universita¨t Gießenvorgelegt vonEwa Strzalkowska-Kominiakbetreut vonProf. Dr. Winfried StuteGießen, Januar 2008D-26Dekan: Prof. Dr. Bernd Baumann1. Berichterstatter: Prof. Dr. Winfried Stute2. Berichterstatter: Prof. Dr. Erich Ha¨uslerIch danke Herrn Prof. Dr. Stute fu¨r die her-vorragende Betreuung der Doktorarbeit,seine Zeit und seine Hilfe.¨Ich mochte mich auch bei meinen Eltern¨und meiner Schwester sehr herzlich furdie Unterstu¨tzung bedanken.Contents1 Introduction And Main Results 21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Simulations 112.1 The Marshall and Olkin Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Simulations for the estimator of a d.f. for differentα andN . . . . . . . . . . . . . . 122.3 Comparison betweenF (x,y) and the standard empirical estimator . . . . . . . . . . 21n2.4 Estimation of correlation coefficients and expectations . . . . . . . . . . . . . . . . 253 Proofs 274 A Functional Central Limit Theorem 69A Basic Properties ofF (t) 971n2A.1 Bounds for(F −F ) (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991n 12A.

Subjects

Informations

Published by
Published 01 January 2008
Reads 4
Language English
Document size 1 MB

Exrait

The Statistical Analysis of Truncated And
Censored Data Under Serial Dependence
Inaugural - Dissertation
zur Erlangung des Doktorgrades
an den Naturwissenschaftlichen Fachbereichen
(Mathematik)
der Justus-Liebig-Universita¨t Gießen
vorgelegt von
Ewa Strzalkowska-Kominiak
betreut von
Prof. Dr. Winfried Stute
Gießen, Januar 2008D-26
Dekan: Prof. Dr. Bernd Baumann
1. Berichterstatter: Prof. Dr. Winfried Stute
2. Berichterstatter: Prof. Dr. Erich Ha¨uslerIch danke Herrn Prof. Dr. Stute fu¨r die her-
vorragende Betreuung der Doktorarbeit,
seine Zeit und seine Hilfe.
¨Ich mochte mich auch bei meinen Eltern
¨und meiner Schwester sehr herzlich fur
die Unterstu¨tzung bedanken.Contents
1 Introduction And Main Results 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Simulations 11
2.1 The Marshall and Olkin Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Simulations for the estimator of a d.f. for differentα andN . . . . . . . . . . . . . . 12
2.3 Comparison betweenF (x,y) and the standard empirical estimator . . . . . . . . . . 21n
2.4 Estimation of correlation coefficients and expectations . . . . . . . . . . . . . . . . 25
3 Proofs 27
4 A Functional Central Limit Theorem 69
A Basic Properties ofF (t) 971n
2A.1 Bounds for(F −F ) (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991n 1
2A.2 Bounds for(F −F ) (Z ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211n 1 k
A.3 Linearization ofF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1331n
1Chapter 1
Introduction And Main Results
1.1 Introduction
Survival analysis is the part of statistics, in which the variable of interest may often be interpreted as
the time elapsed between two events. Such ”lifetimes” typically appear in a medical or an engineering
context. E.g., a quantity U may denote the time between infection and the onset of a disease. In
engineering,U may be the time a technical unit was on test until failure occurred. Since in each case
U is a random variable one may be interested in distributional properties ofU. A typical feature of
such lifetime data analysis is that due to time limitationsU may not always be observable. Hence the
available data only provide partial information and, as a consequence, standard statistical procedures
are not applicable. Maybe the most famous example is random (right) censorship where instead of
U one observes min(U,C) and δ = 1 , in which C is a censoring variable and the indicator{U≤C}
reveals the information which of U and C was actually observed. Another important example is
random truncation, in which U is observed only if U ≤ D, where D is the associated truncating
variable. In each case standard empirical estimators attaching equal weights to the observations are
not recommendable and need to be replaced by others taking into account the actual structure of the
data. Typically, this results is a complicated reweighting of the observations leading to estimators with
distributions which are not easy to handle.
In many situations, when one observes patients over time, one may be interested in consecutive times
X ≤X ≤X ≤... denoting the beginning of different phases in the development of a disease. E.g.,1 2 3
in HIV studies,X could be the time of infection,X the time when antibodies occur (seroconversion)1 2
andX the time when AIDS is diagnosed. Let3
U =X −X and U =X −X1 2 1 2 3 2
denote the length of each period. Typically, we may expect some dependence betweenU andU . Let1 2
F denote the unknown bivariate distribution function (d.f.) of(U ,U ):1 2
F(x ,x ) =P(U ≤x ,U ≤x ), x ,x ∈R.1 2 1 1 2 2 1 2
More generally, we may be interested in integralsZ
I = ϕdF
w.r.t. F , whereϕ is a given score function. E.g., if we takeϕ(x ,x ) = x x , we obtain an integral1 2 1 2
which is part of the covariance ofU andU . Given a sample (U ,U ), 1≤ i≤ N, of independent1 2 1i 2i
replicates of(U ,U ), the standard empirical estimator ofI becomes1 2
2NX1
I = ϕ(U ,U ).N 1i 2i
N
i=1
In a practical situation, theU’s may not be all observable. For example, ifE denotes the end of the
study, and if we set Z = E −X , then the patient becomes part of the study only if U ≤ Z. In1 1
other words,U may be truncated from the right byZ and hence gets lost ifU >Z. If no truncation1 1
occurs, bothU andZ will be observed. As toU , this variable will be available only ifU +U ≤Z.1 2 1 2
Otherwise we observeZ−U . Hence given thatU is not truncated,U is at risk of being censored.1 1 2
SinceU andU may be dependent we obtain some kind of dependent censorship.1 2
To summarize the data situation, for each person, we have three sequentially observed data X ≤1
X ≤X giving rise toU andU . As before, letE denote the end of the study. The following figure2 3 1 2
then displays the possible data structures depending on the location ofE:
-
X U X U X1 1 2 2 3
- no truncation and censoring
X X X E1 2 3
- no truncation but censoring
X X E X1 2 3
- truncation and censoring
X E X X1 2 3
Figure 1.1: Possible Data Structures
LetN be the number of people at risk. Since, under possible truncation,N is unknown, we also have
to introducen, the number of actually observed cases. Denoting withα =P(U ≤Z) the probability1
of non-truncation,n is a binomial random variable with parametersα andN. Typically, for truncated
data, the statistical analysis will be conditionally on a givenn. In terms ofn, we are given a sample
˜ ˜(U ,U ),Z andδ , 1≤i≤n, whereU equalsU when no censoring occurs. Otherwise,1i 2i i i 2i 2i
˜U =Z −U .2i i 1i
Finally,δ = 1 .i {U ≤Z −U }2i i 1i
It is the goal of our work to derive an estimatorF ofF given the above data. As a second step wen
shall study estimators Z
I = ϕdFn n
ofI. For the computation of confidence intervals forI, one needs to compute or at least approximate
the distribution ofI . For this, we shall derive a representation ofI as a sum of i.i.d. summands plusn n
remainder. After that we may apply the Central Limit Theorem (CLT) to the leading term to obtain
1/2asymptotic normality ofn (I −I).n
31.2 Main Theorem
In this section we develop an estimatorI ofI through identifiability ofF . This means, that we are ton
find a representation ofF in terms of estimable quantities.
For this, recall thatX ≤X ≤X are three consecutive times such that we are interested in1 2 3
U =X −X and U =X −X .1 2 1 2 3 2
As before denote withF the distribution function (d.f.) of(U ,U ):1 2
F(x ,x ) =P(U ≤x ,U ≤x ).1 2 1 1 2 2
Let
F (x ) :=P(U ≤x )1 1 1 1
and
F (x ) :=P(U ≤x )2 2 2 2
be the associated marginal d.f.’s. LetE denote, as before, the end of the study so that
Z :=E−X ∼G1
denotes the time elapsed betweenX andE. It is assumed throughout that (U ,U ) is independent of1 1 2
Z andZ is observed always whenU is observed, whetherU is censored or not. Note, however, that1 2
sinceU is observed only whenU ≤Z, truncation may cause some dependence between the actually1 1
observedU andZ. As before, write1
α =P(U ≤Z)1
for the probability, that(U ,Z) can be observed. In addition to truncation, whenU ≤Z, the random1 1
variableU is at risk of being censored from the right byZ−U =E−X . In other words, we only2 1 2
have access to
˜U = min(U ,Z−U ) (1.1)2 2 1
Since in generalU andU will be dependent and, at the same time, the observedU also depends on1 2 1
˜Z, equation (1.1) incorporates a kind of dependent censorship. Along with (U ,Z) andU we also1 2
observe
1, ifU is uncensored2δ = 1 ={U +U ≤Z}1 2 0, otherwise
˜It is the purpose of this work to reconstructF from a sample of independent replicates of(U ,Z),U1 2
andδ. Actually, our target will be Z
I = ϕdF,
whereϕ is a given score function. In particular, whenϕ is the indicator of the rectangle (−∞,x ]×1
(−∞,x ], we are back atI =F(x ,x ).2 1 2
4For identifiability ofF , we also need some sub-distributions connected withF andG. Set
1 ˜H (x,y) = P(U ≤x,U ≤y,δ = 1|U ≤Z)1 2 12
= P(U ≤x,U ≤y,U +U ≤Z|U ≤Z)1 2 1 2 1
−1= α P(U ≤x,U ≤y,U +U ≤Z)1 2 1 2Z Zx y
−1 −= α [1−G(x +x ) ]F(dx ,dx ),1 2 1 2
−∞ −∞
where the last equality follows from the independence of the original(U ,U ) andZ.1 2
−Hence, provided thatsupp(ϕ)⊂{(x ,x ) :G(x +x ) < 1}, we obtain1 2 1 2
Z Z
α 1I = ϕdF = ϕ(x ,x ) H (dx ,dx ).1 2 1 22−1−G(x +x )1 2
Furthermore,
P(Z ≥x) P(Z ≥x,U ≤Z)+P(Z ≥x,U >Z)1 1−1 −α (1−G(x) ) = =
P(U ≤Z) P(U ≤Z)1 1Z
−1= P(Z ≥x|U ≤Z)+α [1−F (y)]G(dy).1 1
[x,∞)
Set
A(x) =P(Z ≥x|U ≤Z)1
and Z
−1B(x) =α [1−F (y)]G(dy).1
[x,∞)
Hence
Z Z
1 1I = ϕdF = ϕ(x ,x ) H (dx ,dx ). (1.2)1 2 1 22A(x +x )+B(x +x )1 2 1 2
The function A can be easily estimated through the empirical d.f. of an observed Z-sample. In
contrast, the functionB contains the unknownα and the unconditional d.f.’sF andG ofU andZ.1 1
To eliminate these terms we introduce Z
∗ −1G (y) =P(Z ≤y|U ≤Z) =α F (z)G(dz)1 1
(−∞,y]
so that Z
1−F (y)1 ∗B(x) = G (dy).
F (y)[x,∞) 1
∗The function G is a conditional d.f. which again is easily estimable, while F can by estimated1
through the well known Lynden-Bell estimator for truncated data.
5˜Our statistical analysis is based on a sample ofn replicates of (U ,Z),U andδ. More precisely, we1 2
assume that we are givenN independent(U ,U ) random observations from the d.f.F and a sample1i 2i
Z ofN independent random variables from the d.f. G such that theU-sample is also independent ofi
theZ-sample. We only observe(U ,Z ) ifU ≤Z . Hence the actually observed number of data is1i i 1i i
NX
n = 1 .{U ≤Z }1i i
i=1
Note that n is a binomial random variable with parameters N and α. Throughout this work our
statistical analysis will be based on a given value ofn. The distribution function of the observedZi
∗equalsG and can be estimated through
nX1∗G (y) = 1 .{Z ≤y}n in
i=1
The d.f. of an actually observedU becomes1i
∗F (x ) =P(U ≤x |U ≤Z)1 1 1 11
which may be estimated through
nX1∗F (x ) = 1 .1 {U ≤x }1n 1i 1n
i=1
The empirical analogue ofA(x) becomes
nX1
A (x) = 1 , (1.3)n {Z ≥x}in
i=1
whileB(x) is estimated through
Z
1−F (y)1n ∗B (x) = G (dy)n n
F (y)1n[x,∞)
nX1 1−F (Z )1n i
= 1 , (1.4){Z ≥x}in F (Z )1n i
i=1
whereF is the Lynden-Bell estimator ofF for right-truncated data. More precisely, since1n 1
∗F (x ) = P(U ≤x |U ≤Z)1 1 1 11 Z
−1 −= α (1−G(u ))F (du)1
(−∞,x ]1
we have
∗ −αdF = (1−G )dF11
and therefore
6∗dF1dF = .1 −1 −α (1−G )
Set
C(x) =P(U ≤x≤Z|U ≤Z). (1.5)1 1
Since
−1 −1 −C(x) =α P(U ≤x≤Z) =α F (x)(1−G(x )),1 1
we obtain
∗dF dF1 1= . (1.6)
F C1
dF1The cumulative hazard function associated with is defined as
F1Z
F (du)1
Λ(x) = .
F (u)1[x,∞)
The product-integration formula then yields
Y
c−Λ (t)F (t) =e [1+Λ{y}].1
y>t
Since by (1.6) Z ∗dF1Λ(x) = ,
C[x,∞)
the empirical counterparts become
Z n∗ X 1dF {U ≥x}1i1nΛ (x) := =n
C nC (U )[x,∞) n n 1ii=1
and, if there are no ties,
Y Y 1
F (t) = [1+Λ {y}] = 1− , (1.7)1n n
nC (U )n 1i
y>t U >t1i
where
nX1
C (x) = 1 .n {U ≤x≤Z }1k kn
k=1
1Finally, the estimator ofH (x,y) becomes2
7