La lecture en ligne est gratuite
Read Download

Share this publication

BITE: Bayesian Intensity Estimator
Reference Manual
⁄Tommi Harkanen˜ ˜
March 24, 2004
cThe BITE software and this documentation are ° Copyright 2000 by
Tommi Harkanen, Rolf Nevanlinna Institute. All rights reserved. Repro-˜ ˜
ductions for personal use are allowed. See also flle copyright.txt. The
software and the documentation are provided without warranty of any kind.
⁄Rolf Nevanlinna Institute, P.O.B. 4, FIN-00014 University of Helsinki, Finland, E-mail:
Tommi.Harkanen@RNI.helsinki.fi, tel. +358-9-191 22784, FAX +358-9-191 22779.
1Contents
1 Introduction 5
2 Terminology 6
2.1 Filtering and censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Step functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Installation and description of commands 12
3.1 Installation and running BITE . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Basic elements of the BITE syntax . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Setting simulation limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Loading data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Setting prior variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Initializing model functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Saving Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.8 Intensity model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.9 Monitoring updating of model functions . . . . . . . . . . . . . . . . . . . . 19
3.10 Calculating posterior expectation . . . . . . . . . . . . . . . . . . . . . . . . 19
3.11 Graphical output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.12 Error messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Prior distributions 23
4.1 Scalar parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Covariates and responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Model functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Gamma prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
24.3.2 Increasing gamma prior . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.3 Time independent prior . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Markov chain Monte Carlo 27
6 Examples 27
6.1 Heart transplant survival data . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.1.1 Model and estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.1.2 Data flles for BITE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.1.3 Command flle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 Survival analysis with a frailty model . . . . . . . . . . . . . . . . . . . . . . 32
6.3 Dental caries study I: standard frailty model . . . . . . . . . . . . . . . . . 36
7 The output of BITE 41
8 Discussion 43
A The grammar of BITE 45
3List of Figures
1 An example of an Markovian model. . . . . . . . . . . . . . . . . . . . . . . 6
2 An example of an intensity function ‚(t), and a possible realization from a
corresponding Poisson process. . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 An example of a step function . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 A template for a hierarchical intensity model accepted by BITE. . . . . . . . 11
5 Example of point process data. . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 The estimates of the baseline hazards ‚ and f of the heart transplant data. 290
7 Baseline hazard estimate of the kidney example. . . . . . . . . . . . . . . . 33
List of Tables
1 Poisson-likelihood contributions and the notations of the parameters in BITE. 12
2 Basic tokens of BITE grammar. . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Internal operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Available dependencies of (prior) distributions. . . . . . . . . . . . . . . . . 24
5 The probability distributions recognized by BITE. . . . . . . . . . . . . . . . 24
6 Estimates of the regression coe–cients in the heart transplant data. . . . . 29
7 Data flles of the heart transplant example. . . . . . . . . . . . . . . . . . . . 30
8 The estimates of the Bayesian model with a nonparametric baseline hazard. 34
9 The data flles of the kidney example. . . . . . . . . . . . . . . . . . . . . . . 36
10 Data flles of caries I example. . . . . . . . . . . . . . . . . . . . . . . . . . . 37
11 Some software packages for Bayesian inference. . . . . . . . . . . . . . . . . 43
12 The tokens of BITE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
41 Introduction
Markov chain Monte Carlo (MCMC) methods have had an profound efiect on statistical
methodology based on Bayesian inference (see, for example, Gelman et al. (1995)): Analyticds provide solutions only to simple models, thus the development of e–cient comput-
ers allowed statisticians to apply numerical methods to solve more complicated (and more
realistic) problems, and other n methods are convenient only in low-dimensional
problems.
Some software packages like BUGS (see Gilks et al. (1994)) have been developed for
Bayesian modelling, but they are generally unable to cope with nonparametric intensity
models. The variety of difierent (Bayesian) models is huge, thus building for all of them a
general software package which is both easy to use, exible and e–cient may be close to
impossible in the near future.
This package has been developed for estimating a subset of nonparametric Bayesian
intensity models which provide a exible framework for modelling various life history phe-
nomena with continuous distributions, such as survival analysis and event-history data. For
example, analyses on heart transplant data, see section 6.1, dental caries data, see Harka-˜ ˜
nen et al. (2000a) and H˜arkanen˜ et al. (2000b), infections in kidneys (see section 6.2), and
treatment efiects on leukemia, see Harkanen (2000). For review of the theoretical back-˜ ˜
ground of intensity models see for example, Arjas (1989) or Andersen et al. (1993) which
contains more examples about intensity models, and classical estimation procedures.
Section 2 contains an introduction to the notations. The response is a point process in
one-dimensional space (time) which may be independently and non-informatively censored
or flltered. The probabilistic model for the point process responses is a Poisson process
(which will be presented in section 2.2) parametrized by a non-negative intensity function
which is here approximated by a piecewise constant function (see subsection 2.3). The class
of piecewise constant functions provides a good framework for intensity models, because
addition or multiplication of those functions results again a piecewise constant function. A
time independent intensity (yielding a homogenous Poisson process) is a special case of a
piecewise constant function.
Some elements of the point process can be random, and also time independent covariates
and startpoints can be given prior distributions with some limitations. This topic, as well
as the prior distributions for the intensity functions, will be introduced in section 4.
Variants of Metropolis-Hastings algorithm (which is described in Gilks et al. (1996)
among other MCMC methods) are used for generating a sample from a Markov chain which
has the posterior distribution as the limiting (invariant) distribution. Posterior expectations
can then be approximated by appropriate averages of the sample. This approach (as well
as most numerical methods) has drawbacks, which will be discussed more in section 5.
Section 3 describes the syntax of the commands which can be used for describing (in-
tensity) models, and some examples about using BITE are presented in section 6. Output
of BITE is presented in section 7. Discussion in section 8 lists some software for Bayesian
inference.
5
6
2 Terminology
The basic unit for models is an individual i = 0; 1;:::;N¡ 1, where N is the number
of individuals. The individuals can experience events (T;X), where T is the event time
and X the event type. The event types are called marks: the response is a marked point
process (MPP) with a flnite mark space X2 E :=fE ; j2Jg, where the index setJj
contains M elements. The response data is considered to be an N£M matrix whose
elements contain the marked point process observations of individual i and of markij
j: the elements are more complicated than just real numbers of an ordinary matrix, and
they are described in more detail below. In order to keep the notation more readable, the
observations corresponding to individual i and mark j are denoted by D := .ij
For example, in a simple survival analysis the mark space has only one element (death)
with just one event time for each individual, and after death the individual is no longer in
the risk set. In a repeating events case, for example when observing epileptic seizures, the
mark space has again just one element (seizure), but there can be many event times, and
the individual remains in the risk set until the end of the follow-up.
EPSfrag replacements 1;2
A A1 2
healthy ill
E2;1
E E1;3 2;3
A3
dead
Figure 1: An example of an Markovian model.
Figure 1 presents a directed acyclic graph (DAG) of a sample illness-death model. In
each time point individual i is in one of the following states A :="healthy", A :="ill", and1 2
A :="dead". The events can be presented as transitions between states: E 0 00 := \move3 k ;k
0 00fromA toA ", and the transition times are called event timesT . Since eventsE andk 3;1k k
E are not possible, and the transitionsE 0 0 are counted as \nothing happened",M = 4.3;2 k ;k
0 00 000 00 0If an individuali is in stateA at timet, the transitions (E ) are not possible ink k ;k k =k
the very near future (t;t +dt): The individual is not in the risk sets of those event types,
00until individual i moves to state A .k
A counting process N(t) is an equivalent presentation of a marked point process which
is a sequence ofK event timesT on . A counting process tells how many events of typek +
j are counted by time t:
KX
N(t) := 1fT •t;X =jg; N(0) := 0:k k
k=1
The response is often thought as counting processes N(¢) as presented in Andersen et al.
(1993).
62.1 Filtering and censoring
Often a response is observed only incompletely, and three common missing data mechanisms
are flltering, censoring, and left truncation. In Andersen et al. (1993)[pp. 161-168] a
general flltering is deflned as a union of temporal observation intervals D (indexed by¿
¿ = 1; 2;:::; T ) which contain the endpoints of the intervals (initial points t andij ij¿0
termination pointst ), and K event times (i.e. the points of the point process) as a vector¿1 ij¿
Kij¿of two components: D := ((t ;t ];[ fT g). The observations corresponding to¿ ij¿0 ij¿1 ij¿ll=1
individual i and mark j (omitting the indices i and j) is the union of the observation
intervals ‡ ·
T T K¿( =)D := [ (t ;t ];[ [ Tij ¿0 ¿1 ¿l¿=1 ¿=1 l=1
for which holds that t < T < T <¢¢¢ < T • t , and t • t 8¿. Filtering¿0 ¿1 ¿2 ¿K ¿1 ¿1 ;¿+1;0¿
means that the events are recorded only on the observation intervals:Z t
FN (t) := 1 (s)dN(s):[ (t ;t ]¿ ¿0 ¿1
0
The flltered event times T are a subset of all event times T : [fT g ‰fTg , and¿k k t ¿k k k k
T[ (t ;t ] is a subset of the time when an individual i is in the risk set for event E .¿0 ¿1 j¿=1
Censoring means that the counting process is observed only outside the censoring time
set S =[ S :k k
C cN (t) :=N(supfs : s2S \ [0;t]g):
For example, (t ;t ] = [0;U ] and S = (U;1) corresponds to right censoring; and t =¿0 ¿1 i i ¿1
t = u and S = [ (t ;t ) (with (K ) known, but (T ;:::;T ) unknown) to;¿+1;0 i¿ t ¿0 ¿1 ¿ ¿1 ¿K¿
interval censoring, where U is the right censoring time and (u ) the times when thei it t
counting process values are observed.
Left truncation is typical in demography and epidemiology. It means, that (loosely
speaking) the individuals of the available data have survived until the beginning V of thei
follow-up period, thus the results are conditional on the eventfV <Tg whereT is the lifei i i
time of the individuali. Note that left truncation is not the same thing as left flltration: In
both cases the number of events before time V is unknown, but flltering does not excludei
individuals from analysis.
Filtering, censoring, and left truncation are here assumed to be independent and non-
informative (that is, information about the process causing incompleteness in observations
just before timet does not change the distribution of the counting process after that time,
and the parameters of the process can be viewed as nuisance parameters, respectively).
2.2 Poisson process
KA Poisson process model is a way of modelling point process data (T ) . A homogenousk k=1
Poisson process is parametrized by a non-negative real parameter‚. The interarrival times
T ¡T are exponentially distributed with the same ‚. The higher the pa-k ;k¡1
rameter value is, the more event times the process generates per time unit. If the intensity
equals to 0, then there are no events.
7PSfrag replacements
A1
A2
A3
E1;2
E2;1
E1;3
E2;3
‚(t)
t T T T T T t0 1 2 3 4 5 1
Figure 2: An example of an intensity function ‚(t), and a possible realization from a
corresponding Poisson process.
A generalization of that process is a non-homogenous Poisson process whose parameter
is a non-negative intensity function ‚(t)‚ 0 of time. A counting process N is a Poisson
process with an intensity ‚(t) for all t‚ 0, if
1. N has independent increments andRt2. N(t)¡N(s)» Poisson( ‚(u)du) for all 0<s<t.
s
By the properties of Poisson distribution, the expected number of events on the intervalRt(s;t] is ‚(u)du, i.e., the the volume of the area between‚(t) and time axis and restricted
s
by times s and t. Thus the intensity function gives an intuitive way of describing the
probability model: See Figure 2 in which the number of events is 5.
These classical Poisson processes give few possibilities for modelling the efiects of dif-
ferent events (and event times) observed by time t to the future risk after time t. Another
(modern) deflnition is based on martingale theory allowing an intensity function ‚(t) to
depend on the pre-t history:
A counting process N is a Poisson process with an intensity function ‚, ifZ t
N(t)¡ ‚(u)du (1)
0
is a martingale for allt. (See Arjas (1989) or Andersen et al. (1993) for a proper treatment
of the topic.) RtLoosely speaking, the compensator ‚(u)du (cumulative intensity) increases on av-
0
erage at the same speed as the counting process N(t), thus the difierence (1) uctuates
around zero.
The resulting model has many similarities with non-homogenous Poisson processes.
The Poisson likelihood of the point process responce = D of (omitting the indices forij
individual i and mark j) given the hazard rate~ (¢) isij" # ‰ ¾ZT K¿ tY Y ¿1
(Dj~(¢)) = ~(T ) exp ¡ ~(s)ds :¿l
t¿0¿=1 l=1
The complete Poisson-likelihood (assuming conditional independence) is constructed as a
8
product of individual and mark-speciflc likelihoods:
N MYY
( j~) = ( j~(¢)):ij
i=1j=1
A hazard function~(t) presents the probability~(t)dt that an eventE occurs for individualj
i during (t;t +dt], if the individual is at risk at time t. The value of the hazard function
at time t can depend on the pre-t observations, enabling the use of covariate information
and event time history observed by that time. The intensity function equals the hazard
function, if the individual is at risk, and zero, if not:
‚ (t) =~ (t)¢ 1 (t):ij ij Rij
where R =[ (t ; t ] is the temporal set indicating when individual i is in the risk setij ¿ ¿0 ¿1
for the event E .jPSfrag replacements
A1
A22.3 Step functions
A3
E1;2
A step function f (Ref.) with a support on (0;T ] is piecewise constant:maxE2;1 ‰ PE1;3 M a 1ft2 (T ;T ]g t2 (0;T ]k k k+1 maxk=0E f(t) :=2;3
0 t2= (0;T ]:max‚(t)
t0During the interval (T ;T ] the function gets value a , and the jump points T are ank k+1 k k
t1increasing sequence of real numbers with T · 0.0
T1
f(t)T2
T3
T4
T5
0 T tmax
Figure 3: An example of a step function
A step function is used as a nonparametric building block of an hazard function, because
it allows exibilit y in building hazard functions. Let G be the set of all non-negative step
functions and f;g2 G; a;b2 [0;1); and c2 . Then G is closed under the following
operations:
1. af +bg2G (linearity);
2. fg2G (multiplication);‘
3. f g2G (coproduct);
4. S f2G (shifting).c
96
6
6

Coproduct operator is deflned here as‰‘ f(t)¢g(t) if t2 supp(f)\ supp(g)
f g :=
f(t) +g(t) if t62 supp(f)\ supp(g); ‘
where function supp(¢) denotes the support of the argument function. is commutative‘ ‘ ‘ ‘ ‘ ‘
(f g = g f) and associative ((f g) h = f (g h)), and its precedence is higher‘ ‘ ‘ ‘
than that of multiplication: f¢ (g h) =f¢g h = (f¢g) h. Note that (f +g) h =‘ ‘ ‘ ‘ ‘
f h +g h, and (f¢g) h =f h¢g h in general. Coproduct will be discussed more
in subsection 3.8.
Shift operator S shifts the trailing function to the right by c:c
S f(t) :=f(t¡c):c
Later on in this text, a model function refers to a step function equipped with prior distri-
butions and covariate information.
2.4 Covariates
There are three covariate types which BITE can handle: Constant covariates, time depen-
dent covariates and startpoints.
constant This covariate does not depend on time, so covariates like gender can be expressed
in this manner. This is a special case of the timedependent covariates (see below),
but it is easier to use, and it can be assigned a prior distribution.
timedependent Let us assume that an individual can move between a flnite number of
states, and the intensity always depends on the current state. Then this kind of
covariate can act like a timedependent indicator to select which function is taken
into the model. E.g., a smoking individual might have a high risk to die, but that
decreases when he/she stops smoking. Also the risk might depend on age. In this
case the covariate is time dependent, having 2 states. Timedependent covariates are
treated here as piecewise constant.
An alternative to this is to have a marked point process with two event types: death
while being a smoker (j = 2), and death while not being a smoker (j = 1). Smoking
determines when the individual is at the risk setj: R =[ (t ;t ]. If individuali is¿ ¿0 ¿1
a smoker in the beginning, then he/she stops smoking at time t =t , startsi;1;1;1 i;2;1;0
2again at time t =t , etc. The union[ R covers the follow-up time.i;2;1;1 i;1;2;0 j=1
startpoints When building hazard functions, all model function components need start-
points. For example, a function which is a of calendar time may
contain a model function component that depends on age. Then the time of birth is
the startpointc for that function. When modelling seasonal efiects, the step function
needs to be copied to the intensity several times, and the startpoints are the initial
times (c ;c ;:::;c ) (where m is the number of start points for individual i) ofi1 i2 im ii
the seasonal cycles. A startpoint is the argument c of the shift operator S . See alsoc
subsection 3.8.
10