Published by
###
Nuther

See more
See less

Reference Manual

⁄Tommi Harkanen˜ ˜

March 24, 2004

cThe BITE software and this documentation are ° Copyright 2000 by

Tommi Harkanen, Rolf Nevanlinna Institute. All rights reserved. Repro-˜ ˜

ductions for personal use are allowed. See also ﬂle copyright.txt. The

software and the documentation are provided without warranty of any kind.

⁄Rolf Nevanlinna Institute, P.O.B. 4, FIN-00014 University of Helsinki, Finland, E-mail:

Tommi.Harkanen@RNI.helsinki.fi, tel. +358-9-191 22784, FAX +358-9-191 22779.

1Contents

1 Introduction 5

2 Terminology 6

2.1 Filtering and censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Step functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Installation and description of commands 12

3.1 Installation and running BITE . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Basic elements of the BITE syntax . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Setting simulation limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Loading data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5 Setting prior variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.6 Initializing model functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.7 Saving Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.8 Intensity model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.9 Monitoring updating of model functions . . . . . . . . . . . . . . . . . . . . 19

3.10 Calculating posterior expectation . . . . . . . . . . . . . . . . . . . . . . . . 19

3.11 Graphical output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.12 Error messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Prior distributions 23

4.1 Scalar parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Covariates and responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Model functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.1 Gamma prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

24.3.2 Increasing gamma prior . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.3 Time independent prior . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Markov chain Monte Carlo 27

6 Examples 27

6.1 Heart transplant survival data . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.1.1 Model and estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.1.2 Data ﬂles for BITE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.1.3 Command ﬂle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2 Survival analysis with a frailty model . . . . . . . . . . . . . . . . . . . . . . 32

6.3 Dental caries study I: standard frailty model . . . . . . . . . . . . . . . . . 36

7 The output of BITE 41

8 Discussion 43

A The grammar of BITE 45

3List of Figures

1 An example of an Markovian model. . . . . . . . . . . . . . . . . . . . . . . 6

2 An example of an intensity function ‚(t), and a possible realization from a

corresponding Poisson process. . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 An example of a step function . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 A template for a hierarchical intensity model accepted by BITE. . . . . . . . 11

5 Example of point process data. . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 The estimates of the baseline hazards ‚ and f of the heart transplant data. 290

7 Baseline hazard estimate of the kidney example. . . . . . . . . . . . . . . . 33

List of Tables

1 Poisson-likelihood contributions and the notations of the parameters in BITE. 12

2 Basic tokens of BITE grammar. . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Internal operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Available dependencies of (prior) distributions. . . . . . . . . . . . . . . . . 24

5 The probability distributions recognized by BITE. . . . . . . . . . . . . . . . 24

6 Estimates of the regression coe–cients in the heart transplant data. . . . . 29

7 Data ﬂles of the heart transplant example. . . . . . . . . . . . . . . . . . . . 30

8 The estimates of the Bayesian model with a nonparametric baseline hazard. 34

9 The data ﬂles of the kidney example. . . . . . . . . . . . . . . . . . . . . . . 36

10 Data ﬂles of caries I example. . . . . . . . . . . . . . . . . . . . . . . . . . . 37

11 Some software packages for Bayesian inference. . . . . . . . . . . . . . . . . 43

12 The tokens of BITE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

41 Introduction

Markov chain Monte Carlo (MCMC) methods have had an profound eﬁect on statistical

methodology based on Bayesian inference (see, for example, Gelman et al. (1995)): Analyticds provide solutions only to simple models, thus the development of e–cient comput-

ers allowed statisticians to apply numerical methods to solve more complicated (and more

realistic) problems, and other n methods are convenient only in low-dimensional

problems.

Some software packages like BUGS (see Gilks et al. (1994)) have been developed for

Bayesian modelling, but they are generally unable to cope with nonparametric intensity

models. The variety of diﬁerent (Bayesian) models is huge, thus building for all of them a

general software package which is both easy to use, exible and e–cient may be close to

impossible in the near future.

This package has been developed for estimating a subset of nonparametric Bayesian

intensity models which provide a exible framework for modelling various life history phe-

nomena with continuous distributions, such as survival analysis and event-history data. For

example, analyses on heart transplant data, see section 6.1, dental caries data, see Harka-˜ ˜

nen et al. (2000a) and H˜arkanen˜ et al. (2000b), infections in kidneys (see section 6.2), and

treatment eﬁects on leukemia, see Harkanen (2000). For review of the theoretical back-˜ ˜

ground of intensity models see for example, Arjas (1989) or Andersen et al. (1993) which

contains more examples about intensity models, and classical estimation procedures.

Section 2 contains an introduction to the notations. The response is a point process in

one-dimensional space (time) which may be independently and non-informatively censored

or ﬂltered. The probabilistic model for the point process responses is a Poisson process

(which will be presented in section 2.2) parametrized by a non-negative intensity function

which is here approximated by a piecewise constant function (see subsection 2.3). The class

of piecewise constant functions provides a good framework for intensity models, because

addition or multiplication of those functions results again a piecewise constant function. A

time independent intensity (yielding a homogenous Poisson process) is a special case of a

piecewise constant function.

Some elements of the point process can be random, and also time independent covariates

and startpoints can be given prior distributions with some limitations. This topic, as well

as the prior distributions for the intensity functions, will be introduced in section 4.

Variants of Metropolis-Hastings algorithm (which is described in Gilks et al. (1996)

among other MCMC methods) are used for generating a sample from a Markov chain which

has the posterior distribution as the limiting (invariant) distribution. Posterior expectations

can then be approximated by appropriate averages of the sample. This approach (as well

as most numerical methods) has drawbacks, which will be discussed more in section 5.

Section 3 describes the syntax of the commands which can be used for describing (in-

tensity) models, and some examples about using BITE are presented in section 6. Output

of BITE is presented in section 7. Discussion in section 8 lists some software for Bayesian

inference.

5

6

2 Terminology

The basic unit for models is an individual i = 0; 1;:::;N¡ 1, where N is the number

of individuals. The individuals can experience events (T;X), where T is the event time

and X the event type. The event types are called marks: the response is a marked point

process (MPP) with a ﬂnite mark space X2 E :=fE ; j2Jg, where the index setJj

contains M elements. The response data is considered to be an N£M matrix whose

elements contain the marked point process observations of individual i and of markij

j: the elements are more complicated than just real numbers of an ordinary matrix, and

they are described in more detail below. In order to keep the notation more readable, the

observations corresponding to individual i and mark j are denoted by D := .ij

For example, in a simple survival analysis the mark space has only one element (death)

with just one event time for each individual, and after death the individual is no longer in

the risk set. In a repeating events case, for example when observing epileptic seizures, the

mark space has again just one element (seizure), but there can be many event times, and

the individual remains in the risk set until the end of the follow-up.

EPSfrag replacements 1;2

A A1 2

healthy ill

E2;1

E E1;3 2;3

A3

dead

Figure 1: An example of an Markovian model.

Figure 1 presents a directed acyclic graph (DAG) of a sample illness-death model. In

each time point individual i is in one of the following states A :="healthy", A :="ill", and1 2

A :="dead". The events can be presented as transitions between states: E 0 00 := \move3 k ;k

0 00fromA toA ", and the transition times are called event timesT . Since eventsE andk 3;1k k

E are not possible, and the transitionsE 0 0 are counted as \nothing happened",M = 4.3;2 k ;k

0 00 000 00 0If an individuali is in stateA at timet, the transitions (E ) are not possible ink k ;k k =k

the very near future (t;t +dt): The individual is not in the risk sets of those event types,

00until individual i moves to state A .k

A counting process N(t) is an equivalent presentation of a marked point process which

is a sequence ofK event timesT on . A counting process tells how many events of typek +

j are counted by time t:

KX

N(t) := 1fT •t;X =jg; N(0) := 0:k k

k=1

The response is often thought as counting processes N(¢) as presented in Andersen et al.

(1993).

62.1 Filtering and censoring

Often a response is observed only incompletely, and three common missing data mechanisms

are ﬂltering, censoring, and left truncation. In Andersen et al. (1993)[pp. 161-168] a

general ﬂltering is deﬂned as a union of temporal observation intervals D (indexed by¿

¿ = 1; 2;:::; T ) which contain the endpoints of the intervals (initial points t andij ij¿0

termination pointst ), and K event times (i.e. the points of the point process) as a vector¿1 ij¿

Kij¿of two components: D := ((t ;t ];[ fT g). The observations corresponding to¿ ij¿0 ij¿1 ij¿ll=1

individual i and mark j (omitting the indices i and j) is the union of the observation

intervals ‡ ·

T T K¿( =)D := [ (t ;t ];[ [ Tij ¿0 ¿1 ¿l¿=1 ¿=1 l=1

for which holds that t < T < T <¢¢¢ < T • t , and t • t 8¿. Filtering¿0 ¿1 ¿2 ¿K ¿1 ¿1 ;¿+1;0¿

means that the events are recorded only on the observation intervals:Z t

FN (t) := 1 (s)dN(s):[ (t ;t ]¿ ¿0 ¿1

0

The ﬂltered event times T are a subset of all event times T : [fT g ‰fTg , and¿k k t ¿k k k k

T[ (t ;t ] is a subset of the time when an individual i is in the risk set for event E .¿0 ¿1 j¿=1

Censoring means that the counting process is observed only outside the censoring time

set S =[ S :k k

C cN (t) :=N(supfs : s2S \ [0;t]g):

For example, (t ;t ] = [0;U ] and S = (U;1) corresponds to right censoring; and t =¿0 ¿1 i i ¿1

t = u and S = [ (t ;t ) (with (K ) known, but (T ;:::;T ) unknown) to;¿+1;0 i¿ t ¿0 ¿1 ¿ ¿1 ¿K¿

interval censoring, where U is the right censoring time and (u ) the times when thei it t

counting process values are observed.

Left truncation is typical in demography and epidemiology. It means, that (loosely

speaking) the individuals of the available data have survived until the beginning V of thei

follow-up period, thus the results are conditional on the eventfV <Tg whereT is the lifei i i

time of the individuali. Note that left truncation is not the same thing as left ﬂltration: In

both cases the number of events before time V is unknown, but ﬂltering does not excludei

individuals from analysis.

Filtering, censoring, and left truncation are here assumed to be independent and non-

informative (that is, information about the process causing incompleteness in observations

just before timet does not change the distribution of the counting process after that time,

and the parameters of the process can be viewed as nuisance parameters, respectively).

2.2 Poisson process

KA Poisson process model is a way of modelling point process data (T ) . A homogenousk k=1

Poisson process is parametrized by a non-negative real parameter‚. The interarrival times

T ¡T are exponentially distributed with the same ‚. The higher the pa-k ;k¡1

rameter value is, the more event times the process generates per time unit. If the intensity

equals to 0, then there are no events.

7PSfrag replacements

A1

A2

A3

E1;2

E2;1

E1;3

E2;3

‚(t)

t T T T T T t0 1 2 3 4 5 1

Figure 2: An example of an intensity function ‚(t), and a possible realization from a

corresponding Poisson process.

A generalization of that process is a non-homogenous Poisson process whose parameter

is a non-negative intensity function ‚(t)‚ 0 of time. A counting process N is a Poisson

process with an intensity ‚(t) for all t‚ 0, if

1. N has independent increments andRt2. N(t)¡N(s)» Poisson( ‚(u)du) for all 0<s<t.

s

By the properties of Poisson distribution, the expected number of events on the intervalRt(s;t] is ‚(u)du, i.e., the the volume of the area between‚(t) and time axis and restricted

s

by times s and t. Thus the intensity function gives an intuitive way of describing the

probability model: See Figure 2 in which the number of events is 5.

These classical Poisson processes give few possibilities for modelling the eﬁects of dif-

ferent events (and event times) observed by time t to the future risk after time t. Another

(modern) deﬂnition is based on martingale theory allowing an intensity function ‚(t) to

depend on the pre-t history:

A counting process N is a Poisson process with an intensity function ‚, ifZ t

N(t)¡ ‚(u)du (1)

0

is a martingale for allt. (See Arjas (1989) or Andersen et al. (1993) for a proper treatment

of the topic.) RtLoosely speaking, the compensator ‚(u)du (cumulative intensity) increases on av-

0

erage at the same speed as the counting process N(t), thus the diﬁerence (1) uctuates

around zero.

The resulting model has many similarities with non-homogenous Poisson processes.

The Poisson likelihood of the point process responce = D of (omitting the indices forij

individual i and mark j) given the hazard rate~ (¢) isij" # ‰ ¾ZT K¿ tY Y ¿1

(Dj~(¢)) = ~(T ) exp ¡ ~(s)ds :¿l

t¿0¿=1 l=1

The complete Poisson-likelihood (assuming conditional independence) is constructed as a

8

product of individual and mark-speciﬂc likelihoods:

N MYY

( j~) = ( j~(¢)):ij

i=1j=1

A hazard function~(t) presents the probability~(t)dt that an eventE occurs for individualj

i during (t;t +dt], if the individual is at risk at time t. The value of the hazard function

at time t can depend on the pre-t observations, enabling the use of covariate information

and event time history observed by that time. The intensity function equals the hazard

function, if the individual is at risk, and zero, if not:

‚ (t) =~ (t)¢ 1 (t):ij ij Rij

where R =[ (t ; t ] is the temporal set indicating when individual i is in the risk setij ¿ ¿0 ¿1

for the event E .jPSfrag replacements

A1

A22.3 Step functions

A3

E1;2

A step function f (Ref.) with a support on (0;T ] is piecewise constant:maxE2;1 ‰ PE1;3 M a 1ft2 (T ;T ]g t2 (0;T ]k k k+1 maxk=0E f(t) :=2;3

0 t2= (0;T ]:max‚(t)

t0During the interval (T ;T ] the function gets value a , and the jump points T are ank k+1 k k

t1increasing sequence of real numbers with T · 0.0

T1

f(t)T2

T3

T4

T5

0 T tmax

Figure 3: An example of a step function

A step function is used as a nonparametric building block of an hazard function, because

it allows exibilit y in building hazard functions. Let G be the set of all non-negative step

functions and f;g2 G; a;b2 [0;1); and c2 . Then G is closed under the following

operations:

1. af +bg2G (linearity);

2. fg2G (multiplication);‘

3. f g2G (coproduct);

4. S f2G (shifting).c

96

6

6

‘

Coproduct operator is deﬂned here as‰‘ f(t)¢g(t) if t2 supp(f)\ supp(g)

f g :=

f(t) +g(t) if t62 supp(f)\ supp(g); ‘

where function supp(¢) denotes the support of the argument function. is commutative‘ ‘ ‘ ‘ ‘ ‘

(f g = g f) and associative ((f g) h = f (g h)), and its precedence is higher‘ ‘ ‘ ‘

than that of multiplication: f¢ (g h) =f¢g h = (f¢g) h. Note that (f +g) h =‘ ‘ ‘ ‘ ‘

f h +g h, and (f¢g) h =f h¢g h in general. Coproduct will be discussed more

in subsection 3.8.

Shift operator S shifts the trailing function to the right by c:c

S f(t) :=f(t¡c):c

Later on in this text, a model function refers to a step function equipped with prior distri-

butions and covariate information.

2.4 Covariates

There are three covariate types which BITE can handle: Constant covariates, time depen-

dent covariates and startpoints.

constant This covariate does not depend on time, so covariates like gender can be expressed

in this manner. This is a special case of the timedependent covariates (see below),

but it is easier to use, and it can be assigned a prior distribution.

timedependent Let us assume that an individual can move between a ﬂnite number of

states, and the intensity always depends on the current state. Then this kind of

covariate can act like a timedependent indicator to select which function is taken

into the model. E.g., a smoking individual might have a high risk to die, but that

decreases when he/she stops smoking. Also the risk might depend on age. In this

case the covariate is time dependent, having 2 states. Timedependent covariates are

treated here as piecewise constant.

An alternative to this is to have a marked point process with two event types: death

while being a smoker (j = 2), and death while not being a smoker (j = 1). Smoking

determines when the individual is at the risk setj: R =[ (t ;t ]. If individuali is¿ ¿0 ¿1

a smoker in the beginning, then he/she stops smoking at time t =t , startsi;1;1;1 i;2;1;0

2again at time t =t , etc. The union[ R covers the follow-up time.i;2;1;1 i;1;2;0 j=1

startpoints When building hazard functions, all model function components need start-

points. For example, a function which is a of calendar time may

contain a model function component that depends on age. Then the time of birth is

the startpointc for that function. When modelling seasonal eﬁects, the step function

needs to be copied to the intensity several times, and the startpoints are the initial

times (c ;c ;:::;c ) (where m is the number of start points for individual i) ofi1 i2 im ii

the seasonal cycles. A startpoint is the argument c of the shift operator S . See alsoc

subsection 3.8.

10

Share this publication