comment
18 Pages
English
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer
Downloading requires you to have access to the YouScribe library
Learn all about the services we offer
18 Pages
English

Description

STATISTICS IN MEDICINEStatist. Med. 2002; 21:1663–1680 (DOI: 10.1002/sim.1110)Commentary on‘Using inverse weighting and predictive inference to estimatethe ects of time-varying treatments on the discrete-timehazard’∗;†James M. RobinsDepartments of Epidemiology and Biostatistics; Harvard School of Public Health; 677 Huntington Avenue;Boston; MA 02115-6096; U.S.A.1. INTRODUCTIONI would like to begin by thanking the editor Ralph D’Agostino for his recognition of theimportance of the topic considered in Dawson and Lavori’s paper and for inviting this com-mentary. Dawson and Lavori (DL) propose several methods for estimating the ect of amonotone treatment pattern on the hazard function of a discrete failure time variable andcompare their approaches to others previously proposed. In my opinion, DL’s methods intheir Sections 1–3 are interesting and correct but are not the best available. However, I be-lieve the methods DL propose in Sections 4 and 5 are biased without the addition of further,rather implausible, assumptions. Additionally, I feel that certain of DL comments indicatethey are not fully aware of the motivation for and the range of the dirent methods pro-posedbymyselfandcolleagues.Mycommentaryisorganizedasfollows.IstreviewDL’smethodological proposals in the order in which they appear in their paper and compare otherproposals with theirs. In my al section, I review DL’s critique of methodologies proposedby myself and co-authors.2. IGNORABLE ...

Subjects

Informations

Published by
Reads 8
Language English

Exrait

STATISTICS IN MEDICINE Statist. Med. 2002; 21 :1663–1680 (DOI: 10.1002/sim.1110)
Commentary on ‘Using inverse weighting and predictive inference to estimate the eects of time-varying treatments on the discrete-time hazard’
James M. Robins ; Departments of Epidemiology and Biostatistics ; Harvard School of Public Health ; 677 Huntington Avenue ; Boston ; MA 02115-6096 ; U.S.A.
1. INTRODUCTION I would like to begin by thanking the editor Ralph D’Agostino for his recognition of the importance of the topic considered in Dawson and Lavori’s paper and for inviting this com-mentary. Dawson and Lavori (DL) propose several methods for estimating the eect of a monotone treatment pattern on the hazard function of a discrete failure time variable and compare their approaches to others previously proposed. In my opinion, DL’s methods in their Sections 1– 3 are interesting and correct but are not the best available. However, I be-lieve the methods DL propose in Sections 4 and 5 are biased without the addition of further, rather implausible, assumptions. Additionally, I feel that certain of DL comments indicate they are not fully aware of the motivation for and the range of the dierent methods pro-posed by myself and colleagues. My commentary is organized as follows. I rst review DL’s methodological proposals in the order in which they appear in their paper and compare other proposals with theirs. In my nal section, I review DL’s critique of methodologies proposed by myself and co-authors. 2. IGNORABLE TREATMENT ASSIGNMENT DL’s assumption of sequential strongly ignorable treatment assignment is stronger than is needed to obtain the results in their Sections 1– 3. Rather, given that treatment Z t is monotone (that is, subjects o treatment never restart) and Y t is a survival indicator, it is sucient to impose the weaker assumption that for all t; t ¿t and M in the set { c; d s ; s = t; t + 1 ; : : : } of treatment regimes   pr( Z t = 1 | Z t 1 1 ; X t ; Y t = 0 ; Y Mt ) = pr( Z t = 1 | Z t 1 = 1 ; X t ; Y t = 0) ¿ 0 = where X t = ( X 0 ; X 1 ; : : : ; X t ). This assumption states that among subjects surviving to t who have remained on treatment through t 1 ( Z t 1 = 1), the conditional probability of staying on
Corrrespondence to: James M. Robins, Department of Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115-6096, U.S.A. E-mail: robins@hsph.harvard.edu Received February 2001 Copyright ? 2002 John Wiley & Sons, Ltd. Accepted February 2001
1664 J. M. ROBINS treatment at t given past covariate history X t does not further depend on the future poten-tial (that is, counterfactual) survival outcomes Y Mt for any treatment regime M that species continuous treatment at least through time t 1. I have referred to this assumption as the as-sumption of sequential randomization with respect to counterfactual survival or, equivalently, as the assumption of no unmeasured confounders for potential survival outcomes [1 ; 2]. This assumption is weaker than DL’s sequential ignorability assumption in that it does not require the dierent regimes M to be considered jointly. More importantly, in contrast to DL’s as-sumption, it does not require that there be no unmeasured confounders for the counterfactual time dependent covariate outcomes X t . In references [1] and [2], I consider substantive set-tings in which the assumption of no unmeasured confounders holds only for potential survival outcomes (so DL’s ignorability assumption fails). I show that in such settings the eect of treatment on potential survival outcomes is still given by the G -computation algorithm for-mula estimand. This implies that the results obtained by DL in Sections 1– 3 hold under this weaker assumption, as their results follow mathematically from the G -computation algorithm formula. For example, with c denoting the continuous treatment regime, their estimator (2) is converging to c ( t ) = pr( Y c ( t +1) = 1 | Y ct = 0) which is the ratio of the G -computation estimand for pr( Y c ( t +1) = 1 ; Y ct = 0) to that for pr( Y ct = 0). Note the fact that DL’s estimand of interest is based on the G -computation algorithm formula does not imply that the most robust way to estimate this estimand is with the plug-in G -computation algorithm estimator. Indeed, begin-ning with my original derivation of the G -computation algorithm estimand in 1986 [3], I have stressed that the G -computation algorithm estimator is rarely a robust estimation method. DL reiterate this point in their Section 3.3 and Appendix B. All the methods co-workers and I have developed since 1986 were motivated by the need to nd more robust estimators.
3. DOUBLY ROBUST ESTIMATION OF THE HAZARD UNDER CONTINUOUS TREATMENT In this section, we argue that, except when the sample size is very small, one should use one of the doubly robust locally ecient estimators initially proposed by colleagues and myself [4–7] instead of the estimator proposed by DL in their Section 3. A thorough discussion of double robustness can be found in reference [35]. 3.1. The special case t = 1 To focus on the central issue, it will be useful to rst consider the simplest case of DL’s set-up in which t = 1. Specically, let X 0 denote the column vector of baseline covariates, Z 0 denote the indicator of treatment at time 0, and Y 0 and Y 1 denote the survival indicators at times 0 and 1 ( Y j = 1 if a failure and Y j = 0 if a survivor). Let Y c 0 and Y c 1 be a subject’s survival indicator at times 0 and 1 had, possibly contrary to fact, a subject been treated at time 0. Our goal is to estimate the discrete hazard c = pr[ Y c 1 = 1 | Y c 0 = 0] at time 1 under treatment. Because all subjects are survivors at time 0, Y c 0 = Y 0 = 0 with probability 1, so that our estimand is c = pr[ Y c 1 = 1]. Under DL’s ignorability assumption that Y c 1 Z 0 | X 0 , we have that pr[ Y c 1 = 1 | X 0 ] = pr[ Y 1 = 1 | X 0 ; Z 0 = 1]. Therefore, c = pr[ Y c 1 = 1] is given by the G -computation algorithm formula estimand c = pr[ Y 1 = 1 | X 0 ; Z 0 = 1] d F ( X 0 ) or equiva-lently by the inverse probability of treatment weighting (IPTW) formula c = E [ Z 0 Y 1 =e ( X 0 )] where e ( X 0 ) = pr[ Z 0 = 1 | X 0 ] is the propensity score. In Section 3.1, DL suggest tting the Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21 :1663–1680
COMMENTARY ON DAWSON AND LAVORI
1665
model logit pr[ Y 1 = 1 | X 0 ; Z 0 = 1] =  X 0 and then estimating c with the regression estimator reg = n 1 i expit { ˆ X 0 i } where i will always indexes the n study subjects, expit( x ) = e x = (1 + ˆ e x ), ˆ is the maximum likelihood estimator (MLE) of among subjects with Z 0 = 1, and 1 is a component of X 0 so that  X 0 includes an intercept term. Note that in the special case in which t = 1, DL’s regression estimator does not require an estimate of propensity score because the conditional law of X 0 given Z 0 = 1 is equal to its marginal law. In contrast, Rosenbaum [8], Robins and Rotnitzky [9], and Heyting et al . [10] specied a propensity score model pr[ Z 0 = 1 | X 0 ] = e ( X 0 ; ) with e ( X 0 ; ) = expit( X 0 ) and then estimated c with the IPTW estimator IPTW = { i Z 0 i Y 1 i =e ( X 0 i ; ˆ ) } = { i Z 0 i =e ( X 0 i ; ˆ ) } where ˆ is the MLE of . ˆ ˆ In their Section 3, DL also mention the IPTW estimator. Now, in general, IPTW will be ˆ consistent for c only if the propensity model e ( X 0 ; ) is correct, while reg will be con-sistent only if the regression model expit {  X 0 } is correct. When both models are correct, ˆ ˆ Robins and Wang [11] point out that reg has smaller asymptotic variance than IPTW and will have better sampling properties in very small samples. DL stress this latter point. However, since with observational data one can never be sure that either the propensity model or the regression model is correct, the best that can be hoped for is to nd an estimator that is con-sistent asymptotically normal (CAN) for c if either (but not necessarily both) of the models ˆ is correct. Robins et al . [6] refer to an estimator dr with this property as doubly robust or doubly protected. In the following subsection and Appendix D, we review arguments in references [4–7] to show that the following estimators are doubly robust: (i) the augmented ˆ ˆ regression estimator dr ; aug ; = reg + i Z 0 i ( Y i expit( ˆ  X 0 i )) =e ( X 0 i ; ˆ ) = { i Z 0 i =e ( X 0 i ; ˆ) } ; (ii) the doubly robust regression estimator reg ; dr = n 1 i expit( ˜ X 0 i + ˜ =e ( X 0 i ; ˆ)) where ( ˜ ;  ˜) is the ˆ MLE of ( ;  ) in the extended regression model logitpr[ Y 1 = 1 | X 0 ; Z 0 = 1] =  X 0 + e ( X 0 ; ˆ ) 1 that adds the ‘robustifying covariate’ e ( X 0 ; ˆ ) 1 to the regression model  X 0 ; and (iii) the iterated IPTW estimator IPTW ; dr = { i Z 0 i Y 1 i =e ( X 0 i ; ˜ ;  ) } = { i Z 0 i =e ( X 0 i ; ˜ ;  ) } where ( ˜ ;  ) are ˆ ˜ ˜ ˜ the values at convergence (that is, as k → ∞ ) of the MLEs ( k ;  k ) of ( k ;  k ) in the ex-tended model pr[ Z 0 = 1 | X 0 ] = e ( X 0 ; k ;  k ) with e ( X 0 ; k ;  k ) = expit( k X 0 + k { expit( ˆ  X 0 i ) ˆ ˜ reg } =e ( X 0 ; ˜ k 1 ;  k 1 )) that adds at iteration k the ‘robustifying covariate’ { expit( ˆ X 0 i ) ˆ ˜ reg } =e ( X 0 ; ˜ k 1 ;  k 1 ) to the propensity model e ( X 0 ; ) = expit( X 0 ). The iteration is initial-˜ ized by taking e ( X 0 ; ˜ 1 ;  1 ) equal to e ( X 0 ; ˆ ). All doubly robust estimators are also locally semi-parametric ecient when both the re-gression model expit( X 0 ) and the propensity model e ( X 0 ; ) are correctly specied. That is, when both models are correct, the estimators are equally ecient and no doubly robust can have a smaller asymptotic variance. Now, one might view the double robustness property as being of only academic interest because, in reality, all models are misspecied. However, even when both the propensity and the regression models are misspecied, the bias of the ˆ doubly robust estimators (i) and (ii) will often be less than that of IPTW or of DL’s estimator ˆ reg . To see why, let and be the probability limits of the MLEs ˆ and ˆ from the unextended propensity and regression models. Dene the theoretical quantities h e ( X 0 ) and h reg ( X 0 ) by h e ( X 0 ) = e ( X 0 ) =e ( X 0 ; ) 1 and h reg ( X 0 ) = pr[ Y 1 = 1 | X 0 ; Z 0 = 1] expit( X 0 ), and let e = sup | h e ( x 0 ) | and reg = sup | h reg ( x 0 ) | with the supremum taken over the support of X 0 . ˆ Letting c be the probability limit of , the large sample absolute bias | d r c | of any doubly robust estimator is at most order e reg . In contrast, the large sample biases | I PTW c | of
Copyright ? 2002 John Wiley & Sons, Ltd.
Statist. Med. 2002; 21 :1663–1680
1666
J. M. ROBINS
ˆ ˆ IPTW and | r eg c | of reg are generally of order e and reg , respectively. Therefore, when both the propensity and regression models are approximately correct in the sense that that e and reg are small, the large sample bias of the doubly robust estimators will often be smaller than that of the IPTW or regression estimator. When one model is grossly wrong and the other approximately correct (but it is not known which is the nearly correct model), only the use of the doubly robust estimator can guarantee small bias. Finally, the doubly-robust estimators (i) and (ii) above have biases that are bounded even when both the propensity and outcome regression models are grossly incorrect. 3.2. Arbitrary t In the case of arbitrary t , results of Robins [12] and Scharfstein et al . [4] can be used to derive doubly robust estimators of the hazard c ( t ) under continuous treatment. In the context of estimation of the cumulative (integrated hazard) of a continuous failure time variable, Robins ˆ ˆ [12] derived the analogue of dr ; aug , and Robins et al . [5] derived the analogue of IPTW ; dr . ˆ Here we provide the analogue of reg ; dr . In Appendix D, we prove the double robustness of all three estimators. (Before beginning, we remark that for purposes of causal inference, one should generally make regime-specic survivor functions or cumulative hazards the ultimate targets of inference, rather than hazards or hazard ratios at particular times. This reects the well known fact that the potential survival time of every subject could be greater under a regime a than under a regime b and yet, at certain times t , the hazard of failure under regime a can exceed that under regime b .) Let G ( t +1)0 = Y t +1 and G ( t +1)1 = I ( Y t = 0). For m = t; t 1 ; : : : ; 0 and j = 0 ; 1, let G mj = E   [ G ( m +1) j | X m ; Y m = 0 ; Z m = 1] I ( Y m = 0). Thus G t 0 = ( t | X t ; Z t = 1) I ( Y t = 0) and G t 1 = I ( Y t = 0).   Under DL’s sequential ignorability assumption, G m 0 = c ( t | X m )pr[ Y ct = 0 | X m ; Y m = 0] I ( Y m = 0) and G m 1 = pr[ Y ct = 0 | X m ; Y m = 0] I ( Y m = 0). In particular note that E [ G 00 ] =E [ G 01 ] = E { pr[ Y c ( t +1) = 1 ; Y ct = 0 | X 0 ] } =E { pr[ Y ct = 0 | X 0 ] } = pr[ Y c ( t +1) = 1 ; Y ct = 0] = pr[ Y ct = 0] = c ( t ). Our goal will be to estimate E [ G 00 ] =E [ G 01 ], as E [ G 00 ] =E [ G 01 ] = c ( t ) even under the weaker as-sumption [1] of no unmeasured confounders for potential survival outcomes. ˆ Given an estimate G ( m +1) j of G ( m +1) j for j = 0 ; 1, we estimate G mj by specifying a model expit( mj W mj + mj e ˆ m 1 ) for the conditional expectation of G ˆ ( m +1) j given X m among subjects with Y m = 0 and Z m = 1 where W mj is a user-supplied vector function of the covariates X m through time m and e ˆ m = e ˆ m | m 1 × e ˆ m 1 | m 2 × · · · × e ˆ 1 | 0 × e ˆ 0 |− 1 is a model-based estimate of the propensity score product through time m where e m | m 1 = pr[ Z m = 1 | Z m 1 = 1 ; X m ; Y m = 0] ; Z 1 1. We obtain estimates ( ˆ mj ;  ˆ mj ) as the solution to the following linear logistic score equation for the model logit E [ G ˆ ( m +1) j | X m ; Z m = 1 ; Y m = 0] = expit( mj W mj + mj e ˆ m 1 ) 1 0 = I ( Y m; i = 0) I ( Z m; i = 1)( G ˆ ( m +1) j; i expit( mj W mj; i + mj e ˆ m ; 1 i ))( W mj; i ; e ˆ m; i ) T i ˆ ˆ ˆ and set G mj G mj ( ˆ mj ;  ˆ mj ) = expit( ˆ mj W mj + ˆ mj =e ˆ m ) I ( Y m = 0). Even though G ( m +1) j; i can take on values intermediate between zero and one, a standard binomial regression software program ˆ can be tricked into solving this score equation by multiplying the G ( m +1) j; i by 1000, round-ing to the nearest integer, treating the inated counts as the number of binomial successes in 1000 trials, and specifying the canonical logistic link. The above recursion is initialized ˆ ˆ by setting G ( t +1)0 = G ( t +1)0 = Y t +1 and G ( t +1)1 = G ( t +1)1 = I ( Y t = 0). (Note that this implies that Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21 :1663–1680
COMMENTARY ON DAWSON AND LAVORI 1667 G t 1 = G t 1 = I ( Y t = 0) as well.) Finally, c ( t ) = i G 00 ; i = i G 01 ; i . Robins et al . [13] and Robins ˆ ˆ ˆ ˆ ˆ [14] refer to recursive regression estimators like c ( t ) as (ratios of) iterated conditional expec-tation (ICE) estimators. Now ˆ c ( t ) is CAN when the models expit( mj W mj + mj e ˆ m 1 ) for the conditional expectation of the G ( m +1) j are correct because then i G 00 ; i = i G 01 ; i is converging ˆ ˆ ˆ to E [ G 00 ] =E [ G 01 ] = c ( t ). In Appendix D we prove that c ( t ) is doubly robust by showing it will be CAN if the model for the propensity score used to calculate ˆ e m is correct even if the models expit( mj W mj + mj e ˆ m 1 ) are incorrect. We also provide doubly robust estimators of augmented regression estimator form and of the IPTW form as well. ˆ In contrast to the doubly robust ICE estimator c ( t ), DL’s estimator requires not only a cor-rect model for the propensity scores but also a correct model expit( t 0 W t 0 ) for E [ G ( t +1)0 | X t ; Y t = 0 ; Z t = 1] = [ t | X t ; Z t = 1]. I wish to point out that Scharfstein et al . [4] have shown that, in contrast to propensity score models, it is not in general possible to specify mutually compatible logistic models expit( mj W mj ) or expit( mj W mj + mj e ˆ m 1 ) for the conditional expectations G ( m +1) j . Specically, they show there is no joint distribution for which all such models are correct with the mj non-zero. However, although incorrect, the models, if suciently high dimensional, can still ˆ approximate the conditional expectations G ( m +1) j closely, in which case, as argued earlier, we would expect our doubly robust estimators to generally have smaller large sample bias than their competitors, provided exible models for the propensity score are used as well. Now with a good deal more eort it is possible to specify and t mutually compatible models ˆ for the conditional expectations G ( m +1) j ; however, we doubt the extra eort will signicantly improve on the performance obtained with mutually incompatible logistic models, provided these latter models are richly parameterized [35]. 3.3. Improved inverse weighting We next consider DL’s claim in their Section 3.2 that the small sample performance of their estimator can be improved by replacing the covariate X t in their discrete hazard model by an estimate of the propensity score e t | t 1 = pr[ Z t = 1 | X t ; Z ( t 1) = 1 ; Y t = 0]. Returning to the special case that t = 1, DL’s claim is that the performance of ˆ reg is inferior to that of the estimator ˆ reg ; e = i expit( ˆ 0 + ˆ 1 e ( X 0 i ; ˆ )) where 0 and 1 are the MLEs of 0 and 1 in ˆ ˆ the regression model logit pr[ Y 1 = 1 | e ( X 0 ; ˆ) ; Z 0 = 1] = 0 + 1 e ( X 0 ; ˆ). We disagree with this claim for the following reasons. First, we believe that the arguments that DL oer in their Section 3.2 and in the rst third of their Appendix A fail to provide any support for, much less prove, their claim. Second, and more importantly, their estimator ˆ reg ; e will only be consistent for c if both the propensity score model e ( X 0 ; ) and the linear logistic regression model of Y 1 on (1 ; e ( X 0 ; ˆ)) are both correct, even in the special case t = 1. Thus, in our opinion, the estimator ˆ reg ; e should never be used, as its less robust than even their original estimator ˆ reg , much less a doubly robust estimator ˆ dr .
4. ESTIMATING THE EFFECTS OF DURATION OF TREATMENT In the rst part of Section 4, DL propose a method for estimating the eect of treatment duration. We shall argue that it is preferable to estimate duration eects by either IPTW Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21 :1663–1680
1668
J. M. ROBINS
estimation of a marginal structural logistic discrete hazard model [15–17] or G -estimation of a structural nested failure time model [18 ; 19]. 4.1. Marginal structural models The problem with DL’s approach is that it does not contain regression parameters that serve to quantify biologically relevant causal eects. As a result, it is dicult to test important causal hypotheses. For example, suppose we wanted to test the causal null hypothesis of no eect of treatment on survival. That is, the hypothesis that the hazard c ( t ) = pr( Y c ( t +1) = 1 | Y ct = 0) had all subjects remained on treatment past t is equal to the hazard d s ( t ) had all subjects stopped treatment at time s for s = 0 ; 1 ; : : : ; t , t = 0 ; 1 ; : : : ; T where T +1 is the maximum follow-up time. Then to estimate d s ( t ), DL in their equation (4) propose tting separately for s = 0 ; 1 ; : : : ; t , hazard models for the observed hazard at t among those who stopped treatment at time s of the form logit ( t | X s ; Z s = 0 ; Z s 1 = 1) = f ( t ) T s + X s s + X 0 s where we have indexed the parameters by s to indicate that they will generally vary with s , and, for expositional convenience, following DL, we have assumed the hazard depends on the entire covariate history only through X s and X 0 . To estimate c ( t ) they proposed tting logit ( t | X t ; Z t = 1) = f ( t ) T t + X t t + X 0 t to subjects who remained on treatment through t . Note the null hypothesis of no treatment eect on the hazard at t does not imply that the k , the k or the k are equal for k = 0 ; : : : ; t 3 . Rather after inverse probability weighting by the appropriate function of the ˆ ˆ propensity scores, DL obtain estimates c; t ( t ) ;  d s ; t ( t ) s = 0 ; 1 ; : : : ; t; t = 0 ; 1 ; : : : ; T . To test the null hypothesis of no treatment eect we must test the hypothesis that the t + 2 estimates at each t are estimating the same number. To do so we must extend DL’s approach by introducing models for the marginal counterfactual hazards. Then a test of the causal null hypothesis is a tT =0 ( t + 1) = ( T 2 + 3 T + 1) = 2 degree of freedom test of the hypothesis 1 ts = 0, s = 0 ; 1 ; : : : ; t , t = 0 ; 1 ; 2 ; : : : ; T in the model logit M ( t ) = 0 t + 1 ts I ( s 6 t ) for M in the set { d s ; s = 0 ; 1 ; : : : ; T; T + 1 } where d T +1 is the continuous treatment regime c . Under this model, for s greater than t , logit d s ( t ) = logit c ( t ) = 0 t since stopping treatment at time s in the future does not aect the hazard at the earlier time t . Clearly such a test will have little power. To increase power, we would specify a submodel with fewer parameters – an extreme case being logit M ( t ) = 0 t + 1 (1 + t s ) I ( s 6 t ) which says that the odds ratio e 1 at time t comparing d s ( t ) and c ( t ) is simply a linear logistic function of the time (1 + t s ) since stopping treatment. Then a one degree of freedom test of 1 = 0 is a test of the causal null hypothesis. To carry out this 1 d.o.f. test ˆ ˆ one would generally treat DL’s estimates logit c ( t ) and logit d s ( t ) as multivariate normal with mean logit c ( t ) and logit d s ( t ) and covariance matrix that would have to be estimated by the bootstrap. However, if, as would certainly be the case for large T , there are very few ˆ subjects who followed regime d s for many s , the corresponding estimates logit d s ( t ) would fail to have an approximate normal sampling distribution and, as a consequence, the test would have the wrong level. Further, there might be a concern that the eect of treatment might be modied by a continuous pretreatment variable such as depression score X 0 , a component of Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21 :1663–1680
COMMENTARY ON DAWSON AND LAVORI
1669
X 0 , so that the optimal treatment might dier for those with dierent values of the depression score. Thus one might wish to t the model logit M ( t | X 0 ) = 0 t + 1 (1 + t s ) I ( s 6 t ) + 2 X 0 + 3 X 0 (1 + t s ) I ( s 6 t ) with 3 quantifying the interaction. However, this is not possible using DL’s methodology. All of these diculties are solved by using IPTW estimation of marginal structural models (MSMs). Indeed, all three of the above models are examples of discrete hazard MSMs [15–17] as they model the marginal hazard of the counterfactuals hazards d s ( t ) and c ( t ) (possibly within levels of baseline covariates) rather than the hazard of the observed failure time variable given past treatment history and baseline covariates. The rst model is a saturated (non-parametric) MSM as it places no a priori restrictions on the values of logit d s ( t ) or logit c ( t ). Under the assumption of no unmeasured confounders, the parameters of any discrete hazard MSM, can be t by IPTW. For example, to t the second of the MSMs above, one ts the discrete hazard model logit ( t | Z t = s ) = 0 t + 1 (1 + t s ) I ( s 6 t ) to the observed data by pooled weighted logistic regression where each unit of person-time contributes a separate observation; if s 6 t , Z t = s means a subject rst failed to take treatment at time s , Z t is recorded as t + 1 if a subject continues on treatment through t , and the weight given to a person at time t is SW t = { sj =01 e ˆ j | j 1 (1 e ˆ s | s 1 ) I ( s 6 t ) } = { js = 01 e ˆ j | j 1 (1 e ˆ s | s 1 ) I ( s 6 t ) } where we set s = t + 1 if a subject stayed on treatment through time t , and ˆ j | j 1 is the e empirical probability of remaining on treatment at time j among surviving subjects who were on treatment at time j 1 (without correcting for past covariate history X j ). Informally, the denominator of SW t is the probability that subject had the treatment history through time t that they did indeed have, given past treatment and covariate history. The resulting IPTW estimators of the parameters are CAN if the propensity model used to calculate the e ˆ j | j 1 is correct. Doubly robust IPTW estimators of regression form (that is, ICE), augmented regression form, and IPTW form that are analogous to those described earlier are available [7]. 4.2. Dynamic treatment regimes and G -estimation of structural nested failure-time models In this section, we continue to follow DL and restrict consideration to monotone treatments. However, in contrast to DL, we shall consider dynamic treatment regimes. A dynamic regime is one in which a subject’s covariate history X s through time s determines whether treatment should be stopped at time s . To select an optimal treatment strategy for a patient, it is usually necessary to consider dynamic treatment regimes. For example, the optimal regime will be dynamic when the treatment can cause serious toxicity, because the optimal strategy must stop the treatment when toxicity develops. Even in the absence of toxicity, the optimal regime may be dynamic. Indeed, the optimal regime will be dynamic whenever the eect of treatment at s is qualitatively modied by time-varying covariates X s . For example, in DL’s depression scenario, if antidepressant treatment at s was of benet to those who have experienced symptoms of depression in the past three weeks, but was harmful (due to the possibility of side-eects) to those without symptoms for three weeks, then the optimal regime might be to continue an antidepressant until the patient has been symptom-free for three weeks. Robins [21] showed that under DL’s sequential ignorability assumption IPTW estimators could Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21 :1663–1680