/******************************************************/ /* Data obtained from http://www-fars.nhtsa.dot.gov */ /* by requesting violations (v1, v2, and v3), */ /* injury severity (is), body type (bt), speed */ /* limit (sl), and restraint use for (rs) NJ, 1999. */ /* Other variables are a number for the accident (id),*/ /* the state (state), which in our case is always NJ, */ /* an identification number given to the vehicles */ /* in the accident (vh), and an id number given to */ /* person within vehicle (pn). */ /******************************************************/ options ls=80; data crash ; infile 'crash.dat'; input id state case vh pn sl bt v1 v2 v3 rs is; /******************************************************/ /* In order to get independent observations, delete */ /* all but first passenger. Also, delete pedestrians.*/ /******************************************************/ if pn>1 then delete; if vh=0 then delete; /******************************************************/ /* Redefine v1 to be 1 if there is a moving violation,*/ /* 0 otherwise, and let vio be a character description*/ /******************************************************/ if v1=. then v1=0; if v1=83 then v1=0; if abs(v1-75)<5 then v1=0; if v1=0 then vio="No violation"; if v1>0 then vio="Violation"; /******************************************************/ /* Define fataln to be 1 if there is a fatality, 0 */ /* otherwise, and let fatal be a character description*/ /******************************************************/ fatal='Not fatal'; fataln=0; if is=4 then do; fatal='Fatal'; fataln=1; end; /******************************************************/ /* Define beltn to be 1 if subject used restraint */ /* (or we don't know) and 0 if restraint not used. */ /* Define beltuse to be description. */ /******************************************************/ beltuse='Proper restraint use presumed'; beltn=1; if rs>12 or rs=0 then do; beltuse='Improper or no restraint use '; beltn=0; end; /******************************************************/ /* Define the body type of the vehicle */ /******************************************************/ vt="Car "; if bt>10 then vt="Light truck"; if bt>20 then vt="Van"; if bt>30 then vt="Truck"; run; data pairs; set crash; by case; retain belt1 fatal1; if vh>2 then delete; if pn>1 then delete; if first.case and last.case then delete; if vh=1 then do; belt1= beltn; fatal1=fataln; delete; end; if fatal1+fataln=1 then; else delete; if fatal1=1 then fbu=belt1; else nfbu=belt1; if fataln=1 then fbu=beltn; else nfbu=beltn; run; /*****************************************************/ /* Now we've got the quantities we need for McNemar's*/ /* test. Here data set pairs has one line for each */ /* accident in which the driver of car 1 died and the*/ /* driver of car 2 lived. There are two variables of*/ /* interest. fbu is whether the person who had the */ /* fatality had the seatbelt on (1) or off (0), and */ /* nfbu is whether the survivor had the seatbelt on */ /* or off. */ /*****************************************************/ title1 'McNemar test'; proc freq data=pairs; table fbu*nfbu/norow nocol nopct agree; run; data paired; set pairs; output; fataln=fatal1; beltn=belt1; output; run; /* You might be tempted to use the original crash data set in the regression below. This isn't a great idea, since paired differs from the original in two ways: It only has accidents in- volving two cars, and each accident has one (not zero or two) fatalities. Accidents with either all fatalities or all non- fatalities will have probabilities fit either to 0 or 1, causing the corresponding case variable to be set to -Infinity or Infinity respectively, and leave the belt use variable unchanged. So these cases don't matter, except that SAS won't be able to handle them efficiently.*/ /* procs genmod and phreg fit P[response=0]. Fix this for genmod by adding "descending" in the first clause. Fix it for the conditional regression by making a new variable that is 1- the old one.*/ /* The next logistic regression is wrong for two reasons: 1. It doesn't reflect the sampling scheme that forced there to be one and only one fatality per accident (ie, the case control design). 2. The presence of a very large number of nuisance parameters makes the usual maximum likelihood SE arguments fail.*/ title1 'Wrong Unconditional logistic regr. for crash data'; proc genmod data=paired ; class case; model fataln=beltn case/dist=bin; run; title1 'Wrong Unconditional probit regr. for crash data'; proc genmod data=paired ; class case; model fataln=beltn case/dist=bin link=probit; run; /* This regression is better: 1. is partially satisfied, because the sampling distribution is taken to be where half the responses are fatalities, and 2. is circumvented by ignoring accident. proc phreg below does the following: the first variable in the model statement specifies a time at which subjects have an event. We use it with two values, zeros and 1s. The variable after the * is an indicator for whether an individual is never observed to have an event, and the value in parentheses is the value indicating no event. The procedure takes each event time, and runs a conditional logistic regression with covariates after the =, with the data set being those who were around after the last event time, and forces the coefficients of the regressions to be the same. We only want this done once.*/ title1 'Conditional logistic regression for case-control'; proc phreg data=paired nosummary; model fataln*fataln(1)=beltn/ties=discrete; run; /* This regression is correct. We do the conditioning within cases.*/ title1 'Conditional logistic regression with matching'; proc phreg data=paired nosummary; strata case; model fataln*fataln(1)=beltn/ties=discrete; run; /* Unlike the unconditional model, conditional logistic regression is not impacted by homogeneous strata, since there's only one possible data point, and hence the likelihood gets a multiplier of 1. The next results are identical to the prior results.*/ title1 'Matched Conditional logistic regression, more data'; data paired; set crash; if vh>2 then delete; run; proc phreg data=paired nosummary; strata case; model fataln*fataln(1)=beltn/ties=discrete; run; /* The conditional logistic regression results are identical to the log odds ratio from log(n01/n10) from above.*/ /* This approach also lets us use data with larger strata, and covariates.*/ proc phreg data=crash nosummary; strata case; model fataln*fataln(1)=beltn v1/ties=discrete; run;