Wednesday, December 5, 2012

Multi-level modeling and its implement in SAS

Multi-level modeling is widely used when samples are collected under some nested or multi-level group structure. For example, in some healthcare study, blood pressures are measured several times a day for an individual and these study participants are from several regions. In this case blood pressure samples are nested under individual level and individuals are nested under site level (regions). Due to individual characteristics, blood pressure samples measured on the same individual may share common features, either something can be measured or hidden; also, a group of participants from the same region may be correlated due to common characteristics, like social-economic and nature environment.

Multi-level model is often used to account for the intra-class correlation of samples collected; that is, the within-individual correlation and within-site correlation are considered when using multi-level modeling method by including individual level and site level random effect. In order to determine a "reasonable" model specification (because theoretically, there are multiple ways to assign random effects in the multi-level model), prior knowledge and preliminary analysis about the sample within and between class variability are needed.

Ok, the most reason I am writing this post is about the implementation of multi-level modeling. The paragraphs above are background knowledge which can be found in many places, for example multi-level model in Wikipedia. Again, I am using the case mentioned at beginning. Usually the case I met is a simpler version of 2-level nested model; that is, the intra-class correlation that blood pressure samples are nested under individuals is considered because in some cases we only enroll study participants in one place or in multiple sites where the sites difference can be ignored. In particular, if we hope to know the change of blood pressure in a day, we may fit a linear model with blood pressure as dependent variable and the time (since wake-up) that blood pressure is measured as independent variable. And we need to add in random effects in this model in order to include the within individual correlation of the blood pressure samples, for example, we can add individual random effects in the intercept as we believe the overall level blood pressure varies across individuals and we also add individual random effects in the coefficient of time because we believe the rate of blood pressure change also varies across individuals. Also, the model adjusts for other demographic factors: age, gender and race. Before going forward, I need to clarify that this model specification may not necessarily be right on medical knowledge; but we just use this as an example to illustrate the implement of multi-level model in software -- here we focused on SAS .

The follows are SAS code to implement the model. The "proc mixed" part of codes can be easily found in SAS Proc Mixed examples. Since I hope to know both model estimates and the variance components of this  model (that is, I hope to know the within and between individual variability), I use ODS to output the two pieces of information. You may refer to ODS output tables of proc mixed to find out more output results and the table names.


/*SAS code to implement multi-level model*/
ODS output SolutionF(persist=proc)=est_output;
ODS output CovParms(persist=proc)=cov_output;
proc mixed data = datain covtest  ;
class id;
model BP = t age gender race/solution ;  /*BP: blood pressure ; t: time (since wake up) when BP is measured*/
random intercept t/subject = id type = UN;
run;
ODS output close;


Ok, I cannot help quickly finishing the part of the above because the interesting part of this post is on the 3-level nested case (actually the reason is that I feel a little hungry now...). In the above, we ignore the intra-class correlation introduced by common site (region) effect. In some cases, the correlation of participants' blood pressure level within the same region cannot be ignored. For example, we collected these blood pressure measures in 20 sits in the world which locate in East Asian, North America, North Europe, South Africa, and Australia (very extreme example, it is hard to get supported to conduct such a big study across countries). In this case, blood pressure measures of the same individual are correlated due to individual characteristics and individuals from the same site are also correlated. Therefore, besides individual random effects in the intercept and coefficient of time, we may need site level random effect in the intercept (or/and the slope of time) -- we put site level random effect in the intercept this time as we believe the average blood pressure varies across sites.

The follows are SAS codes. You can see to specify two or multiple level random effects, you need to add another row of "random" statement. Also, if participants are nested under site level, you need to put "subject=id(site)".


/*SAS code to implement multi-level model*/
ODS output SolutionF(persist=proc)=est_output;
ODS output CovParms(persist=proc)=cov_output;
proc mixed data = datain covtest  ;
class site id;
model BP = t age gender race/solution ;  /*BP: blood pressure ; t: time (since wake up) when BP is measured*/
random intercept/ subject = site type = UN  ; 
random intercept t/subject = id(site) type = UN; /*! note this: id(site) if participants are nested under sites*/
run;
ODS output close;




Reference:
[1] Using SAS PROC MIXED to fit multilevel models, hieracrchical models, and individual growth models. Judith D. Singer, Journal of Education and Behavioral Statistics, Vol. 24, No.4, pp.323-355,1998

[2] Hierarchical Linear Models: Applications and Data Analysis Methods (2nd edition). Anthony S. Bryk; Stephen W. Raudenbush. SAGE, 2001.