
I made the variable equal zero if the observation is used in both model 3 and model 4. Gen not_in_model4 =1 if in_model_3=1 & in_model_4=0 I then created another variable that equals one if the observation is used in model 3 but not in model 4. To examine the differences between the two samples I ran model 3 once more and generated a new variable “in_model_3”. Is there a big difference between the 1,683 observations used in model 4 and the 384 observations that were not used in model 4 but were used in model 3? So a person who does not report their income level is included in model_3 but not in model_4. (This is knows as listwise deletion or complete case analysis). Note: regression analysis in Stata drops all observations that have a missing value for any one of the variables used in the model. Using the same 1,683 observations in model 3a that we used in model 4 had a significant impact on the coefficient of the religious attendance variable for model 3. When we controlled for income we noticed that our sample size decreased from 2,067 to 1,683. If we didn’t control for income we might conclude that frequent religious attendance leads to a lower mental health composite score.

The coefficient for the variable “frequent religious attendance” was negative 58 in model 3 and then rose to a positive 6 in model 4 when income was included. Using different samples in our models could lead to erroneous conclusions when interpreting results.īut excluding observations can also result in inaccurate results.
#Regress in stata how to
In a previous post, Using the Same Sample for Different Models in Stata, we examined how to use the same sample when comparing regression models.
