Comparing Regression Models -Different Dependent Variable

Well this is my first post ever, so I decided to start with something light.And for the past couple of days I have been studying Basic Econometrics by Gujarati again just for fun ( yes for fun 🙂 )

For those of you who have done regression analysis at some point of time, the issue of comparing models must have crept up. And like a knight in shining armor, enter…..drum rolls…..R-Squared. For those unaware R-Squared measures how much of the variation in your dependent variable is explained by the independent variable. Higher the R-Squared value, better the fit of the model. But what if you wanted to compare the following two models:

${Y }_{ i } ={ \beta }_{ 1 }+{ \beta }_{ 2 }{ X }_{ i } +{ u }_{ i }\quad \quad \quad Model\quad 1$

$\log { {Y }_{ i } } ={ \beta }_{ 1 }+{ \beta }_{ 2 }\log { { X}_{ i } } +{ u }_{ i }\quad \quad \quad Model\quad 2$

Now any standard textbook will tell you you can’t compare two models unless the following conditions hold:

1. The Dependent variable is the same.
2. The Sample size is the same.

So now we have a problem. Luckily there is a way around. As given in Gujarati, we follow these steps:

1. Estimate Model 1  to get:

$\widehat { Y } _{ i }={\widehat{ \beta } }_{ 1 }+\widehat{{ \beta }}_{ 2 }{ X }_{ i }$

2. Estimate Model 2 to get:

$\widehat \log {{ Y } _{ i }}={\widehat{ \beta } }_{ 1 }+\widehat{{ \beta }}_{ 2 }\log{{ X }_{ i }}$ and the R-Squared : ${r}^{2}$

3. Now take the log of the predicted values of model 1 i.e.

$\log{\widehat{{Y}_{i}}}$    Notice the difference. This is key. We are taking log of the predicted values of Model 1.

4. Regress the log of actual values against the log of the predicted values of model 1 to get

$\log{{Y}_{i}} = {\alpha}_{1}+{\alpha}_{2}\log{\widehat{Y}_{i}} +{u}_{i}\quad \quad \quad Model\quad 3$

Predicted model is:

$\widehat{\log{{Y}_{i}}} = {\widehat{\alpha}}_{1}+{\widehat{\alpha}}_{2}\log{\widehat{Y}_{i}}$ with R-Squared :${\delta}^{2}$

Now this ${\delta}^{2}$ can be compared with ${r}^{2}$.

This is where Gujarati signs off. When I saw this the question that occurred to me was why? Why should this work? Whats the logic behind it? After a bit of soul searching and having lunch I tried to do this:

Note: Hat over a value denotes the predicted values while a bar denotes the mean value.

For Model 1 R-Squared is :

${{r}^{*}}^{2}=\frac{ESS}{TSS}=\frac { \sum { { \left( { \widehat { Y } }_{ i }-\overline { Y } \right) }^{ 2 } } }{ \sum { { \left( { Y }_{ i }-\overline { Y } \right) }^{ 2 } } }$

For Model 2 R-Squared is :

${r}^{2}=\frac{ESS}{TSS}=\frac { \sum { { \left( { \widehat {\log{ Y} } }_{ i }-\overline {\log{{ Y}_{i}} } \right) }^{ 2 } } }{ \sum { { \left( { log{Y} }_{ i }-\overline {\log{ { Y}_{i}} } \right) }^{ 2 } } }$

Now the real thing. Consider the R-Squared for Model 3:

${r}^{2}=\frac{ESS}{TSS}=\frac { \sum { { \left( { \widehat {\log{ Y} } }_{ i }-\overline {\log{ { Y}_{i}} } \right) }^{ 2 } } }{ \sum { { \left( { log{Y} }_{ i }-\overline {\log{ { Y}_{i}} } \right) }^{ 2 } } }$

This is why we can compare model 2 and 3. The dependent variable is the same for both models. The dependent variable for model 3 comes from model 1 which establishes the connection. The chain of reasoning is as follows.

1. Model 2 explains the variations in $\log{{Y}_{i}}$
2. Model 1 explains the variations in ${Y}_{i}$
3. We use these explained variations of model 1 as the independent variable in model 3 to explain the variations in $\log{{Y}_{i}}$
4. Thus, in effect we are using model 1 to explain model 3.
5. Hence, when we compare the two R-Squared values we are comparing model 1 with model 2.

That ends the reasoning and this blog post. Hope this helps someone. Feel free to comment and/or criticize as necessary.

Thanks for your time and patience. Ciao!!!

One thought on “Comparing Regression Models -Different Dependent Variable”

1. C. P. Gupta says:

Aman, it is simply great! First of all let me congratulate you for doing a great job.

The topics picked up is good. Keep it up.

And, now think about…. What if in the second model it is X and not log X?

All the best…

Like