Easy White Test in R

One of the assumptions of Classical linear regression has been the homoscedasticity of errors i.e. the variance of error terms are constant. However in real life many times we are stuck with data due to many reasons that exhibit heteroscedasticy i.e. non-constant variance. There are many tests available out there but one of the most popular one is the White test.

Now I will show you a simple way to implement the test in R. But first the data.


As can be seen from the scatter plot, it appears that the Y values are getting more spaced out as X increases. So heteroscedasticity may be present. But lets check to find out. First lets run the White test in Eviews to see what we get. To do that we first run a simple regression of the form Y ~ const + X. Then we run the White test on the residuals.


Thus as the Null is rejected, we can be sure there is heteroscedasticity. But now lets do this in R.

The simplest way to do this is to use the lmtest package in R and use the bptest() function.

Just run the following code to install lmtest package:


Then load the package using library() as:


Now run the code:

lm1 <- lm(Y ~ X, data = data1)  # Here data1 is the data frame containing X and Y for our case

bptest(lm1, ~ X + I(X^2), data = data1)

# Note it is necessary to enclose squares or powers of vectors in I() for the test to work properly

R Output is:

studentized Breusch-Pagan test

data:  lm1
BP = 21.332, df = 2, p-value = 2.333e-05

As we can see the Test statistic and p-values are the same as those produced by Eviews. Although Eviews shows 0.0000 if you right-click and select the copy option, you will get the option to have high precision while copying. Then you can see the actual value. The Eviews value is reproduced below:


Now this is all good. We have the flexibility to give the exact form of the auxiliary regression in bptest() { The X + I(X^2) thing}……….But Eviews gave us the entire regression not just the test statistic which can help us identify the weights to use if say we want to run a Weighted Least Square. bptest() did not.

Now we are in a pickle. But do not fear. We can get that in R and more. But to do that we have to understand the White Test a bit.

What the White test does is use the residuals of the OLS regression and run a auxiliary regression as follows (The example is for a 2 variable case. For 1 variable the cross-product and appropriate variable will not be there.)

{Residuals}^{2} = {\alpha}_{0} + {\alpha}_{1}{X}_{1} + {\alpha}_{2}{X}_{2} + {\alpha}_{3}{{X}_{1}}^{2} + {\alpha}_{4}{{X}_{2}}^{2}+{\alpha}_{5}{X}_{1}{X}_{2}

The R-Squared from this auxiliary regression is then multiplied by the nos of observations to get the test statistic. Under the null hypothesis that all the coefficient are jointly zero, this test statistic follows a chi-square distribution with k-1 degrees of freedom.(k is the number of parameters estimated in the auxiliary regression, so for the case just above it will be df= 6-1 = 5).

We can set up this regression ourselves as below:

R1 <- resid(lm1)  # Extract  the residuals

aux <- R1^2   # Square the residuals

aux_lm <- lm(aux ~ X + I(X^2), data = data1)           # Run the auxiliary regression

summary(aux_lm)       # Print the results

The output that R produces is given below:

lm(formula = aux ~ X + I(X^2), data = data1)

    Min      1Q  Median      3Q     Max 
-214.95  -29.44   -4.00   31.48  326.25 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 137.1712    46.2453   2.966  0.00473 ** 
X           -23.4446     6.9708  -3.363  0.00154 ** 
I(X^2)        1.0779     0.2383   4.523 4.13e-05 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 86.66 on 47 degrees of freedom
Multiple R-squared:  0.4266,	Adjusted R-squared:  0.4022 
F-statistic: 17.49 on 2 and 47 DF,  p-value: 2.105e-06

Yeye!!!! the same output as Eviews.

The test statistic can be calculated as :

obs <- length(aux)

data2 <- summary(aux_lm)

R_Squared <- data2$r.squared

Test_Stat <- obs*R_Squared

The value of test statistic is:

[1] 21.33167

The same!!!!!! Ofcourse we knew it would be 🙂

Now to get the p-value. The code is:

pchisq(Test_Stat,2,lower.tail = FALSE)

Notice we used 2 which is the degrees of freedom. Since we had 3 parameter namely X, X^2 and a constant. So we used 3 – 1 = 2.

And the output:


So now we know how to run the White test in R.

Well that does it. Hope this comes in helpful for someone.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s