Improving Panel Regression

2.1 Boundary Violations

The panel regression is specified as
\begin{equation}
{{y}_{it}}-{{\bar{y}}_{i}}=\left( {{x}_{it}}-{{{\bar{x}}}_{i}} \right)\beta +\left( {{e}_{it}}-{{{\bar{e}}}_{i}} \right).
\tag{2.1}
\end{equation}
In Stata, the grandmeans $\bar{\bar{y}}$ and $\bar{\bar{x}}$ are added back to the regression model for estimation of the intercept (Gould, 2011).

There are two kinds of boundary violations-- empirical and theoretical out-of-bounds predictions. Empirical boundary violation is defined as
\begin{equation*}
\left( {{x}_{it}}-{{{\bar{x}}}_{i}} \right)\hat{\beta }<a \qquad or \qquad \left( {{x}_{it}}-{{{\bar{x}}}_{i}} \right)\hat{\beta }>b,
\end{equation*}
where $a$ and $b$ are the lower- and upper-bounds of the dependent variable ${{y}_{it}}$. In plain language, if the predicted value of any empirical observation falls outside the permissible boundary, it is a case of empirical boundary violation.⁷

Theoretical boundary violation refers to the case in which an out-of-bounds predicted value occurs for any possible observation, given the covariate space defined by the empirical data. For example, the range of the covariates Partisan composition and Turnout are [10.14%, 88.98%] and [20.37%, 100%]. Given the beta estimates as shown in Table 1, we can derive that the predicted value of democratic vote share is 128.96% for a case in which Partisan composition=88.98% and Turnout=100% in the 1964 presidential election if the incumbent is a Republican. In the same way, the predicted vote share is -20.7% for a case in which Partisan composition=10.14% and Turnout=20.37% in the 1972 presidential election if the incumbent is a Democrat. Both cases indicate theoretical boundary violations because the joint presence of covariate values is logically possible, despite their empirical nonappearance. The same problem can be demonstrated in less dramatic examples. For instance, in the 1964 presidential election, if the incumbent is a Republican and given that Partisan composition=60% and Turnout=60%, the predicted vote share is 108.22%. In the 1972 presidential election if the incumbent is a Democrat and given that Partisan composition=40% and Turnout=40%, the predicted vote share is -2.62%. Again, both cases are not outlier cases, and similar cases do exist in the existing temporal domain, except for the GOP Incumbent variable. For the former case, there are 37 cases in which both covariates deviate from 60% by a 1% margin in all temporal units except '52, '84, and '88. For the latter, there are seven cases in which both covariates deviate from 40% by a 1% margin in '56, '76, '88, '98, and '00.

Some scholars may argue that only empirical observations count, and we do not have to consider those theoretical out-of-bounds cases. However, such an argument contradicts the fundamental reasoning of statistical inference, that is, using a variable to decontexualize a concept for universal comparison (Kellstedt and Whitten, 2009: 7-14). If the GOP Incumbent variable in 1964 could only be 0 in order to reflect the historical truth, then all observations in our empirical data should be viewed as idiosyncratic.⁸ Thus, statistical inference is not possible. For this reason, we should consider the theoretical boundary violation as a sign of invalid parameter estimates.

____________________

Footnote

⁷The dependent variable has different meanings if different demeaning operations are applied. If ${{y}_{it}}$ is used without additional operations, ${{y}_{it}}$ denotes vote share and is bounded with $0$ and $1$. If ${{y}_{it}}$ is demended by ${{\bar{y}}_{i}}$, ${{y}_{it}}-{{\bar{y}}_{i}}$ means within-groups deviations. If the grandmean is added back, ${{y}_{it}}-{{\bar{y}}_{i}}+\bar{\bar{y}}$ indicates within-groups variations plus a baseline vote share. That is how Stata reports panel regression with a constant measure.

⁸We are not opposed to interpreting a time dummy as a composite estimate of the time effect in a given place. Thus, all the idiosyncratic effects have already been lumped into a single measure. However, this does not point to the uniqueness of time dummies; rather, the maximum or minimum of such measures reflect the greatest or least effects of the time factor that has ever occurred. Since this has happened previously, we have no reason to rule out the possibility that it will happen again.

Solving Problems in the Panel Regression Model for Truncated
Dependent Variables: A Constrained Optimization Method

2.1 Boundary Violations

____________________

Footnote

Download [full paper] [supplementary material A] [supplementary material B] [all replication files]

Solving Problems in the Panel Regression Model for Truncated Dependent Variables: A Constrained Optimization Method

2.1 Boundary Violations

____________________

Footnote

Download [full paper] [supplementary material A] [supplementary material B] [all replication files]

Solving Problems in the Panel Regression Model for Truncated
Dependent Variables: A Constrained Optimization Method