Optimization to the Truncated Regression Model

The current truncated regression model suffers significantly from boundary violations. Under no circumstances can an ineligible solution achieve inferential validity. This article demonstrates that this problem is widespread in the OLS and TRM models when a regression analysis is applied to a truncated normal dependent variable. To resolve this problem, I propose a modified truncated regression model (TRMCO) by incorporating the techniques of constrained optimization and successfully eliminates boundary violations and generates admissible and interpretable results. The major contribution of this article is twofold. First, the application of the non-linear programming method SQP successfully solves the boundary violation problems in the parameter estimation process of maximum likelihood. Second, this article provides simulation evidence and a replication study to demonstrate the superiority of TRMCO over the existing model.

The findings in this article have profound implications for statistical theory as well as empirical application. From a theoretical perspective, the plausibility of the distributional assumption for the dependent variable is critical to inferential validity. When boundary limits exist for a normal random variable, the failure to specify boundary constraints would lead to an invalid statistical inference. Unlike TRMCO, the current model does not solve the problem of boundary violations. Nor does the existing literature include relevant discussions regarding how boundary violations affect the validity and interpretability of the regression result. This article proves that the boundary violations can be fixed, and hence, no compromise should be made to accept those ineligible results.

From the empirical perspective, this article demonstrates how to work directly with a truncated normal distribution by maximum likelihood under the framework of constrained optimization. This involves the setup of boundary constraints with the specification of the centered or fixed model and the application of sequential quadratic programming algorithm. Together, those efforts engender a new regression method that is exempt from boundary violations.

Many studies in political science analyze truncated normal dependent variables. In addition to party's vote share, any variable that uses the percentage measure is likely subject to boundary restrictions, such as voter turnout or politician's approval rating. However, some variables do have boundary restrictions, but these restrictions are largely neglected, since the untruncated normal distribution works fairly well. These variables include test scores and effective number of parties. Still, other variables, such as media exposure or formal education, have implicit boundary restrictions, but researchers are often unaware of their existence. Given the situations discussed above, it is strongly recommended that researchers compare the results of their original model with the TRMCO model and check the robustness of regression outcomes. Otherwise, inferential validity could be seriously compromised if boundary violations actually occur.