Logo

1.1 Panel Regresion with Truncated Dependent Variables

Panel (or time-series-cross-section, TSCS hereafter) data is commonly used in political science studies (Wawro, 2002; Beck and Katz, 1995; 2007; Beck, 2007; Adolph, Butler, and Wilson, 2005). In particular, many of these studies involve the analysis of a truncated dependent variable (Gomez, Hansford, and Krause, 2007; Knack, 1995; Baek, 2009; Boyne et al., 2009). In American politics, the study of political participation is associated with aggregate-level (state or county) data about voter turnout in multiple temporal units, such as Current Population Survey (CPS) (Sides, Schickler, and Citrin, 2008). In political economy studies, the sovereign bond rating uses the data from a limited-point scale across different countries in multiple years from S&P and Moody's (Biglaiser, 2007). In world politics, for the past two centuries, the cross-national data of military spending has been measured as a percentage of GDP in the Correlates of War Project (COW) (Fordham and Walker, 2005). In comparative politics, research on a party's vote share considers electoral datasets, such as the Democratic Electoral Systems Around the World dataset (DESAW) (Golder, 2005). All of the aforementioned studies apply data that simultaneously possess spatial and temporal characteristics. The target of investigation is always related to a dependent variable that has boundary restrictions.1

The standard method of analyzing panel data is panel data regression (Greene, 2008: 180-213) in which the within- and between-groups estimators are applied to the fixed-effects or random-effects model, such as the xtreg command in Stata (McCaffrey et al, 2010). The basic idea is to purge between-groups variance by subtracting the group means from the pooled regression, and then the OLS method can be applied to the within-groups regression. In other words, all the constants of the spatial units, representing any omitted time-invariant variables, are canceled out after the de-meaning operation (Wooldridge, 2005). Hence, the OLS estimator is BLUE (Baltagi, 2011:308). This approach is equivalent to the least squares dummy variable estimation (LSDV), typically known as the fixed-effect model (FE) (Hsiao, 2003: 30-33). Its advantage is avoiding working with the large covariate matrix if the number of spatial units is plenty. This approach achieves its goal by utilizing the important property$-$ the equivalence of differencing and dummying (Wicharaya, 1995:200).

However, when the dependent variable is distributed as truncated normal, differencing is not equivalent to dummying, since the time mean of the truncated dependent variable is a biased estimate of the district-level location parameter (Hsiao, 2003: 243). This means that using the time mean to characterize the contextual effect of the spatial units is not valid (Alan et al. 2011). Therefore, a methodological problem occurs if we apply the panel data regression to analysis of a truncated dependent variable.

____________________

Footnote

1 Most of the work does notify readers about the limited range of the dependent variable. However, few actually discuss the statistical property of the truncated random variable. In fact, many convenient properties for a normal dependent variable do not hold in the truncated normal random variable. For example, linear transformation of a truncated normal random variable does not generate a truncated normal random variable. It has been proven that the simple arithmetic operations, such as additivity, do not work for a truncated normal random variable. See Horrace (2005a; 2005b).

2 We do not intend to discuss all estimators for panel regression. Rather, we want to employ the simplest method to illuminate the basic problems. For more discussions on the strategies for analysis of TSCS data, see Franzese (2005) and Franzese and Hays (2007).

Download [full paper] [supplementary material A] [supplementary material B] [all replication files]