Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
21st Century Political Science.pdf
Скачиваний:
133
Добавлен:
21.02.2016
Размер:
7.28 Mб
Скачать

59

LONGITUDINAL ANALYSIS

SHANE NORDYKE

University of South Dakota

Many of the most important questions that are addressed within the field of political science today involve changes and processes that

occur over time. The passing of time is often an integral part of the question being answered. What is the effect of the International Monetary Fund on stability in developing countries? What are the shortand long-term effects of significant policy programs? How do different types of electoral systems affect voter turnout? What variables are important for understanding congressional behavior? How does decision making differ for representatives during election years? These are all important questions within various subfields of political science that in one way or another involve changes that take place over time. As Jennings and Niemi (1978) note, “Questions of persistence and change are also fundamental to an understanding of how people cope with and relate to political phenomena” (p. 333). Longitudinal data allow for analysis that is dynamic in nature and are often better able to address these questions than methods that are available for cross-sectional data. This chapter provides the following:

AA briefdescriptionoverviewof longitudinalof various methodsdata

that are available

for its analysis

 

A discussion of the benefits and challenges of using this

type of data

Thethe significanthistorical developmentcontributionsofofthesepoliticalmethods,scienceincludingscholars

Adevelopedbrief descriptionspecificallyof statisticalfor longitudinalprogramsdatathat have been

Longitudinal analysis isn’t a specific methodology or form of analysis. Rather, it refers to a broad category of empirical models and methods that have been developed for evaluating longitudinal data. Studies incorporating longitudinal data have been contributing to the field of knowledge in the social sciences for more than 50 years. Longitudinal data includes any data set that has a group of individual observations repeated at multiple times. A good example is the work by M. Kent Jennings on intraand intergenerational effects on political attitudes. In 1965, Jennings and Niemi conducted a national probability sample of high school seniors and their parents. Eight years later, they were able to collect data again from more than three fourths of their original sample, providing a second time period of data for his large cross-sectional panel. For the next 40 years, they continued to collect data from this same group and eventually added the children of the original respondents to the sample as well. By taking repeated measures of the same individuals, Jennings (2002) was able to “observe how formative political experiences can affect intragenerational cleavages over the adult life space and how they reflect on intergenerational continuities” (p. 303)

497

498 • POLITICAL SCIENCE METHODOLOGY

in ways that would not have been possible with a crosssectional analysis of a group of individuals at only one time.

An analysis of longitudinal data can combine the benefits of cross-sectional analysis in its ability to make betweensubject comparisons with the added value of observing changes through repeated observations. The development and expansion of these methods have often occurred in other fields, particularly biomedical sciences, psychology, economics, and education, where it is more common for multiple points of data across time to be collected from a specific identified population. However, it is an important method in the discipline of political science because it allows for dynamic analysis that can address changes over time and allows for more powerful inferences with regard to causality. Recent advances in statistical software programs, which allow for the statistical analysis of longitudinal data sets, and the availability of large-scale longitudinal data sets have made longitudinal analysis more accessible to a wider range of political science scholars.

Cross-Sectional Versus

Longitudinal Analysis

Cross-sectional data analysis draws inferences about the population by using a large sample of observations to analyze correlations between variables of interest. Crosssectional analysis provides a snapshot of the population at a single time. Time-series analysis focuses on analyzing single observations for multiple periods. In time-series analysis, time is the primary, and in some cases the only, independent variable of concern. The analysis of longitudinal data combines the contributions of both by evaluating the same cross section of data at multiple observation points. This is not the same as simply repeating a cross-sectional analysis at a different time; to be considered longitudinal data and to reap the advantages that it provides, the individuals or subjects evaluated must remain fixed over time.

An example can help to illustrate the differences. Suppose that an analyst is interested in understanding voter turnout in presidential elections in the United States. A cross-sectional analysis of voter turnout in a large sample of individual counties across the United States may allow the analyst to draw inferences about the effects of important independent variables such as age, median income, educational level, and party affiliation on voter turnout by analyzing the correlations in these variables between subjects (in this case counties). He or she would not, however, be able to analyze how turnouts have changed over time or how the effect of each of the individual variables changes over time. The most a cross-sectional analysis could say about the effect of age on voting behavior is that voters who are 60 years old this year are more or less likely to vote than those who are 30 years old this year, not how aging itself affects an individual’s voting behavior. A time-series analysis, on the other hand, might analyze the

national turnout rate for every presidential election for the past 40 years. In doing so, the analyst discovers changes in the overall turnout rate overt time—in year t, voter turnout is 5% greater than in year + 4—but not how individual voters have changed or what caused the changes in turnout over time. Time-series analysis is also limited in that it can’t tell whether 60% of the people vote all the time or if everyone votes 60% of the time. Longitudinal data, analyzed properly, can answer all of these questions by making the most of the information about within-subject differences and between-subject differences. A panel designed to answer this question would likely consist of a large sample of individuals surveyed about their voting habits as well as the independent and control variables that are important to the researcher. The same individuals would again be surveyed repeatedly at intervals appropriate to the research question.

In his article, Lubotsky (2007) is able to expand on previous research analyzing wage changes for immigrants. Previous studies had used averages from cross-sectional data taken from two different age-groups in order to estimate wage changes of immigrants over their first 20 years in the United States without taking into consideration changes in the composition of immigrant populations, which might also affect earnings. By using a longitudinal data set that tracked individuals for this period, Lubotsky was able to overcome analytical weaknesses of the crosssectional data and find that “the actual earnings growth of immigrants who remain in the United States is considerably slower than that implied by comparisons across decennial censuses” (p. 824).

In addition to analyzing change, longitudinal analysis can also strengthen studies interested in causal relationships. A primary weakness of cross-sectional analyses is that even the most carefully designed studies can speak only to correlation between variables and use those correlations to make guarded inferences about causality. This is true even if the study is repeated several times with different samples:

Whereas the cross sectional method infers causation from the presence of correlations, the longitudinal method permits the use of the far more dependable technique for inferring causa tion by watching the changes as the specified variables inter act over a period of time. The longitudinal method can provide the materials for studying change because it obtains information for each individual over a period of time, in a suf ficient and properly selected group, which is then combined with the information for other individuals into common clas sifications and appropriate summaries. The emphasis is placed on each individual history. (Goldfarb, 1960, p. 8)

With cross-sectional analysis, the strength of the results lies in the correlations between the independent and dependent variables. However, the analyst must constantly be on the lookout for spurious relationships: correlations that appear to exist between two variables that are unrelated but instead are both related to some third variable not included in the model. Even if there is a relationship

between two variables, the nature, size, and strength of that relationship can be misestimated if unobserved variables that are related to both the independent and dependent variable are omitted. Perhaps the most often cited example is that of the relationship between education and job attainment or income. Both education and job attainment are likely related to individual aptitude or ability, which is difficult to observe or measure in any meaningful way. In a cross-sectional analysis, there isn’t a way to disentangle the two to determine the independent effect of additional education by controlling for individual aptitude or ability. A longitudinal analysis, though, which would track the same individual for multiple years, can hold the individual’s aptitude as a constant, since it is within that individual, and then observe only the effect of additional years of education on that individual’s job attainment or income. In other words, longitudinal analysis allows analysts to control for unobserved factors that may be specific to the individual subject in order to evaluate the independent variable of interest. In political science, the unobserved factors of a specific subject can often be of even greater concern. For example, a nation’s unique history, culture, composition, and geography will likely have an important effect on many dependent variables of interest that would be extremely difficult to incorporate into a formal model.

To summarize, there are several advantages of analyzing longitudinal data. It allows for the observation of changes and patterns in changes over time, including within-individual change and interindividual differences in change (Singer & Willett, 2003). It offers the ability to control for heterogeneity created by omitted variables that is typical in political science and economic data by dealing with intersubject variation, and it allows for much greater flexibility in modeling differences in behavior across individuals. Although longitudinal analysis can be a powerful tool for answering a multitude of questions within the discipline, it comes with heavy costs. Collecting the quantity and quality of data that is needed is time-consuming and often expensive. For future waves of data to be collected, the researcher must have a plan and make provisions for the cost of future observations. (Retrospective studies obviously wouldn’t face this particular challenge.) Another particular challenge of longitudinal methods is the loss of subjects. Subjects on a panel, particularly individuals who are asked to provide repeated measurements or take repeated surveys, will often drop out for a variety of reasons. Subjects may move out of the area of interest, die, simply stop participating out of boredom or disinterest, or otherwise leave the sample. This attrition can create missing data, which can have significant consequences for analyzing and interpreting the data. This is important to keep in mind when analysts are determining the frequency and length of surveys that will be taken. In some cases, it may be worth sacrificing the additional data that could come from longer surveys or more frequent measurements in order to prevent attrition among participants. In some cases, researchers may also have to worry about the possible

Longitudinal Analysis • 499

effects of interviewing on the respondent if it will possibly have an impact on independent or dependent variables of interest (Goldfarb, 1960). It is not unreasonable to think that an individual’s level of interest in politics, political behavior or activism, and attitudes could all possibly be impacted by the experience of participating in several interviews regarding these subjects.

It is often the case that combining cross-sectional analysis with analysis of available longitudinal data can strengthen the confidence of conclusions in an efficient manner. In their study of the relationship between cumulative voting systems and voter turnout, Bowler, Brockington, and Donovan (2001) combine a cross-sectional and a longitudinal approach. For the crosssectional analysis, they compare locations with cumulative voting systems with like locations using plurality elections to examine differences in voter turnout. Although they attempt to control as much as possible for differences between communities, their analysis is strengthened by using a longitudinal analysis design to evaluate data for turnout before and after cities changed from plurality systems to cumulative voting systems; they collect data from localities with cumulative voting systems for all of the elections since the conversion and the last three elections prior to transitioning. Data is collected at the city level for fixed time increments since they correlate with election periods. In this case, combining the methods helped to strengthen confidence in the outcomes. Goldfarb (1960) also suggests that it can be useful to conduct a crosssectional analysis prior to embarking on longitudinal data collections because conducting the less expensive crosssectional study first can help researchers to identify and develop important concepts and measures earlier on.

Historical Developments

As is the case with many statistical methods, including classic linear regression models, longitudinal analysis can trace its origins back to the field of astronomy (Fitzmaurice & Molenberghs, 2008). Within the last 40 years, methods for longitudinal analysis have advanced substantially, in part because of increased governmental funding for largescale longitudinal studies and increased computing power and software sophistication. As might be expected, the earliest approaches to analyzing change through longitudinal data were grounded in analysis of variance (ANOVA) methods. In the late 1970s and early 1980s, significant works in the life sciences (Laird & Ware, 1982) and the humanities (Goldstein, 1979) demonstrated the usefulness of ANOVA methods for analyzing longitudinal data and brought increased attention to the development of the methods. As Fitzmaurice and Molenberghs (2008) note, “While ANOVA methods can provide a reasonable basis for a longitudinal analysis in cases where the study design is very simple, they have many shortcomings that have limited their usefulness in applications” (p. 7). To provide

500 • POLITICAL SCIENCE METHODOLOGY

valid estimates, these models required adherence to a number of strict assumptions, including the assumption that variance remains constant over time. The models were designed to apply to experimental research designs where the number of observation points was fixed and common to all individuals. These restrictions were generally untenable with real-world data and models, particularly within political science. As Fitzmaurice and Molenberghs explain,

It was these features of longitudinal data that provided the impetus for statisticians to develop far more versatile tech niques that can handle the commonly encountered problems of data that are unbalanced and incomplete, mistimed mea surements, time varying and time invariant covariates, and responses that are discrete rather than continuous. (p. 7)

Current Approaches

Linear mixed-effects models are the most common current method for analyzing longitudinal data with a continuous dependent variable. Investigators in the biomedical sciences, education, psychology, and social sciences have contributed to the rapid development of algorithms and extensions of these types of models. As Fitzmaurice, Laird, and Ware (2004) note,

In the early 1980’s, Laird and Ware proposed the use of the EM algorithm to fit a class of linear mixed effects models appro priate for the analysis of repeated measurements (Laird & Ware, 1982); Jennrich and Schluster (1986) proposed a variety of alternative algorithms, including Fish scoring and Newton Raphson algorithms. Later in the decade Liang and Zegar introduced the generalized estimating equations in the biosta tistical literature and proposed a family of generalized linear models for fitting repeated observations of binary and count data (Liang & Zegar, 1986; Zegar & Liang, 1986). (p. 2)

Today, these models can handle issues of unbalanced data; variables that do and do not vary with time; continuous, binary, and categorical dependent variables; relatively long

or short panels; and a variety of other complications that are beyond the capacity of the earliest models. For an extensive and more thorough review of the evolution and history of these models, see Fitzmaurice and Molenberghs (2008).

Data Considerations

To be useful, longitudinal data must be carefully collected and organized. In cross-sectional analysis, individual data points are generally notated as x , with a single subscript indicating the individual. Longitudinal data require that each data point have two identifiers, which can be notated as xit, with an additional subscript t denoting the wave of measurement in time. The error term also would have two dimensions, one for the individual and one for the time period. There are two different arrangements for organizing longitudinal data.

A person level data set has as many records as there are peo

ple in the sample. As you collect additional waves, the file

gains new variables, not new cases. A person period data set

has many more records

one for each person period combina

tion. As you collect additional waves of data, the file gains new

records, but no new

variables. (Singer & Willett, 2003,

p. 17)

 

Most of the commonly used statistical software packages can convert one form of data into the other with relative ease. Singer and Willett (2003) recommend, though, that when it comes to interpreting and analyzing data, the person-period data set is better for analysis of change over time. The STATA software package also requires that the data be in the person-period format to be recognized as panel data.

An example of the organization of a typical longitudinal data set in person-period format is included below. In this example, states are the individual subjects, and years are the measure of time. There are three independent variables in the example.

 

 

 

 

 

 

State

Year

Dependent Variable

Independent Variable 1

Independent Variable 2

Independent Variable 3

 

 

 

 

 

 

Alabama

2000

X1,1

Y11,1

Y21,1

Y31,1

 

 

 

 

 

 

Alabama

2001

X1,2

Y11,2

Y21,2

Y31,2

 

 

 

 

 

 

Alabama

2002

X1,3

Y11,3

Y21,3

Y31,3

 

 

 

 

 

 

Alabama

2008

X1,9

Y11,9

Y21,9

Y31,9

 

 

 

 

 

 

Alaska

2001

X2,1

Y12,1

Y22,1

Y32,1

 

 

 

 

Longitudinal Analysis • 501

 

 

 

 

 

 

State

Year

Dependent Variable

Independent Variable 1

Independent Variable 2

Independent Variable 3

 

 

 

 

 

 

Alaska

2002

X2,2

Y12,2

Y22,2

Y32,2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Alaska

2008

X2,9

Y12,9

Y22,9

Y32,9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Wyoming

2001

X50,1

Y150,1

Y250,1

Y350,1

 

 

 

 

 

 

 

 

 

 

 

 

Wyoming

2008

X50,9

Y150,9

Y250,9

Y350,9

 

 

 

 

 

 

Longitudinal data can be experimental, as is the case in many biomedical studies, or observational, as is more often the case in political science. The data can be collected prospectively after the research model has been designed or retrospectively by using existing historical data. Subjects can be individuals, families, organizations, states, and even nations, as is often the case in studies of international relations.

Time can be measured in a variety of ways as well. Pharmaceutical studies interested in the short-term effects of a drug may have observations every 15 minutes for a total of 2 hours. Some longitudinal data sets, however, have multiple years between observations. Most often, longitudinal data sets are wide (large number of subjects) but short (few time periods). Longer longitudinal data sets are now becoming more available, but analysis of these data sets may also have to address additional complications from using nonstationary data, a problem that has long been recognized in time-series analysis (Baltagi, 2005). Stationarity simply means that all the errors have the same expectation and a common variance that does not change over time (Fox, 1997).

There are a few additional ideas about the nature of the data that are worth noting. A balanced longitudinal data set is one in which every individual is observed for the same number of times and at the same intervals. This can often be the case in experimental designs where the entire groups are measured, exposed to a particular treatment— perhaps a political campaign ad—and then measured a few subsequent times. If the time between intervals and the number of individuals is small, it may be relatively easy to ensure that they are all measured at the same intervals. In observational studies, which are more frequent in political science, unbalanced panels are much more common. The panel may be considered unbalanced because not all of the individuals are observed for the same number of times, there are missing data points for particular subjects during certain years, or the subjects are measured the same number of times but at differing intervals.

Although there are statistical methods for modeling and handling missing data, it is important that the analyst understand the nature of the missing data. Missing data is inherently a loss of information and can result in a loss of efficiency or a drop in precision. Perhaps more important, though, under certain circumstances, “the missing data can introduce bias and thereby lead to misleading inferences about changes in the mean response” (Fitzmaurice et al., 2004, p. 376). This is the most serious of the potential problems created by missing data. To know whether bias is likely to be introduced, it is important to understand the reason that the data is missing. If the dropped data points are randomly distributed and have occurred from random attrition among subjects, then it is likely not a concern; however, if those that have dropped out of the study are somehow systematically related to the outcome of interest, then bias is likely. For example, a researcher interested in civic participation may use a longitudinal survey study to evaluate commitment to or interest in civic participation in a sample of individuals. It is possible that those that drop out of the study may do so because they fail to recognize the value of studying civic participation and are likely less interested in civic participation in general. Missing data points that are somehow systematically related to variables of interest require greater care and should be addressed in the study design, or the systematic error should be quantified by the researcher.

Models for Longitudinal Analysis

Depending on the specific nature of the data being used, the research question being addressed, and often the discipline of the analysts, longitudinal analysis can be approached through one of several methodological choices. In some instances, longitudinal analysis is treated as a special case within time series. Bayesian models are still evolving in their ability to be applied to longitudinal data as well. Longitudinal data can also be

502 • POLITICAL SCIENCE METHODOLOGY

treated as a special type of clustered or hierarchical data. Clustered data methods, such as robust regression, are typically used when individuals within the study are likely to be systematically correlated in meaningful clusters. Classic examples within the field of education are classrooms, schools, and districts. Students in the same class are likely to be related on the dependent variable in ways that are not captured by the independent variables in the model. In the case of longitudinal analysis, the clusters are made up of all of the observations for any one individual. The observations are likely to be more closely related to all other observations from that subject than to others.

Since they are perhaps the most commonly employed models within the political science literature, two types of linear regression models deserve particular attention: fixed effects and random effects. There are two different ways of modeling time and individual effects, and the choice between the two models depends on the nature of the data and the theories driving the model. The fixed effects model, also known as the least squares dummy variable (LSDV) model, generates a dummy variable for each of the subject observations across cross-sectional units to control for their individuality (again, think culture, unique history, or individual characteristics that are not incorporated into the model). This model arises from the assumption that the omitted effects are correlated with the included variables (Greene, 2008). Creating the dummy variables controls for this effect and ensures unbiased estimates of the relationship between the independent and dependent variables, but it does so at a heavy cost to efficiency. Each dummy variable results in a loss of a degree of freedom. When the number of subjects is very large, this loss of efficiency is substantial. Essentially, the fixed effects model uses dummy variables to throw out all of the between-subject variance, which is often essential to making causal arguments that need to be purged of third-variable (subject uniqueness) bias. This process also means that the fixed effects model cannot produce estimates for time-invariant variables, such as gender. If the variable does not change over time for the individual subject, or even if it changes very little, the fixed effects model will fail to provide precise coefficient estimates.

Because the random effects model is both more efficient and able to estimate time invariant variables, it is used most often. However, the random effects model can produce bias in the error terms, in which essentially a measure of all the latent or unmodeled omitted variables one tries to capture without modeling are systematically related to the dependent variable or the relationship between the independent and dependent variables. In other words, where states are the subject, a researcher needs to know if there is something about South Dakotans that he or she is not including in the model that will be correlated across all of the error components for South Dakota and

that is also expected to be correlated with the independent variables in the model. One should address this question foremost with theory and an understanding of the data, but it can also be tested for empirically with the Hausman test.

The Hausman test considers the null hypothesis that the difference in the coefficients is not systematic. As Yaffee (2005) notes,

The test for this correlation is a comparison of the covari ance matrix of the regressors in the LSDV (Least Squares Dummy Variable) model with those in the random effects model. The null hypothesis is that there is no correlation. If there is no statistically significant difference between the covariance matrices of the two models, then the correlations of the random effects with the regressors are statistically insignificant. (p. 8)

If the null hypothesis is rejected, then the random effects model may be biased and is not appropriate (Gujarati, 2003), but if the null hypothesis is not rejected, the random effects model is more efficient and preferred because it allows for time invariance in the independent variables.

Extensions of Longitudinal Analysis

The models discussed so far have primarily focused on longitudinal analysis with a continuous dependent variable. Extensions of the longitudinal analysis model have been developed that allow for analysis of dichotomous and categorical data as well. The last 30 years have witnessed a substantial advancement of the methods for analyzing discrete longitudinal data in part because of the ability to readily harness high-speed computing resources (Fitzmaurice & Molenberghs, 2008). Over the years, many extensions of longitudinal analysis have been developed. Some are extensions of generalized linear models (GLMs). GLMs are a “unified class of models for regression analysis of independent observations of a discrete or continuous response” (Fitzmaurice & Molenberghs, p. 9). Likelihoodbased approaches have also been explored but have found the extension to longitudinal analysis of discrete data somewhat more challenging, but alternatives have made progress. Greene (2008, chap. 23) offers a nice overview of explanations and applications for discrete choice models with respect to panel data, and for a more extensive review, Molenberghs and Verbeke (2005) is a good resource. survival analysis

Event history analysis, also called or duration analysis, is a special type of panel data analysis used primarily within the study of international relations and comparative politics where organizations, states, or governments can be the unit of analysis. In these models, rather than observations being scheduled at specified intervals, measurements are taken at each stage of a sequence of events. As Box-Steffensmeier (1997) notes,

Time plays a key role in politics and a class of econometric models, known collectively as event history analysis, can pro vide researchers leverage on the issue of the timing of politi cal change. Event history analysis allows researchers to answer a more extensive set of questions than conventional analyses by using information on the number, timing, and sequence of changes in the dependent variable. (p. 1414)

Many of the earliest time-to-event data analyses were conducted in the biomedical or health sciences fields and often focused on morbidity. The publication of Cox’s seminal paper on proportional hazards models in 1972 contributed significantly to the method’s further development (Fitzmaurice et al., 2004). Today, event history analysis is used in a variety of fields to study the occurrence of distinct events such as wars, loan defaults, or significant political changes. In his article, Gasiorowski (1995) uses event history analysis to examine the relationship between economic crises and political regime change. In this panel data set, observations for a created dichotomous dependent variable that indicates whether a regime change occurred during a given year are taken for 97 countries. For more information on the specifications of these event history models, see Box-Steffensmeier and Jones (2004) and Singer and Willett (2003).

Statistical Programs

The increasing popularity of longitudinal analysis with multiple disciplines has led to the development and expansion of statistical options in almost all of the major statistical software programs. STATA, SAS, and LIMDEP all have the capacity to run fixed and random effects models, can handle the additional complexity of unbalanced panels, have options for one-way or two-way random and fixed effects models, and can correct for autocorrelation. LIMDEP and STATA also have procedures for limited dependent panel data analysis including negative binomial, logit, and probit. Both programs also provide models for analyzing panel stochastic frontier data (Yaffee, 2005). With the latest versions, Releases 10 and 11, STATA has published an entire manual devoted to the analysis of longitudinal and panel data. STATA provides options for an impressive variety of longitudinal models including the Arellano-Bond linear dynamic panel data model, linear panel data models for fixed effects, random effects and population averaged models, models for logit and probit analysis of panel data, stochastic frontier models, and Poisson models for panel data. The program also includes a broad range of hypotheses tests, diagnostics, graphing capabilities, and postestimation techniques. In 2008, Croissant and Millo introduced an add-on package called plm, which makes data management and estimation of linear panel data models fairly straightforward in R, an opensource software platform for statistical analysis. (For a detailed description and information on downloading the

Longitudinal Analysis • 503

software, see Croissant & Millo, 2008.) There are also specialized software packages that were designed specifically for fitting multilevel models for change to data including HLM (Raudenbush, Bryk, Cheong, & Congdon, 2001), which is the program of choice for examples in Singer and Willett’s (2003) instructional text, Applied Longitudinal Data Analysis. With so many options, the choice is in large part user preference. There is some evidence that the different programs produce the same or very similar results (Singer & Willett), so analysts may choose between them based on familiarity with the overall program, ease of use, look of visual outputs, or software availability.

Future Directions

The same characteristics that make longitudinal data analysis so rich and interesting also create analytical complexities for statistical models. For example, the random effects model, which allows for time invariant variables and greater efficiency, is often the most applicable for the types of data and research questions analyzed in political science research, but it is also the most susceptible to bias. These models continue to evolve as methodologists continue to expand and refine these methods to deal with greater complexity and to incorporate these solutions into statistical software, which is also ever increasing in capacity, so that they are available for scholars and practitioners in the field. “Thus, the outlook is bright that modern methods for longitudinal analysis will be applied more widely and across a broader spectrum of disciplines” (Fitzmaurice & Molenberghs, 2008, p. 3).

An additional factor contributing to the increasing use of longitudinal methods is the increasingly large amount of longitudinal data available for analysis. Large-scale longitudinal data sets of interest used to be relatively hard to come by, but now panel data sets are available on a multitude of topics, and more continue to be developed. One prominent example of longitudinal data that can be applicable to research questions within the field is the Panel Study of Income Dynamics (PSID). As the study authors note, “The PSID is a nationally representative longitudinal study of nearly 9,000 U.S. families. Following the same families and individuals since 1968, the PSID collects data on economic, health, and social behavior” (Institute for Social Research, n.d., para. 1). Impressive in both its size and its quality of information, the National Longitudinal Study of Youth (NLSY) is “a nationally representative sample of 12,686 young men and women who were 14–22 years old when they were first surveyed in 1979” (Bureau of Labor Statistics, n.d., para. 1). The study continued to interview the same individuals annually through 1994, providing 15 years of annual data, and currently interviews them every other year on employment, marital status, economic status, education, drug use, and health. In 2007 and 2008, the American National Election Studies (ANES)

504 • POLITICAL SCIENCE METHODOLOGY

incorporated for the first time a 2-year panel study, which interviewed individuals in late 2007—before the primaries, in the months running up to Election Day, in November 2008, and in May 2009. As ANES (n.d.) authors note, “The panel will allow scholars to study citizen politics in new ways and will illuminate how election year politics affect judgments of the new administration in the formative months of its term” (para. 4). The Correlates of War Project offers numerous longitudinal data sets that are frequently used within the study of international relations. This is just a short list of a rapidly growing collection of longitudinal data sets that are available for analysis within the discipline. However, with greater availability of data and easier to use programs, the likelihood for misuse of these analytical models will increase as well. Wilson and Butler (2007) identify 195 studies published in political science journals that employ linear panel data methods to evaluate whether the authors consider unit heterogeneity and dynamic specifications and found only 5% of the studies met their limited criteria. They find a general “lack of attention to specification issues and a failure to adequately consider well-known models found in the literature” (p. 110). Hence, it is critical that as these models become more popular and are used with greater frequency, more time is spent educating scholars within the discipline on their proper use and interpretations.

Conclusion

An increasing number of longitudinal studies and panel data collection efforts as well as the increased ability to track individual subjects over time have provided the impetus for analysts to harness the advantages of greater computing power in order to capitalize on the benefits offered by analyzing longitudinal data. By combining a dynamic element with the richness of cross-sectional observations, panel studies give “more informative data, more variability, less collinearity among variables, more degrees of freedom and more efficiency” (Gujarati, 2003). Developments in the analysis and collection of longitudinal data in the last 50 years have allowed political science scholars to address questions that were previously difficult to answer with much confidence or certainty. Greater computing power and the advances of statistical software along with the exploding growth of longitudinal data sets have made longitudinal data analysis one of the fastest growing methods in the field. Researchers now have a variety of tools at their disposal for dealing with a whole spectrum of complex questions and data. The popularity of the methods will likely continue to increase the availability and precision of these models. Although this offers much promise, it is also important that analysts take time to understand the differences, assumptions, and needs of the various models. This chapter has provided only a brief glimpse of the

distinctions between available options; however, the references and further readings list offers several volumes that provide the technical, theoretical, and practical knowledge needed for the application and further development of these powerful tools.

References and Further Readings

American National Election Studies. (n.d.). About ANES. Ann

Arbor, MI: Center for Political Studies. Retrieved May 15,

2009, from http://www.electionstudies.org/overview/over

view.htm

Arellano, M. (2002). Panel data econometrics. Oxford, UK:

Oxford University Press.

Baltagi, B. (2005). Econometric analysis of panel data (3rd ed.).

New York: Wiley.

Bowler, S., Brockington, D., & Donovan, T. (2001). Election sys

tems and voter turnout: Experiments in the United States.

Journal of Politics, 63(3), 902 915.

Box Steffensmeier, J. M. (1997). Time is of the essence: Event

history models in political science. American Journal of

Political Science, 41(4), 1414 1461.

Box Steffensmeier, J. M., & Jones, B. S. (2004). Event history

modeling: A guide for social scientists. New York: Cambridge

University Press.

Bureau of Labor Statistics. (n.d.). National longitudinal surveys:

NLSY79. Washington, DC: Author. Retrieved May 15, 2009,

from http://www.bls.gov/nls/nlsy79.htm

Croissant, Y., & Millo, G. (2008). Panel data econometrics in

R: The plm package. Journal of Statistical Software, 27(2),

1

43.

Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2004). Applied longitudinal analysis. Hoboken, NJ: John Wiley & Sons.

Fitzmaurice, G. M., & Molenberghs, G. (2008). Advances in longitudinal data analysis: An historical perspective. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longitudinal data analysis handbooks of modern sta tistical methods (pp. 3 27). Boca Raton, FL: CRC Press.

Fox, J. (1997). Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage.

Gasiorowski, M. J. (1995). Economic crisis and political regime change: An event history analysis. American Political Science Review, 89(4), 882 897.

Goldfarb, N. (1960). Longitudinal statistical analysis: The method of repeated observations of a fixed sample. Glencoe, IL: Free Press. The design and analysis of longitudinal

Goldstein, H. (1979).

studies. London: Academic Press.

Greene, W. H. (2008). Econometric analysis (6th ed.). Saddle River, NJ: Pearson Education.

Institute for Social Research. (n.d.).

Gujarati, D. N. (2003). Basic econometrics (4th ed.). New York: McGraw Hill. Panel study of income

dynamics. Ann Arbor: University of Michigan. Retrieved May 13, 2009, from http://psidonline.isr.umich.edu

Jennings, M. K. (2002). Generation units and the student protest movement in the United States: An intra and intergenera tional analysis. Political Psychology, 23(2), 303 324.

Jennings, M. K., & Niemi, R. G. (1968, July). The transmission of political values from parent to child. American Political Science Review, 62(1), 169 184.

Jennings, M. K., & Niemi, R. G. (1978, July). The persistence of political orientations: An over time analysis of two genera tions. British Journal of Political Science, 8(3), 333 363.

Laird, N. M., & Ware, J. H. (1982). Random effects models for longitudinal data. Biometrics, 38(4), 963 974.

Liang, K. Y., Zeger, S. L., & Qaqish, B. (1992). Multivariate regression analyses for categorical data (with discussion).

Journal of the Royal Statistical Society, Series B (Methodological), 54(1), 3 40.

Lubotsky, D. (2007). Chutes or ladders? A longitudinal analysis of immigrant earnings. Journal of Political Economy, 115(5), 820 867.

Molenberghs, G., & Verbeke, G. (2005). Models for discrete lon gitudinal data. New York: Springer.

Longitudinal Analysis • 505

Raudenbush, S. W., Bryk, A. S., Cheong, Y., & Congdon, R. (2001).

HLM [Computer software]. Lincolnwood, IL: Scientific

Software

International. Available from http://www

.ssicentral.com

Singer, J. B., & Willett, J. B. (2003). Applied longitudinal data

analysis: Modeling change and event occurrence. New York:

Oxford University Press.

Wilson, S. E., & Butler, D. M. (2007). A lot more to do: The

sensitivity of time series cross section analyses to sim

ple alternative specifications. Political Analysis, 15(2),

101

123.

 

Woolridge, J. (2002). Econometric analysis of cross section and

panel data.

Cambridge: MIT Press.

Yaffee, R. (2005). A primer for panel data analysis. Connect.

New York: New York University. Retrieved May 29, 2009,

from

http://www.nyu.edu/its/pubs/connect/fa1103/yaffee

primer.html