Big Data versus a Survey
Economists are shifting attention and resources from work on survey data to work on “big data.” This analysis is an empirical exploration of the trade-offs this transition requires. Parallel models are estimated using the Federal Reserve Bank of New York Consumer Credit Panel/Equifax and the Survey of Consumer Finances. After adjustments to account for different variable definitions and sampled populations, it is possible to arrive at similar models of total household debt. However, the estimates are sensitive to the adjustments. Little similarity is observed in parallel models of nonmortgage debt. While surveys intentionally collect theoretically related variables, it may be necessary to merge external data into commercial big data. In this example, some education and income measures are successfully integrated with the big data, but other external aggregates fail to adequately substitute for survey responses. Big data offers sample sizes, frequencies, and details that surveys cannot match. However, this example illustrates why caution is appropriate when attempting to substitute big data for a carefully executed survey.
Keywords: Big Data; Survey Data; Household Debt.
JEL Codes: C55, C81, D12.
Suggested citation: Whitaker , Stephan, 2014. “Big Data versus a Survey,” Federal Reserve Bank of Cleveland, working paper no. 14-40.