A New Tool for Robust Estimation and Identification of Unusual Data Points
Most consistent estimators are what Müller (2007) terms “highly fragile”: prone to total breakdown in the presence of a handful of unusual data points. This compromises inference. Robust estimation is a (seldom-used) solution, but commonly used methods have drawbacks. In this paper, building on methods that are relatively unknown in economics, we provide a new tool for robust estimates of mean and covariance, useful both for robust estimation and for detection of unusual data points. It is relatively fast and useful for large data sets. Our performance testing indicates that our baseline method performs on par with, or better than, two of the currently best available methods, and that it works well on benchmark data sets. We also demonstrate that the issues we discuss are not merely hypothetical, by re-examining a prominent economic study and demonstrating its central results are driven by a set of unusual points.
JEL codes: C3 C4 C5.
Keywords: big data, machine learning, outlier identification, fragility, robust estimation, detMCD, RMVN.
Suggested citation: Garciga, Christian, and Randal Verbrugge. 2020. “A New Tool for Robust Estimation and Identification of Unusual Data Points.” Federal Reserve Bank of Cleveland, Working Paper No. 20-08. https://doi.org/10.26509/frbc-wp-202008.