## Faster Two-Way Fixed Effects Estimator in R

In a current project, we need to run linear models on a large data frame (~70 million rows) with fixed effects for both place (e.g., grid-cell) and time (e.g., year) – a common specification for difference-in-differences models with multiple periods and groups. The base lm package in R choked on the task. The excellent lfe package by Gaure can estimate the models but requires long run times.

In attempt to speed up our estimation routines, I built a new function that uses the data.table and RcppEigen packages to partial out the fixed effects and then run OLS on the de-meaned data. I include the code below in hopes that it might be useful to others facing a similar task. Note: this assumes that your panel is balanced.

To start, we need to create some fake data for analysis:

I then check equivalence of the estimates and benchmark this function against both the felm and lm functions.

Here’s the punchline: