Skip to main content
. Author manuscript; available in PMC: 2024 Aug 17.
Published in final edited form as: Hist Methods. 2023 Aug 17;56(3):138–159. doi: 10.1080/01615440.2023.2239699

Table 5.

Representativeness of Linked G2 Records

(1)
G2-Sibling
(2)
G2-1940
Census
(3)
G2-Death
(4)
G2-Marriage
(5)
G2-G3
Day of Birth 0.00000485***
(0.00000144)
−0.000246***
(0.00000190)
−0.0000265***
(0.00000188)
0.00000955***
(0.00000172)
−0.0000140***
(0.00000200)
Number of Siblings 0.0674***
(0.0000675)
0.00619***
(0.0000854)
0.00908***
(0.0000862)
0.00589***
(0.0000791)
0.0135***
(0.0000920)
Length of Name 0.00101***
(0.0000401)
0.0104***
(0.0000457)
0.00853***
(0.0000458)
0.00711***
(0.0000407)
0.0165***
(0.0000477)
Length of Father’s Name 0.00781***
(0.0000408)
0.00187***
(0.0000548)
0.00168***
(0.0000540)
0.000294***
(0.0000482)
0.00101***
(0.0000587)
Length of Mother’s Name 0.0146***
(0.0000328)
−0.000297***
(0.0000441)
0.00186***
(0.0000444)
0.0000652*
(0.0000377)
0.00201***
(0.0000468)
Share of Birth Records with Misspelled Father’s Name 0.299***
(0.000687)
−0.00351***
(0.00109)
0.00525***
(0.00110)
−0.0133***
(0.00101)
−0.0158***
(0.00116)
Share of Birth Records with Misspelled Mother’s Name 0.382***
(0.000646)
0.0526***
(0.000978)
0.0292***
(0.000978)
0.0574***
(0.000916)
0.0269***
(0.00104)
Link in Ohio 0.0751***
(0.000397)
0.0848***
(0.000458)
0.00233***
(0.000506)
0.103***
(0.000394)
0.0577***
(0.000513)
Female 0.0000647
(0.000304)
−0.0409***
(0.000394)
−0.0870***
(0.000394)
0.0174***
(0.000362)
−0.0194***
(0.000420)
Constant −0.0239***
(0.000778)
0.00497***
(0.000956)
0.0471***
(0.000941)
−0.0614***
(0.000823)
−0.109***
(0.000993)
Observations 4,297,569 4,297,569 4,297,569 4,297,569 4,297,569
R 2 0.511 0.033 0.027 0.029 0.043
F-statistic 553612.5 19394.9 14700.0 19632.3 25877.8

Notes: Each column reports the regression coefficients from a linear probability model using the sample of all G2s in the LIFE-M data regardless of whether the observation is linked. The dependent variable is an indicator variable equal to one if the G2 record is linked to the data source appearing in the column title and zero otherwise. The covariates for G2 links include the birthday of G2 children, number of children in G1-G2 family, name length of G2 children, name length of G1 father, name length of G1 mother, share of birth records with misspelled father’s name, share of birth records with misspelled mother’s name, and dummy variables for sex and state. Regressions pool Ohio and North Carolina as well as men and women. F-statistics are reported for a heteroskedasticity-robust Wald-test of joint significance of covariates. Robust standard errors are reported in parentheses. *** indicates the variable is statistically significant at the one-percent level.