Abstract
Advances in high-throughput technology and the easiness with which we can generate massive amounts of data has created excellent opportunities to decipher the various paths to aging by integrating genetics, genomic, and longitudinal data in large observational studies. Integration of data from different sources is not an easy statistical task, and requires the combination of traditional statistical methods and machine learning methods that are often considered “black boxes”. I will use an example of integration of genetic and genomic data with changes of cognitive and physical functions in participants of the New England Centenarian Study to show some of the challenges but also the opportunities of this data analysis. The example will show how machine learning approaches to data analysis that are grounded on the principle of automation could naturally foster reproducibility and transparency in aging research, but rigor, reproducibility, and transparency require code sharing in addition to data sharing.