The Hendricks’s team mission is to be the lynch pin between biomedical research and statistical and machine learning method development. Sitting at the interface between the applied and the theoretical enables our team to develop and apply methods to improve the utility and equity of large, publicly available genetic data resources, identify the biological mechanisms of healthy diets, and elucidate the genomic underpinnings of conditions and traits. We follow best practices of reproducibility and robust science by creating open source, well documented software and releasing all data and code used for our studies. Our team is highly collaborative working with people from a variety of backgrounds and education levels. We are always learning, improving, and pushing ourselves and others to be our best. In doing so, we produce first-class research for the broader community and train the next generation of genomics and health data scientists.
The increasing abundance of ‘omics data (e.g., genetic, metabolomic) holds great potential for advancing research and precision medicine. Already, individuals’ genetic information is being used to inform drug selection and dosing and high-throughput metabolomics is informing precision nutrition. However, challenges in rigorous, reproducible, and representative uses of the data remain. Thoughtful study design, data use, and method development can help address these challenges. This presentation showcases two ‘omics informatics opportunities: detecting and leveraging genetic substructure from summary data, and identifying metabolite derived food biomarkers and their association with health.
Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.