Presented by a multi-institutional, interdisciplinary research team comprised of an information theorist, a neuroscientist, and a postdoc with both a computer science and neuroscience background.
Cannabis is the most commonly abused illicit drug in the U.S. with approximately 10% of users developing cannabis use disorder (CUD). Previous genetic studies have shown that CUD are up to 60% heritable, warranting a deeper understanding of their genetic contributions, which can be most thoroughly examined through genome wide association study (GWAS) data. However, the large (800,000) number of single nucleotide polymorphisms (SNPs) in GWAS data captured per subject inflates the rate of false positives and thus, large samples sizes are needed for any high-dimensional inference tasks. Furthermore, minor allele frequency also necessitates large sample sizes, as a risk allele for a disease could occur in a fraction of a percent of a population. As a result, much of GWAS data is uninterpretable using naive statistical methods due to insufficient sample sizes. Thus, it is necessary to determine which alleles confer propensity toward substance use disorders in general, and toward specific substance use disorders. To this end, our aim is to predict severity of cannabis use disorder based on an individual’s SNPs and assessments of CUD severity. We will use least absolute shrinkage and selection operator (LASSO), which determines the SNPs that contribute most to the model, as well as regularization to prevent overfitting. This project will result in identification of genetic variability that is unique to CUD and provide insight into how different SNPs contribute to CUD severity as assessed behaviorally. In addition to identifying individuals at risk for CUD, our proposed methods will provide a potential solution for analysis of GWAS data using relatively small sample sizes.