-
Rebecca Barter - Assessing The Fit of a 'Big Data' Model: The Clustered Residual Plot
-
Monday, October 26, 2015 1:00 AM - 2:00 AM EDT
Online
In the modern era of computational data generation and collection, we are capable of producing multitudes of large and complex datasets. As a result, it is becoming increasingly difficult to adequately visualize, and thus understand, the datasets at our disposal. In this talk I will introduce the clustered plot, an extremely useful graph which allows for a more in-depth exploration of the relationships between variables in a dataset. The applications of the clustered plot are many, and include exploratory data analysis to identify clusters in the data, simultaneous exploration of the relationships between different variables and the examination of changes in these relationships between clusters. In the context of model fitting, the plot can be used to assess properties such as the underlying model assumptions, the adequacy of the model fit or the predictive accuracy. We will focus on two case studies: the first will examine crime in US communities and the second will explore neural activity in the visual cortex. These examples will serve to demonstrate the practicality of the clustered plots, and will be accompanied by details on implementation of the clustered plots in a general setting using the clusterplot package written using the R programming software.
Bio : Rebecca is currently pursuing a PhD in Statistics at UC Berkeley. She graduated from The University of Sydney in Australia with a Bachelor of Science (Advanced) with Honours where she majored in Mathematics and Statistics. She undertook an Honours year during which time she conducted research in statistical bioinformatics focusing on improving prognosis prediction in Stage III Melanoma. Her current research interests include areas of applied statistics including statistical modeling, data visualization, causal inference and the design and analysis of experiments.