Talk Abstract:
Do you encounter missing values in your model features, but don’t give them much thought? I have two goals in this talk: 1) use my work with sort algorithms at Tripadvisor to show how ad-hoc imputation of missing values severely hurts the performance of real-world ML models, and 2) cast the missing value problem as a probabilistic model which one can solve through Bayesian inference. I will end by showing that the most widely used missing value imputation technique in the statistics community (Multiple Imputation by Chained Equations, MICE), which scikit-learn implements in its IterativeImputer) can be better understood as approximate Bayesian inference in a simple probabilistic model.
This talk will have content that should appeal to data and ML related researchers of all skill levels. For beginning data-related practitioners, part 1 of my talk will demonstrate why it is important to think about missing values carefully during feature engineering and how to examine their role in a model’s predictive performance. For more experienced attendees, part 2 of my talk will try to draw a bridge between the statistical literature on missing value imputation and the world of the machine learning practitioner through a Bayesian lens.
Speaker Bio:
Narendra is a long time Bayesian interested in the connections between statistics, causal inference and machine learning. Currently, he is a Machine Learning Scientist at Tripadvisor based at their global headquarters in Needham, MA. His work at Tripadvisor spans the entire range of customer-centric ML problems from recommendation engines to building probabilistic models of user-generated content creation. To learn more about Narendra, look at his webpage at: https://narendramukherjee.github.io
Disclaimer: All views, thoughts, & opinions expressed in the webinar belong solely to the panelists, & not to the panelists’ employer, organization, committee, other group or individual.