Deep Dive into Synthetic Data Generation for Bias Mitigation

Presented by

Dr. Emma Beauxis-Aussalet, Sarah-Jane van Els & Triveni Gandhi

About this talk

As we saw in episode 1 of this series, the bias inherent in historical data is often not correctable by simply collecting more or more representative data. If nobody from a certain group has ever applied for this kind of loan or that type of job, there may simply be no data to collect. If we accept defeat on this, there is a real risk AI models will refuse to make predictions on these groups with missing data, reinforcing the problem that got us here in the first place. One solution with promise is synthetic data, generated by combining the data of real cases to produce anonymised cases with properties that match the underlying population, “filling in the gaps” in historical data. In this session, we discuss a concrete use case developed by the ICAI lab in collaboration with Randstad and explore the promise and limits of this approach. Speaker bios: Dr. Emma Beauxis-Aussalet is an assistant professor of ethical computing at the Vrije Universiteit Amsterdam (VU). She is also lab manager of the Civic AI Lab. In 2019 Emma obtained her doctorate at Utrecht University with a dissertation on AI bias, for her work at the Centrum Wiskunde & Informatica (CWI). With her multidisciplinary experience, she has been researching computational methods, statistics, user interfaces and data visualizations that enable transparent and controllable AI systems. Modelling and visualizing AI errors is one of her main research topics. For her achievements in this field, she was named one of the 100 Brilliant Women in AI Ethics in 2021. She also received the 3rd WomENcourage Prize for her contributions to the development of AI literacy and bias awareness in lectures and workshops. Sarah-Jane is a recent MSc Information Sciences graduate with a BSc in Business Administration from the Vrije Universiteit Amsterdam. She conducted her master thesis at Randstad Groep Nederland, researching synthetic data to identify bias in recommender systems for recruitment.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (265)
Subscribers (55769)
Dataiku is the world’s leading platform for Everyday AI, systemizing the use of data for exceptional business results. Organizations that use Dataiku elevate their people (whether technical and working in code or on the business side and low- or no-code) to extraordinary, arming them with the ability to make better day-to-day decisions with data. More than 450 companies worldwide use Dataiku to systemize their use of data and AI, driving diverse use cases from fraud detection to customer churn prevention, predictive maintenance to supply chain optimization, and everything in between.