Hi [[ session.user.profile.firstName ]]

Synthetic Data: How do We Manage Them?

Synthetic data is becoming a prevalent technique to circumvent a major difficulty in organizations: how to manage and share good quality data without disclosing personal and sensitive information?
Synthetic data solves this problem by generating fake data while preserving most of the statistical properties of the original data. In this session, we will introduce some metrics to quantify similarity, quality, and privacy.
Recorded Jul 3 2019 21 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Armando Vieira, Data Scientist, Hazy
Presentation preview: Synthetic Data: How do We Manage Them?

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Crash Course in Data Architecture Oct 16 2019 3:00 pm UTC 45 mins
    Jesse Bishop, Solutions Architect, Dataiku
    Data architecture is the foundation of every organization’s data strategy, but it's not just something for CIOs and data architects either - everyone at data-powered organizations can benefit from understanding the ways data moves between teams and flows into data projects to yield insights. In our Crash Course, we’ll cover key architecture terms and highlight different priorities regarding security and scalability. Additionally, we’ll discuss ways to strategize and align architectural concerns with business priorities.

    Jesse Bishop works with a wide variety of Fortune 500 clients and specializes in helping large organizations operationalize their AI workflow. Jesse is an Insight Data Science Fellow in New York City. He previously worked for the Federal Trade Commission developing models to predict the impact of mergers in a wide variety of industries including Energy, Semiconductors, and E-commerce. Jesse earned his Ph.D. in Applied Microeconomics from the University of Minnesota.
  • Use Case Demo: Machine Learning Based Fraud Detection Oct 1 2019 3:00 pm UTC 45 mins
    Kevin Graham, Advanced Technology Strategist for Financial Services, and Will Nowak, Solutions Architect
    Fraudulent actors are always looking for new ways to subvert legitimate transaction systems; traditional rules-based approaches are no longer sufficient (or efficient enough) to combat fraud. In this webinar, we’ll discuss best practices and examples on how machine learning can improve fraud detection capabilities.

    Data Scientists, Quants, and Analysts in the banking sector can benefit from expert best practices on tackling fraud detection. We’ll include a brief use case demo to concretely ground the discussion and discuss real-time considerations for detection. Kevin’s financial expertise and Will’s diverse implementation experience make them the perfect team to explore the host of factors that go into a machine learning fraud detection model. We’ll host a Q&A after the demo, so make sure to join us live.

    Kevin Graham is a Dataiku Account Executive with nearly 10 years of experience across financial services and technology. He started his career in Sales & Trading before moving into a technology sales capacity at Oracle and Merlon Intelligence. At Merlon, Kevin focused on how AI and machine learning could help solve complex challenges within financial crime compliance. He currently is part of a financial services focused sales team across the Eastern United States and Canada at Dataiku.

    Will Nowak is a solutions architect at Dataiku, where he helps Fortune 500 companies improve data science operations. Previously, he engineered machine learning models for several Y Combinator startups, learning the pitfalls and challenges to productionalizing machine learning. Will holds a bachelor’s in Math and Economics from Northwestern University and a Master’s in Organizational Leadership from Columbia University.
  • How Machine Learning Helps Levi’s Leverage Data to Enhance E-Commerce Experience Recorded: Sep 19 2019 72 mins
    An AWS and Dataiku Partnership
    Levi Strauss & Co. (Levi’s) had already migrated its store and data science applications to the cloud. It needed a way to quickly create prototypes and put them into production to create different meaningful customer experiences on the website.

    Levi’s used Dataiku Data Science Studio (DSS) and Amazon Web Services (AWS) to create a recommendation system that aligns to a customer journey, such as showing best-selling products in their region to new customers or displaying complementary items to complete an outfit to returning purchasing customers.

    Watch this webinar to learn how machine learning enables Levi’s to easily and quickly leverage its data to create new products for its customers.

    Watch to learn how to:

    - Try different algorithms and ways of connecting data together through data pipelines to move beyond experimentation into operations
    - Use Amazon SageMaker for model training
    - Create prototypes in Dataiku DSS and use AWS to put them into production
    - Run multiple processes in parallel
  • Data Transformation at GE Aviation Recorded: Aug 26 2019 19 mins
    Somesh Saxena and Jon Tudor, GE Aviation
    GE Aviation has implemented their own version of a self-service data system that now has more than 1,800 users throughout the company, allowing them to use real-time data at scale to make better and faster decisions throughout the organization.

    This webinar is a Q&A format with Jon Tudor, Sr. Manager Self-Service Data and Analytics and Somesh Saxena, Product Owner of Dataiku and Alation, at GE Aviation. It covers the how and why behind the company's data transformation.
  • Defining ROI for Data Initiatives Recorded: Jul 30 2019 42 mins
    Mike Bukowski, VP Sales, Dataiku
    Calculating the Return on Investment (ROI) of your data initiatives is critical to activating your data; if you can't show that data initiatives are valuable, there will be resistance across the organization to implementation, causing you to miss out on opportunities and lag behind the competition. ROI isn't a simple calculation, but rather one that requires an in-depth understanding of your business needs and pain points. Mike has years of experience demonstrating the value data initiatives can unlock and will share tips, tricks, and best practice guidelines for helping you understand your own ROI metrics.
  • EGG London 2019 Highlights Recorded: Jul 16 2019 3 mins
    Dataiku
    Take a look at the best of EGG 2019 London!
  • Toward Ethical AI: Inclusivity as a Messy, Difficult, but Promising Answer Recorded: Jul 5 2019 15 mins
    Larry Orimoloye, Solutions Architect, Dataiku
    AI technologists must consider the ethical implications of what we’re building. Kurt Muehmel explores AI within a broader discussion of the ethics of technology, arguing that inclusivity and collaboration is a necessary answer.
  • GDPR and the ICO's Proposed AI Auditing Framework Recorded: Jul 5 2019 26 mins
    Ali Shah, Head of Technology Policy, Information Commissioner's Office
    The use of AI in industry and society is growing, and so are the concerns about its impact. The Information Commissioner’s Office (ICO) is responsible for protecting individual’s data protection rights, and has been at the forefront of tackling complex privacy challenges with significant societal implications, including the Facebook/Cambridge Analytica investigation for example.

    The ICO has made AI a priority for the organisation and, with the new powers given to us through the GDPR, we are developing an enhanced supervisory framework for assessing the risks and potential harms to peoples’ data protection rights that could occur when AI is used. The framework will cover many of the hot risk topics which are the focus of the EGG conference, including interpretability, bias, and fairness among others.

    This talk will inform the audience about the ICO’s work on AI, give an overview of our current thinking on some of the steps organisations can take to navigate AI’s data protection challenges, and encourage the audience to feed into the framework’s ongoing consultation.
  • Data Science Beyond Business: Navigating the Gender Pay Gap Recorded: Jul 5 2019 19 mins
    Ben Montgomery, Head of Technical Sales, UK Northern Europe & Middle East, Dataiku
    There are plenty of studies that discuss the gender pay gap. But these studies can be frustrating because they don’t allow us to dig into the data to better understand how the gender pay gap trend manifests itself. This talk will explore the relevant datasets, let them speak, and help us understand better what we mean when we talk about the gender pay gap
  • Building a Brilliant Shoe Discovery and Highlighting the AI in Retail Recorded: Jul 4 2019 14 mins
    Antonis Argyros, VP Product and Growth, SafeSize
    After scanning more than fifteen million feet, two and a half million shoes from almost every brand and receiving more than two million of feedback points from consumers all over the world, SafeSize has managed to build a brilliant shoe discovery solution for omnichannel retail.

    In this session, Antonis will share with the audience the key learnings and the main challenges that SafeSize faced the last decade while at the same time will explain how AI solutions could be implemented in the retail market today.
  • Democratizing Automated Forecasting at Mercedes-Benz Recorded: Jul 4 2019 18 mins
    Lukas Stroemsdoerfer, Lead Data Scientist, Mercedes-Benz
    Future outlooks of financial KPIs are key to steer Mercedes-Benz business. In the past these outlooks where tediously collected and calculated with a lot of manual effort. Lately, such outlooks are often generated by Machine Learning based forecasts. After successfully working on more than 25 financial KPI forecasts, we packed all our knowledge into a software package. Using DataIku DSS we can put our package at the fingertips of our business users, enabling them to combine their business expertise with state-of-the-art Machine Learning.
  • Ethics, Data Science, and Public Service Media Recorded: Jul 4 2019 19 mins
    Ben Fields Lead Data Scientist BBC News
    In this talk, I will look at the contributions public service media organizations can play in the emerging understanding of the responsible and ethical practice of data science. We will look at some specific project examples: what works and where we can improve.

    Among them are automatic decision-making processes, as they need to come to Public Service Medias (PSM) because they represent some competitive advantage and competitive potential. But PSM are about making sure that people have a shared understanding of the world around them. How can you balance these two different expectations?

    By the end of the talk, the audience should have examples and principals they can apply in their own data science practice.
  • What Is To Be Done When Absolutely Everything Can Be Faked? Recorded: Jul 4 2019 18 mins
    Shaun McGirr, Head of Data Science & Business Intelligence, Cox Automotive UK
    We like to believe it was once easy to distinguish authentic images from those doctored for a political purpose. We recall an age of so few news sources that none of them were fake. The rose-tinted glasses that enable our nostalgia to hide an important enduring lesson: the most important things you can learn about your data, you cannot learn from your data.

    Thankfully, salvation is at hand if we only pay more attention to the second word in "data science". The scientific method is more than a technique for generating hypotheses to A/B test, it offers everything we need to use data responsibly, regardless of which algorithms are in fashion. In this talk, I share practical knowledge drawn from a background in social science, which can help any practitioner embed ethics in their daily work.
  • Ethical Enterprise AI - A Guideline or Compass? Recorded: Jul 4 2019 25 mins
    Martin Leijen, Architect Data & Intelligence Lab, Rabobank
    How are the privacy & ethical standards of Rabobank determined and protected and how can we effectively maintain our morals while at the same time we encourage a strong drive to optimize business opportunities and profitability?

    In this session, we will look at a Privacy by Design canvas for Rabobank and look at examples for which it may not immediately be clear how to act. These examples include cases such as prediction of life-changing events (pregnancy, divorces, illness or worse), that motivated the enterprise discussion on Ethical AI.
  • Generative Adversarial Networks for Finance Recorded: Jul 3 2019 19 mins
    Alexandre Hubert, Lead Data Scientist, Dataiku
    The Gaussian assumption in the Black-Scholes formula for option pricing has proven its limits. Today, Generative Adversarial Networks (GANs) are the new golden standard for simulation. It has worked wonders in image generation, but can it be applied to option pricing? Here is the story of how 2 data scientists (inc. a former trader) deployed a GAN for option pricing in real-time, in 10 days.
  • How to Make A Success of Data Science: Rendezvous Architecture for Data Science Recorded: Jul 3 2019 21 mins
    Jan Teichmann, Senior Data Scientist, Zoopla
    Making data science a success is really hard with up to 85% of projects and initiatives around big data and data science failing according to Gartner. The reasons are complex but often misunderstood. What is so different about data science that it needs new approaches?

    A survey in 2016 concluded that 80% of data science is preparing and cleaning data (the infamous 80/20 rule). That survey of data scientists caught on and developed into the widely recognized problem statement for data science, because productionisation of models is the TOUGHEST problem in data science. This talk will introduce the unique data science requirements and why they are important for the long game and introduce the Rendezvous Architecture, a proven solution to integrate data science and enterprise requirements in a harmonious way at scale
  • Leveraging the Law of Averages to Deal with Data Science Frustration Recorded: Jul 3 2019 18 mins
    Adrian Badi, Lead Data Analyst, Demant
    Working as a data scientist can be fun and exciting. You get hired and you have all these great ideas and projects that you would like to try out with all that big data lying around. A couple of weeks or even months later, things start to change. You start getting data engineering tasks, your projects and ideas get down-prioritized and you end up doing reporting and dashboards for executives. Does this sound familiar? If you are anything like Adrian, you might feel constrained by your environment; not to mention underused to your full potential.


    In this session, Adrian will talk about his experiences and how a very old and unexpected book helped him understand the mechanisms that he can use to kill frustration, reduce stress and learn to give the maximum of his capabilities while adding value to the organization. He cannot promise you a "solution-fits-all" session, but what he DOES promise is an engaging, funny and honest session that will help you put things into perspective.
  • Convenient and flexible ML pipelines with Kubeflow Recorded: Jul 3 2019 19 mins
    Mattias Arro, Machine Learning Engineer, Subspace AI
    It is still early days for open source solutions for productionalising and deploying machine learning (ML) models, managing scalable data pipelines and data science experiments. Kubeflow is a collection of tools that are perfect for these use cases and is gaining popularity for a good reason.

    This talk describes a system built on top of Kubeflow which is generic enough to be used for managing ML pipelines of various shapes and sizes, yet flexible enough to allow entirely custom workflows. At its core, there is a set of conventions which determine where data is read from and written to, and expressing data preprocessing and models as a configuration of composable objects and functions. This approach makes it trivial to add new models, datasets, and training objectives to a production system, and enables training and deploying stacked models of arbitrary complexity.
  • The Spirit of the City: Capturing Network-Generated Data for Better Cities Recorded: Jul 3 2019 20 mins
    Luca Maria Aiello, Senior Research Scientist, Nokia Bell Labs
    The corporate smart-city rhetoric is about efficiency, predictability, and security. "You'll get to work on time, no queue when you go shopping, and you are safe because of CCTV cameras around you". All these things make a city acceptable, but they don't make a city great.

    Goodcitylife.org is a global group of like-minded people who are passionate about building technologies whose focus is to give a good life to city dwellers; because the future of the city is, first and foremost, about people.

    We go beyond the creation of a smart city by using digital data to measure intangible aspects of the urban space: the spirit of the city. We will show how this acquired knowledge can be leveraged to build new tools for both citizens and municipalities. Can we rethink existing mapping tools? Is it possible to capture smellscapes of entire cities and celebrate good odors? Can you measure a city's cultural capital?

    We will see how a creative use of network-generated data can tackle hitherto unanswered research questions.
  • Synthetic Data: How do We Manage Them? Recorded: Jul 3 2019 21 mins
    Armando Vieira, Data Scientist, Hazy
    Synthetic data is becoming a prevalent technique to circumvent a major difficulty in organizations: how to manage and share good quality data without disclosing personal and sensitive information?
    Synthetic data solves this problem by generating fake data while preserving most of the statistical properties of the original data. In this session, we will introduce some metrics to quantify similarity, quality, and privacy.
Dataiku
Dataiku is the centralized data platform that moves businesses along their data journey from analytics at scale to enterprise AI. By providing a common ground for data experts and explorers, a repository of best practices, shortcuts to machine learning and AI deployment/management, and a centralized, controlled environment, Dataiku is the catalyst for data-powered companies.

Customers like Unilever, GE, BNP Paribas, Santander use Dataiku to ensure they are moving quickly and growing exponentially along with the amount of data they’re collecting. By removing roadblocks, Dataiku ensures more opportunity for business-impacting models and creative solutions, allowing teams to work faster and smarter.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Synthetic Data: How do We Manage Them?
  • Live at: Jul 3 2019 9:00 am
  • Presented by: Armando Vieira, Data Scientist, Hazy
  • From:
Your email has been sent.
or close