Hi [[ session.user.profile.firstName ]]

How to Process Tons of Data for Cheap with Spark + Kubernetes

This webinar has something for everyone, whether technical or not. On the business side, you’ll get an overview of how to better manage your infrastructure spend while providing the compute your analysts and data scientists need as well as a practical demonstration of how autoscaling your data processing infrastructure provides the horsepower you need without breaking the bank.

On the technical side, if you like Spark for processing big data and Kubernetes for scaling and managing containers but you haven’t run Spark on Kubernetes yet, this is the webinar for you. In this one hour session, you’ll learn:

- Why Kubernetes is a great scheduler for Spark jobs
- How to quickly spin up a managed Kubernetes cluster on AWS and run you first Spark job from your environment
- How Dataiku lets data scientists spin up Kubernetes clusters and run Spark jobs with just a few mouse clicks
Recorded Apr 6 2020 64 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Gus Cavanaugh, Sales Engineer @ Dataiku
Presentation preview: How to Process Tons of Data for Cheap with Spark + Kubernetes

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Big Data and AI in Finance ft. RBC, ATB Financial, and Scotiabank Jun 26 2020 11:00 pm UTC 75 mins
    Fouad Yousif, Dhruv Mayank, Zain Abbas
    AI is utilized by financial institutions in various ways to improve their operations. Its diverse applications affect both the sell side which include investment banking & stockbrokers, as well as the buy side, which include asset managers & hedge funds. In this talk, we’ll dive into how various financial services leverage machine learning, alternative data sets, & other advanced analytics to tackle issues such as credit decisions, risk management, fraud prevention, trading, personalized banking, process automation, & more. We’ll touch on use cases, the impact of big data, & the benefits of that big data & AI can bring to financial services.

    Zain is working as a Sr. Data Scientist at the Customer Insights, Data & Analytics group at Scotiabank. Zain leverages machine learning & other advanced analytics to enhance Scotiabank’s knowledge about its customers & deliver a better customer experience. He has a PhD in Soft Computing & worked in academia before starting the current role.

    Dhruv is a data scientist on ATB Financial’s AI R&D team. He studied machine learning for financial engineering at the University of Toronto & has worked with NLP for sentiment analysis on Twitter and developed a model & app for predicting the replaceability of jobs to automation & the gig economy. At ATB, his team attempts to push the boundary of how machine learning can be used in the financial industry.

    Fouad has over 12 years of experience in data science & educational outreach. He currently works as a Data Scientist at RBC, developing ML models that predict risk & provide business insights. Previously, he worked as a data scientist at The Ontario Institute for Cancer Research, where he was part of many computational developments that aimed to improve cancer prognosis.
  • Gouvernance et IA de Confiance : De la Réflexion à l’Exécution Jun 25 2020 9:00 am UTC 60 mins
    Grégory Abisror (Associé Risk Advisory chez Deloitte), Sophie Dionnet (VP Strategy chez Dataiku)
    Identifier et prévenir les risques liés à l’usage de l’IA dans les processus métiers est un enjeu majeur pour les entreprises. Il le deviendra encore davantage avec les réglementations à venir de la Commission Européenne et le sentiment de défiance grandissant du grand public. C’est pourquoi Dataiku et Deloitte vous livrent à travers ce webinar, animé par Pierre Carrere, les clés pour vous aider à concrétiser vos ambitions de Gouvernance de l’IA.

    Vous découvrirez au fil de la discussion entre Grégory Abisror (Associé Risk Advisory Deloitte France) et Sophie Dionnet (VP Strategy Dataiku) quels sont les enjeux de la gouvernance IA en France et quelles sont les bonnes pratiques pour répondre de manière efficiente à ces enjeux. L’adoption de ces conseils vous permettra d’accélérer votre réflexion et aborder plus sereinement la mise en place de votre Gouvernance IA.
  • Towards More Transparency in Your Projects with Explainable AI Jun 23 2020 6:00 am UTC 60 mins
    Virginie Mathivet, AI specialist, Teamwork
    Explainable AI is not just a new buzzword. It is about being able to explain why a model acts as it does not only at the global level but also for each prediction it gives. Indeed, in certain fields, being able to explain the results is necessary, whether in a legal framework or for ethical principles.

    In this webinar, Virginie Mathivet, AI specialist will, therefore, present to you what Explainable AI is, why it is so important, and how Dataiku DSS can help us.
  • Advanced Data Analytics and ML for Patient Care and Hospital Acquired Conditions Jun 19 2020 5:00 pm UTC 45 mins
    Emma Irwin, Sales Engineer, Dataiku
    Hospital-acquired conditions (HAC) represent a major strain on hospitals. Infections, surgical errors, and falls in medical facilities lead to further medical treatment that payers (insurers, public health programs) will often refuse to cover. While HACs are inevitable at even the best-run facilities, minimizing can go a long way in improving a hospital’s financial position and patient care. Some providers have made significant progress in identifying sources of HACs by leveraging advanced data analytics and ML.

    In this second episode of our three-part series on data and analytics use cases for the Healthcare industry, Emma Irwin will present demonstrate how Dataiku’s Data Science Studio can be leveraged to quickly and easily design, train and deploy accurate models to identify sources of HAC and implement strategies to minimize their occurrence.
  • Design, train and deploy accurate churn prediction models Jun 18 2020 5:00 pm UTC 45 mins
    Greg Cashman, Senior Sales Engineer, Dataiku
    Given that it costs 5-10 times more to acquire a new customer than to retain an existing one, it seems obvious that all businesses should engage in some level of churn prevention. Because of its business impact and its relative ease in execution, for many types of business, churn prediction is a great first project to tackle with machine learning and AI.

    In this talk, Greg Cashman will demonstrate how Dataiku’s Data Science Studio can be leveraged to quickly and easily design, train and deploy accurate churn prediction models. The insights delivered from these models will not only empower your organization to proactively intervene and retain valued customers but will also provide valuable insight into the key drivers of churn.
  • Customer Service, Satisfaction & Churn Prevention Jun 18 2020 12:00 pm UTC 60 mins
    Frank Oechsle (Head of Data Science, esentri), Matthias Wurdig (Senior Consultant Data Science, esentri)
    ... im closed Loop für ein erfolgreiches CRM.

    Wie können Customer Service & Satisfaction Daten genutzt werden, um klassische Churn Prediction neu zu denken? Welche Modelle und Methoden funktionieren? Und was braucht es wirklich, um Churn Prevention erfolgreich zu gestalten? Das richtige Stichwort lautet hier: Uplift Modelling.

    Frank und Matthias haben gemeinsam über 20 Jahre Erfahrung im Bereich Customer Relationship Management und zeigen auf, wie aktives Churn Management mit Uplift Modellen gelingt.

    Frank Oechsle:
    Frank Oechsle setzt seit über 10 Jahren Data Science Methoden im analytischen CRM Umfeld ein. Dies vor allem im Handel und in der Telekommunikation, auch als Führungskraft mit Business Plan Verantwortung für den Customer Churn. Der diplomierte Wirtschaftsmathematiker promoviert darüber hinaus am KIT im Bereich künstliche Intelligenz und ist damit immer am Puls der Zeit. Ihn begeistert die Messbarkeit des Mehrwerts, den Data Science Anwendungen erzeugen können.

    Matthias Wurdig:
    Matthias Wurdig ist seit 2011 als Data Scientist tätig, speziell im analytischen Customer Relationship Management (CRM). Der Wirtschaftsinformatiker hat zahlreiche Data Science Projekte in den unterschiedlichsten Branchen, von Automotive bis hin zur IT & Telekommunikation betreut und geleitet. Von Marketing & Sales, Customer Experience bis zum Churn Management hat er stets die Journey des Kunden aus Data Science-Perspektive im Blick.

    Bitte beachten Sie, dass Sie mit der Registrierung für dieses Webinar zustimmen, dass Ihre Daten an Dataiku's Partner esentri weitergegeben werden.
  • Artificial Intelligence in Banking and Finance Jun 16 2020 3:00 pm UTC 27 mins
    Alexandre Hubert, Sales Engineering Director
    Enterprise AI is at peak hype, and although the Banking, Financial Services and Insurance industry is starting to embrace it, there are still some challenges to over come before you can reach your full AI potential.

    Join us as we explore what AI means for Banking and Financial Services Industry, investigating the current environment, use cases, challenges, possible solutions, and the future of AI.
  • Comment améliorer son Forecasting en s'appropriant une démarche Data Science ? Jun 11 2020 9:00 am UTC 60 mins
    Ludovic Blusseau, Head of Sales Engineering, Dataiku
    Participez au webinar présenté par Ludovic Blusseau, Head of Sales Engineering chez Dataiku, afin de découvrir quels sont les apports d'une démarche Data Science dans le Forecasting.

    Le Forecasting est utilisé depuis les années 50 dans l'anticipation des risques et la prise de décision. Mais, à l'ère de l'IA et des algorithmes, les techniques de modélisation plus anciennes ne parviennent pas à intégrer les quantités de sources de données nécessaires pour produire des résultats suffisamment précis pour l'entreprise moderne.

    Ce webinar propose un tour d’horizon du Forecasting adressé à travers Dataiku. L’exemple de forecasting de ventes vous sera exposé afin d'illustrer, de manière concrète, les étapes à suivre pour combiner l'expertise métier avec les techniques de Data Science. Vous serez alors en mesure de comprendre comment affiner vos prévisions, les automatiser et les démultiplier dans de nombreux cas d'usages applicable au Forecasting.
  • Top 3 Use Cases for Data Science in Marketing Jun 11 2020 6:00 am UTC 32 mins
    Ryan Morris, Account Executive, Dataiku
    What options does data science offer to support marketing projects? How can the success of campaigns and marketing activities be measured and continuously improved using data? In this webinar we show the top 3 use cases “Churn Prediction”, “Segmentation” and “Recommendation Engine” using real examples from companies that make data-driven decisions.
  • Data Science Disruption Across Industries ft. Hulu, Nordstrom, & Atomwise Jun 10 2020 6:00 pm UTC 75 mins
    Skander Hannachi (Nordstrom), Jared Thompson (Atomwise), & Moe Lotfy (Hulu)
    Tentative Schedule:
    2:00pm: Intro
    2:05pm: Data Science Disruption Across Industries ft. Skander Hannachi (Sr. Data Scientist / ML Solutions Architect @ Nordstrom), Jared Thompson (Sr. Machine Learning Engineer @ Atomwise), and Moe Lotfy (Sr. Product Data Scientist @ Hulu)
    2:45pm: Q&A

    Talk Abstract:

    The impact of data science has been felt across a range of industries including retail, pharmaceuticals, and media. As the retail sector strives to stay technologically significant while meeting customer demands, data science has emerged as a lifeguard that can be leveraged to predict customer likes and dislikes, visualize customer behavior, and implement knowledgeable decisions. In this fireside chat, we will cover how investments in data science by behalf of the retail industry has heightened the capabilities of retailers beyond just data collection and analysis. Adjacent to healthcare, the pharma industry has also emerged as an industry where data science is increasing its application. We will touch on applications of data science within pharmaceuticals ranging from identifying suitable candidates for trails based on their physiological chemical structure, medical history, and other important characteristics to optimizing evaluations of trial results. The media and entertainment industry have also ventured into a digitally driven space and the amount of customer data available has exponentially increased. The bulk of the data work today within media and entertainment is dedicated to audience understanding and answering questions such “What are people reading, listening to, and watching?” We will discuss how modern applications of data science in the media and entertainment industry establish new rules and demand extra creative thinking from media and entertainment holders.
  • Reducing AI Bias and Optimizing Data Labeling Frameworks w/ Appen Jun 4 2020 6:00 pm UTC 75 mins
    Monchu Chen, Principal Data Scientist @ Appen
    Tentative Schedule:
    2:00pm: Intro
    2:05pm: Reducing AI Bias and Optimizing Data Labeling Frameworks w/ Appen by Monchu Chen
    2:45pm: Q&A

    Talk Abstract:

    Bias in machine learning has become a significant concern as AI technology spreads to more application domains.  While some bias is a consequence of limits in design and tooling, bias in the training data itself is much more common. Skewed training data often promotes AI models that reveal discrimination and amplify human prejudices.

    In this talk, we present a framework, developed at Appen, to minimize bias. This framework operates by routing data labeling tasks to the right labelers to avoid introducing bias. It also optimizes the process by determining a proper distribution of labelers for a given task.

    Our speaker, Monchu Chen, will review some use cases where this framework has been applied, and discuss results that show how the optimization process minimizes skew in the training data. Chen will also discuss extending this approach to other use cases and review the implications of this work.

    Speaker bios:

    Monchu Chen has worked in human-computer interaction for more than two decades.  He has helped corporations and startups apply user insights to product innovation in multiple application domains. Monchu now focuses on the human aspects of AI as the Principal Data Scientist for Appen's ML team, building models and systems to improve annotation quality, efficiency, and reducing AI bias. Dr. Chen holds a PhD from Carnegie Mellon University. He previously held a tenured faculty position at Carnegie Mellon Portugal and is the author of more than 70 publications and patents.
  • Operationalization: The Next Frontier Jun 4 2020 9:00 am UTC 30 mins
    Vincent Rijnbeek, Territory Account Executive
    The wider use of Machine Learning Platforms contributes to the proliferation of analytical assets and models. The biggest challenge is therefore to deploy and operationalize these models at scale. What are the best practices and the pitfalls to avoid for a successful operationalization of data projects?
  • Strategy and Marketing Attribution Models with Squarespace Recorded: Jun 1 2020 64 mins
    Braden Purcell, Omar Abboud (Data Scientists @ Squarespace), Nate Lawson (Data & Operations Lead @ Squarespace)
    Tentative Schedule:

    2:00pm EST: Intro
    2:05pm EST: Strategy and Marketing Attribution Models with Squarespace by Braden Purcell, Omar Abboud, and Nate Lawson from Squarespace
    2:45pm EST: Q&A

    Squarespace’s Data Science team helps the company make better strategic decisions using data. The Data Science and Marketing teams collaborate closely to ensure that they quantify the impact of their marketing decisions and use data to optimize spend allocation. In the fireside chat, they will be asked to give a broad overview of marketing data science at Squarespace. They will also be asked to provide examples as to how they have improved the quality of their checkout survey that serves as a key input to their marketing attribution model. To wrap up, they will hang out for a Q&A session with members of both data science and marketing teams to answer questions about how they work together.
  • How Data Can Propel Business Growth & Drive Cost Reduction w/ Airbnb Recorded: May 29 2020 46 mins
    Cassie Cao (Data Science Manager @ Airbnb)
    Tentative Schedule:
    7:00pm EST: Intro
    7:05pm EST: How Data Can Propel Business Category Growth & Drive Cost Reduction with Airbnb by Cassie Cao
    7:45pm EST: Q&A

    Talk Abstract:

    At Airbnb, Cassie helped launch new business categories such as Airbnb for work and the Luxury segment. Currently, she is leading a Data Science team to drive Customer Service cost reduction. From new business category growth to cost reduction, data can be the powerful driver. Cassie will share her key learnings with you about different data science tactics to address these two problems on the opposite side. 

    Speaker bios:

    Cassie Cao is a Data Science Manager at Airbnb. She is deeply passionate about tackling challenging business problems using data. She has built the Airbnb Luxury Data Science team from scratch and recently transited into creating a new DS team focusing on cost reduction. Cassie has a master’s degree in Statistics from Harvard, and was a research associate at HBS.
  • L'IA au Service de la Supply Chain Recorded: May 28 2020 53 mins
    Fabrice Simon (Expert Data Science & Innovation chez Avisia), Nicolas Bétin (Consultant Data Science chez Avisia)
    Si on en croit Michel Serre qui a dit “L'intelligence, c'est l'imprévisible.” … alors il faut une sacrée dose d'intelligence pour travailler au sein d'un département Supply Chain en ce moment !

    Eviter la rupture de stock d'un côté pour les produits dont la demande explose face à la situation que nous vivons, comme par exemple les pâtes ou la farine … identifier les risques d'arrêt dans les fournisseurs qui nous approvisionnent … prévoir l'écoulement des stocks en stand-bye au moment de la reprise ...

    C'est pour ce genre de cas que l'IA peut se révéler utile, en réussissant à intégrer de multiples dimensions et améliorant les capacités de réactions face aux aléas.

    Ce n'est donc pas pour rien que "80% des décideurs exerçant dans la supply chain ont bien conscience de ce potentiel" selon une étude réalisée par INCISIV.

    Oui mais ... avec de l'IA interprétable !

    Si l'utilité de l'IA ne fait donc aucun doute … encore faut-il être bien équiper en termes d'outils ou de compétences. En effet, dans une période d'incertitude, se lancer dans un usage d'IA non maîtrisé ou externalisé peut rapidement faire basculer dans les travers de la "boîte noire".

    Le tout est donc de savoir répondre à la question suivante : Comment optimiser votre Supply Chain avec de l'IA interprétable ?

    Objet du Webinar
    Ce webinar va nous permettre de partager et comprendre les principales mécaniques de fonctionnement des projets d'IA afin de décortiquer leur fonctionnement.
    Nous échangerons avec vous sur un cas d’application au travers d'un retours d’expérience sur l'interprétabilité d'un forecast de ventes de nouveautés produits pour la Supply Chain.
    Enfin nous vous exposerons les enjeux, facteurs clés de succès et les résultats en lien avec ce retour d'expérience.

    En vous inscrivant à ce webinaire, vous acceptez que vos informations soient partagées avec le partenaire de Dataiku, Avisia.
  • Level Up! Elevating AI in the Enterprise - Exclusive with IDC Recorded: May 28 2020 65 mins
    Chris Marshall, AVP, IDC | Richard Jones, VP APAC, Dataiku | Ewen Plougastel, Managing Director, Accenture
    Exclusive with IDC

    These days, Artificial intelligence (AI) has the attention of business leaders across functions and industries around the world. But surprisingly few firms in Asia/Pacific have progressed to truly enterprise AI – embedding AI into every aspect of their business model. Why? What stops them?

    In this 60 minutes interactive webinar, we investigate different strategies enterprises adopt toward AI, when they work and when they don’t. Don’t miss it !

    Ewen Plougastel, Managing Director, Accenture will be talking to Dr. Chris Marshall, a Senior Analyst from IDC covering AI in Asia and Richard Jones, VP APAC of Dataiku.

    Chris will review regional patterns in deploying AI while Richard will discuss the challenges facing data scientists on the ground.
  • ML-based Fraud Detection and Prevention in Healthcare. Use Case Deep Dive + Q&A Recorded: May 26 2020 46 mins
    Emma Irwin, Solutions Architect, Dataiku
    Accurate fraud detection in healthcare has the potential to make medicine better, more affordable, and more accessible. Join us for a new episode of Caffeinated Data | Healthcare Edition (Episode 1/3). Participate in a data science project showcase followed by Q&A with one of our data scientists. In our first episode, we'll present how to build ML-based fraud detection and prevention models or how to optimize your existing ones.
  • Entreprise AI, from Cost to Revenue Center Recorded: May 20 2020 44 mins
    Alexis Fournier
    In the coming years, the ability of organizations to pivot their activities around Enterprise AI will fundamentally determine their fate. Those able to efficiently leverage ML techniques to improve business operations and processes will get ahead of the competition. Of course, the key word here is efficiently; it’s not enough for organizations to simply leverage ML techniques at any price. Eventually, in order for Enterprise AI strategy to be truly sustainable, one must consider the economics: not just gains, but cost.

    Common sense and economics tell us not to start from scratch every time you have a new idea, new use case to be tested/implemented. Reuse is the simple concept of avoiding too much rework in AI projects and the concept of capitalization in Enterprise AI takes reuse to another level.

    This session will explain how Dataiku enables every organization to benefit from AI by allowing people within the organization to scale, providing transparency and reproducibility throughout - and across - teams.
  • How to Get Started With NLP Recorded: May 19 2020 66 mins
    Katie Gross, Lead Data Scientist @ Dataiku
    Natural Language Processing (NLP), the branch of machine learning and AI which deals with bridging the gap between human language and computer understanding, is all the rage right now. Once a relatively niche topic, in the past few years landmark new models and applications have brought NLP to the center-stage of real-world enterprise data science and AI.

    This webinar will give data scientists a framework for getting started with NLP projects. It will go over:
    • What exactly NLP is and how it’s used
    • How to clean and pre-process text for machine learning projects
    • An overview of some of the main NLP algorithms and how they work

    Katie Gross is a Lead Data Scientist at Dataiku, where she helps clients across industries develop AI solutions using Dataiku DSS. Previously, she worked as a data scientist at a marketing science firm, Schireson and did freelance data science work for IBM and a dating app, Radiate. Prior to her data science life, Katie spent three years as a CPG consultant at Nielsen. Katie holds a BA in Economics from Colgate University.
  • Transition from Business Intelligence (BI) to Analytics Recorded: May 19 2020 50 mins
    Timm Grosser, Senior Analyst, BARC
    Data bring real added value when they lead to action and thus make an active contribution. For many companies, a look into the past is no longer sufficient with classic Business Intelligence (BI). Faster (re-)actions are desired, even predictions in order to be able to act effectively. Advanced Analytics provides tools to achieve decision automation.

    Advanced Analytics is not equal to BI and the approach is different too. This leads to the big question of how Advanced Analytics and BI should work together or be combined. On the one hand, this requires a strategic definition for BI, for Advanced Analytics and, beyond that, for data. Because data is the essential basis. If you want to be data-driven, you need a data strategy that goes hand in hand with a change in culture.

    In this webinar, Timm Grosser, Senior Analyst at BARC, will deep dive the differences and pick up on 3 essential aspects to transition from BI to Analytics:

    1) Data governance as the predominant instrument for implementing a data strategy
    2) Operationalization of analytics to bring added value
    3) Technology to support the integration, preparation, analysis and security of data as well as to promote cross-functional collaboration
Enabling Your Path to Enterprise AI
Dataiku is the centralized data platform that moves businesses along their data journey from analytics at scale to enterprise AI. By providing a common ground for data experts and explorers, a repository of best practices, shortcuts to machine learning and AI deployment/management, and a centralized, controlled environment, Dataiku is the catalyst for data-powered companies.

Customers like Unilever, GE, BNP Paribas, Santander use Dataiku to ensure they are moving quickly and growing exponentially along with the amount of data they’re collecting. By removing roadblocks, Dataiku ensures more opportunity for business-impacting models and creative solutions, allowing teams to work faster and smarter.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: How to Process Tons of Data for Cheap with Spark + Kubernetes
  • Live at: Apr 6 2020 6:00 pm
  • Presented by: Gus Cavanaugh, Sales Engineer @ Dataiku
  • From:
Your email has been sent.
or close