How to Process Tons of Data for Cheap with Spark + Kubernetes
This webinar has something for everyone, whether technical or not. On the business side, you’ll get an overview of how to better manage your infrastructure spend while providing the compute your analysts and data scientists need as well as a practical demonstration of how autoscaling your data processing infrastructure provides the horsepower you need without breaking the bank.
On the technical side, if you like Spark for processing big data and Kubernetes for scaling and managing containers but you haven’t run Spark on Kubernetes yet, this is the webinar for you. In this one hour session, you’ll learn:
- Why Kubernetes is a great scheduler for Spark jobs
- How to quickly spin up a managed Kubernetes cluster on AWS and run you first Spark job from your environment
- How Dataiku lets data scientists spin up Kubernetes clusters and run Spark jobs with just a few mouse clicks
RecordedSep 15 202164 mins
Your place is confirmed, we'll send you email reminders
Many organizations believe that they need to have all their data ducks lined up before they attempt AI analytics. They believe they need to have conquered traditional or BI analytics first, including data catalogs, data lineage, master data management, big data, etc. before planning for AI. While this conventional thinking has merits, it results in high opportunity costs and carries risks. Join Jerry Hartanto, AI Strategist at Dataiku, to debunk this common assumption and uncover how organizations can establish capabilities for traditional analytics and experiment with AI analytics, leveraging analytics capabilities frameworks and tools that excel for both traditional and AI analytics.
At any given time, organizations are attempting to transform their business (think business process, digital, management, organizational, and cultural transformations) with the common end goals of operational change, business model innovation, and domain expansion. Now is the time to use AI-enabled solutions to drive business transformation, but how is that done in practice? Join Jerry Hartanto, AI Strategist at Dataiku, for an overview of how AI mitigates business transformation risks, accelerates the time to value, and drives tangible outcomes (before it’s too late!).
Doug Bryan, AI Strategist, Dataiku / Aaron McClendon, Head of Data Science, Aimpoint Digital
Efficient supply chain management is essential for organizations to provide the right products and services to their customers in the right place and at the right time. In this webinar, Dataiku and Aimpoint Digital will share how teams are effectively developing, deploying, and automating scalable Demand Forecasting models, helping to significantly improve their supply chain analytics initiatives and harmonize the demand-driven supply chain vision.
Triveni Gandhi, Data Scientist & Paul-Marie Carfantan, AI Governance Manager
Responsible AI is a topic of growing interest for data practitioners for both research and the industry, especially in the context of ML Fairness. While implementing standardized fairness techniques into existing pipelines can be a challenge, Dataiku offers strategic resources for data scientists and analysts to seamlessly incorporate ML Fairness techniques into their workflows.
Join Triveni Gandhi, Senior Industry Data Scientist, and Paul-Marie Carfantan, AI Governance Manager, for this webinar to learn about practical applications of ML Fairness and how they support broader Governance, Responsible AI, and MLOps concepts in the organization.
1.) Rohit Bhattacharjee, Data Science Team Lead at Maaloomatiia, 2.) Layla Sabbouh, Data Scientist at Maloomatiia
Logistics and supply chain executives who manage complex worldwide operations are under pressure to meet production demand, reduce costs and maintain high standards of customer satisfaction, all in the face of increasing tensions in global trade and commerce as well as supplier and vendor constraints.
One of the best ways forward is to develop AI-enabled solutions and applications to make the supply chain more agile and resilient. Some of the most promising use cases with the highest ROI revolve around AI for Inventory Management, AI for Mitigating Supply Chain Risks, and AI for Production Schedule Optimization. All three algorithms improve visibility, predictability, and flexibility while making recommendations and adjustments in real-time.
Some of the substantial benefits that AI on Dataiku offers to the supply chain sector are listed below:
· 15-20% reduction in inventory holding costs
· 5-7% increase in OTIF performance
· 3-5% increase in product availability
In this webinar, we shall discuss the high-level capabilities of each of these applications while taking you through an end-to-end demo of the same on Dataiku. All in all, this is a very attractive value proposition, and we look forward to having you join us.
-Rohit Bhattacharjee, Data Science Team Lead @ Maaloomatiia
-Layla Sabbouh, Data Scientist @ Maloomatiia
Please be aware that by registering for this webinar, you agree to have your personal information shared with Dataiku's partner Maaloomatiia. They may contact you with information that could be of interest to you.
Minosh Salam, Director, Business and Strategy @ DataQraft, Umut Şatir Gürbüz, Senior Sales Engineer @ Dataiku
This webinar will talk about the Must-Know Trends that will define where Enterprise AI is headed next.
Democratized Data Quality, AI Governance, Self-Service Analytics, MLOps, Responsible AI, Edge Computing are some of the popular concepts that we will discuss in detail. In the second part of the webinar we will showcase the platform's capabilities with a live demo.
- Minosh Salam, Director, Business and Strategy at DataQraft
- Umut Şatir Gürbüz, Senior Sales Engineer at Dataiku
Please be aware that by registering for this webinar, you agree to have your personal information shared with Dataiku's partner DataQraft. They may contact you with information that could be of interest to you.
Dr. Emma Beauxis-Aussalet, Sarah-Jane van Els & Triveni Gandhi
As we saw in episode 1 of this series, the bias inherent in historical data is often not correctable by simply collecting more or more representative data. If nobody from a certain group has ever applied for this kind of loan or that type of job, there may simply be no data to collect. If we accept defeat on this, there is a real risk AI models will refuse to make predictions on these groups with missing data, reinforcing the problem that got us here in the first place. One solution with promise is synthetic data, generated by combining the data of real cases to produce anonymised cases with properties that match the underlying population, “filling in the gaps” in historical data. In this session, we discuss a concrete use case developed by the ICAI lab in collaboration with Randstad and explore the promise and limits of this approach.
Dr. Emma Beauxis-Aussalet is an assistant professor of ethical computing at the Vrije Universiteit Amsterdam (VU). She is also lab manager of the Civic AI Lab. In 2019 Emma obtained her doctorate at Utrecht University with a dissertation on AI bias, for her work at the Centrum Wiskunde & Informatica (CWI). With her multidisciplinary experience, she has been researching computational methods, statistics, user interfaces and data visualizations that enable transparent and controllable AI systems. Modelling and visualizing AI errors is one of her main research topics. For her achievements in this field, she was named one of the 100 Brilliant Women in AI Ethics in 2021. She also received the 3rd WomENcourage Prize for her contributions to the development of AI literacy and bias awareness in lectures and workshops.
Sarah-Jane is a recent MSc Information Sciences graduate with a BSc in Business Administration from the Vrije Universiteit Amsterdam. She conducted her master thesis at Randstad Groep Nederland, researching synthetic data to identify bias in recommender systems for recruitment.
Many firms have a large document corpus made up of both digitized and raw images. Now more than ever, financial institutions are turning towards unstructured data sources to capture additional attributes in order to, ultimately, adjust or confirm their analyses and discover new trends and insights. Many organizations rely on individuals to read sections of these documents or search for relevant materials in an ad hoc manner, with no systematic way of categorizing and understanding the information and trends.
Join us for this Dataiku session on interactive document intelligence, where we will showcase a modular and reusable pipeline to rapidly and automatically digitize documents, extract text, and consolidate data into a unified and searchable database. We will focus on NLP techniques applied to prepare, categorize, and analyze textual data based on themes of interest (in this project: ESG), with additional theme modules available. Lastly, we will demo a purpose-built dashboard to provide business users with a simple and interactive tool to analyze high-level trends and drill down into aggregated insights.
Dominick Rocco, General Manager of Machine Learning, phData / Doug Bryan, AI Strategist, Dataiku
HR is an often overlooked but rich source of valuable AI use cases such as writing better job postings, identifying key attributes of successful new hires, and attrition management. There is potential for huge benefits when it comes to AI in HR to support collaborative teams and employee retention in addition to keeping job listings competitive.
Join Dataiku and phData as we walk you through a use case of a human resources team at a major medical device manufacturer that needed a more robust data analytics solution as they looked for ways to accurately predict manager performance. phData built a fully functioning model that delivers measurable business value, complete with visualizations and executive dashboards.
Difficultés d’approvisionnement, rupture de stocks, allongement des délais de livraison, insatisfaction des clients... Plus que jamais, La crise COVID a mis en lumière la nécessité pour les entreprises de superviser et d’optimiser leurs chaînes d’approvisionnement et de distribution, pour prévenir les perturbations éventuelles et élaborer les plans d’actions opérationnels.
Pour répondre à des enjeux de plus en plus complexes, Eulidia est convaincu que l'intelligence artificielle se positionne comme un moteur essentiel de la transformation de la Supply Chain et comme un levier d’innovation et de compétitivité pour les acteurs du marché qui décident d’investir !
Avec le concours de Dataiku, ce webinar vous éclairera sur les étapes clés de ce programme de modernisation
-Modern AI : Retours d’expérience
-Flexibilité et performance apportée par une plate-forme Analytics telle que Dataiku / Snowflake
-Les clés de la modernisation de la Supply Chain
-Supply Chain Analytics : quels cas d’usages et quels bénéfices ?
Michael Ernest, Director of Solution Architecture at Dataiku
In this demo we'll walk through a DSS project that features a variety of integrations with the AWS platform as well as several key services. The demonstration will highlight key performance benefits built into DSS when operating in the AWS environment, and integrations with AWS services, including Redshift, EMR, and EKS.
Jerry Hartanto, AI Strategist & Evangelist @ Dataiku with Guest Asha Dinesh, Market Impact Consultant @ Forrester &
When it comes to building a modern AI platform, organizations shouldn’t spend time, energy and resources cobbling together tools across the AI lifecycle, which ultimately results in losing the larger picture of the full data pipeline (not to mention adds technical debt).
This webinar unpacks the results of The Total Economic Impact™ Of Dataiku study (conducted by Forrester Consulting and commissioned by Dataiku) that quantifies and solidifies some of the benefits that Dataiku customers experience in leveraging one, central platform to systemize the use of data for Everyday AI, including:
- 423% ROI over three years (with a payback period of < 6 months)
- 75% time savings for data engineers and data scientists
- 90% reduction in manual, repeated reporting tasks
Do you want to know what your customers are talking about your airline? What services do they like? And what puts them off? We developed a workflow in Dataiku DSS that uses NLP to determine the sentiment behind the tweets and a webapp that allows the end user to view the results. We will walk you through how we used the tweets, did some text cleaning, built models to classify the sentiment, and if time permits - also web scraping to extract airline online reviews.
Dan Darnell, Head of Product Marketing Dataiku and James Kobielus, Senior Research Director, Data Management
Organizations everywhere are automating the development, deployment, monitoring, and governance of mission-critical machine learning (ML) and other artificial intelligence (AI) applications.
Operational data science is a collaborative process that increasingly goes under the name of MLOps. Organizations are bringing the latest MLOps into their data science workflows to augment the productivity of data engineers, statistical modelers, and other highly skilled personnel. Mature enterprise MLOps processes leverage cloud-native infrastructure to scale the deployment, monitoring, and management of statistical models and code builds into production applications.
Join Dan Darnell from Dataiku and TDWI’s senior research director James Kobielus for this webinar to learn how enterprises can succeed in using mature MLOps practices across their entire data science pipelines to speed deployment of their most sophisticated AI applications.
Key topics that he will discuss include:
- Business opportunities that are driving demand for MLOps
- Key investments in data ingestion, cleansing, preparation, and modeling technologies that are essential for organizations to succeed with MLOps
- Challenges that organizations face when implementing MLOps within their established data science processes
- Principal operational metrics that organizations must monitor and track to ensure the success of their MLOps initiatives while mitigating associated operational, legal, and regulatory risks
Maria Prosviryakova - Senior ML Engineer at Skyscanner
Looking for places to travel next in these uncertain times? Interested in finding great deals in safe destinations? Skyscanner's personalised recommendations can save your precious decision time! In this talk, Maria Prosviryakova, Senior Machine Learning Engineer at Skyscanner, will share the journey from a simple yet impactful collaborative filtering model to deep learning-powered destinations recommendations. Maria will touch upon the architecture of the real-time recommender system that relies on ML pipelines and MLflow and is orchestrated using Apache Airflow. She will also discuss challenges faced on the road to production, and how the personalised recommendations increased Skyscanner's engagement metrics.
Maria Prosviryakova is currently working as a Senior ML Engineer at Skyscanner. Maria holds an MS degree in Statistics and has 7+ years of experience working as a data scientist across different industries and locations: finance in New York, e-commerce in Buenos Aires and the travel industry in Barcelona. She now specialises in recommender systems and ML in production.
Dataiku is the world’s leading platform for Everyday AI, systemizing the use of data for exceptional business results. Organizations that use Dataiku elevate their people (whether technical and working in code or on the business side and low- or no-code) to extraordinary, arming them with the ability to make better day-to-day decisions with data.
More than 450 companies worldwide use Dataiku to systemize their use of data and AI, driving diverse use cases from fraud detection to customer churn prevention, predictive maintenance to supply chain optimization, and everything in between.