Hi [[ session.user.profile.firstName ]]

Recipes for Success How to Build Continuous Ingest Pipelines

Modern data infrastructures are fed by vast volumes of data, streamed from an ever-changing variety of sources. Standard practice has been to store the data as ingested and force data cleaning onto each consuming application. This approach saddles data scientists and analysts with substantial work, creates delays getting to insights and makes real-time or near-time analysis practically impossible.
Recorded Aug 27 2019 62 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Arvind Prabhakar - Co-Founder and CTO - StreamSets
Presentation preview: Recipes for Success How to Build Continuous Ingest Pipelines

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • DataOps - Buzz Word or Buzzworthy? Mar 24 2020 2:00 pm UTC 44 mins
    Kirk Borne, Principal Data Scientist and Executive, Booz Allen Hamilton
    With its streamlined approach to developing and delivering software applications, DevOps has dazzled the IT world. Now, DataOps is poised to remake data analytics. It relies on similar cross-functional principles, delivering agile data-based projects through a mix of automation, orchestration, and closer collaboration between once-separate teams.

    It’s about making sure there’s tight communication, collaboration, and integration between the data team and the operations team. But it’s also important not to have too much communication. It’s not necessary for every small change to have to go through a change review board. When you get hung up on that analysis paralysis, you never get anything done.
  • Data Ingestion to Visualization Journey - "The DoD Story" Mar 17 2020 2:00 pm UTC 24 mins
    Sandeep Dorawala, Vice President & Graham Evans, Principal Director (Booz Allen Hamilton)
    In 2018 DoD completed its first-ever audit of the Department of Defense which covered approximately $2.8 trillion in total assets accounting for over 70 percent of the U.S. Government’s ownership. The FY 2018 DoD consolidated audit is arguably one of the largest and most complex financial statement audits ever undertaken in history. This challenge is exacerbated by the hundreds of business and financial systems needed to build the universe of transactions under audit. DoD’s financial management community has created and sustained modern, effective, and cost conscience financial management processes, data, systems, policies, and work-force. This is achieved through a combination of big data, data fusion, and analytics. Booz Allen Hamilton supports DoD and other federal customers in leveraging modern data platform and enterprise analytics techniques that drive these critical capabilities.

    In one engagement with a large DoD client, Booz Allen implemented ~1,200 StreamSets pipelines to move data from more than 200 enterprise systems to a cloud-based data lake for analysis and reporting. They understand the notions around data operations and the need for a flexible framework to align with the DoD goals.

    From ingestion to visualization, they are asked to create a variety of information technology solutions to help the DoD manage their data and resources with the core belief being that operating more efficiently will inevitably lead to saving taxpayer money and preventing warfighter casualties. In this session Booz Allen team will walk through some of the learning and experiences as they rolled out the DataOps approach to solving the OUSD needs.
  • DataOps: Go Fast and Be Confident Mar 10 2020 2:00 pm UTC 22 mins
    Girish Pancha, CEO (StreamSets)
    The next wave of Digital Transformation is upon us. Enterprises are reliant on software more than ever, and tomorrow's applications are both singularly focused and intelligent.

    Business expectations are higher than ever before, and teams are expected to be both agile and guarantee that the software is always reliable.

    DataOps is the foundation upon which all software will be built in the future, teasing an inherent order and discipline out of the chaos that's otherwise caused by agile, autonomic decision making.
  • DataOps Panel: How thought leaders are embarking on the DataOps journey Mar 3 2020 3:00 pm UTC 27 mins
    Mark Ramsey (GSK), Anne-Britton Arnett and Phani Konduru (Humana), John Felahi (Solera) & Sandeep Dorawala (Booz Allen Hamilt
    Join a thought provoking panel of distinguished thought leaders who are working on driving transformational projects through the adoption of DataOps. Learn why they are doing it? How they have got buy in to start these initiatives and why legacy solutions cannot deliver the results needed in this dynamic and ever-changing IT landscape.

    Whether you are CXO level, an Enterprise Architect or Team leader or a practitioner of Data Engineering or Data Science, there are great insights to be acquired from this panel.
  • Digital Transformation in Healthcare Feb 25 2020 3:00 pm UTC 25 mins
    Anne-Britton Arnett and Phani Konduru (Humana)
    The US Healthcare system is going through a major transformation. Now, more than ever, with rising costs and patient care requirements, technology has to be the change agent. At Humana, there is a key driver to become much more than a Claims Payer. The mission is to be an integrated care delivery provider, with a focus on well-being and improving the health of the communities in which Humana serves.

    At the heart of this transformation is a real-time patient-centric architecture and system that will integrate the company and its diverse functions and recent acquisitions. Learn from the thought leaders from the Analytical, Operational and Architectural side on how they are collaboratively working to develop the next generation system and processes.
  • Testing and Monitoring Machine Learning Pipelines at Slack Feb 18 2020 3:00 pm UTC 21 mins
    Josh Wills, Director of Data Engineering, Slack
    Testing and Monitoring Machine Learning Pipelines at Slack At Slack, we build a variety of offline models for search ranking, anomaly detection, channel recommendations, and conversion prediction.

    As we have encountered different kinds of failure modes in the model development process, we have implemented a variety of tests and alerts to detect common failure modes that we have experienced.

    In this talk, I will walk through our framework for testing and monitoring model building pipelines and highlight some of the more unusual kinds of errors that we handle via our automated detection and remediation system.
  • Databricks and StreamSets: Manage Big Data Pipelines in the Cloud Feb 11 2020 3:00 pm UTC 59 mins
    Hiral Jasani, Senior Partner Marketing Manager (Databricks) Rupal Shah, Director of Cloud Services (StreamSets)
    Whether you are cloud-native or migrating to the cloud, enterprises are looking for speed and agility. Databricks and StreamSets have partnered to bring rapid data pipeline design and testing to critical cloud workloads. Together, they bring the power of Apache Spark™ to a broad audience with a logical and visual, UI-based pipeline development tool. This allows more users to leverage Apache Spark™ and Delta Lake with confidence, reliability and unmatched performance in the cloud.

    In this webinar, we will discuss: Using a drag-and-drop interface for pipeline development to continuously ingest and stream data into Delta Lake on Databricks, How Delta Lake helps make cloud data more reliable with features like ACID-compliant transactions, schema enforcement and scalable metadata handling, How to migrate on prem Data Lake workloads (e.g. Hadoop) to cloud services and easily manage compute resources using Databricks’ optimized auto-scaling for compute resources.
  • Creating a Modern DataOps Environment to Drive Analytics Feb 4 2020 3:00 pm UTC 30 mins
    Mark Ramsey, [PHD] (Former Chief Data & Analytics Officer, GSK)
    The application of AI and machine learning to tackle tasks such as drug discovery, customer insight or clinical trial optimization are popular media topics. An area of much less coverage is the application of these technologies in the creation of a modern DataOps environment.

    This session will highlight how a Top 10 pharmaceutical company implemented a large scale, production class, big data & analytics platform in less than a year leveraging bots, machine learning and real-time pipelines. Learn how the technology mix was applied to the data sources, ingestion and rationalization processes to accelerate the delivery of analytics ready data to the business.
  • Delivering AI at Enterprise Scale Recorded: Jan 21 2020 25 mins
    Dan Jeavons, General Manager of Data Science (Shell)
    Shell has been an early adopter of artificial intelligence (AI), as it attempts to speed up its digital transformation. From machine learning to computer vision, deep learning to virtual assistants and autonomous vehicles to robotics, Shell has been focused on a range of technologies that have supported advances in AI.

    Dan Jeavons joins us to detail Shell’s journey to data analytics excellence, focusing on the deployment of self-service analytics both for the data science teams who look to deploy mission-critical models for real-time use and for the longer-term strategy of filling the “data science skills gap.” Dan Jeavons is the General Manager of Shell’s Data Science CoE within the company’s central Digital Technology organisation. He is currently part of a leadership team tasked with leading Shell’s digital transformation.
  • Accelerating Digital Transformation through DataOps Recorded: Jan 14 2020 33 mins
    Arvind Prabhakar, CTO StreamSets
    Enterprise leaders who are working on establishing a DataOps practice are faced with challenges at various levels. They must enable a clear line of sight from business to data, while embracing rapidly evolving requirements across all systems. At a project level, the challenge is the plethora of technologies to choose from. And at the implementation level, it is the learning curve for complex analytics stack and how it impacts the delivery timelines and maintenance costs. Addressing these challenges is a prerequisite for Digital Transformation. In this keynote, StreamSets’ CTO Arvind Prabhakar will share how StreamSets is enabling Enterprises to accelerate their Digital Transformation using the industry’s first DataOps platform.
  • DataOps Panel: How Thought Leaders are Embarking on the DataOps Journey Recorded: Jan 7 2020 35 mins
    Alex Gorelik, Sandeep Dorawalla, Joe Hellerstein, John Schmidt, Amr Awadallah
    Join an insightful panel of IT leaders who are working on creating and promoting the world of DataOps. Learn how data wrangling, Spark creators, cloud service providers, and data catalogue leaders are all a part of the broader new ecosystem and how the collective innovation is providing solutions to yesterday’s complex and difficult problems.

    In this panel, moderated by Shekhar Iyer, President of Streamsets, you will learn lessons from the early adopters, in the best practices of combining the various solutions to help you achieve DataOps success and drive business impact and transformation.
  • Machine Learning with Tensorflow and Apache Kafka Recorded: Sep 19 2019 25 mins
    Clarke Patterson - Head of Product Marketing - StreamSets
    According to the 2018 Apache Kafka Report, 94% of organizations plan to deploy new applications or systems using Kafka this year. At the same time, 77% of those same organizations say that staffing Kafka projects has been somewhat or extremely challenging.

    In this multi-part webinar series, StreamSets will take learnings from our customers and share practical tips for making headway with Kafka. Each session will discuss common challenges and provide step-by-step details for how to avoid them. By the end of the series you'll have many more tools at your disposal for ensuring your Kafka project is a success.

    Kafka and Tensorflow can be used together to build comprehensive machine learning solutions on streaming data. Unfortunately, both can become black boxes and it can be difficult to understand what's happening as pipelines are running. In this talk we'll explore how StreamSets can be used to build robust machine learning pipelines with Kafka.

    In this session you'll learn:
    -How to easily build pipelines with Tensorflow and Kafka
    -Visualizing data in Tensorflow pipelines
    -Creating reusable code fragments for standardizing pipeline best practices
  • Monitoring and Protecting Data in Apache Kafka Recorded: Sep 17 2019 45 mins
    Clarke Patterson - Head of Product Marketing - StreamSets
    According to the 2018 Apache Kafka Report, 94% of organizations plan to deploy new applications or systems using Kafka this year. At the same time, 77% of those same organizations say that staffing Kafka projects has been somewhat or extremely challenging.

    In this multi-part webinar series, StreamSets will take learnings from our customers and share practical tips for making headway with Kafka. Each session will discuss common challenges and provide step-by-step details for how to avoid them. By the end of the series you'll have many more tools at your disposal for ensuring your Kafka project is a success.

    Kafka and Tensorflow can be used together to build comprehensive machine learning solutions on streaming data. Unfortunately, both can become black boxes and it can be difficult to understand what's happening as pipelines are running. In this talk we'll explore how StreamSets can be used to build robust machine learning pipelines with Kafka.

    In this session you'll learn:
    -How to easily build pipelines with Tensorflow and Kafka
    -Visualizing data in Tensorflow pipelines
    -Creating reusable code fragments for standardizing pipeline best practices
  • Stream into Kafka Series: Dead Easy Kafka Pipeline Development Recorded: Sep 12 2019 53 mins
    Clarke Patterson, Head of Product Marketing @ StreamSets & Pat Patterson, Technical Director @ StreamSets
    According to the 2018 Apache Kafka Report, 94% of organizations plan to deploy new applications or systems using Kafka this year. At the same time, 77% of those same organizations say that staffing Kafka projects has been somewhat or extremely challenging.

    In this multi-part webinar series, StreamSets will take learnings from our customers and share practical tips for making headway with Kafka. Each session will discuss common challenges and provide step-by-step details for how to avoid them. By the end of the series you'll have many more tools at your disposal for ensuring your Kafka project is a success.

    Getting started with Kafka can be harder than it needs to be. Building a cluster is one thing, but ingesting data into that cluster can require a lot of experience and often a lot of rework. During this session we'll demystify the process of creating pipelines for Apache Kafka and show how you can create Kafka pipelines in minutes, not hours or days.

    In this session you'll learn:
    -Designing any-to-any Kafka pipelines in minutes
    -Snapshotting and monitoring data in Kafka
    -Editing pipelines quickly and easily without major disruption
  • 5 Ways to Scale Kafka with StreamSets Recorded: Sep 10 2019 30 mins
    Clarke Patterson, Head of Product Marketing @ StreamSets & Bryan Duxbury, Chief Technologist @ StreamSets
    According to the 2018 Apache Kafka Report, 94% of organizations plan to deploy new applications or systems using Kafka this year. At the same time, 77% of those same organizations say that staffing Kafka projects has been somewhat or extremely challenging.

    In this multi-part webinar series, StreamSets will take learnings from our customers and share practical tips for making headway with Kafka. Each session will discuss common challenges and provide step-by-step details for how to avoid them. By the end of the series you'll have many more tools at your disposal for ensuring your Kafka project is a success.

    When it comes to scaling out Apache Kafka, there's often a trade off between complexity, performance and cost. In this session, we'll look at five different ways to scale up to handle massive message throughput with Kafka and StreamSets

    In this session you'll learn:
    -Scaling pipelines vertically and horizontally
    -Getting scale by streaming in a cluster
    -Leveraging Kubernetes to elastic scaling
  • How Cox Automotive Democratized Data with a Self-Service Data Exchange Recorded: Aug 29 2019 62 mins
    Nathan Swetye - Sr. Manager of Platform Engineering - Cox Automotive
    Cox Automotive comprises more than 25 companies dealing with different aspects of the car ownership lifecycle, with data as the common language they all share. The challenge for Cox Automotive was to create an efficient engine for the timely and trustworthy ingest of data capability for an unknown but large number of data assets from practically any source. Working with StreamSets, they are populating a data lake to democratize data, allowing analysts easy access to data from other companies and producing new data assets unique to the industry.

    In this webinar, Nathan Swetye from Cox Automotive will discuss how they:

    -Took on the challenge of ingesting data at enterprise scale and the initial efficiency and data consistency struggles they faced.
    -Created a self-service data exchange for their companies based on an architecture that decoupled data acquisition from ingestion.
    -Reduced data availability from weeks to hours and developer time by 90%.
  • Recipes for Success How to Build Continuous Ingest Pipelines Recorded: Aug 27 2019 62 mins
    Arvind Prabhakar - Co-Founder and CTO - StreamSets
    Modern data infrastructures are fed by vast volumes of data, streamed from an ever-changing variety of sources. Standard practice has been to store the data as ingested and force data cleaning onto each consuming application. This approach saddles data scientists and analysts with substantial work, creates delays getting to insights and makes real-time or near-time analysis practically impossible.
  • Ultralight Data Movement for IoT with SDC Edge Recorded: Aug 22 2019 34 mins
    Guglielmo Iozzia - Big Data Lead - Optum | Pat Patterson - Technical Director - StreamSets
    Edge computing and the Internet of Things bring great promise, but often just getting data from the edge requires moving mountains.

    During this webinar, we will discuss:

    -How to make edge data ingestion and analytics easier using StreamSets Data Collector edge, an ultralight, platform independent and small-footprint Open Source solution for streaming data from resource-constrained sensors and personal devices (like medical equipment or smartphones) to Apache Kafka, Amazon Kinesis and many others.

    -We'll provide an overview of the SDC Edge main features, supported protocols and available processors for data transformation, insights on how it solves some challenges of traditional approaches to data ingestion, pipeline design basics, a walk-through some practical applications (Android devices and Raspberry Pi) and its integration with other technologies such as StreamSets Data Collector, Apache Kafka, and more.
  • Modern Streaming Data Stack with Kinetica & StreamSets Recorded: Aug 20 2019 60 mins
    Matt Hawkins - Principal Solutions Architect - Kinetica | Mark Brooks - Solutions Architect - StreamSets
    Enterprises are now faced with wrangling massive volumes of complex, streaming data from a variety of different sources, a new paradigm known as extreme data. However, the traditional data integration model that’s based on structured batch data and stable data movement patterns makes it difficult to analyze extreme data in real-time.

    Join Matt Hawkins, Principal Solutions Architect at Kinetica and Mark Brooks, Solution Engineer at StreamSets as they share how innovative organizations are modernizing their data stacks with StreamSets and Kinetica to enable faster data movement and analysis.

    During this webinar, we will discuss:

    -The modern data architecture required for dealing with extreme data
    -How StreamSets enables continuous data movement and transformation across the enterprise
    -How Kinetica harnesses the power of GPUs to accelerate analytics on streaming data
    -A live demo of StreamSets and Kinetica connector to enable high speed data ingestion, queries and data visualization
  • Modernize Cybersecurity Threat Detection Recorded: Aug 15 2019 55 mins
    Nathan Necaise - VP of Data Sciences Emerging Services - Optiv
    The convergence of streaming data platforms with cyber security solutions presents real opportunity for combating and predicting future threats. Join StreamSets and Optiv as we discuss common use cases and architectural patterns used by leading Fortune 500 organizations to modernize their cyber architecture.

    During this webinar, we will discuss:

    -Common challenges facing today’s SIEM’s and how to effectively augment them with streaming data platforms
    -Show customer examples and demonstrate how they are leading to transformative effects
    -How to optimize security architectures that use technologies like Splunk using StreamSets
Continuous Dataflows that Unleash Pervasive Intelligence
The StreamSets DataOps platform enables companies to build, execute, operate and protect batch and streaming dataflows. It is powered by StreamSets Data Collector, award-winning open source software with approximately 2,000,000 downloads to date from thousands of companies. The commercial StreamSets Control Hub is the platform's cloud-native control plane through which enterprises design, monitor and manage complex data movement that is executed by multiple Data Collectors. Unique Intelligent Pipeline technology automatically inspects the data in motion, detecting unexpected changes, errors and sensitive data in-stream.

Global 2000 customers use StreamSets for data lake ingestion, Apache Kafka enablement, cybersecurity, IoT, customer 360, GDPR compliance and more. In 2017, the company tripled its customer count and quadrupled revenues.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Recipes for Success How to Build Continuous Ingest Pipelines
  • Live at: Aug 27 2019 5:00 pm
  • Presented by: Arvind Prabhakar - Co-Founder and CTO - StreamSets
  • From:
Your email has been sent.
or close