Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events

Logo
Presented by

Joshua Robinson - Founding Engineer, FlashBlade

About this talk

Learn how Pure Storage engineering manages streaming 190B log events per day and makes use of that deluge of data in our continuous integration (CI) pipeline. Our test infrastructure runs over 70,000 tests per day creating a large triage problem that would require at least 20 triage engineers. Instead, Spark’s flexible computing platform allows us to write a single application for both streaming and batch jobs to understand the state of our CI pipeline for our team of 3 triage engineers. Using encoded patterns, Spark indexes log data for real-time reporting (Streaming), uses Machine Learning for performance modeling and prediction (Batch job), and finds previous matches for newly encoded patterns (Batch job). Resource allocation in this mixed environment can be challenging; a containerized Spark cluster deployment, and disaggregated compute and storage layers allow us to programmatically shift compute resources between the streaming and batch applications. This talk will go over design decisions to meet SLAs of streaming and batching in hardware, data layout, access patterns, and containers strategy. We will also go over the challenges, lessons learned, and best practices for this kind of setup.
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (300)
Subscribers (11248)
Pure is redefining the storage experience and empowering innovators by simplifying how people consume and interact with data.