Find the Balance Between MPP Databases and Spark for Analytical Processing

Logo
Presented by

Paige Roberts, Open Source Relations Manager, Vertica, and David Menninger, SVP & Research Director, Ventana Research

About this talk

Both Apache Spark and massively parallel processing (MPP) databases are designed for the demands of analytical workloads. Each has strengths related to the full data science workflow, from consolidating data from many siloes, to deploying and managing machine learning models. Understanding the power of each technology, and the cost and performance trade-offs between them can help you optimize your analytics architecture to get the best of both. Learn when using Spark accelerates data processing, and when it spreads far beyond what you want to maintain. Learn when an MPP database can provide blazing fast analytics, and when it can fail to meet your needs. Most of all, learn how these two powerful technologies can combine to create a perfect balance of power, cost, and performance.

Related topics:

More from this channel

Upcoming talks (1)
On-demand talks (161)
Subscribers (36901)
The Vertica Unified Analytics Platform is built to handle the most demanding analytic use cases and is trusted by thousands of leading data-driven enterprises around the world, including Etsy, Bank of America, Uber, and more. Based on a massively scalable architecture with a broad set of analytical functions spanning event and time series, pattern matching, geospatial, and built-in machine learning capability, Vertica enables data analytics teams to easily apply these powerful functions to large and demanding analytical workloads. Vertica unites the major public clouds and on-premises data centers, as needed, and integrates data in cloud object storage and HDFS without forcing any data movement. Available as a SaaS option, or as a customer-managed system, Vertica helps teams combine growing data siloes for a more complete view of available data. Vertica features separation of compute and storage, so teams can spin up storage and compute resources as needed, then spin down afterwards to reduce costs.