Solving the "Middle Data" problem

Presented by

Natalino Busa, Head of Applied Data Science at Teradata

About this talk

We are very well aware that companies like Facebook, Twitter, Whatsapp deal with datasets in the range of 100's of Petabytes and more. However not all datasets are that big. Did you know that all english pages of Wikipedia amount to just 49 GB uncompressed text data? Likewise, there are a large amount of datasets ranging from customers data to events and transactions which do not exceed the low Terabyte range. In this webinar we will discuss how to process data in this range both for interactive queries as well for batch processing. We will look at what tradeoffs can be made by tuning the architecture with SSD and RAM. And which distributed computing paradigm work best for this datasets and their typical workloads. We will revision the concepts of data locality, data replication and parallel computing for this specific class of datasets.

Related topics:

More from this channel

Upcoming talks (6)
On-demand talks (600)
Subscribers (89340)
Data is the foundation of any organization and therefore, it is paramount that it is managed and maintained as a valuable resource. Subscribe to this channel to learn best practices and emerging trends in a variety of topics including data governance, analysis, quality management, warehousing, business intelligence, ERP, CRM, big data and more.