Tame Your Duplicate Data and Lower Operating Cost and Risk

Presented by

Madhup Mishra, Glen Martin and John Stegeman, Hitachi Vantara

About this talk

Is duplicate data weighing you down? For many organizations, duplicate data makes up a significant portion of their data lake operating cost. It arises organically – analysts and scientists improve a data set they’re working on, save it, continue, save it again. Their peers do the same. The next thing you know, you have 10, 25, 100 copies of a data set, with minimal true differences. Updating one without changing the other copies leads to bogus results, leaves sensitive data unsecured and even adds unnecessary cost pressures. In the past, cost to maintain duplicate data were covered by IT. But with moves to public cloud, funded with your op-ex budget, these duplicates become materially cost prohibitive. Duplicate data is also a risk for data breaches and other regulatory violations. It does no good to defend the “production” version, if the other copies are less protected. In this session we’ll discuss Lumada Data Catalog, and how its Data Rationalization feature can help you discover and tame your duplicate data.

Related topics:

More from this channel

Upcoming talks (3)
On-demand talks (120)
Subscribers (14339)
Looking for the latest information on Big Data, the Internet of Things (IoT), Data Integration, and Predictive Analytics? Then join our channel to hear from industry thought leaders, Pentaho customers, and Hitachi Vantara experts as they discuss everything from how to turn data into valuable insights with embeddable analytics to how to accelerate value with Hadoop, NoSQL, and other data sources.