Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
As data analytics becomes more embedded within organizations, as an enterprise business practice, the methods and principles of agile processes must also be employed.
Agile includes DataOps, which refers to the tight coupling of data science model-building and model deployment. Agile can also refer to the rapid integration of new data sets into your big data environment for "zero-day" discovery, insights, and actionable intelligence.
The Data Lake is an advantageous approach to implementing an agile data environment, primarily because of its focus on "schema-on-read", thereby skipping the laborious, time-consuming, and fragile process of database modeling, refactoring, and re-indexing every time a new data set is ingested.
Another huge advantage of the data lake approach is the ability to annotate data sets and data granules with intelligent, searchable, reusable, flexible, user-generated, semantic, and contextual metatags. This tag layer makes your data "smart" -- and that makes your agile big data environment smart also!