This talk shows how to build a scalable data science platform, using only free, commercially-friendly open source software. The end-to-end architecture covers interactive queries & visualization, machine learning & data mining, deploying models to production, and a full 24x7 operations toolset.
Requirements include what an enterprise typically requires: Strong security (authentication, authorization, audit, encryption, multi-tenancy), active monitoring for both systems & data, backup & restore, user management (with LDAP integration), distributed deployment on commodity hardware, auto scaling, and self-healing when containers or services go down. Technologies covered include Spark, Hadoop, ElasticSearch, Kibana, Jupyter notebooks, TensorFlow, OpenScoring, Docker Swarm, and supporting tools.
This talk is intended for practicing architects and technology leaders, who need to understand how to best leverage the open source ecosystem in this space and what it takes to integrate the available cutting-edge technologies into a cohesive, enterprise-grade and production-grade architecture.
David Talby is Atigeo’s senior vice president of engineering, leading the R&D, product management, and operations teams. David has extensive experience in building and operating web-scale analytics and business platforms, as well as building world-class, agile, distributed teams.