Running Hadoop and Spark on Docker: Challenges and Lessons Learned
Watch this on-demand webinar to learn how to run Hadoop and Spark on Docker in an enterprise deployment.
Today, most applications can be “Dockerized”. However, there are unique challenges when deploying a Big Data framework such as Spark or Hadoop on Docker containers in a large-scale production environment.
In this webinar, we discussed:
-Practical tips on how to deploy multi-node Hadoop and Spark workloads using Docker containers
-Techniques for multi-host networking, secure isolation, QoS controls, and high availability with containers
-Best practices to achieve optimal I/O performance for Hadoop and Spark using Docker
-How a container-based deployment can deliver greater agility, cost savings, and ROI for your Big Data initiative
Don’t miss watching this webinar on how to "Dockerize" your Big Data applications in a reliable, secure, and high-performance environment.
RecordedAug 18 201662 mins
Your place is confirmed, we'll send you email reminders
See an overview of the container-based BlueData software platform from Hewlett Packard Enterprise. Learn how you can spin up instant machine learning, deep learning, and analytics environments – while ensuring enterprise-grade security and performance. Provide your data science teams with on-demand access to the tools and data they need – whether on-premises, in the public cloud, or in a hybrid cloud architecture.
Mike Leone, Sr. Analyst, Enterprise Strategy Group; Victor Ghadban, Field CTO AI / ML, BlueData (HPE)
Join our webinar with an industry analyst from Enterprise Strategy Group (ESG), and learn how you can accelerate your AI initiative.
Enterprises in all industries are recognizing the game-changing business impact of Artificial Intelligence (AI) and Machine Learning (ML).
To be successful in your AI initiative, your data science teams need the ability to quickly build and deploy ML models in large-scale distributed environments. But this is easier said than done.
In this webinar, we'll share industry research from ESG and discuss how to:
- Address the skills gap in data science and AI / ML
- Utilize containers to accelerate your AI / ML deployment
- Run AI / ML workloads on either on-premises or in the cloud
- Achieve faster-time-to value for your AI initiative
Nanda Vijaydev, Sr. Director, Solutions, BlueData; John Spooner, Director of Solution Engineering, H2O.ai
Watch this webinar to learn how you can accelerate your deployment of H2O and AI / ML in Financial Services.
Keeping pace with new technologies for data science, machine learning, and deep learning can be overwhelming. And it can be challenging to deploy and manage these tools – including H2O and many others – for data science teams in large-scale distributed environments.
This webinar will discuss how to deploy H2O and other ML / DL tools in Financial Services. Learn about:
-Example use cases for AI / ML / DL in Financial Services
-Using H2O and other ML / DL tools with containers
-Overcoming deployment challenges for distributed environments
-How to ensure enterprise-grade security, high performance, and faster-time-to-value
Join this webinar to learn how you can accelerate innovation using AI / ML / DL in Healthcare and Life Sciences.
Healthcare professionals and researchers have access to immense volumes of data from a variety of sources. Early adopters of Machine Learning (ML) and Deep Learning (DL) are uncovering new insights from this data to improve patient care and transform the industry with AI-driven innovations.
But it can be challenging to deploy and manage these tools – including TensorFlow and many others – for data science teams in large-scale distributed environments.
In this webinar, we'll discuss:
- Example AI use cases – including precision medicine, drug discovery, and claims management
- Data access, data security, and other key requirements for implementing AI in Healthcare and Life Sciences
- How to overcome deployment challenges for distributed ML / DL environments using containers
- How to ensure enterprise-grade security, high performance, and faster-time-to-value for ML / DL
Tom Phelan, Chief Architect, BlueData; Nanda Vijaydev, Director, Solutions, BlueData
Join this webinar to learn how you can accelerate your deployment of TensorFlow and AI / ML in Financial Services.
Keeping pace with new technologies for data science, machine learning, and deep learning can be overwhelming. And it can be challenging to deploy and manage these tools – including TensorFlow and many others – for data science teams in large-scale distributed environments.
This webinar will discuss how to deploy TensorFlow and other ML / DL tools in the Banking, Insurance, and Capital Markets industries. Learn about:
-Example use cases for AI / ML / DL in Financial Services – with an enterprise case study
-Using TensorFlow and other ML / DL tools with GPUs and containers
-Overcoming deployment challenges for distributed environments – including operationalization
-How to ensure enterprise-grade security, high performance, and faster-time-to-value
Join this webinar to learn about deploying H2O in large-scale distributed environments using containers.
Artificial intelligence and machine learning are now a top priority for most enterprises. But it can be challenging to implement multi-node AI / ML environments for data science teams in large-scale enterprise deployments.
Together, BlueData and H2O.ai deliver a game-changing solution for AI / ML in the enterprise. In this webinar, discover how you can:
-Quickly spin up containerized H2O and Driverless AI environments whether on dev/test or production
-Ensure seamless support for H2O running on CPUs or GPUs, and provide a secure connection to your data lake
-Operationalize your distributed machine learning pipelines and deliver faster time-to-value for your AI initiative
Find out how to run AI / ML on containers while ensuring enterprise-grade security, performance, and scalability.
Lynn Calvo, AVP Emerging Data Technology, GM Financial; Nick Chang, Head of Customer Success, BlueData
Watch this on-demand webinar for a case study with GM Financial on deploying Machine Learning and Deep Learning applications using a flexible container-based architecture.
GM Financial, the wholly-owned captive finance subsidiary of General Motors, is a global enterprise in a highly regulated industry. Learn about their journey in implementing Machine Learning, Deep Learning, and Natural Language Processing – including how they’ve kept up with the blistering pace of change, while delivering immediate value and managing costs.
In this webinar, GM Financial will discuss some of their challenges, technology choices, and initial successes:
- Addressing a wide range of Machine Learning use cases, from credit risk analysis to improving customer experience
- Implementing multiple different tools (including TensorFlow™, Apache Spark™, Apache Kafka®, and Cloudera®) for different business needs
- Deploying a multi-tenant hybrid cloud environment with containers, automation, and GPU-enabled infrastructure
Don’t miss this webinar! Gain insights from an enterprise case study, and get perspective on Kubernetes® and other game-changing technology developments.
Tom Phelan, Chief Architect, BlueData; Yaser Najafi, Big Data Solutions Engineer, BlueData
Watch this on-demand webinar to learn about using Kubernetes with stateful applications for AI and Big Data workloads.
Kubernetes is now the de facto standard for container orchestration. And while it was originally designed for stateless applications and microservices, it's gaining ground in support for stateful applications as well.
But distributed stateful applications – including analytics, data science, machine learning, and deep learning workloads – are still complex and challenging to deploy with Kubernetes.
In this webinar, we'll discuss considerations for running stateful applications on Kubernetes:
-Unique requirements for multi-service stateful workloads including Hadoop, Spark, Kafka, and TensorFlow
-Persistent Volumes, Statefulsets, Operators, Helm, and other Kubernetes capabilities for stateful applications
-Technical gaps in Kubernetes deployment patterns and tooling, including security and networking
-Options and strategies to deploy distributed stateful applications in containerized environments
Learn about a new open source project focused on deploying and managing stateful applications with Kubernetes.
Radhika Rangarajan Director, Data Analytics and AI, Intel; Nanda Vijaydev Director, Solutions, BlueData
Watch this on-demand webinar to learn how you can accelerate your AI initiative and deliver faster time-to-value with machine learning.
AI has moved into the mainstream. Innovators in every industry are adopting machine learning for AI and digital transformation, with a wide range of different use cases. But these technologies are difficult to implement for large-scale distributed environments with enterprise requirements.
This webinar discusses:
-The game-changing business impact of AI and machine learning (ML) in the enterprise
-Example use cases: from fraud detection to medical diagnosis to autonomous driving
-The challenges of building and deploying distributed ML pipelines and how to overcome them
-A new turnkey solution to accelerate enterprise AI initiatives and large-scale ML deployments
Find out how to get up and running quickly with a multi-node sandbox environment for TensorFlow and other popular ML tools.
Tom Phelan, Chief Architect, BlueData; Nanda Vijaydev, Director - Solutions, BlueData
Watch this on-demand webinar to learn about deploying deep learning applications with GPUs in a containerized multi-tenant environment.
Keeping pace with new technologies for data science and machine learning can be overwhelming. There are a plethora of open source options, and it's a challenge to get these tools up and running easily and consistently in a large-scale distributed environment.
This webinar will discuss how to deploy TensorFlow and Spark clusters running on Docker containers, with a shared pool of GPU resources. Learn about:
*Quota management of GPU resources for greater efficiency
*Isolating GPUs to specific clusters to avoid resource conflict
*Attaching and detaching GPU resources from clusters
*Transient use of GPUs for the duration of the job
Find out how you can spin up (and tear down) GPU-enabled TensorFlow and Spark clusters on-demand, with just a few mouse clicks.
Nick Chang, Head of Customer Success, BlueData; Yaser Najafi, Big Data Solutions Engineer, BlueData
Watch this on-demand webinar to learn about use cases for Big-Data-as-a-Service (BDaaS) – to jumpstart your journey with Hadoop, Spark, and other Big Data tools.
Enterprises in all industries are embracing digital transformation and data-driven insights for competitive advantage. But embarking on this Big Data journey is a complex undertaking and deployments tend to happen in fits and spurts. BDaaS can help simplify Big Data deployments and ensure faster time-to-value.
In this webinar, you'll hear about a range of different BDaaS deployment use cases:
-Sandbox: Provide data science teams with a sandbox for experimentation and prototyping, including on-demand clusters and easy access to existing data.
-Staging: Accelerate Hadoop / Spark deployments, de-risk upgrades to new versions, and quickly set up testing and staging environments prior to rollout.
-Multi-cluster: Run multiple clusters on shared infrastructure. Set quotas and resource guarantees, with logical separation and secure multi-tenancy.
-Multi-cloud: Leverage the portability of Docker containers to deploy workloads on-premises, in the public cloud, or in hybrid and multi-cloud architectures.
Watch this on-demand webinar to learn how separating compute from storage for Big Data delivers greater efficiency and cost savings.
Historically, Big Data deployments dictated the co-location of compute and storage on the same physical server. Data locality (i.e. moving computation to the data) was one of the fundamental architectural concepts of Hadoop.
But this assumption has changed – due to the evolution of modern infrastructure, new Big Data processing frameworks, and cloud computing. By decoupling compute from storage, you can improve agility and reduce costs for your Big Data deployment.
In this webinar, we discussed how:
- Changes introduced in Hadoop 3.0 demonstrate that the traditional Hadoop deployment model is changing
- New projects by the open source community and Hadoop distribution vendors give further evidence to this trend
- By separating analytical processing from data storage, you can eliminate the cost and risks of data duplication
- Scaling compute and storage independently can lead to higher utilization and cost efficiency for Big Data workloads
Learn how the traditional Big Data architecture is changing, and what this means for your organization.
Darren Darnell, Jim Foppe, and Mike Steimel (Panera Bread); Nanda Vijaydev (BlueData)
Watch this on-demand webinar to learn how Panera Bread uses Big Data analytics to drive their business, with #1 ranked customer loyalty.
Panera Bread – with over 2,000 locations and 25 million customers in its loyalty program – relies on analytics to fine-tune its menu, operations, marketing, and more. Find out how they solve key business challenges using Hadoop and next generation Big Data technologies, including real-time data to analyze consumer behavior.
In this webinar, Panera Bread discussed how they:
-Use a data-driven approach to improve customer acquisition, customer retention, and operational efficiency
-Spin up instant clusters for rapid prototyping and exploratory analytics, with real-time streaming platforms like Kafka
-Operationalize their data science and data pipelines in a hybrid deployment model, both on-premises and in the cloud
Don’t miss watching this case study webinar. Discover your own recipe for success with Big Data analytics and data science!
Watch this on-demand webinar and learn how a leading healthcare company is yielding big dividends from Big Data.
Advisory Board, a healthcare firm serving 90% of U.S. hospitals, has multiple different business units and data science teams within their organization. In this webinar, they'll share how they use technologies like Hadoop and Spark to address the diverse use cases for these different teams – with a highly flexible and elastic platform leveraging Docker containers.
In this webinar, Advisory Board discussed how they:
-Migrated their analytics from spreadsheets and RDBMS to a modern architecture using tools such as Hadoop, Spark, H2O, Jupyter, RStudio, and Zeppelin.
-Provide the ability to spin up instant clusters for greater agility, with shared and secure access to a treasure trove of data in their HDFS data lake.
-Shortened time-to-insights from days to minutes, slashed infrastructure costs by more than 80 percent, and freed up staff to innovate and build new capabilities.
Don’t miss watching this case study webinar. Find out how you can improve agility, flexibility, and ROI for your Big Data journey.
Watch this on-demand webinar to learn the key considerations and options for container orchestration with Big Data workloads.
Container orchestration tools such as Kubernetes, Marathon, and Swarm were designed for a microservice architecture with a single, stateless service running in each container. But this design is not well suited for Big Data clusters constructed from a collection of interdependent, stateful services. So what are your options?
In this webinar, we discussed:
- Requirements for deploying Hadoop and Spark clusters using Docker containers
- Container orchestration options and considerations for Big Data environments
- Key issues such as management, security, networking, and petabyte-scale storage
- Best practices for a scalable, secure, and multi-tenant Big Data architecture
Don’t miss watching this webinar on container orchestration for Hadoop, Spark, and other Big Data workloads.
Watch this video to find out how Nasdaq improves agility and reduces costs for their Big Data infrastructure, while ensuring performance and security. To learn more about the BlueData software platform, visit www.bluedata.com
The BlueData EPIC software platform makes deployment of Big Data infrastructure and applications easier, faster, and more cost-effective – whether on-premises or on the public cloud.
With BlueData EPIC on AWS, you can quickly and easily deploy your preferred Big Data applications, distributions and tools; leverage enterprise-class security and cost controls for multi-tenant deployments on the Amazon cloud; and tap into both Amazon S3 and on-premises storage for your Big Data analytics.
Sign up for a free two-week trial at www.bluedata.com/aws
Faster Time-to-Value for AI / ML and Big Data Analytics
BlueData is transforming how enterprises deploy AI / Machine Learning (ML) and Big Data analytics. BlueData’s container-based software platform makes it easier, faster, and more cost-effective for enterprises to innovate with AI / ML and Big Data technologies -- either on-premises, in the public cloud, or in a hybrid architecture. With BlueData, our customers can spin up containerized environments within minutes, providing their data scientists with on-demand access to the applications, data, and infrastructure they need. Founded in 2012 by VMware veterans and headquartered in Santa Clara, California, BlueData was recently acquired by Hewlett Packard Enterprise.
Running Hadoop and Spark on Docker: Challenges and Lessons LearnedTom Phelan, Chief Architect, BlueData; Anant Chintamaneni, VP of Products, BlueData[[ webcastStartDate * 1000 | amDateFormat: 'MMM D YYYY h:mm a' ]]62 mins