Running Hadoop and Spark on Docker: Challenges and Lessons Learned
Watch this on-demand webinar to learn how to run Hadoop and Spark on Docker in an enterprise deployment.
Today, most applications can be “Dockerized”. However, there are unique challenges when deploying a Big Data framework such as Spark or Hadoop on Docker containers in a large-scale production environment.
In this webinar, we discussed:
-Practical tips on how to deploy multi-node Hadoop and Spark workloads using Docker containers
-Techniques for multi-host networking, secure isolation, QoS controls, and high availability with containers
-Best practices to achieve optimal I/O performance for Hadoop and Spark using Docker
-How a container-based deployment can deliver greater agility, cost savings, and ROI for your Big Data initiative
Don’t miss watching this webinar on how to "Dockerize" your Big Data applications in a reliable, secure, and high-performance environment.
RecordedAug 18 201662 mins
Your place is confirmed, we'll send you email reminders
Lynn Calvo, AVP Emerging Data Technology, GM Financial; Nick Chang, Head of Customer Success, BlueData
Join this webinar for a case study in deploying Machine Learning applications using a flexible container-based architecture.
GM Financial, the wholly-owned captive finance subsidiary of General Motors, is a global enterprise in a highly regulated industry. Learn about their journey in implementing Machine Learning, Deep Learning, and Natural Language Processing – including how they’ve kept up with the blistering pace of change, while delivering immediate value and managing costs.
In this webinar, GM Financial will discuss some of their challenges, technology choices, and initial successes:
- Addressing a wide range of Machine Learning use cases, from credit risk analysis to improving customer experience
- Implementing multiple different tools (including TensorFlow™, Apache Spark™, Apache Kafka®, and Cloudera®) for different business needs
- Deploying a multi-tenant hybrid cloud environment with containers, automation, and GPU-enabled infrastructure
Don’t miss this webinar! Gain insights from an enterprise case study, and get perspective on Kubernetes® and other game-changing technology developments.
Tom Phelan, Chief Architect, BlueData; Yaser Najafi, Big Data Solutions Engineer, BlueData
Join this webinar to learn about using Kubernetes with stateful applications for AI and Big Data workloads.
Kubernetes is now the de facto standard for container orchestration. And while it was originally designed for stateless applications and microservices, it's gaining ground in support for stateful applications as well.
But distributed stateful applications – including analytics, data science, machine learning, and deep learning workloads – are still complex and challenging to deploy with Kubernetes.
In this webinar, we'll discuss considerations for running stateful applications on Kubernetes:
-Unique requirements for multi-service stateful workloads including Hadoop, Spark, Kafka, and TensorFlow
-Persistent Volumes, Statefulsets, Operators, Helm, and other Kubernetes capabilities for stateful applications
-Technical gaps in Kubernetes deployment patterns and tooling, including security and networking
-Options and strategies to deploy distributed stateful applications in containerized environments
Learn about a new open source project focused on deploying and managing stateful applications with Kubernetes.
Radhika Rangarajan Director, Data Analytics and AI, Intel; Nanda Vijaydev Director, Solutions, BlueData
Watch this on-demand webinar to learn how you can accelerate your AI initiative and deliver faster time-to-value with machine learning.
AI has moved into the mainstream. Innovators in every industry are adopting machine learning for AI and digital transformation, with a wide range of different use cases. But these technologies are difficult to implement for large-scale distributed environments with enterprise requirements.
This webinar discusses:
-The game-changing business impact of AI and machine learning (ML) in the enterprise
-Example use cases: from fraud detection to medical diagnosis to autonomous driving
-The challenges of building and deploying distributed ML pipelines and how to overcome them
-A new turnkey solution to accelerate enterprise AI initiatives and large-scale ML deployments
Find out how to get up and running quickly with a multi-node sandbox environment for TensorFlow and other popular ML tools.
Tom Phelan, Chief Architect, BlueData; Nanda Vijaydev, Director - Solutions, BlueData
Watch this on-demand webinar to learn about deploying deep learning applications with GPUs in a containerized multi-tenant environment.
Keeping pace with new technologies for data science and machine learning can be overwhelming. There are a plethora of open source options, and it's a challenge to get these tools up and running easily and consistently in a large-scale distributed environment.
This webinar will discuss how to deploy TensorFlow and Spark clusters running on Docker containers, with a shared pool of GPU resources. Learn about:
*Quota management of GPU resources for greater efficiency
*Isolating GPUs to specific clusters to avoid resource conflict
*Attaching and detaching GPU resources from clusters
*Transient use of GPUs for the duration of the job
Find out how you can spin up (and tear down) GPU-enabled TensorFlow and Spark clusters on-demand, with just a few mouse clicks.
Nick Chang, Head of Customer Success, BlueData; Yaser Najafi, Big Data Solutions Engineer, BlueData
Watch this on-demand webinar to learn about use cases for Big-Data-as-a-Service (BDaaS) – to jumpstart your journey with Hadoop, Spark, and other Big Data tools.
Enterprises in all industries are embracing digital transformation and data-driven insights for competitive advantage. But embarking on this Big Data journey is a complex undertaking and deployments tend to happen in fits and spurts. BDaaS can help simplify Big Data deployments and ensure faster time-to-value.
In this webinar, you'll hear about a range of different BDaaS deployment use cases:
-Sandbox: Provide data science teams with a sandbox for experimentation and prototyping, including on-demand clusters and easy access to existing data.
-Staging: Accelerate Hadoop / Spark deployments, de-risk upgrades to new versions, and quickly set up testing and staging environments prior to rollout.
-Multi-cluster: Run multiple clusters on shared infrastructure. Set quotas and resource guarantees, with logical separation and secure multi-tenancy.
-Multi-cloud: Leverage the portability of Docker containers to deploy workloads on-premises, in the public cloud, or in hybrid and multi-cloud architectures.
Watch this on-demand webinar to learn how separating compute from storage for Big Data delivers greater efficiency and cost savings.
Historically, Big Data deployments dictated the co-location of compute and storage on the same physical server. Data locality (i.e. moving computation to the data) was one of the fundamental architectural concepts of Hadoop.
But this assumption has changed – due to the evolution of modern infrastructure, new Big Data processing frameworks, and cloud computing. By decoupling compute from storage, you can improve agility and reduce costs for your Big Data deployment.
In this webinar, we discussed how:
- Changes introduced in Hadoop 3.0 demonstrate that the traditional Hadoop deployment model is changing
- New projects by the open source community and Hadoop distribution vendors give further evidence to this trend
- By separating analytical processing from data storage, you can eliminate the cost and risks of data duplication
- Scaling compute and storage independently can lead to higher utilization and cost efficiency for Big Data workloads
Learn how the traditional Big Data architecture is changing, and what this means for your organization.
Darren Darnell, Jim Foppe, and Mike Steimel (Panera Bread); Nanda Vijaydev (BlueData)
Watch this on-demand webinar to learn how Panera Bread uses Big Data analytics to drive their business, with #1 ranked customer loyalty.
Panera Bread – with over 2,000 locations and 25 million customers in its loyalty program – relies on analytics to fine-tune its menu, operations, marketing, and more. Find out how they solve key business challenges using Hadoop and next generation Big Data technologies, including real-time data to analyze consumer behavior.
In this webinar, Panera Bread discussed how they:
-Use a data-driven approach to improve customer acquisition, customer retention, and operational efficiency
-Spin up instant clusters for rapid prototyping and exploratory analytics, with real-time streaming platforms like Kafka
-Operationalize their data science and data pipelines in a hybrid deployment model, both on-premises and in the cloud
Don’t miss watching this case study webinar. Discover your own recipe for success with Big Data analytics and data science!
Watch this on-demand webinar and learn how a leading healthcare company is yielding big dividends from Big Data.
Advisory Board, a healthcare firm serving 90% of U.S. hospitals, has multiple different business units and data science teams within their organization. In this webinar, they'll share how they use technologies like Hadoop and Spark to address the diverse use cases for these different teams – with a highly flexible and elastic platform leveraging Docker containers.
In this webinar, Advisory Board discussed how they:
-Migrated their analytics from spreadsheets and RDBMS to a modern architecture using tools such as Hadoop, Spark, H2O, Jupyter, RStudio, and Zeppelin.
-Provide the ability to spin up instant clusters for greater agility, with shared and secure access to a treasure trove of data in their HDFS data lake.
-Shortened time-to-insights from days to minutes, slashed infrastructure costs by more than 80 percent, and freed up staff to innovate and build new capabilities.
Don’t miss watching this case study webinar. Find out how you can improve agility, flexibility, and ROI for your Big Data journey.
Watch this on-demand webinar to learn the key considerations and options for container orchestration with Big Data workloads.
Container orchestration tools such as Kubernetes, Marathon, and Swarm were designed for a microservice architecture with a single, stateless service running in each container. But this design is not well suited for Big Data clusters constructed from a collection of interdependent, stateful services. So what are your options?
In this webinar, we discussed:
- Requirements for deploying Hadoop and Spark clusters using Docker containers
- Container orchestration options and considerations for Big Data environments
- Key issues such as management, security, networking, and petabyte-scale storage
- Best practices for a scalable, secure, and multi-tenant Big Data architecture
Don’t miss watching this webinar on container orchestration for Hadoop, Spark, and other Big Data workloads.
Watch this video to find out how Nasdaq improves agility and reduces costs for their Big Data infrastructure, while ensuring performance and security. To learn more about the BlueData software platform, visit www.bluedata.com
The BlueData EPIC software platform makes deployment of Big Data infrastructure and applications easier, faster, and more cost-effective – whether on-premises or on the public cloud.
With BlueData EPIC on AWS, you can quickly and easily deploy your preferred Big Data applications, distributions and tools; leverage enterprise-class security and cost controls for multi-tenant deployments on the Amazon cloud; and tap into both Amazon S3 and on-premises storage for your Big Data analytics.
Sign up for a free two-week trial at www.bluedata.com/aws
The BlueData software platform is a game-changer for Big Data analytics. Watch this video to see how BlueData makes it easier, faster, and more cost-effective to deploy Big Data infrastructure and applications on-premises.
With BlueData, you can spin up Hadoop or Spark clusters in minutes rather than months – at a fraction of the cost and with far fewer resources. Leveraging Docker containers and optimized to run on Intel architecture, BlueData’s software delivers agility and high performance for your Big Data analytics.
Matt Maccaux, Global Big Data Lead, Dell EMC; Anant Chintamaneni, Vice President, Products, BlueData
Watch this on-demand webinar to learn how to deploy a scalable and elastic architecture for Big Data analytics.
Hadoop and related technologies for Big Data analytics can deliver tremendous business value, and at a lower cost than traditional data management approaches. But early adopters have encountered challenges and learned lessons over the past few years.
In this webinar, we discussed:
-The five worst practices in early Hadoop deployments and how to avoid them
-Best practices for the right architecture to meet the needs of the business
-The case study and Big Data journey for a large global financial services organization
-How to ensure highly scalable and elastic Big Data infrastructure
Discover the most common mistakes for Hadoop deployments – and learn how to deliver an elastic Big Data solution.
Watch this on-demand webinar to learn how to deploy Hadoop, Spark, and other Big Data tools in a hybrid cloud architecture.
More and more organizations are using AWS and other public clouds for Big Data analytics and data science. But most enterprises have a mix of Big Data workloads and use cases: some on-premises, some in the public cloud, or a combination of the two. How do you support the needs of your data science and analyst teams to meet this new reality?
In this webinar, we discussed how to:
-Spin up instant Spark, Hadoop, Kafka, and Cassandra clusters – with Jupyter, RStudio, or Zeppelin notebooks
-Create environments once and run them on any infrastructure, using Docker containers
-Manage workloads in the cloud or on-prem from a common self-service user interface and admin console
-Ensure enterprise-grade authentication, security, access controls, and multi-tenancy
Don’t miss watching this webinar on how to provide on-demand, elastic, and secure environments for Big Data analytics – in a hybrid architecture.
Nanda Vijaydev, Director of Solutions Management, BlueData and Anant Chintamaneni Vice President, Products, BlueData
Watch this on-demand webinar to learn how to bring DevOps agility to data science and big data analytics.
It’s no longer just about building a prototype, or provisioning Hadoop and Spark clusters. How do you operationalize the data science lifecycle? How can you address the needs of all your data science users, with various skillsets? How do you ensure security, sharing, flexibility, and repeatability?
In this webinar, we discussed best practices to:
-Increase productivity and accelerate time-to-value for data science operations and engineering teams.
-Quickly deploy environments with data science tools (e.g. Spark, Kafka, Zeppelin, JupyterHub, H2O, RStudio).
-Create environments once and run them everywhere – on-premises or on AWS – with Docker containers.
-Provide enterprise-grade security, monitoring, and auditing for your data pipelines.
Don’t miss watching this webinar. Learn about data science operations – including key roles, tools, and tips for success.
So you want to use Cloudera, Hortonworks, and MapR on AWS. Or maybe Spark with Jupyter or Zeppelin; plus Kafka and Cassandra. Now you can, all from one easy-to-use interface. Best of all, it doesn't require DevOps or AWS expertise.
In this webinar, we discussed:
-Onboarding multiple teams onto AWS, with security and cost controls in a multi-tenant architecture
-Accelerating the creation of data pipelines, with instant clusters for Spark, Hadoop, Kafka, and Cassandra
-Providing data scientists with choice and flexibility for their preferred Big Data frameworks, distributions, and tools
-Running analytics using data in Amazon S3 and on-premises storage, with pre-built integration and connectors
Don’t miss watching this webinar on how to quickly and easily deploy Spark, Hadoop, and more on AWS – without DevOps or AWS-specific skills.
Nanda Vijaydev, Director of Solutions Management, BlueData; and Anant Chintamaneni, VP of Products, BlueData
Implementing data science and machine learning at scale is challenging for developers, data engineers, and data analysts. Methods used on a single laptop need to be redesigned for a distributed pipeline with multiple users and multi-node clusters. So how do you make it work?
In this on-demand webinar, hear a real-world use case and learn about:
- Requirements and tools such as R, Python, Spark, H2O, and others
- Infrastructure complexity, gaps in skill sets, and other challenges
- Tips for getting data engineers, SQL developers, and data scientists to collaborate
- How to provide a user-friendly, scalable, and elastic platform for distributed data science
Learn how to get started with a large-scale distributed platform for data science and machine learning.
Krishna Mayuram, Lead Architect for Big Data, Cisco; Anant Chintamaneni, VP of Products, BlueData
Watch this on-demand webinar with Cisco and BlueData to learn how to deliver greater agility and flexibility for Big Data analytics with Big-Data-as-a-Service.
Your data scientists and developers want the latest Big Data tools for iterative prototyping and dev/test environments. Your IT teams need to keep up with the constant evolution of new tools including Hadoop, Spark, Kafka, and other frameworks.
The DevOps approach is helping to bridge this gap between other developers and IT teams. Can DevOps agility and automation be applied to Big Data?
In this webinar, we discussed:
-A way to extend the benefits of DevOps to Big Data, using Docker containers to provide Big-Data-as-a-Service.
-How data scientists and developers can spin up instant self-service clusters for Hadoop, Spark, and other Big Data tools.
-The need for next-generation, composable infrastructure to deliver Big-Data-as-a-Service in an on-premises deployment.
-How BlueData and Cisco UCS can help accelerate time-to-deployment and bring DevOps agility to your Big Data initiative.
BlueData is transforming how enterprises deploy Big Data analytics and machine learning. BlueData’s Big-Data-as-a-Service software platform leverages Docker container technology to make it easier, faster, and more cost-effective for enterprises to innovate with Big Data and AI technologies -- either on-premises, in the public cloud, or in a hybrid architecture. With BlueData, our customers can spin up containerized environments within minutes, providing their data scientists with on-demand access to the applications, data, and infrastructure they need. Founded in 2012 by VMware veterans and headquartered in Santa Clara, California, BlueData is backed by investors including Amplify Partners, Atlantic Bridge, Dell Technologies Capital, Ignition Partners, and Intel Capital.
Running Hadoop and Spark on Docker: Challenges and Lessons LearnedTom Phelan, Chief Architect, BlueData; Anant Chintamaneni, VP of Products, BlueData[[ webcastStartDate * 1000 | amDateFormat: 'MMM D YYYY h:mm a' ]]62 mins