Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership.
Brian Dirking, Senior Director of Partner Marketing at Databricks, and Nauman Fakhar, System Architect at Databricks
Learn the basics of Spark on Azure Databricks, including RDDs, Datasets, DataFrames and other fundamentals of Spark.
Learn the basics of Spark on Azure Databricks, including:
· RDDs, DataFrames, Datasets, and other fundamentals of Spark.
· How to quickly setup Azure Databricks, and how Azure Databricks is fully-managed platform, relieving you of DataOps duties.
· How to use Notebooks, which provide a collaborative space for your entire analytics team, and how you can schedule Notebooks, immediately putting your work into production.
Building multiple ETL pipelines is very complex and time consuming, making it a very expensive endeavor. As the number of data sources and the volume of the data increases, the ETL time also increases, negatively impacting when an enterprise can derive value from the data.
Join Prakash Chockalingam, Product Manager and data engineering expert at Databricks, to learn how to avoid the common pitfalls of data engineering and how the Databricks Unified Analytics Platform can ensure performance and reliability at scale to lower total cost of ownership (TCO).
In this webinar, you will learn how Databricks can help to:
- Remove infrastructure configuration complexity to reduce DevOps efforts
- Optimize your ETL data pipelines for performance without compromising reliability
- Unify data engineering and data science to accelerate innovation for the business.
Data scientists and data engineers need a secure and scalable platform to collaborate on analytics. Register for this webinar and see how Azure Databricks provides a platform that enables teams to accelerate innovation, providing:
- A collaborative workspace to experiment with models and datasets, and then put jobs into action instantly.
- An automated infrastructure that enables you to autoscale compute and storage independently.
The live demo portion of the webinar will show how Azure Databricks can bring in streaming data, run it in a machine learning model, and then output the results to PowerBI for visualization.
The upcoming Spark 2.3 release marks a big step forward in speed, unification, and API support.
Reynold Xin and Jules Damji from Databricks will walk through how you can benefit from the upcoming improvements:
- New DataSource APIs that enable developers to more easily read and write data for Continuous Processing in Structured Streaming.
- PySpark support for vectorization, giving Python developers the ability to run native Python code fast.
- Improved performance by taking advantage of NVMe SSDs.
- Native Kubernetes support, marrying the best of container orchestration and distributed data processing.
Enterprise data science teams are driving big innovations in machine learning, but this has put them under increased pressure to deliver more models, more frequently, and more rapidly.
In this webinar, Forrester VP & Principal Analyst, Mike Gualtieri, will share data on the top trends in machine learning and lay out what data science teams need to do in order to maximize their output.
Chris Robison, Head of Data Science at Overstock.com and Craig Kelly, Group Product Manager at Overstock.com, will showcase how they utilized big data and machine learning to
-Create a one-to-one personalized shopping experience.
-Decrease cost of moving models to production by nearly 50%.
-Stand up new models 5x faster than before.
iPass is the world’s largest wifi network serving over 160 network providers with nearly 60+ million hotspots in airports, hotels, airplanes, and public spaces in 120 countries across the globe.
Analyzing the state of the world’s wifi in real time is a daunting task fraught with unpredictable challenges that can impact performance, reliability, and security. Join this webinar to learn why iPass moved from an on-premises Hadoop system to Databricks in the cloud and how they are able to deliver ground-breaking results with a small and nimble team.
With Databricks, iPass can now focus on scalable business logic and not building infrastructure. This new found freedom has allowed their team to:
-monitor the performance of millions of wifi hotspots around world.
-leverage machine learning and real-time analytics to understand the health of access points.
-make recommendations to customers on the best access point to use to ensure optimal performance.
Continuous integration and continuous delivery (CI/CD) enables an organization to rapidly iterate on software changes while maintaining stability, performance, and security. Many organizations have adopted various tools to follow the best practices around CI/CD to improve developer productivity, code quality, and software delivery. However, following the best practices of CI/CD is still challenging for many big data teams.
This webinar will highlight:
*Key challenges in building a data pipeline for CI/CD.
*Key integration points in a data pipeline's CI/CD cycle.
*How Databricks facilitates iterative development, continuous integration and build.
Current data management architectures are a complex combination of siloed, single-purpose tools. There are data lakes for low cost storage, but are difficult to use for data discovery, data warehouses that are reliable and optimized for fast queries, but come at a cost when having to scale, and various streaming and batch systems to shuffle data between them, often times resulting in data integrity issues.
Businesses have to create a patchwork of different tools, skillsets, and expertise just to solve one fundamental problem: How can I make data-driven decisions faster?
Join this webinar to learn how Databricks Delta — a new unified data management system — takes advantage of the the scale of a data lake, the reliability and performance of a data warehouse, and the low-latency updates of a streaming system, all in a unified and fully managed fashion.
This webinar will cover:
-How the need to process batch and streaming data creates challenges for enterprises with complex data architectures.
-How Databricks Delta takes the best of data warehouses, data lakes and streaming systems to provide a highly scalable, performant, and reliable data management system.
-A live demonstration of Databricks Delta to showcase how easy it is to cost-efficiently scale without impacting query performance.
Machine learning has unlocked new possibilities that deliver significant business value. However most companies don’t have the resources to either build and maintain the supporting infrastructure or apply data science to build a smarter solution.
Join us for this webinar and hear from John Huang, engineering and data analytics lead at Handshake, as he shares how he quickly and cost effectively scaled a small engineering team to build an machine-learning powered recommendation engine that profiles users and behaviors to present relevant next steps. In this webinar you will learn how to:
-Simplify and accelerate data engineering processes including data ingest and ETL
-Incorporate machine learning into your production application without an army of data scientists
-Choose an analytics engine that will enable key analytics such as attribution, step analysis, and linear regression
-Embed visualizations into your application that drive stickiness
A key obstacle for doing data engineering at scale is having a robust distributed infrastructure on which frameworks like Apache Spark can run efficiently. On top of building the infrastructure, having proper automatic functioning of the infrastructure is another critical piece for running production workloads.
Join this webinar to learn how Databricks’ Unified Analytics Platform can help simplify your data engineering problems by configuring your distributed infrastructure to be in autopilot mode. Learn how:
-Databricks’ automated infrastructure will allow you to autoscale compute and storage independently.
-To significantly reduce cloud costs through cutting edge cluster management features.
-To control certain features in the cluster management and balance between ease of use and manual control.
CardinalCommerce was acquired by Visa earlier this year for its critical role in payments authentication. Through predictive analytics and machine learning, Cardinal measures performance and behavior of the entire authentication process across checkout, issuing and ecosystem partners to recommend actions, reduce fraud and drive frictionless digital commerce.
With Databricks, CardinalCommerce simplified data engineering to improve the performance of their ETL pipeline by 200% while reducing operational costs significantly via automation, seamless integration with key technologies, and improved process efficiencies.
Join this webinar to learn how CardinalCommerce was able to:
-Simplify access to data across the organization
-Accelerate data processing by 200%
-Reduce EC2 costs through faster performance and automated infrastructure
-Visualize performance metrics to customers and stakeholders