Tune in to hear from open data lake platform leaders and engineers discuss everything from continuous date engineering on data lakes for machine learning, streaming analytics, ad-hoc analytics and data exploration in the cloud. 

The interactive talks are designed for both data engineers, data analysts and data scientists that want to learn about some of the challenges and solutions for use cases seen in data-driven organizations. 

Learn more about Qubole: http://bit.ly/AboutQubole

In the final session of Data Platforms Online 2017, Ashish Thusoo will offer some of his highlights from the week’s sessions, pick out some emerging themes and trends, and answer questions from the audience. Ashish built the original data team at Facebook, is a co-author of Apache Hive, co-author of “Building a Data Driven Enterprise with DataOps” and CEO of Qubole. He’ll be moderated by Horia Margarit, resident Data Scientist at Qubole. Get your questions ready for what will be a lively and entertaining discussion!

Fireside Chat: Lessons Learned from Facebook

Cloud has changed the game when it comes to data analytics.  Previously, organizations had to lock themselves into a particular architecture and level of capacity for three to seven years and do all the lifting themselves. Cloud on the other hand allows them to experiment with different hardware and software options, get more of the solution as a service and scale up and down to meet project spikes and accelerate busy jobs at will. This makes it much more viable for any company to get the advantages of advanced analytics against large data sets, without an oversize IT staff or huge capital investments.

Oracle cloud is specifically designed to help enterprises take advantage of cloud for data analytics—it offers massive non-variable performance, predictable low cost and broad choice of deployment and software options. Oracle and Qubole work together to deliver a new breed of data platform—capable of taming the scale, performance, cost and complexity issues associated with gaining business insight from data of all types.

Watch this webinar to understand:
- Summary of industry trends for big data on the cloud
- How Oracle Cloud Infrastructure is optimized for big data workloads from a cost, performance and flexibility perspective
- How Oracle Cloud Big Data solutions compare with on-premises and competing cloud options

Big Data Trends & Oracle Cloud Infrastructure

The question is not much whether to migrate to the cloud or not. That question has likely already been answered by many organizations and the answer is a resounding full steam ahead. But the start of the journey can be daunting especially with a lot of ‘as-a-service’ terminology floating around. Please join James Curtis, senior analyst at 451 Research, as he discusses not only some industry trends and what many organizations are doing but also a simplified approach to understanding cloud services and how that might best fit your organization. Because it’s not so much buyer beware; it’s more about buyer understand.

Untangling the Cloud Services Hairball

There is a surge in hype around Artificial Intelligence. Startups are raising hundreds of millions of dollars by bedazzling investors with Deep Learning, word embeddings, and reinforcement learning. This is a distraction from the very real problems that data and AI can solve if done right. By working across dozens of machine learning problems that are live in the real world, I’ve worked out the most common problems encountered and recurring design patterns on how to solve real-world problems using AI as a tool. This talk will arm you with a perspective on how to get pragmatic solutions with AI today.

BS-free Data Science

This webinar is focused on:
- Increasing collaborative friction between engineers, analysts, and business
- Process-driven iteration i.e. balancing agility with discipline
- Making the quantitative business case for moving from big bang to continuous enhancement (convincing your CFO/CIO to shift from CapEx to OpEx)
- Case studies and outcomes from our clients

Get "Datopia": Transforming to an Agile Data Culture

Today’s organizations are tasked with managing multiple data types, coming from a wide variety of sources. Faced with massive volumes and heterogeneous types of data, organizations are finding that in order to deliver insights in a timely manner, they need a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Every use case might be different and different use cases might need different tools. AWS provides a variety of options for your needs from RDS, EMR, Redshift, Athena and Quicksight. In this talk we will discuss the different technologies available on AWS and its application.

Modern Data Architecture with AWS

Once considered the "black magic" of digitally-born retailers like Amazon, personalizing the customer experience has now become table stakes for any retailer interested in surviving in the era of digital transformation.   the techniques, tools and scalable platforms necessary to optimize customer interactions are now available and accessible for use by companies of any shape or size. We'll discuss how the use of big data technology in the cloud eases the implementation of common retail use cases as well as how it helps to avoid typical pitfalls.

Amplifying Retail with Big Data and The Cloud

We will look at how DevOps is making Data Science more mainstream with automation, release trains, agility and operational readiness. In this talk, we will look at the various tools and techniques in building a successful Data Science practice and how DevOps can be introduced to provide Continuous Integration, Delivery, and Deployment of Data Science Models.

Bringing DevOps to the World of Data Science

Continuous streams of data are generated in every industry from sensors, manufacturing IoT devices, business transactions, social media, network devices, clickstream logs, and more. Found within these streams of data are critical business insights that are waiting to be unlocked. Attend this session and learn how customers are creating solutions for fleet monitoring, smart grid, network monitoring, recommendations, and other real-time solutions to analyze multiple concurrent streams of data-in-motion into insights and actions for competitive advantage. 

In this session you will see demos and learn how services like Azure Event Hubs, Stream Analytics, Machine Learning, and other Azure services work seamlessly together to create your end to end real-time analytics solutions.

Real-time Analytics on Streaming Data with Azure Stream Analytics

This session will cover how Azure Machine Learning and R can help data scientists overcome the following challenges:
- Development Time - Dramatically reduce the time of running initial ML experiment validations.
- Performance - Option for best in class performance.

For deeper data science needs the session will explore how hard core data scientists can leverage R to attack the most complex scenarios.

Azure Machine Learning and R to Speed-up Data Science Projects

Data governance, discovery and lineage help data scientists find and integrate data of interest to uncover otherwise hidden trends, anomalies, and powerful predictors of business successes and failures. Comcast’s Streaming Data Platform comprises a wide variety of ingest, transformation, and storage services. Peer-reviewed Apache Avro schemas support end-to-end data governance. Apache Atlas is our metadata repository for data discovery and lineage. We have extended Atlas with custom data and process types, eg.: avro schemas; AWS S3 buckets and prefixes; kafka topics; and kinesis streams. Custom asynchronous messaging libraries notify Atlas of new data and schema entities and lineage links as they are created.

Data Governance, Discovery, & Lineage in a Heterogeneous Streaming Platform

At BloomReach we process around 100 million products everyday across all of our customers. For each customer, the feed processing needs to be fast and reliable, and while indexing there shouldn't be any impact on serving. We will walk over how we've built this in BloomReach while also making sure that the cost is minimal.

How We Built a Scalable, Fast, & Reliable Indexing Infrastructure

Looker is the modern data platform that democratizes data analytics, creates meaningful insights, and powers critical business actions. The Looker platform allows you to analyze your data and act on it within a single interface, enabling both business users and analysts to add maximum value where it matters most. Stop jumping between tabs and tools - do it all in Looker.

Creating Real Value: Bridging the Gap Between Analysis and Action

Today, 91% of companies are ingesting data from third party partners to run their businesses. Additionally, 70% of companies either currently send or plan to send data to partners. This inter-company data collaboration powers insights, machine learning, and better consumer experiences. But, it also increases workloads for strapped engineering teams and creates challenges to data access. Learn how companies are streamlining and even automating their Data Operations to accelerate the time from data to business value.

Accelerate Time to Value with Data Operations

Enterprises are building big data solutions with Azure Data Lake, an on-demand, real-time stream processing service with a no-limits data lake built to support massively parallel analytics. Patterns of enterprise solutions are emerging and evolving as customers migrate their analytics workloads to the cloud and embrace new business opportunities. With an overview of Azure Data Lake, this webinar briefly explores some of the choices customers are making in building big data solutions with Azure Data Lake.

On-Demand Analytics: Building Big Data Solutions with Azure Data Lake

Talend provides the data agility businesses need to use the latest cloud technologies to act with insight across their organization and win in an economy being deeply transformed by exploding data volumes, technology innovation, and fundamental changes to the IT infrastructure. Join us to learn how Talend and Qubole together help companies’ business users execute data preparation workloads in the cloud at a fraction of the cost and resources.

Qubole and Talend: A Match Made in the Cloud

In the biological sciences, hypothesis-driven experiments and bottom-up design experiments rely on predicting what will happen with new cells and molecules. Machine learning excels at prediction and has become more democratized, making it an important component in the biotech toolkit. We use Merck's Kaggle competition as a representative task in this domain that involves predicting molecular activity from numeric descriptors of chemical structure. Our approach utilizes deep neural networks using the Keras library in a Qubole notebook, which is conveniently attached to an autoscaled Spark cluster. We use Spark to distribute the hyperparameter search for optimizing the neural net.

Deep Learning for Biotechnology on Qubole

Saavn is India’s leading music streaming service. Since context is key to music, we have built a system called Sniper that lets us identify cohorts of users in real-time and target them for marketing, advertising and recommendation purposes. This system allows us to understand user behavior by quantifying their engagement characteristics such as stream consumption, affinities or ads. Speed and scalability are critical to its design. This talk will cover our motivations behind building such a system and how big data technologies have helped us architect it.

How We Built a Scalable, Real-time User Targeting System

For years IT has been tasked to produce, gather, and store large volumes of internal and external data. We’re now engaged in developing the infrastructure to support analysis of that data. A cloud-based, self-service, big-data model can be the answer and provide numerous benefits and efficiencies. But with those benefits, there are cultural, organizational and architectural hurdles to clear. We will discuss the challenges we faced at Scripps Networks Interactive, and the successful team and architectural outcomes that emerged.

Building a Culture to Support a Big Data Model in the Cloud

In any enterprise, one of the most prevalent security risks revolves around who has access to which resources. Whether data is being stored in a cloud solution or on-premises, there is a large challenge in knowing how to provide the correct privileges to associates. By using machine learning and clustering algorithms like the Louvain Method, we can group similar users in the Capital One network and create two valuable features: (1) automated onboarding and (2) automated “rogue access” detection.  With the utilization of machine learning, we have allowed Capital One to become a more well-managed company, and have reduced a major cybersecurity threat. This talk will be a deep dive into the model, data engineering and productionization of the web application interface.

Using Machine Learning to Manage User Access

Discover the evolution of data science workflows implemented at Expedia with a special emphasis on Learning to Rank problems. This session will explore the process of industrializing the data science workflow and best practices on how to keep your data productive, or even pull your organization out of the data swamp.

Industrializing Data Science Workflows

The Under Armour built the world's most comprehensive health and wellness application: the Connected Fitness Data Platform. It consists of event streaming pipelines and processing using big data technologies like Hive, Presto and Spark to derive the insights needed to keep their users fit and healthy. Discover their step by step process.

A Data Platform To Enable Intelligent Features

Data Platforms Virtual Summit

Cloud computing has exploded over the past few years, delivering a previously unimagined level of workplace mobility and flexibility. The cloud computing community on BrightTALK is made up of thousands of engaged professionals learning from the latest cloud computing research and resources. Join the community to expand your cloud computing knowledge and have your questions answered in live sessions with industry experts and vendor representatives.

Cloud Computing

The data center management community focuses on the holistic management and optimization of the data center. From technologies such as virtualization and cloud computing to data center design, colocation, energy efficiency and monitoring, the BrightTALK data center management community provides the most up-to-date and engaging content from industry experts to better your infrastructure and operations. Engage with a community of your peers and industry experts by asking questions, rating presentations and participating in polls during webinars, all while you gain insight that will help you transform your infrastructure into a next generation data center.

Data Center Management

Practicing business intelligence allows your company to transform raw data into sets of insights for targeted business growth. The business intelligence and analytics community on BrightTALK is made up of thousands of data scientists, database administrators, business analysts and other data professionals. Find relevant webinars and videos on business analytics, business intelligence, data analysis and more presented by recognized thought leaders. Join the conversation by participating in live webinars and round table discussions.

Business Intelligence and Analytics

Welcome to the big data and data management community on BrightTALK. Join thousands of data quality engineers, data scientists, database administrators and other professionals to find more information about the hottest topics affecting your data. Subscribe now to learn about efficiently storing, optimizing a complex infrastructure, developing governing policies, ensuring data quality and analyzing data to make better informed decisions. Join the conversation by watching live and on-demand webinars and take the opportunity to interact with top experts and thought leaders in the field.

Big Data and Data Management

As an IT professional, many of the problems you face are multifaceted, complex and don’t lend themselves to simple solutions. The information technology community features useful and free information technology resources. Join to browse thousands of videos and webinars on ITIL best practices, IT security strategy and more presented by leading CTOs, CIOs and other technology experts.

Please note!