The cloud has the potential to deliver on the promise of big data processing for machine learning and analytics to help organizations become more data-driven, however, it presents its own set of challenges.
This webinar covers best practices in areas such as.
- Using automation in the cloud to derive more value from big data by delivering self-service access to data lakes for machine learning and analytics
- Enabling collaboration among data engineers, data scientists, and analysts for end-to-end data processing
- Implementing financial governance to ensure a sustainable program
- Managing security and compliance
- Realizing business value through more users and use cases
In addition, this webinar provides an overview of Qubole’s cloud-native data platform’s capabilities in areas described above.
About Our Speaker:
James Curtis is a Senior Analyst for the Data, AI & Analytics Channel at 451 Research. He has had experience covering the BI reporting and analytics sector and currently covers Hadoop, NoSQL and related analytic and operational database technologies.
James has over 20 years' experience in the IT and technology industry, serving in a number of senior roles in marketing and communications, touching a broad range of technologies. At iQor, he served as a VP for an upstart analytics group, overseeing marketing for custom, advanced analytic solutions. He also worked at Netezza and later at IBM, where he was a senior product marketing manager with responsibility for Hadoop and big data products. In addition, James has worked at Hewlett-Packard managing global programs and as a case editor at Harvard Business School.
James holds a bachelor's degree in English from Utah State University, a master's degree in writing from Northeastern University in Boston, and an MBA from Texas A&M University.
Data lakes are centralized data repositories. Data needed by data scientists is physically copied to a data lake which serves as a one storage environment. This way, data scientists can access all the data from only one entry point – a one-stop shop to get the right data. However, such an approach is not always feasible for all the data and limits it’s use to solely data scientists, making it a single-purpose system.
So, what’s the solution?
A multi-purpose data lake allows a broader and deeper use of the data lake without minimizing the potential value for data science and without making it an inflexible environment.
Attend this session to learn:
• Disadvantages and limitations that are weakening or even killing the potential benefits of a data lake.
• Why a multi-purpose data lake is essential in building a universal data delivery system.
• How to build a logical multi-purpose data lake using data virtualization.
Do not miss this opportunity to make your data lake project successful and beneficial.
How do you avoid your enterprise data lake turning into a so-called data swamp? The explosion of structured, unstructured and streaming data can be overwhelming for data lake users, and make it unmanageable for IT. Without scalable, repeatable, and intelligent mechanisms for cataloguing and curating data, the advantages of data lakes diminish. The key to solving the problem of data swamps is Informatica’s metadata driven approach which leverages intelligent methods to automatically discover, profile and infer relationships about data assets. Enabling business analysts and citizen integrators to quickly find, understand and prepare the data they are looking for.Read more >
Consumers are engaging with brands across multiple touchpoints, channels, and devices, generating massive amounts of valuable data. Organizations are quickly adopting a number of solutions to keep up with this explosion of customer data and better capture and correlate user behavior.
Two common solutions brands are leveraging to house and analyze all of this customer data are Enterprise Data Warehouses (EDW) and Data Lakes. Register now for this 30-minute webinar and learn:
- Key benefits of each and which is best for your brand
- Why pairing your enterprise data storage solution with customer data initiatives makes your tech stack even more powerful
- How an automated data supply chain fits in a modern EDW and data lake environment
- And more!
The webinar will conclude with a live Q&A Chat with questions from the audience on all things enterprise data storage.
This is the second webinar in our “Citizen Data Science” series. Our first webinar, “Getting Started with Citizen Data Scientists,” covered the importance of citizen data scientists, how to get them enabled and how to empower them.
To some citizen data scientists, data is a new language that they’re unfamiliar with translating into insights. For others, they’re eager to dive in and curious to explore, but reluctant to communicate their findings because they don’t have experience translating data into tactical operations.
These new business-focused analysts need to be confident that they’re utilizing data properly. To empower these new data dives, Kyle Dempsey, senior professional services engineer at Periscope Data pulled together a collection of tips to make sure first-time analysts are doing data right.
Join Kyle on September 27th, 2018 at 10 a.m. PDT as he walks through how to:
- Define your analysis and outcomes
- Understand what data is available and know how to ask for more
- Ask questions in a way that allows data to answer them
- Use data to inform decisions
- Enable collaboration between technical and nontechnical teams
Is Your Data Ready for GDPR?
As the deadline for GDPR approaches, it is time to get practical about protecting personal data.
We break down the steps for turning a data lake into a data hub with appropriate data management and governance activities: from capturing and reconciling personal data to providing for consent management, data anomyzation, and the rights of the data subject.
A smart approach to GDPR compliance lays a foundation for personalized and profitable customer and employee relations.
Watch, as experts from MAPR and Talend show you how to:
- Diagnose the maturity of your GDPR compliance;
- Set up milestones and priorities to reach compliance;
- Create a foundation to manage personal data through a data lake;
- Master compliance operations - from data inventory to data transfers to individual rights management.
Achieving actionable insights from data is the goal of any organization. To help in this regard, data catalogs are being deployed to build an inventory of data assets that provides both business and IT users a way to discover, organize and describe enterprise data assets. This is a good first step that helps all types of users easily find relevant data to extract insights from.
Increasingly, end users want to take the next step in provisioning or procuring this data into a sandbox or analytics environment for further use. Attend this session to see how organizations are looking to build actionable data catalogs via a data marketplace, that allow self-service access to data without sacrificing data governance and security policies.
Learn how to provide governed access and visibility to the data lake while still staying on track and within budget. Join Scott Gidley, Zaloni’s Vice President of Product, as he discusses:
- Architecting your data lake to support next-gen data catalogs
- Rightsizing governance for self-service data
- Where a data catalog falls short and how to address
- Success use cases
Anyone who's ever analyzed data knows the pain of digging in only to find that it is poorly structured, full of inaccuracies, or just plain incomplete. But "dirty data" isn't just a pain point for analysts; it can have a major financial and cultural impact on an organization.
Attend this live webinar to learn four actionable ways to overcome common data preparation issues including how to establish a company standard for "clean data" and how to democratize data prep across your organization.
- Andy Cotgreave, Technical Evangelism Director, Tableau
- Jason Harmer, Data Analytics and Visualization Consultant, Nationwide Insurance
- Gordon Strodel, Information Management and Analytics Consultant, Slalom
Today's enterprises need broader access to data for a wider array of use cases to derive more value from data and get to business insights faster. However, it is critical that companies also ensure the proper controls are in place to safeguard data privacy and comply with regulatory requirements.
What does this look like? What are best practices to create a modern, scalable data infrastructure that can support this business challenge?
Zaloni partnered with industry-leading insurance company AIG to implement a data lake to tackle this very problem successfully. During this webcast, AIG's VP of Global Data Platforms, Carlos Matos, and Zaloni CEO, Ben Sharma will share insights from their real-world experience and discuss:
- Best practices for architecture, technology, data management and governance to enable centralized data services
- How to address lineage, data quality and privacy and security, and data lifecycle management
- Strategies for developing an enterprise-wide data lake service for advanced analytics that can bridge the gaps between different lines of business, financial systems and drive shared data insights across the organization
The shelf life of data is shrinking. A streaming shift is taking place and use cases such as IoT connected cars, real-time fraud detection and predictive maintenance using streaming analytics are becoming commonplace. You too can switch to the fast data lane with Informatica, leveraging Kafka and other big data technologies. So shift gears and change lanes with us while we take you on a journey into the world of streaming data.Read more >
Data is the new currency for most organizations and data volumes are continuing to grow at an explosive rate. While the advantage of collection of such large volumes of data is obvious, protection of this data from cybercriminals and malicious actors is becoming increasingly difficult. Conventional security mechanisms are failing and large-scale security breaches, despite increasing security spends, are becoming commonplace. This, along with increasingly stringent regulatory requirements and privacy laws have brought “Data Security” – protection of the data itself whether in motion, in use or in transit into strong focus. While there has been an increasing focus on Data Centric Security, the solution landscape is fractured and enterprises are still struggling to identify and deploy long term solutions with minimal disruption to existing investments and processes. This talk will focus on the current state of Data security and offer pointers to how organizations can embark on a long-term Data Centric journey which truly adds business value.Read more >
If you feel like you don’t trust your data, there’s probably a good reason. It happens all the time; companies implement analytics, customize their solutions and don’t audit the implementation to ensure ongoing data accuracy. This leads to multiple inaccuracies, gaps in tracking, and — even worse — information that’s simply missing. Inaccurate data can send a brand down the wrong path, leading to bad decisions and additional costs for tools and resources that could have otherwise been avoided.
Join and learn:
- What data quality is and why it’s critical to an organization's overall success
- Why your data is a mess and how to identify the warning signs of poor data quality
- Best practices to ensure clean and quality data and how to take back control
- And so much more!
The webinar will conclude with a Fireside Chat with live questions from the audience on all things data quality.
As data is growing at an exponential rate, organizations are increasingly looking to leverage streaming data from mobile devices, wearable technology and sensors for real-time processing and analytics. Gartner estimates that, “By 2020, 70% of organizations will adopt data streaming to enable real-time analytics.” However, implementing real-time data ingest, processing and delivering insights at scale requires infrastructure with zero latency and easy access to information when it is required.
In the webinar, we’ll discuss:
- Adopting Modern Data Lake with the Hortonworks Data Platform (HDP)
- Accelerating real-time data analytics with Hortonworks Data Flow (HDFTM) and Attunity to build a data lake
- Solving challenges with real-time data ingest and managing data in motion workloads
Join subject matter experts from IBM and Hortonworks for a joint webcast to help you accelerate real-time data analytics and manage your data workloads efficiently.
As data analytics becomes more embedded within organizations, as an enterprise business practice, the methods and principles of agile processes must also be employed.
Agile includes DataOps, which refers to the tight coupling of data science model-building and model deployment. Agile can also refer to the rapid integration of new data sets into your big data environment for "zero-day" discovery, insights, and actionable intelligence.
The Data Lake is an advantageous approach to implementing an agile data environment, primarily because of its focus on "schema-on-read", thereby skipping the laborious, time-consuming, and fragile process of database modeling, refactoring, and re-indexing every time a new data set is ingested.
Another huge advantage of the data lake approach is the ability to annotate data sets and data granules with intelligent, searchable, reusable, flexible, user-generated, semantic, and contextual metatags. This tag layer makes your data "smart" -- and that makes your agile big data environment smart also!
AI Machine Learning model accuracy depends on the quality of data. In data science, when we say quality of data, it means data consistency, data completeness and data correctness which are all part of data integrity. In this session we will talk about how machine learning models can be adopted for data integration. Also, in case of some of the machine learning models, we assume data is normally distributed or data elements are appropriately scaled. However, it is not always true. Hence, data has to be transformed by normalizing data without losing its integrity. This is a big challenge in data science. Data integrity is maintained with the help of integrity constraints or the rules that are designed to keep data consistent and correct. In this session we will discuss some of the techniques and methods used for data integration, data transformation and normalization while ensuring data integrity. We will walk you through the steps involved with the help of examples.Read more >
This 1-hour webinar from GigaOm Research brings together leading minds in cloud data analytics, featuring GigaOm analyst Andrew Brust, joined by guests from cloud big data platform pioneer Qubole and cloud data warehouse juggernaut Snowflake Computing. The roundtable discussion will focus on enabling Enterprise ML and AI by bringing together data from different platforms, with efficiency and common sense.
In this 1-hour webinar, you will discover:
- How the elasticity and storage economics of the cloud have made AI, ML and data analytics on high-volume data feasible, using a variety of technologies.
- That the key to success in this new world of analytics is integrating platforms, so they can work together and share data
- How this enables building accurate, business-critical machine leaning models and produces the data-driven insights that customers need and the industry has promised
- How to make the lake, the warehouse, ML and AI technologies and the cloud work together, technically and strategically.
Register now to join GigaOm Research, Qubole and Snowflake for this free expert webinar.
Ensure data privacy and protection across the enterprise, lowering risk associated with data governance initiatives.Read more >
Getting your company ready for GDPR isn’t about putting a few new processes in place — it’s about rethinking your entire approach to personal data, including how to get value from it. For decades, companies have collected and stored all kinds of personal information “just in case” they ever needed it.
GDPR requires a different approach. You need to be proactive in thinking about how to get value from your data, and you need to understand exactly what your company is doing with personal data and why.
Join Jill Reber and Kevin Moos of Primitive Logic to learn:
- How to work with third parties who process personal data on your behalf
- How preparing for GDPR helps you understand your data on a whole new level (and why that’s a good thing)
Do you know that your existing investments in Informatica PowerCenter can fast track you to Big Data and data lake technologies? We will demonstrate why our customers are moving from data warehouses to data lakes, leveraging big data and cloud ecosystems and how to do this rapidly, leveraging your existing investments in Informatica technology.Read more >
The data contained in the data lake is too valuable to restrict its use to just data scientists. It would make the investment in a data lake more worthwhile if the target audience can be enlarged without hindering the original users. However, this is not the case today, most data lakes are single-purpose. Also, the physical nature of data lakes have potential disadvantages and limitations weakening the benefits and possibly even killing a data lake project entirely.
A multi-purpose data lake allows a broader and greater use of the data lake investment without minimizing the potential value for data science or for making it a less flexible environment. Multi-purpose data lakes are data delivery environments architected to support a broad range of users, from traditional self-service BI users to sophisticated data scientists.
Attend this session to learn:
* The challenges of a physical data lake
* How to create an architecture that makes a physical data lake more flexible
* How to drive the adoption of the data lake by a larger audience