Designing Data Lakes: Architecture Options with Open Source Tools
The concept of Data lakes evolved to address challenges and opportunities in managing big data.
Organizations are investing massive amounts of time and money to upgrade existing data infrastructures and build data lakes whether on-premises or in the cloud.
This talk will discuss architectures and design options to implement data lakes with open source tools. Also covered are challenges of upgrade & migration from existing data warehouses, metadata management, supporting self-service and managing production deployments.
RecordedApr 17 201962 mins
Your place is confirmed, we'll send you email reminders
Join this live panel of industry experts and learn what steps businesses are taking to streamline their adoption of self service analytics tools and techniques.
- How companies are approaching their in-house analytics strategy to ensure their analytics tools are meeting their needs.
- Best practices for managing your data to optimize analytics insights.
- How data visualization and data storytelling can enrich your self service analytics journey.
Tune in to this live panel of industry experts to learn how enterprises are adopting AI and machine learning technology to enhance their analytics strategy.
- How businesses are applying AI in a variety of analytics use cases across multiple verticals
- The business benefits of utilizing advanced analytics techniques.
- How you can optimize your approach to machine learning and AI for the enterprise to get the most out of your data.
Giovanni Lanzani, Chief Science Officer at GoDataDriven
Now that the Data Science hype is levelling out, many companies are wondering what went wrong as they could not extract values from their data science efforts.
In this webinar we will explore what does it take to apply data science and machine learning in the real world.
Key takeaways include:
- How can you go beyond the traditional data warehouse when doing machine learning
- How should you adapt your processes to keep monetizing on your data
- How to close the feedback loop between your customers and your machine learning models
- What kind of profiles are essential to successfully become a data driven organization
Yaroslav Nedashkovskyi, System Architect at SoftElegance
We are going to discuss a case study on a unified data lake for the oil industry -- it is a software architecture and a set of microservices that are used to get business values from the data that are generated during the oil production. Math models were developed to make failure prediction of rod pumps during the oil artificial lifting.
We used modern capabilities of Big Data Architecture, based on Apache Spark set of technologies, machine learning, archived data, and streaming data from wells to build a unified math model to predict failure of that kind of industrial equipment.
Join this webinar to learn:
-- How machine learning can help to predict failure of industrial equipment
-- Architecture to handle near real-time data-flow from oil wells
Charlie Leahy, Head of Software Architecture and Data Science (Hufsy)
Banks have a vast wealth of mineable data available to them, but traditionally have provided their customers with little feedback beyond a balance and list of transactions.
In this talk Charles Leahy, Tech Lead at Hufsy, looks at ways in which tools such as visualisation and machine learning can be employed to give users meaningful insights, helping them make the most of their money.
Erin Junio (BrightTALK), Darren Plumlee (IBM), Nandu Patil (CVS Health), Tanvi Shah (Slalom)
When it comes to large volumes of data from a multitude of sources, many businesses struggle to make sense of a seemingly endless and unstructured wealth of information. Thankfully, with the right approach to data visualization, companies can provide context for their data and construct a narrative that lends incredible insight into the patterns and trends affecting key business initiatives.
Join this live panel of industry experts and discover how data visualization is making the data storytelling process more intuitive and precise than ever.
Erin Junio (BrightTALK), Aaron Kalb (Alation), Ashley Howard (Tableau)
As data visualization technology continues to evolve, it gives businesses the edge they need to dive deeper into their data with analytics, enriching the data discovery process and revealing answers to questions that analysts might not have even known they had. Properly executed visual analytics can help organizations discover patterns and insights that lead to tangible results in today's competitive marketplace.
Join this live panel and discover the latest tools, techniques and strategies industry experts are using to visually transform their data for compelling business intelligence and analytics use cases.
Carl Allchin & Andy Kriebel, Asst. Head Coach & Head Coach (The Information Lab Data School); Eva Murray, Head of BI (Exasol)
Carl Allchin (The Information Lab Data School) shares practical advices and real-life scenarios for developing a successful and sustainable self-service analytics environment.
Carl has several years of experience as an analytics and BI consultant, coach and trainer. He focuses on operational improvement and customer experience measurement, two key components for any self-service strategy.
Join us for this webinar and takeaway a number of tips, ideas and the pitfalls to avoid when setting up self-service BI in your organization.
Maurice Flynn, Director of Research, The FF Foundation
Most business executives involved in this area spend most of their time involved in this part of the process despite wanting to get onto the results stage. But in fact most experts agree this is probably the most important part to get right. Fortunately new tools and techniques are making this easier than ever. However awareness is still low and confusion is high. In this session we aim to clarify and show the way forward.
Brian Lange, Partner and Data Scientist, Datascope
Good applications of machine learning and AI can be difficult to pull off. Join Brian Lange, Partner and Data Scientist at data science firm Datascope, as he discusses a variety of ways machine learning and AI can fail (from technical to human factors) so that you can avoid repeating them yourself.
Apache Spark for Big Data Analysis combined with Apache Zeppelin for Visualization is a powerful tandem that eases the day to day job of Data Scientists.
In this webinar, you will learn how to:
+ Collect streaming data from the Twitter API and store it in a efficient way
+ Analyse and Display the user interactions with graph-based algorithms wi.
+ Share and collaborate on the same note with peers and business stakeholders to get their buy-in.
Akmal Chaudhri, Technology Evangelist, GridGain Systems
Attend this session to learn how to easily share state in-memory across multiple Spark jobs, either within the same application or between different Spark applications using an implementation of the Spark RDD abstraction provided in Apache Ignite. During the talk, attendees will learn in detail how IgniteRDD – an implementation of native Spark RDD and DataFrame APIs – shares the state of the RDD across other Spark jobs, applications and workers. Examples will show how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames.
Akmal Chaudhri has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).
Susanne Chishti, FINTECH Circle | Luca Romagnoli, Salesforce | Sune Gabelgård, Nets| Philip Pointner, Jumio| Paul Hamilton
Data is the new oil. How can financial services organisations use data to revolutionise their offerings and provide a better customer experience?
Join this panel to find out:
- How can banks help customers put their data to good use? What have we learned from Open Banking and PSD2?
- How can AI and predictive intelligence be used to avoid risks and provide a more seamless experience?
- Challenges with data management and aggregation and how to solve them
- How data analytics is impacting financial services
Susanne Chishti, Founder of FINTECH Circle
Luca Romagnoli, Director of Regulated Industries at Salesforce
Sune Gabelgård, Head of Digital Fraud, Intelligence & Research, Nets
Philip Pointner, Chief Product Officer, Jumio
Paul Hamilton, Innovation and FS Expert, PA Consulting
This talk tells the story of implementation and optimization of a sparse logistic regression algorithm in spark. I would like to share the lessons I learned and the steps I had to take to improve the speed of execution and convergence of my initial naive implementation. The message isn’t to convince the audience that logistic regression is great and my implementation is awesome, rather it will give details about how it works under the hood, and general tips for implementing an iterative parallel machine learning algorithm in spark.
The talk is structured as a sequence of “lessons learned” that are shown in form of code examples building on the initial naive implementation. The performance impact of each “lesson” on execution time and speed of convergence is measured on benchmark datasets.
You will see how to formulate logistic regression in a parallel setting, how to avoid data shuffles, when to use a custom partitioner, how to use the ‘aggregate’ and ‘treeAggregate’ functions, how momentum can accelerate the convergence of gradient descent, and much more. I will assume basic understanding of machine learning and some prior knowledge of spark. The code examples are written in scala, and the code will be made available for each step in the walkthrough.
Lorand is a data scientist working on risk management and fraud prevention for the payment processing system of Zalando, the leading fashion platform in Europe. Previously, Lorand has developed highly scalable low-latency machine learning algorithms for real-time bidding in online advertising.
Robert Cruz, Senior Director, Information Governance Practice, Smarsh
During this presentation, Robert Cruz, Senior Director/Information Governance Practice at Smarsh will discuss the governance challenges of today’s data – namely, the growth of social, mobile, and rich, dynamic content. Cruz will also look at the process of governing your data in the cloud and provide tips and best practices for qualifying cloud services providers for data availability, performance and extraction.
Romain Fouache, Dataiku | Maciej Dabrowski, Genesys | Richard Corderoy,Oakland Data | Kevin Hannon, VP, Unravel Data
The market for big data technologies continues to accelerate as big data becomes an increasingly integral part of business operations worldwide. And, as data analytics tools and solutions have matured, businesses have been able to leverage the insights from their data at a faster pace than ever before.
Discover the ways in which businesses are applying their big data insights to achieve real-world results.
- How successful businesses are incorporating big data analytics into their digital strategy and seeing real results
- How to overcome common challenges and pitfalls when implementing your big data analytics solutions
- How emerging technologies like machine learning and AI are evolving big data insights
- and more!
Bas Geerdink, Technology Lead, Labs Innovation Office, ING
Romain Fouache, VP Strategy, Dataiku
Maciej Dabrowski, Chief Data Scientist, Genesys
Richard Corderoy, Chief Data Officer, Oakland Data and Analytics
Kevin Hannon, VP, Unravel Data
Leveraging data lakes is no longer a technical challenge, AWS, Azure and GCP make it easy to provision and harness the technical infrastructure needed in order to leverage your data. The issues that prevent data exploitation are all of the non technical drivers that we have been talking about for years. In this talk we will run through the following provocations:
1. We don't need governance, we have a data lake.
2. We have all our data in one place so we know what data we have.
3. We have a data lake so we now have a data strategy.
4. We have new shiny tools we don't need to think about data engineers.
5. We do agile so we will get results.
Register for our webinar and get the inside track on how to leverage cloud data lakes.
Gary Richardson, MD, Emerging Technology
Gary is the Managing Director for Emerging Technology at 6point6. With over 17 years’ of consulting experience, Gary leads a team of data scientists and data engineers in the agile development of Blockchain, AI and Machine Learning solutions. The focus of the team is bringing a collaborative approach to analytics, underpinned by machine learning and data engineering. He believes mainstream business adoption of AI solutions are the key to accelerating innovation enabling businesses to compete, reduce cost and ensure compliance.
Prior to joining 6point6, Gary was the Head of Data Engineering at a Big 4 consulting firm, focussing on blockchain and bringing sound data engineering to the world of AI.
Managing and analyzing data to inform business decisions
Data is the foundation of any organization and therefore, it is paramount that it is managed and maintained as a valuable resource.
Subscribe to this channel to learn best practices and emerging trends in a variety of topics including data governance, analysis, quality management, warehousing, business intelligence, ERP, CRM, big data and more.