Memory is the New Disk: Fast Data Analytics with Apache Spark
Apache Spark (http://spark.apache.org/) is emerging as the next generation of technology for data processing. Compared to Hadoop MapReduce, Spark is easier to use and maintain, increasing its popularity, which is due to the fact that it excels in in-memory computations (many times faster than MapReduce). This also allows Spark to be the de-facto platform for machine learning algorithms at scale.
In this presentation, the Spark platform will be introduced to the audience by:
• Walking through interesting features of Spark
• Comparing and contrasting it with Hadoop
• Highlighting Spark use cases and its ecosystem
RecordedMar 12 201539 mins
Your place is confirmed, we'll send you email reminders
Hélène Lyon, IBM, Distinguished Engineer, IBM Z Solutions Architect
IT is a key player in the digital and cognitive transformation of business processes delivering solutions for improved business value with analytics. This session will step by step explain the journey to secure production while adopting new analytics technologies leveraging mainframe core business assets
Data Scientists are rare and highly valued individuals, and for good reason: making sense of data, and using the machine learning libraries requires an unusual blend of advanced skills. Why is it then that Data Scientists spend the majority of their time getting data ready for models, and a fraction actually doing the high value work?
In this talk we introduce the concept of Data Fabric, a new way to provide a self-service model for data, where data scientists can easily discover, curate, share, and accelerate data analysis using Python, R, and visualization tools, no matter where the data is managed, no matter the structure, and no matter the size.
We will talk through the role of Apache Arrow, the in-memory columnar data standard that is accelerating analytics for GPU-based processing, as well as the role of Pandas and Arrow in providing unprecedented speed in accessing datasets from Python.
Dr Tom Parsons, Research Director and Mitchell Murphy, Data Scientist from Spotlight Data
Researchers generate huge amounts of valuable unstructured data and articles from research every day. The potential for this information is huge: cancer and pharmaceutical breakthroughs, advances in technology and cultural research that can improve the world we live in.
This webinar discusses how text mining and Machine Learning can be used to make connections across this broad range of files and help drive innovation and research. We discuss using Kubernetes microservices to analyse the data and then applying Machine Learning and graph databases to simplify the reuse of the data.
JS Gourevitch, Luca Schnettler, Petra Wildermann, Anil Celik, Thomas Lethenborg
The data economy and digital technologies are deeply transforming almost all areas of our lives. One of the most heavily transformed revolve around insurance and healthcare with a number of really interesting development possibly redefining the way we take care of ourselves and the way we consumer and use insurance as well.
From harnessing the power of data to better help mental health patients, carers and medical personnel with their treatments to assessing the risk of developing broad range of illnesses and engaging better with users to propose them personalised healthy life plans to using big data and analytics to track down and prepare for epidemics to using data to better cover cars and drivers with car insurances and finally using social media data for insurers to better engage with customers, this webinar will propose a fascinating exploration of the opportunities, risks, new models supporting the digital transformation in banking.
Moderated by Jean-Stéphane Gourévitch
Luca Schnettler, CEO and Founder, HealthyHealth (UK)
Petra Wildermann, Business Development Director, Metabiota (Switzerland)
Anil Celik, Co-founder and CEO Urbanstats (US)
Thomas Lethenborg, CEO, Monsenso (Denmark)
Public cloud deployments have become irresistible in terms of flexibility, low barriers to entry, security, and developer friendliness. But the sheer inertia of traditional data lakes make them difficult to transition to cloud. In this talk we'll look at examples of how leading companies have made the transition using open source technologies and hybrid strategies.
Instead of following a "lift and shift" strategy for moving data lake workloads to the cloud, there are new considerations unique to cloud that should be considered alongside traditional approaches related to compute (eg, GPU, FPGA), storage (object store vs. file store), integrations, and security.
Viewers will take away techniques they can immediately apply to their own projects.
Maloy Manna, PM Engineering, AXA Data Innovation Lab, Paris
The concept of Data lakes evolved to address challenges and opportunities in managing big data.
Organizations are investing massive amounts of time and money to upgrade existing data infrastructures and build data lakes whether on-premises or in the cloud.
This talk will discuss architectures and design options to implement data lakes with open source tools. Also covered are challenges of upgrade & migration from existing data warehouses, metadata management, supporting self-service and managing production deployments.
Hélène Lyon, IBM, Distinguished Engineer, IBM Z Solutions Architect
As an Enterprise customer, you are potentially using IBM Z in a hybrid cloud implementation. Let's understand how to benefit from cloud access to mainframe data without moving it outside z; thereby improving security, reducing integration challenges and answering your GDPR auditor's needs.
Iver van de Zand will talk and demo on the latest SAP innovations for analytics in the cloud. Keywords are live connectivity and the closed loop of combined business intelligence, planning and predictive analytics all in one environment. Fully ready and prepared for big data.
David Siegel, Blockchain, decentralization and business agility expert
Still confused about this whole Blockchain thing? Interested in investing in digital currencies, but not sure where to start? Want to get a better idea of the threats and opportunities?
David Siegel is a Blockchain, decentralization and business agility expert who has been a high-level management & strategy consultant to companies like Sony, Hewlett Packard, Amazon, NASA, Intel, and many start-ups. David has been praised for being able to explain Blockchain in the most simple and interesting way.
What you will learn:
-What is Bitcoin?
-What is the blockchain?
-What is Ethereum? What is Ether?
-What is a distributed application?
-What is a smart contract?
-What is a triple ledger?
-What about identity and security?
-What business models are at risk?
-What are the opportunities?
-What should we do?
Vivek Bajaj, Global VP of Solutions for IBM Financial Services
Today the payments industry faces a rebirth by necessity. Financial institutions process massive volumes of customer and payments transaction data, much of it unstructured and untapped.
Cognitive Systems have the ability to understand, reason and learn. In Financial Services applying cognitive capabilities to real world payments issues like safer and faster payments is yielding significant results. Furthermore Risk and Compliance and segment of one engagement are areas where ROI is tremendous when leveraging advanced analytics and artificial intelligence in cohesion.
Learn from real world use cases of how financial institutions globally have gained significant competitive advantage by becoming a truly Cognitive Bank.
HDFS on Kubernetes: Lessons Learned is a webinar presentation intended for software engineers, developers, and technical leads who develop Spark applications and are interested in running Spark on Kubernetes. Pepperdata has been exploring Kubernetes as potential Big Data platform with several other companies as part of a joint open source project.
In this webinar, Kimoon Kim will show you how to:
–Run Spark application natively on Kubernetes
–Enable Spark on Kubernetes read and write data securely on HDFS protected by Kerberos
Dr. Umesh Hodeghatta Rao, CTO, Nu-Sigma Analytics Labs
Data visualization must be intuitive in order for non-IT business leaders to see data patterns. Representing data in a graphical or pictorial format is easy, but constructing the data in the best and most logical way can be tricky.
In this session, Umesh will talk about how to represent data simply to make quicker and better business decisions. He will walk through several data visualization techniques through business cases and examples. By the end of the session, you will not only know different data visualization techniques, but also have an understanding of circumstances under which each technique should be used and the best way to represent particular data sets for different business cases.
Predictive Analytics - everyone is talking about it and many organisations claim to be doing it. But are they? And what insights do they gain to then make tactical or strategic changes? How can analysts work with decision makers by sharing results in a visually effective and meaningful way while also informing them about possible courses of action?
This webinar is presented by Andy Kriebel, Head Coach at the Data School and Eva Murray, Tableau Evangelist at Exasol. Our guest speaker on Predictive Analytics is Benedetta Tagliaferri, Consulting Analyst at The Information Lab.
The webinar will look at some examples of predictive analysis and will show data visualization examples that are actionable and can drive further questions and discussions in an organisation.
Carl Edwards, BI Consultant, Brett Churchill, BI Consultant
Looking to take your graphs to the next level? Want to make sure you choose the right visualization? Plagued by the challenges of geospatial heat maps?
Get your questions ready and join this session where data experts Carl and Brett will go over the common questions they get asked and answer all the data visualization issues you've been plagued with, including how to:
-Use location-based data to put your visualization on the map
-Uncover new relationships, patterns and opportunities
-Identify emerging trends
-Answering comparative business questions with set analysis
-Understand best practices for creating an aesthetically-pleasing and useful visualization
When analysis needs to be used by decision makers that didn’t create it, the communication of the information and the message it conveys becomes critical. There is a plethora of ways to layout reports and dashboards, even within a single company.
Enter the SUCCESS formula, that “lightbulb” moment.
Introduced by the IBCS Association (International Business Communication Standards) the SUCCESS formula provides conceptual, perceptual and semantic rules that enable faster, better, and less-costly results in all stages of business communications and decision-making processes.
This webinar will introduce the 7 Rules of SUCCESS that provides a toolkit to aid analysts in designing their visualisations for better reach and decisions in their target audience.
The webinar will also introduce The Philips journey to implementing IBCS principles in their global "Accelerate!” Initiative.
Marwa Ayad Mohamed ( Founder of YourChildCode ,Team lead software Engineer, Women Techmakers Cairo Lead)
Tensorflow is an open source software library for numerical computation and machine learning.
Join this session where Marwa will discuss:
-Introduction to Artificial intelligence, machine learning and deep learning
-Sample of machine learning applications
-Tensorflow Story, Model and windows installation steps with object recognition demo.
Arinze Akutekwe, PhD Data Scientist, BAS EMEIA – Intelligent Enterprise - Analytics at Fujitsu
Artificial intelligence has greatly changed the way we live since the 20th century. It involves the science and engineering of making machines intelligent and autonomous using computer programs.
The processing power of computers has been on the exponential increase with cost of processors and storage decreasing. This has made research and developments efforts in AI areas such as deep learning, once thought to be impossible possible.
In this webinar, we will examine current methods, application domains of specific methods, their impacts on our daily lives and try to answer questions on ethics of applying these technologies.
Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.
This talk will cover a code walk-through, the key lessons learned while building such real-world software systems over the past few years. We'll look for fraud signals in public email datasets, using IPython and popular open-source libraries (scikit-learn, statsmodel, nltk, etc.) for data science and Apache Spark as the compute engine for scalable parallel processing.
David will iteratively build a machine-learned hybrid model – combining features from different data sources and algorithmic approaches, to catch diverse aspects of suspect behavior:
- Natural language processing: finding keywords in relevant context within unstructured text
- Statistical NLP: sentiment analysis via supervised machine learning
- Time series analysis: understanding daily/weekly cycles and changes in habitual behavior
- Graph analysis: finding actions outside the usual or expected network of people
- Heuristic rules: finding suspect actions based on past schemes or external datasets
- Topic modeling: highlighting use of keywords outside an expected context
- Anomaly detection: Fully unsupervised ranking of unusual behavior
Apache Spark is used to run these models at scale – in batch mode for model training and with Spark Streaming for production use. We’ll discuss the data model, computation, and feedback workflows, as well as some tools and libraries built on top of the open-source components to enable faster experimentation, optimization, and productization of the models.