Hi [[ session.user.profile.firstName ]]

How to Share State Across Multiple Apache Spark Jobs using Apache Ignite

Attend this session to learn how to easily share state in-memory across multiple Spark jobs, either within the same application or between different Spark applications using an implementation of the Spark RDD abstraction provided in Apache Ignite. During the talk, attendees will learn in detail how IgniteRDD – an implementation of native Spark RDD and DataFrame APIs – shares the state of the RDD across other Spark jobs, applications and workers. Examples will show how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames.

Akmal Chaudhri has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).
Recorded Mar 28 2018 42 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Akmal Chaudhri, Technology Evangelist, GridGain Systems
Presentation preview: How to Share State Across Multiple Apache Spark Jobs using Apache Ignite

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Fight gaming fraud with AI and machine learning Jul 31 2018 5:00 pm UTC 60 mins
    Jeff Sakasegawa, Trust and Safety Architect, Sift Science
    Globally there are 2.2 billion active gamers, and 47 percent of them shell out cash while they play. And 100 percent of them are at risk from fraudsters who rip off everything from a gamer’s identity to their credit cards, online goods, and trust in your company. With every instance of fraud, your reputation takes a nose dive, driving away customers and directly impacting your bottom line.

    But fraud is notoriously difficult to combat. Legacy rules-based approaches have never been able to keep up with fraudsters, who constantly evolve their techniques using sophisticated technology like automated scripts and bots.

    That’s why machine learning and artificial intelligence are being leveraged to detect fraud before it affects your company and end users. Machine learning can sift through billions of game events and analyze vast streams of data in real time to stop fraud in its tracks.

    To learn more about how machine learning and AI can keep your game and players safe from increasingly aggressive online criminals, don’t miss this VB Live event!

    In this webinar, you'll learn:
    * How the gaming industry can secure gamer data and build trust
    * How account takeover, fake licensing, spam, and scams pose a particular challenge to gamers and gaming platforms
    * What policies your company should have in place around data breach ransom
    * How to combat trolling

    Speakers:
    * Jeff Sakasegawa, Trust and Safety Architect, Sift Science
    * Dean Takahashi, Lead Writer, GamesBeat
    * Scott Adams, CEO FraudPvP.com, Former Director of Fraud & Risk, Riot Games
    * Rachael Brownell, Moderator, VentureBeat

    Sponsored by Sift Science
  • Does it matter if an algorithm can't explain how it knows what it knows? Recorded: May 24 2018 34 mins
    Beau Walker, Founder, Method Data Science
    With the General Data Protection Regulation (GDPR) becoming enforceable in the EU on May 25, 2018, many data scientists are worried about the impact that this regulation and similar initiatives in other countries that give consumers a "right to explanation" of decisions made by algorithms will have on the field of predictive and prescriptive analytics.

    In this session, Beau will discuss the role of interpretable algorithms in data science as well as explore tools and methods for explaining high-performing algorithms.

    Beau Walker has a Juris Doctorate (law degree) and BS and MS Degrees in Biology and Ecology and Evolution. Beau has worked in many domains including academia, pharma, healthcare, life sciences, insurance, legal, financial services, marketing, and IoT.
  • Semantic AI: Bringing Machine Learning and Knowledge Graphs Together Recorded: May 23 2018 64 mins
    Kirk Borne, Principal Data Scientist, Booz Allen Hamilton & Andreas Blumauer, CEO, Managing Partner Semantic Web Company
    Implementing AI applications based on machine learning is a significant topic for organizations embracing digital transformation. By 2020, 30% of CIOs will include AI in their top five investment priorities according to Gartner’s Top 10 Strategic Technology Trends for 2018: Intelligent Apps and Analytics. But to deliver on the AI promise, organizations need to generate good quality data to train the algorithms. Failure to do so will result in the following scenario: "When you automate a mess, you get an automated mess."

    This webinar covers:

    - An introduction to machine learning use cases and challenges provided by Kirk Borne, Principal Data Scientist at Booz Allen Hamilton and top data science and big data influencer.
    - How to achieve good data quality based on harmonized semantic metadata presented by Andreas Blumauer, CEO and co-founder of Semantic Web Company and a pioneer in the application of semantic web standards for enterprise data integration.
    - How to apply a combined approach when semantic knowledge models and machine learning build the basis of your cognitive computing. (See Attachment: The Knowledge Graph as the Default Data Model for Machine Learning)
    - Why a combination of machine and human computation approaches is required, not only from an ethical but also from a technical perspective.
  • How AI is Changing Marketing Recorded: May 17 2018 43 mins
    Gil Allouche, CEO, Metadata.io
    In this webinar, Metadata.io CEO Gil Allouche will talk about the different ways AI is being used by marketers. From analyzing data to orchestrating new marketing campaigns, AI is powering marketing activities in new and exciting ways and affecting interactions throughout the entire customer lifecycle. As an example of how AI can have a tremendous impact on marketing practices, Gil will focus on its role in lead generation. Webinar attendees will learn:

    - What Machine Learning is in relation to AI and how it connects your data to find patterns
    - Examples of how machine learning can identify target audiences, including the 20 percent that creates 80 percent of your revenue
    - How AI technology can help marketers prioritize their budgets to focus on the most effective programs
    - Starting with small, iterative uses of AI in marketing can be the most effective way to understand what will yield the most ROI

    Gil Allouche founded Metadata.io to make demand generation easy for non-technical marketers. The Metadata.io platform and AI Operator evolved from Gil's experiences hacking various marketing and CRM systems to get the solutions he needed.
  • Customer-Centered AI: A Radical Strategy Recorded: May 16 2018 34 mins
    Geordie Kaytes, Partner, Heroic
    AI is a powerful tool, but often companies get more excited about their technology than in the customer value they’re creating. Geordie Kaytes will share a framework for building customer-centered AI products. You’ll learn how to craft a far-reaching vision and strategy centered around customer needs and balance that vision with the day-to-day needs of your company.

    Learn a framework for creating and communicating a vision that describes the overall direction of your AI product, a defined product strategy, a cross-functional roadmap aligned with the strategy, and a list of metrics that track progress towards the strategy

    About the Speaker: Geordie Kaytes is the director of UX strategy for Boston-area UI/UX studio Fresh Tilled Soil and a partner at Heroic (https://www.heroicteam.com), a design leadership coaching firm that helps growing companies scale their digital product capabilities. A digital product design leader with deep experience in design process transformation and cross-functional expertise in design, strategy, and technology, Geordie has helped companies in a broad range of industries develop a 360-degree view of their product design processes. Previously, he did his obligatory tour of duty in management consulting. He holds a BA from Yale in political science. He is a coauthor of the Medium publication Radical Product.
  • Intelligent Agents and a New Class of Perceived Errors Recorded: May 16 2018 47 mins
    Dennis R. Mortensen, CEO and Founder, x.ai
    As we move to the conversational UI and take advantage of NLP and AI in general, we change the way we interact with technology dramatically. The standard GUI is many times fully eliminated, leading to novel challenges in UX. Tasks are removed from the user’s oversight with invisible or seamless software, and the output is not always as expected. But sometimes that output is correct within the parameters given and simply perceived as an error.

    Dennis will talk through where x.ai has encountered error perception issues as we seek to develop frictionless software, how we thought about the problem and the communication strategies we’re exploring to resolve it.
  • Network Telemetry & Analytics in the Age of Big Data & AI Recorded: May 15 2018 35 mins
    Ruturaj Pathak, Senior Product Manager, Networking BU, Inventec
    We are seeing a sea change in networking. SDN has enabled improvements in network telemetry and analytics.

    In this presentation, I will talk about the current challenges that are out there and how the technology change is helping us to improve the overall network telemetry. Furthermore, I will share how deep learning techniques are being used in this field. Please join this webinar to understand how the field of network telemetry is changing.
  • The Predictive Bank of the Future: How AI will Change Banking Forever Recorded: May 15 2018 47 mins
    Tariq Ali Asghar, CEO, Emerging Star investment Group
    This Webinar explains how Big Data, Artificial Intelligence, and Machine Learning is going to transform the future Banking Industry. Banks which can manage this Big Data evolution successfully will survive and thrive, and give a more holistic and personalized customer service, thereby increasing their revenues tremendously.

    The key takeaway from this Webinar is that “Right information at the right place and the right time is going to be the real money and will shape the future of Banking Industry.”

    Tariq is a Fintech Expert, writer, and thinker based in Toronto Canada and is currently working on an initiative to disrupt the conventional Banking Industry with “Big Data Predictive Analytics Model” of his startup.
  • An Introduction to Deep Learning Recorded: May 15 2018 64 mins
    Mustafa Kabul, Principal Data Scientist, SAS
    In this webinar, Mustafa Kabul, Principal Data Scientist, SAS, will provide an introduction to deep learning and its applications.

    Mustafa is a data scientist in the Artificial Intelligence and Machine Learning R&D at SAS, where he leads innovative projects for SAS’s next-generation AI-enabled analytics products, including applications of deep learning. His current focus is on applying deep reinforcement learning to operational problems in the CRM and IoT spaces. An operations research expert working at the interface of machine learning and optimization, previously, he developed distributed, large-scale integer optimization algorithms for marketing optimization problems. Ever the optimization enthusiast, Mustafa always looks into ways to improve the algorithms. Nowadays his favorites are the distributed stochastic gradient and online learning methods. Mustafa holds a PhD from the University of North Carolina at Chapel Hill, where his research focused on game theory models of supply chains selling to strategic customers.
  • Implementing a Sparse Logistic Regression Algorithm in Apache Spark Recorded: Mar 29 2018 39 mins
    Lorand Dali, Data Scientist, Zalando
    This talk tells the story of implementation and optimization of a sparse logistic regression algorithm in spark. I would like to share the lessons I learned and the steps I had to take to improve the speed of execution and convergence of my initial naive implementation. The message isn’t to convince the audience that logistic regression is great and my implementation is awesome, rather it will give details about how it works under the hood, and general tips for implementing an iterative parallel machine learning algorithm in spark.

    The talk is structured as a sequence of “lessons learned” that are shown in form of code examples building on the initial naive implementation. The performance impact of each “lesson” on execution time and speed of convergence is measured on benchmark datasets.

    You will see how to formulate logistic regression in a parallel setting, how to avoid data shuffles, when to use a custom partitioner, how to use the ‘aggregate’ and ‘treeAggregate’ functions, how momentum can accelerate the convergence of gradient descent, and much more. I will assume basic understanding of machine learning and some prior knowledge of spark. The code examples are written in scala, and the code will be made available for each step in the walkthrough.

    Lorand is a data scientist working on risk management and fraud prevention for the payment processing system of Zalando, the leading fashion platform in Europe. Previously, Lorand has developed highly scalable low-latency machine learning algorithms for real-time bidding in online advertising.
  • Having fun with Raspberry(s) and Apache Projects Recorded: Mar 29 2018 49 mins
    Jean-Frederic Clere, Manager, Software Engineering, Red Hat
    You can do a lot with a Raspberry and ASF projects. From a tiny object
    connected to the internet to a small server application. The presentation
    will explain and demo the following:

    - Raspberry as small server and captive portal using httpd/tomcat.
    - Raspberry as a IoT Sensor collecting data and sending it to ActiveMQ.
    - Raspberry as a Modbus supervisor controlling an Industruino
    (Industrial Arduino) and connected to ActiveMQ.
  • Comparing Apache Ignite & Cassandra for Hybrid Transactional Analytical Apps Recorded: Mar 28 2018 61 mins
    Denis Magda, Director of Product Management, GridGain Systems
    The 10x growth of transaction volumes, 50x growth in data volumes and drive for real-time visibility and responsiveness over the last decade have pushed traditional technologies including databases beyond their limits. Your choices are either buy expensive hardware to accelerate the wrong architecture, or do what other companies have started to do and invest in technologies being used for modern hybrid transactional analytical applications (HTAP).

    Learn some of the current best practices in building HTAP applications, and the differences between two of the more common technologies companies use: Apache® Cassandra™ and Apache® Ignite™. This session will cover:

    - The requirements for real-time, high volume HTAP applications
    - Architectural best practices, including how in-memory computing fits in and has eliminated tradeoffs between consistency, speed and scale
    - A detailed comparison of Apache Ignite and GridGain® for HTAP applications

    About the speaker: Denis Magda is the Director of Product Management at GridGain Systems, and Vice President of the Apache Ignite PMC. He is an expert in distributed systems and platforms who actively contributes to Apache Ignite and helps companies and individuals deploy it for mission-critical applications. You can be sure to come across Denis at conferences, workshop and other events sharing his knowledge about use case, best practices, and implementation tips and tricks on how to build efficient applications with in-memory data grids, distributed databases and in-memory computing platforms including Apache Ignite and GridGain.

    Before joining GridGain and becoming a part of Apache Ignite community, Denis worked for Oracle where he led the Java ME Embedded Porting Team -- helping bring Java to IoT.
  • How to Share State Across Multiple Apache Spark Jobs using Apache Ignite Recorded: Mar 28 2018 42 mins
    Akmal Chaudhri, Technology Evangelist, GridGain Systems
    Attend this session to learn how to easily share state in-memory across multiple Spark jobs, either within the same application or between different Spark applications using an implementation of the Spark RDD abstraction provided in Apache Ignite. During the talk, attendees will learn in detail how IgniteRDD – an implementation of native Spark RDD and DataFrame APIs – shares the state of the RDD across other Spark jobs, applications and workers. Examples will show how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames.

    Akmal Chaudhri has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).
  • Neural Networks/Deep Learning to Transform Modern AI Platform Recorded: Feb 22 2018 64 mins
    Dr. Umesh Hodeghatta Rao, CTO, Nu-Sigma Analytics Labs
    AI is changing the way organizations do businesses and how they interact with customers. AI continues to drive the change. Deep Learning and Natural Language Processing will become standards in AI solutions. Deep Learning is based on brain simulations and uses deep neural networks. AlphaGo is the first AI system to defeat a professional human Go player, the first program to defeat a Go world champion, and arguably the strongest Go player in history. Baidu improved speech recognition from 89% to 99% using Deep Learning. Every AI and Machine learning scientist is required to know Deep Learning tools in his / her current job scenario.

    In this session, we will be discussing what is Deep Learning and why it is gaining popularity. We will explain AI solutions using Deep Learning with a practical example. Deep Learning has an edge over other machine learning techniques as with the increased volume of data, performance increases with Deep Learning. Further, Deep Learning enables Hierarchical Feature Learning i.e. learning feature hierarchies.
  • Image recognition with deep learning Recorded: Oct 11 2017 39 mins
    Layla Tadjpour, Data Science Consultant, Ph.D. in Electrical Engineering from University of Southern California.
    In this webinar, we will learn about image recognition with deep learning. After a brief overview of what deep learning is, and why it matters, we will learn how to classify dogs from cats. That is, how to train a model to recognize dog images from cat images.

    We use Keras, an easy to use python deep learning library that sits on top of Tensorflow, and “fine-tuning”, a very important skill for any deep learning practitioner, to train a model to classify the images.

    Once we trained our model to classify dogs from cats images with high accuracy, we dig into the details of the trained model and look at its building blocks, i.e., Convolutional Neural Networks (CNN), Fully Connected Block and activation functions to develop an understanding of how the deep learning model works.
  • Data scientists: Can't live with them, can't live without them. Recorded: Aug 24 2017 45 mins
    Wyatt Benno, CEO, DataHero
    There has been a flood of publicity around big data, data processing, and the role of predictive analytics in businesses of the future.
    As business operators how do we get access to these valuable business insights, even when there is not a data analyst around to walk us through their results?

    - Should your software emulate a data scientist?
    - Learn about the power of data visualizations.
    - Learn about creating value from disperse data sets.
  • Hunting Criminals with Hybrid Analytics, Semi-supervised Learning, & Feedback Recorded: Aug 23 2017 62 mins
    David Talby, CTO, Pacific AI
    Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.

    This talk will cover a code walk-through, the key lessons learned while building such real-world software systems over the past few years. We'll look for fraud signals in public email datasets, using IPython and popular open-source libraries (scikit-learn, statsmodel, nltk, etc.) for data science and Apache Spark as the compute engine for scalable parallel processing.

    David will iteratively build a machine-learned hybrid model – combining features from different data sources and algorithmic approaches, to catch diverse aspects of suspect behavior:

    - Natural language processing: finding keywords in relevant context within unstructured text
    - Statistical NLP: sentiment analysis via supervised machine learning
    - Time series analysis: understanding daily/weekly cycles and changes in habitual behavior
    - Graph analysis: finding actions outside the usual or expected network of people
    - Heuristic rules: finding suspect actions based on past schemes or external datasets
    - Topic modeling: highlighting use of keywords outside an expected context
    - Anomaly detection: Fully unsupervised ranking of unusual behavior

    Apache Spark is used to run these models at scale – in batch mode for model training and with Spark Streaming for production use. We’ll discuss the data model, computation, and feedback workflows, as well as some tools and libraries built on top of the open-source components to enable faster experimentation, optimization, and productization of the models.
  • Analytics Nightmares and How You Can Prevent Them Recorded: Aug 22 2017 49 mins
    Meta S. Brown, Author, Data Mining for Dummies and President, A4A Brown, Inc.
    Analytics risks can keep you up at night. What if…
    · We make a big investment and don’t break even?
    · Management doesn’t trust the results?
    · Analysts cross data privacy boundaries?

    What a dilemma! You see the perils, yet you want the rewards that analytics can bring. The appropriate process enables you to dramatically reduce risks and maximize returns on your data and analytics investment.

    In this presentation, you will learn:
    · What causes most analytics failures
    · How you can diminish risk and maximize returns through strong analytics process
    · Why you (yes, you!) have a pivotal opportunity to establish high standards for analytics process right now
  • Radiant, a powerful open source Shiny application for business analytics Recorded: Jul 11 2017 58 mins
    Ali Marami Chief Data Scientist
    Radiant is a robust tool for business analytics and running sophisticated models without any need for code development. It leverages the functions and tools in R and at the same time provides a user-friendly interface. With Radiant, you can manipulate and visualize your data, run different models from simple OLS to decision trees (CART) and neural networks, and evaluate your results.

    The application is based on the Shiny package and can be run locally or on a server. Radiant was developed by Vicent Nijs. In this webinar, we review the tools available in Radiant and explain how easily you can use this tool without any setup or installation on your system.

    Radiant key features:

    • Explore: Quickly and easily summarize, visualize, and analyze your data
    • Run different models: OLS, GLM, Neural Networks, Naïve Bayes and CART.
    • Cross-platform: It runs in a browser on Windows, Mac, and Linux
    • Reproducible: Recreate results and share work with others as a state-file or an Rmarkdown report
    • Programming: Integrate Radiant's analysis functions with your own R-code
    • Context: Data and examples focus on business applications

    After this webinar you will learn:

    • Data manipulation and running different models
    • How to run advanced analytics in a browser on any device even in your tablet or iPad.


    Presenter bio:

    Ali has a Ph.D. in Finance from the University of Neuchatel in Switzerland and a BS in Electrical Engineering. He has extensive experience in financial modeling, quantitative modeling, and financial risk management in several US banks.
  • Binomial and Multinomial Logistic Regressions in R Recorded: Jun 29 2017 49 mins
    Ali Marami Chief Data Scientist
    Logistic regressions are the basic of machine learning. In this webinar, we discuss binomial and multinomial logistic regressions, how we implement them in R and test their performance. We will also review few examples of their usage in industry. In addition, you will learn how to use R-Brain advanced IDE when implementing the model.

    - Logistic regressions fundamentals and how to interpret estimates
    - Binomial and Multinomial logistic regressions
    - Implement logistic regressions in R
    - Performance measurement in logistic regressions
    - Generating and understanding ROC curve
    - Building confusion metrics and understanding its elements
    - Examples of model application in industry
    - Learn about new advanced IDE

    Presenter bio:

    Ali has a Ph.D. in Finance from the University of Neuchatel in Switzerland and a BS in Electrical Engineering. He has extensive experience in financial modeling, quantitative modeling, and financial risk management in several US banks.
Machine Learning
Machine Learning

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: How to Share State Across Multiple Apache Spark Jobs using Apache Ignite
  • Live at: Mar 28 2018 10:00 am
  • Presented by: Akmal Chaudhri, Technology Evangelist, GridGain Systems
  • From:
Your email has been sent.
or close