Hi [[ session.user.profile.firstName ]]

Building Your Next Generation Data Architecture with Hadoop

Building your Next Generation Data Architecture, a webinar co-hosted by Birst and Altiscale. Featuring Brad Peters, Birst Founder and Chief Product Officer, and Raymie Stata, Founder and CEO Altiscale, you’ll hear examples of how customers have operationalised Hadoop in the enterprise, overcoming major obstacles to make data in Hadoop available to broad sets of users across their companies.

This webinar reveals:

· How organisations are transitioning to the next generation data architecture

· Recommendations for how IT organizations can maximise the value of their existing data architectures

· How to overcome hurdles when operationalising Hadoop in the enterprise

· Examples of customer reference architectures
Recorded Aug 10 2016 54 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Brad Peters, Birst Founder and Chief Product Officer, and Raymie Stata, Founder and CEO Altiscale
Presentation preview: Building Your Next Generation Data Architecture with Hadoop

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • RoCE vs. iWARP Aug 22 2018 5:00 pm UTC 75 mins
    Tim Lustig, Mellanox; Fred Zhang, Intel; John Kim, Mellanox
    Network-intensive applications, like networked storage or clustered computing, require a network infrastructure with high bandwidth and low latency. Remote Direct Memory Access (RDMA) supports zero-copy data transfers by enabling movement of data directly to or from application memory. This results in high bandwidth, low latency networking with little involvement from the CPU.

    In the next SNIA ESF “Great Storage Debates” series webcasts, we’ll be examining two commonly known RDMA protocols that run over Ethernet; RDMA over Converged Ethernet (RoCE) and IETF-standard iWARP. Both are Ethernet-based RDMA technologies that reduce the amount of CPU overhead in transferring data among servers and storage systems.

    The goal of this presentation is to provide a solid foundation on both RDMA technologies in a vendor-neutral setting that discusses the capabilities and use cases for each so that attendees can become more informed and make educated decisions.

    Join to hear the following questions addressed:

    •Both RoCE and iWARP support RDMA over Ethernet, but what are the differences?
    •Use cases for RoCE and iWARP and what differentiates them?
    •UDP/IP and TCP/IP: which uses which and what are the advantages and disadvantages?
    •What are the software and hardware requirements for each?
    •What are the performance/latency differences of each?

    Join our SNIA experts as they answer all these questions and more on this next Great Storage Debate
  • Cloud Mobility and Data Movement Aug 7 2018 5:00 pm UTC 75 mins
    Eric Lakin, University of Michigan; Michelle Tidwell, IBM; Alex McDonald, NetApp
    We’re increasingly in a multi-cloud environment, with potentially multiple private, public and hybrid cloud implementations in support of a single enterprise. Organizations want to leverage the agility of public cloud resources to run existing workloads without having to re-plumb or re-architect them and their processes. In many cases, applications and data have been moved individually to the public cloud. Over time, some applications and data might need to be moved back on premises, or moved partially or entirely from one cloud to another.

    That means simplifying the movement of data from cloud to cloud. Data movement and data liberation – the seamless transfer of data from one cloud to another – has become a major requirement.

    In this webcast, we’re going to explore some of these data movement and mobility issues with real-world examples from the University of Michigan. Register now for discussions on:

    •How do we secure data both at-rest and in-transit?
    •Why is data so hard to move? What cloud processes and interfaces should we use to make data movement easier?
    •How should we organize our data to simplify its mobility? Should we use block, file or object technologies?
    •Should the application of the data influence how (and even if) we move the data?
    •How can data in the cloud be leveraged for multiple use cases?
  • How to Leverage Big Data for Customers: Lessons from a Purpose-Driven Bank Recorded: Jul 17 2018 42 mins
    Paul Clark, CTO, Tandem
    For years, banks have been sitting on a goldmine of customer data. Only recently have they started exploiting that, although not surprisingly for their own benefit.
    Personal data can give great insights to drive bank outcomes by decreasing credit losses and reducing fraud losses. In this webinar Paul Clark, CTO, will look at how we can use customer data to;
    * Drive customer’s own advantage
    * Avoid slip ups
    * Dodge nasty charges
    * Optimise the customer’s finances end to end.
  • Nemertes Conversations: Is Your Data Ready for GDPR? Recorded: May 24 2018 53 mins
    Co-presented by: Julie McCoy, Solutions Engineer, AvePoint; and Irwin Lazar, VP & Service Director, Nemertes Research
    GDPR requires organizations to identify, classify, and protect personal information, but how do you prepare and protect against a possible breach if you don't know what data you have, where it lives, or how it's classified?

    In this informative webinar we'll discuss:
    • GDPR data classification requirements
    • How to incorporate GDPR data analysis into your breech prevention and reaction plan
    • How to classify and protect information across multiple data stores
    • Solutions for automating classification and information protection

    We look forward to sharing this information with you!
  • Does it matter if an algorithm can't explain how it knows what it knows? Recorded: May 24 2018 34 mins
    Beau Walker, Founder, Method Data Science
    With the General Data Protection Regulation (GDPR) becoming enforceable in the EU on May 25, 2018, many data scientists are worried about the impact that this regulation and similar initiatives in other countries that give consumers a "right to explanation" of decisions made by algorithms will have on the field of predictive and prescriptive analytics.

    In this session, Beau will discuss the role of interpretable algorithms in data science as well as explore tools and methods for explaining high-performing algorithms.

    Beau Walker has a Juris Doctorate (law degree) and BS and MS Degrees in Biology and Ecology and Evolution. Beau has worked in many domains including academia, pharma, healthcare, life sciences, insurance, legal, financial services, marketing, and IoT.
  • Semantic AI: Bringing Machine Learning and Knowledge Graphs Together Recorded: May 23 2018 64 mins
    Kirk Borne, Principal Data Scientist, Booz Allen Hamilton & Andreas Blumauer, CEO, Managing Partner Semantic Web Company
    Implementing AI applications based on machine learning is a significant topic for organizations embracing digital transformation. By 2020, 30% of CIOs will include AI in their top five investment priorities according to Gartner’s Top 10 Strategic Technology Trends for 2018: Intelligent Apps and Analytics. But to deliver on the AI promise, organizations need to generate good quality data to train the algorithms. Failure to do so will result in the following scenario: "When you automate a mess, you get an automated mess."

    This webinar covers:

    - An introduction to machine learning use cases and challenges provided by Kirk Borne, Principal Data Scientist at Booz Allen Hamilton and top data science and big data influencer.
    - How to achieve good data quality based on harmonized semantic metadata presented by Andreas Blumauer, CEO and co-founder of Semantic Web Company and a pioneer in the application of semantic web standards for enterprise data integration.
    - How to apply a combined approach when semantic knowledge models and machine learning build the basis of your cognitive computing. (See Attachment: The Knowledge Graph as the Default Data Model for Machine Learning)
    - Why a combination of machine and human computation approaches is required, not only from an ethical but also from a technical perspective.
  • Audit Ex Machina: Digital Learning Systems and Transactional Data Recorded: May 17 2018 44 mins
    Erik McBain, Strategic Account Manager, MindBridge Ai,
    How are financial service firms around the world using machine learning systems today to identify and address risk in transactional datasets?

    This webinar will look at a new approach to transaction analysis and illustrate how the combination of traditional rules-based approaches can be augmented with next-generation machine learning systems to uncover more in the data, faster and more efficiently.

    We will span the various applications in banking, payments, trading, and compliance; looking at a variety of use cases from bank branch transaction analysis to trading data validation.

    Anyone interested in financial technology, next-generation machine learning systems and the future of the financial services industry will find this webinar of specific interest.

    About the speaker:
    Erik McBain, CFA is a Strategic Account Manager for MindBridge Ai, where he specializes in the deployment of emerging technologies such as artificial intelligence and machine learning systems in global financial institutions and corporations. Over his 10-year career in banking and financial services(Deutsche Bank, CIBCWM, Central Banking), Erik has been immersed in the trading, analysis, and sale of financial instruments and the deployment of new payment, banking and intelligent technologies. Erik's focus is identifying the various opportunities created through technological disruption, creating partnerships, and applying a client-centered innovation process to create transformative experiences, products, and services for his clients.
  • The Teslification of Banking: The Role of Ethical AI in Sustainable Finance Recorded: May 17 2018 37 mins
    Richard Peers, Director Financial Services Industry, Microsoft
    Artificial Intelligence has a huge role to play in banking, no more so than in sustainable finance. However, data is very patchy and much source data is not available to inform Sustainable Finance. The challenge as we set off on this new journey is to make sure that the data and algorithms used are transparent and unbiased.

    In this session, Richard Peers, Director of Financial Services industry at Microsoft will share how disruption and new entrants are bringing new business models and technology to play in banking as in other industries like the Auto Industry

    One new area is sustainable Finance, a voluntary initiative as part of the COP agreement on climate change but the data to inform the markets is a challenge. Big Data, Machine Learning and AI can help resolve this.

    But with such important issues at stake, this session will outline how AI much be designed to ethical principles

    Tune in to this session for a high-level view of some key trends and technologies in banking. Get insight into sustainable finance; why AI can help and why Ethical AI is important; and the Microsoft principles for Ethical AI.
  • Open Banking - Data, Analytics and the Tragedy of the Commons Recorded: May 15 2018 59 mins
    Dr Louise Beaumont (techUK), Natasha Kyprianides (Hellenic Bank), Tony Fish (AMF Ventures), Katrina Cruz (Anthemis Group)
    The tragedy of the commons, first described by biologist Garrett Hardin in 1968, describes how shared resources are overused and eventually depleted. He compared shared resources to a common grazing pasture; in this scenario, everyone with rights to the pasture acting in self-interest for the greatest short-term personal gain depletes the resource until it is no longer viable.

    The banking ecosystem and the data that binds it together is not all that different. For many years, through miss-selling scandals, cookie cutter products and dumb mass-marketing have seen players acting in their own interest in accordance to what they believe the ecosystem should look like, how it should evolve and who controls it.

    But with the introduction of open banking, there are signs that new banking ecosystems are set to thrive. Taking Hardin’s notion, collaboration in the open banking future could benefit everyone in the ecosystem – the traditional banks, the FinTechs, the tech titans with their expertise in delivering services at scale, and yet-to-be-defined participants, likely to include the large data players such as energy firms, retailers and telcos.

    Join me to explore the Open Future.
  • What’s Next in Storage: Analysts and Experts Share their Predictions Recorded: May 2 2018 43 mins
    Greg McSorley, SNIA Technical Council (non-voting); Rick Kutcipal, President, STA; Don Jeanette of TRENDFOCUS
    You won’t want to miss the opportunity to hear leading data storage experts provide their insights on prominent technologies that are shaping the market. With the exponential rise in demand for high capacity and secured storage systems, it’s critical to understand the key factors influencing adoption and where the highest growth is expected. From SSDs and HDDs to storage interfaces and NAND devices, get the latest information you need to shape key strategic directions and remain competitive.
  • Implementing a Sparse Logistic Regression Algorithm in Apache Spark Recorded: Mar 29 2018 39 mins
    Lorand Dali, Data Scientist, Zalando
    This talk tells the story of implementation and optimization of a sparse logistic regression algorithm in spark. I would like to share the lessons I learned and the steps I had to take to improve the speed of execution and convergence of my initial naive implementation. The message isn’t to convince the audience that logistic regression is great and my implementation is awesome, rather it will give details about how it works under the hood, and general tips for implementing an iterative parallel machine learning algorithm in spark.

    The talk is structured as a sequence of “lessons learned” that are shown in form of code examples building on the initial naive implementation. The performance impact of each “lesson” on execution time and speed of convergence is measured on benchmark datasets.

    You will see how to formulate logistic regression in a parallel setting, how to avoid data shuffles, when to use a custom partitioner, how to use the ‘aggregate’ and ‘treeAggregate’ functions, how momentum can accelerate the convergence of gradient descent, and much more. I will assume basic understanding of machine learning and some prior knowledge of spark. The code examples are written in scala, and the code will be made available for each step in the walkthrough.

    Lorand is a data scientist working on risk management and fraud prevention for the payment processing system of Zalando, the leading fashion platform in Europe. Previously, Lorand has developed highly scalable low-latency machine learning algorithms for real-time bidding in online advertising.
  • Having fun with Raspberry(s) and Apache Projects Recorded: Mar 29 2018 49 mins
    Jean-Frederic Clere, Manager, Software Engineering, Red Hat
    You can do a lot with a Raspberry and ASF projects. From a tiny object
    connected to the internet to a small server application. The presentation
    will explain and demo the following:

    - Raspberry as small server and captive portal using httpd/tomcat.
    - Raspberry as a IoT Sensor collecting data and sending it to ActiveMQ.
    - Raspberry as a Modbus supervisor controlling an Industruino
    (Industrial Arduino) and connected to ActiveMQ.
  • Comparing Apache Ignite & Cassandra for Hybrid Transactional Analytical Apps Recorded: Mar 28 2018 61 mins
    Denis Magda, Director of Product Management, GridGain Systems
    The 10x growth of transaction volumes, 50x growth in data volumes and drive for real-time visibility and responsiveness over the last decade have pushed traditional technologies including databases beyond their limits. Your choices are either buy expensive hardware to accelerate the wrong architecture, or do what other companies have started to do and invest in technologies being used for modern hybrid transactional analytical applications (HTAP).

    Learn some of the current best practices in building HTAP applications, and the differences between two of the more common technologies companies use: Apache® Cassandra™ and Apache® Ignite™. This session will cover:

    - The requirements for real-time, high volume HTAP applications
    - Architectural best practices, including how in-memory computing fits in and has eliminated tradeoffs between consistency, speed and scale
    - A detailed comparison of Apache Ignite and GridGain® for HTAP applications

    About the speaker: Denis Magda is the Director of Product Management at GridGain Systems, and Vice President of the Apache Ignite PMC. He is an expert in distributed systems and platforms who actively contributes to Apache Ignite and helps companies and individuals deploy it for mission-critical applications. You can be sure to come across Denis at conferences, workshop and other events sharing his knowledge about use case, best practices, and implementation tips and tricks on how to build efficient applications with in-memory data grids, distributed databases and in-memory computing platforms including Apache Ignite and GridGain.

    Before joining GridGain and becoming a part of Apache Ignite community, Denis worked for Oracle where he led the Java ME Embedded Porting Team -- helping bring Java to IoT.
  • How to Share State Across Multiple Apache Spark Jobs using Apache Ignite Recorded: Mar 28 2018 42 mins
    Akmal Chaudhri, Technology Evangelist, GridGain Systems
    Attend this session to learn how to easily share state in-memory across multiple Spark jobs, either within the same application or between different Spark applications using an implementation of the Spark RDD abstraction provided in Apache Ignite. During the talk, attendees will learn in detail how IgniteRDD – an implementation of native Spark RDD and DataFrame APIs – shares the state of the RDD across other Spark jobs, applications and workers. Examples will show how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames.

    Akmal Chaudhri has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).
  • Scalable Monitoring for the Growing CERN Infrastructure Recorded: Mar 28 2018 45 mins
    Daniel Lanza Garcia, Big Data Engineer, CERN
    When monitoring an increasing number of machines, the infrastructure and tools need to be rethinked. A new tool, ExDeMon, for detecting anomalies and raising actions, has been developed to perform well on this growing infrastructure. Considerations of the development and implementation will be shared.

    Daniel has been working at CERN for more than 3 years as Big Data developer, he has been implementing different tools for monitoring the computing infrastructure in the organisation.
  • The Data Lake for Agile Ingest, Discovery, & Analytics in Big Data Environments Recorded: Mar 27 2018 58 mins
    Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
    As data analytics becomes more embedded within organizations, as an enterprise business practice, the methods and principles of agile processes must also be employed.

    Agile includes DataOps, which refers to the tight coupling of data science model-building and model deployment. Agile can also refer to the rapid integration of new data sets into your big data environment for "zero-day" discovery, insights, and actionable intelligence.

    The Data Lake is an advantageous approach to implementing an agile data environment, primarily because of its focus on "schema-on-read", thereby skipping the laborious, time-consuming, and fragile process of database modeling, refactoring, and re-indexing every time a new data set is ingested.

    Another huge advantage of the data lake approach is the ability to annotate data sets and data granules with intelligent, searchable, reusable, flexible, user-generated, semantic, and contextual metatags. This tag layer makes your data "smart" -- and that makes your agile big data environment smart also!
  • Is the Traditional Data Warehouse Dead? Recorded: Mar 27 2018 61 mins
    James Serra, Data Platform Solution Architect, Microsoft
    With new technologies such as Hive LLAP or Spark SQL, do you still need a data warehouse or can you just put everything in a data lake and report off of that? No! In the presentation, James will discuss why you still need a relational data warehouse and how to use a data lake and an RDBMS data warehouse to get the best of both worlds.

    James will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. He'll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution, and he will put it all together by showing common big data architectures.
  • Customer Support through Natural Language Processing and Machine Learning Recorded: Feb 22 2018 60 mins
    Robin Marcenac, Sr. Managing Consultant, IBM, Ross Ackerman, Dir. Digital Support Strategy, NetApp, Alex McDonald, SNIA CSI
    Watson is a computer system capable of answering questions posed in natural language. Watson was named after IBM's first CEO, Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! (where it beat its human competitors) and was then used in commercial applications, the first of which was helping with lung cancer treatment.

    NetApp is now using IBM Watson in Elio, a virtual support assistant that responds to queries in natural language. Elio is built using Watson’s cognitive computing capabilities. These enable Elio to analyze unstructured data by using natural language processing to understand grammar and context, understand complex questions, and evaluate all possible meanings to determine what is being asked. Elio then reasons and identifies the best answers to questions with help from experts who monitor the quality of answers and continue to train Elio on more subjects.

    Elio and Watson represent an innovative and novel use of large quantities of unstructured data to help solve problems, on average, four times faster than traditional methods. Join us at this webcast, where we’ll discuss:

    •The challenges of utilizing large quantities of valuable yet unstructured data
    •How Watson and Elio continuously learn as more data arrives, and navigates an ever growing volume of technical information
    •How Watson understands customer language and provides understandable responses

    Learn how these new and exciting technologies are changing the way we look at and interact with large volumes of traditionally hard-to-analyze data.

    After the webcast, check-out the Q&A blog http://www.sniacloud.com/?p=296
  • Q&T SIG Talk Series Ep. 8: Using Splunk to Analyze Performance Metrics Recorded: Feb 13 2018 18 mins
    Chris Trimper
    Using Splunk to Analyze Performance Metrics

    Chris Trimper will show you how to leverage Splunk for storing and analyzing performance data from LoadRunner. This allows for easy trending and possible collaboration with other application metrics stored in Splunk. We will also look at building self-service dashboards showing application performance metrics.

    You will be hearing about:

    • Introduction to Splunk
    • Process of preparing LR data for Splunk consumption
    • Building dashboards and performing analytics in Splunk

    Join us for the next upcoming SIG Talk on Tuesday, March 13, 2018: http://www.vivit-worldwide.org/events/EventDetails.aspx?id=1073255&group=.
  • FC vs. iSCSI Recorded: Jan 31 2018 63 mins
    Fred Knight, NetApp, John Kim, SNIA ESF Chair, Mellanox, Alex McDonald, SNIA ESF Vice Chair, NetApp
    In the enterprise, block storage typically handles the most critical applications such as database, ERP, product development, and tier-1 virtualization. The dominant connectivity option for this has long been Fibre Channel SAN (FC-SAN), but recently many customers and block storage vendors have turned to iSCSI instead. FC-SAN is known for its reliability, lossless nature, 2x FC speed bumps, and carefully tested interoperability between vendors. iSCSI is known for running on ubiquitous Ethernet networks, 10x Ethernet speed bumps, and supporting commodity networking hardware from many vendors.

    Because, FCoE also delivers increasing performance as Ethernet speeds increase – and, Fibre Channel also delivers increasing performance as FC speeds increase. Historically, FC delivered speed bumps at a more rapid interval (2x bumps), while Ethernet delivered their speed bumps at a slower pace (10x bumps), but that has changed recently with Ethernet adding 2.5G, 5G, 25G, 40G, and 50G to the traditional 1G, 10G, 100G timeline.

    As the storage world moves to more flash and other non-volatile memory, more cloud, and more virtualization (or more containers), it’s time to revisit one of the great IT debates: Should you deploy Fibre Channel or iSCSI? Attend this SNIA Ethernet Storage Forum webcast to learn:
    •Will Fibre Channel or iSCSI deliver faster performance? Does it depend on the workload?
    •How is the wire speed race going between FC and iSCSI? Does anyone actually run iSCSI on 100GbE? When will 128Gb Fibre Channel arrive?
    •Do Linux, Windows, or hypervisors have a preference?
    •Is one really easier [to install/manage] than the other, or are they just different?
    •How does the new NVMe over Fabrics protocol affect this debate?

    Join SNIA experts as they compare FC vs. iSCSI and argue in an energetic yet friendly way about their differences and merits of each.

    After you watch the webcast check out the Q&A blog http://sniaesfblog.org/?p=680
Making data intelligent
You've got data. It's time manage it. Find information here on everything from data governance and data quality, to master and metadata management, data architecture, and the thing that was just invented ten seconds ago.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Building Your Next Generation Data Architecture with Hadoop
  • Live at: Aug 10 2016 1:00 pm
  • Presented by: Brad Peters, Birst Founder and Chief Product Officer, and Raymie Stata, Founder and CEO Altiscale
  • From:
Your email has been sent.
or close