Hi [[ session.user.profile.firstName ]]

Analytical Zen: Keeping it Simple yet Powerful

What makes one visualisation more impactful than another? Why do some chart types struggle to tell a story? In this session we explore how the Zen of Analysis is impacted by the types of visualisation we use, and the means in which we can use human perception to better tell our data story. Here, we explore the design principles that any analyst should be aware of when designing their worksheets and dashboards, as well as tie in the concepts of visual best practice to issues of speed and performance.
Recorded Feb 25 2015 43 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Jeff Mills Director of Analytics at Tableau
Presentation preview: Analytical Zen: Keeping it Simple yet Powerful

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Implementing a Sparse Logistic Regression Algorithm in Apache Spark Mar 29 2018 12:00 pm UTC 60 mins
    Lorand Dali, Data Scientist, Zalando
    This talk tells the story of implementation and optimization of a sparse logistic regression algorithm in spark. I would like to share the lessons I learned and the steps I had to take to improve the speed of execution and convergence of my initial naive implementation. The message isn’t to convince the audience that logistic regression is great and my implementation is awesome, rather it will give details about how it works under the hood, and general tips for implementing an iterative parallel machine learning algorithm in spark.

    The talk is structured as a sequence of “lessons learned” that are shown in form of code examples building on the initial naive implementation. The performance impact of each “lesson” on execution time and speed of convergence is measured on benchmark datasets.

    You will see how to formulate logistic regression in a parallel setting, how to avoid data shuffles, when to use a custom partitioner, how to use the ‘aggregate’ and ‘treeAggregate’ functions, how momentum can accelerate the convergence of gradient descent, and much more. I will assume basic understanding of machine learning and some prior knowledge of spark. The code examples are written in scala, and the code will be made available for each step in the walkthrough.

    Lorand is a data scientist working on risk management and fraud prevention for the payment processing system of Zalando, the leading fashion platform in Europe. Previously, Lorand has developed highly scalable low-latency machine learning algorithms for real-time bidding in online advertising.
  • Having fun with Raspberry(s) and Apache Projects Mar 29 2018 10:00 am UTC 60 mins
    Jean-Frederic Clere, Manager, Software Engineering, Red Hat
    You can do a lot with a Raspberry and ASF projects. From a tiny object
    connected to the internet to a small server application. The presentation
    will explain and demo the following:

    - Raspberry as small server and captive portal using httpd/tomcat.
    - Raspberry as a IoT Sensor collecting data and sending it to ActiveMQ.
    - Raspberry as a Modbus supervisor controlling an Industruino
    (Industrial Arduino) and connected to ActiveMQ.
  • Comparing Apache Ignite & Cassandra for Hybrid Transactional Analytical Apps Mar 28 2018 9:00 pm UTC 60 mins
    Denis Magda, Director of Product Management, GridGain Systems
    The 10x growth of transaction volumes, 50x growth in data volumes and drive for real-time visibility and responsiveness over the last decade have pushed traditional technologies including databases beyond their limits. Your choices are either buy expensive hardware to accelerate the wrong architecture, or do what other companies have started to do and invest in technologies being used for modern hybrid transactional analytical applications (HTAP).

    Learn some of the current best practices in building HTAP applications, and the differences between two of the more common technologies companies use: Apache® Cassandra™ and Apache® Ignite™. This session will cover:

    - The requirements for real-time, high volume HTAP applications
    - Architectural best practices, including how in-memory computing fits in and has eliminated tradeoffs between consistency, speed and scale
    - A detailed comparison of Apache Ignite and GridGain® for HTAP applications

    About the speaker: Denis Magda is the Director of Product Management at GridGain Systems, and Vice President of the Apache Ignite PMC. He is an expert in distributed systems and platforms who actively contributes to Apache Ignite and helps companies and individuals deploy it for mission-critical applications. You can be sure to come across Denis at conferences, workshop and other events sharing his knowledge about use case, best practices, and implementation tips and tricks on how to build efficient applications with in-memory data grids, distributed databases and in-memory computing platforms including Apache Ignite and GridGain.

    Before joining GridGain and becoming a part of Apache Ignite community, Denis worked for Oracle where he led the Java ME Embedded Porting Team -- helping bring Java to IoT.
  • How to Share State Across Multiple Apache Spark Jobs using Apache Ignite Mar 28 2018 10:00 am UTC 60 mins
    Akmal Chaudhri, Technology Evangelist, GridGain Systems
    Attend this session to learn how to easily share state in-memory across multiple Spark jobs, either within the same application or between different Spark applications using an implementation of the Spark RDD abstraction provided in Apache Ignite. During the talk, attendees will learn in detail how IgniteRDD – an implementation of native Spark RDD and DataFrame APIs – shares the state of the RDD across other Spark jobs, applications and workers. Examples will show how IgniteRDD, with its advanced in-memory indexing capabilities, allows execution of SQL queries many times faster than native Spark RDDs or Data Frames.

    Akmal Chaudhri has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).
  • Scalable Monitoring for the Growing CERN Infrastructure Mar 28 2018 8:00 am UTC 60 mins
    Daniel Lanza Garcia, Big Data Engineer, CERN
    When monitoring an increasing number of machines, the infrastructure and tools need to be rethinked. A new tool, ExDeMon, for detecting anomalies and raising actions, has been developed to perform well on this growing infrastructure. Considerations of the development and implementation will be shared.

    Daniel has been working at CERN for more than 3 years as Big Data developer, he has been implementing different tools for monitoring the computing infrastructure in the organisation.
  • The Data Lake for Agile Ingest, Discovery, & Analytics in Big Data Environments Mar 27 2018 9:00 pm UTC 60 mins
    Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
    As data analytics becomes more embedded within organizations, as an enterprise business practice, the methods and principles of agile processes must also be employed.

    Agile includes DataOps, which refers to the tight coupling of data science model-building and model deployment. Agile can also refer to the rapid integration of new data sets into your big data environment for "zero-day" discovery, insights, and actionable intelligence.

    The Data Lake is an advantageous approach to implementing an agile data environment, primarily because of its focus on "schema-on-read", thereby skipping the laborious, time-consuming, and fragile process of database modeling, refactoring, and re-indexing every time a new data set is ingested.

    Another huge advantage of the data lake approach is the ability to annotate data sets and data granules with intelligent, searchable, reusable, flexible, user-generated, semantic, and contextual metatags. This tag layer makes your data "smart" -- and that makes your agile big data environment smart also!
  • Is the Traditional Data Warehouse Dead? Mar 27 2018 3:00 pm UTC 60 mins
    James Serra, Data Platform Solution Architect, Microsoft
    With new technologies such as Hive LLAP or Spark SQL, do you still need a data warehouse or can you just put everything in a data lake and report off of that? No! In the presentation, James will discuss why you still need a relational data warehouse and how to use a data lake and an RDBMS data warehouse to get the best of both worlds.

    James will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. He'll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution, and he will put it all together by showing common big data architectures.
  • Customer Support through Natural Language Processing and Machine Learning Recorded: Feb 22 2018 60 mins
    Robin Marcenac, Sr. Managing Consultant, IBM, Ross Ackerman, Dir. Digital Support Strategy, NetApp, Alex McDonald, SNIA CSI
    Watson is a computer system capable of answering questions posed in natural language. Watson was named after IBM's first CEO, Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! (where it beat its human competitors) and was then used in commercial applications, the first of which was helping with lung cancer treatment.

    NetApp is now using IBM Watson in Elio, a virtual support assistant that responds to queries in natural language. Elio is built using Watson’s cognitive computing capabilities. These enable Elio to analyze unstructured data by using natural language processing to understand grammar and context, understand complex questions, and evaluate all possible meanings to determine what is being asked. Elio then reasons and identifies the best answers to questions with help from experts who monitor the quality of answers and continue to train Elio on more subjects.

    Elio and Watson represent an innovative and novel use of large quantities of unstructured data to help solve problems, on average, four times faster than traditional methods. Join us at this webcast, where we’ll discuss:

    •The challenges of utilizing large quantities of valuable yet unstructured data
    •How Watson and Elio continuously learn as more data arrives, and navigates an ever growing volume of technical information
    •How Watson understands customer language and provides understandable responses

    Learn how these new and exciting technologies are changing the way we look at and interact with large volumes of traditionally hard-to-analyze data.

    After the webcast, check-out the Q&A blog http://www.sniacloud.com/?p=296
  • Ep. 8 Using Splunk to Analyze Performance Metrics: Q&T SIG Talk Recorded: Feb 13 2018 18 mins
    Chris Trimper
    Using Splunk to Analyze Performance Metrics

    Chris Trimper will show you how to leverage Splunk for storing and analyzing performance data from LoadRunner. This allows for easy trending and possible collaboration with other application metrics stored in Splunk. We will also look at building self-service dashboards showing application performance metrics.

    You will be hearing about:

    • Introduction to Splunk
    • Process of preparing LR data for Splunk consumption
    • Building dashboards and performing analytics in Splunk

    Join us for the next upcoming SIG Talk on Tuesday, March 13, 2018: http://www.vivit-worldwide.org/events/EventDetails.aspx?id=1073255&group=.
  • FC vs. iSCSI Recorded: Jan 31 2018 63 mins
    Fred Knight, NetApp, John Kim, SNIA ESF Chair, Mellanox, Alex McDonald, SNIA ESF Vice Chair, NetApp
    In the enterprise, block storage typically handles the most critical applications such as database, ERP, product development, and tier-1 virtualization. The dominant connectivity option for this has long been Fibre Channel SAN (FC-SAN), but recently many customers and block storage vendors have turned to iSCSI instead. FC-SAN is known for its reliability, lossless nature, 2x FC speed bumps, and carefully tested interoperability between vendors. iSCSI is known for running on ubiquitous Ethernet networks, 10x Ethernet speed bumps, and supporting commodity networking hardware from many vendors.

    Because, FCoE also delivers increasing performance as Ethernet speeds increase – and, Fibre Channel also delivers increasing performance as FC speeds increase. Historically, FC delivered speed bumps at a more rapid interval (2x bumps), while Ethernet delivered their speed bumps at a slower pace (10x bumps), but that has changed recently with Ethernet adding 2.5G, 5G, 25G, 40G, and 50G to the traditional 1G, 10G, 100G timeline.

    As the storage world moves to more flash and other non-volatile memory, more cloud, and more virtualization (or more containers), it’s time to revisit one of the great IT debates: Should you deploy Fibre Channel or iSCSI? Attend this SNIA Ethernet Storage Forum webcast to learn:
    •Will Fibre Channel or iSCSI deliver faster performance? Does it depend on the workload?
    •How is the wire speed race going between FC and iSCSI? Does anyone actually run iSCSI on 100GbE? When will 128Gb Fibre Channel arrive?
    •Do Linux, Windows, or hypervisors have a preference?
    •Is one really easier [to install/manage] than the other, or are they just different?
    •How does the new NVMe over Fabrics protocol affect this debate?

    Join SNIA experts as they compare FC vs. iSCSI and argue in an energetic yet friendly way about their differences and merits of each.

    After you watch the webcast check out the Q&A blog http://sniaesfblog.org/?p=680
  • Enterprise Analytics Journey, the IBM point of view for IBM Z customers Recorded: Dec 14 2017 41 mins
    Hélène Lyon, IBM, Distinguished Engineer, IBM Z Solutions Architect
    IT is a key player in the digital and cognitive transformation of business processes delivering solutions for improved business value with analytics. This session will step by step explain the journey to secure production while adopting new analytics technologies leveraging mainframe core business assets
  • Big Data Analytics vs Privacy: Risks and Opportunities Recorded: Dec 14 2017 58 mins
    Rob Anderson, Head of Field Operations (Privitar),Tim Hickman, Associate (White & Case)
    Today's modern businesses gain competitive edge and remain innovative by using advanced analytics and machine learning. Utilising big data can build customer loyalty by improving personalised marketing campaigns; optimises fraud detection; and improves products and services by advanced testing. However, the data sets required for advanced analytics are often sensitive, containing personal customer information, and therefore come with an inherent set of privacy risks and concerns.

    This roundtable will cover a few key questions on data utility and privacy:

    - In what ways advanced analytics help businesses gain competitive edge?

    - What is defined as sensitive data?

    - Will GDPR affect the way you're allowed to use customer data?

    - What opportunities are there to utilise sensitive data?

    Unlocking the data’s true value is a challenge, but there are a range of tools and techniques that can help. This live discussion will focus on the data analytics landscape; compliance considerations and opportunities for improving data utility in 2018 and beyond.

    Key takeaways:

    - A view of the data protection landscape

    - How to remaining compliant with GDPR when using customer data

    - Use cases for advanced analytics and machine learning

    - Opportunities for maximising data utility in 2018
  • Multi-Cloud Storage: Addressing the Need for Portability and Interoperability Recorded: Dec 12 2017 43 mins
    John Webster, Senior Partner, Evaluator Group, Mark Carlson, SNIA CSI Co-Chair, Alex McDonald, SNIA CSI Chair
    In a recent survey of enterprise hybrid cloud users, the Evaluator Group saw that nearly 60% of respondents indicated that lack of interoperability is a significant technology-related issue that they must overcome in order to move forward. In fact, lack of interoperability was chosen above public cloud security and network security as significant inhibitors. This webcast looks at enterprise hybrid cloud objectives and barriers with a focus on cloud interoperability within the storage domain and the SNIA’s Cloud Storage Initiative to promote interoperability and portability of data stored in the cloud.
  • From Big Data to AI: Building Machine Learning Applications Recorded: Dec 12 2017 49 mins
    Maloy Manna Data engineering PM, AXA Data Innovation Lab
    The newest buzzword after Big Data is AI. From Google search to Facebook messenger bots, AI is also everywhere.

    Machine learning has gone mainstream. Organizations are trying to build competitive advantage with AI and Big Data.

    But, what does it take to build Machine Learning applications? Beyond the unicorn data scientists and PhDs, how do you build on your big data architecture and apply Machine Learning to what you do?

    This talk will discuss technical options to implement machine learning on big data architectures and how to move forward.
  • Applying Machine Learning to unstructured files and data for research Recorded: Dec 12 2017 45 mins
    Dr Tom Parsons, Research Director and Dr Stuart Bowe, Data Scientist from Spotlight Data
    Researchers generate huge amounts of valuable unstructured data and articles from research every day. The potential for this information is huge: cancer and pharmaceutical breakthroughs, advances in technology and cultural research that can improve the world we live in.

    This webinar discusses how text mining and Machine Learning can be used to make connections across this broad range of files and help drive innovation and research. We discuss using Kubernetes microservices to analyse the data and then applying Machine Learning and graph databases to simplify the reuse of the data.
  • Unprotected Files on a Public Cloud Server: Live Panel on the NSA Data Leak Recorded: Dec 8 2017 59 mins
    Chris Vickery, George Crump, David Linthicum, Charles Goldberg, Mark Carlson
    Public, private and hybrid cloud are nothing new, but protecting sensitive data stored on these servers is still of the utmost concern. The NSA is no exception.

    It recently became publicized that the contents of a highly sensitive hard drive belonging to the NSA (National Security Agency) were compromised. The virtual disk containing the sensitive data came from an Army Intelligence project and was left on a public AWS (Amazon Web Services) storage server, not password-protected.

    This is one of at least 5 other leaks of NSA-related data in recent years. Not to mention the significant number of breaches and hacks we’ve experienced lately, including Yahoo!, Equifax, WannaCry, Petya, and more.

    The culprit in this case? Unprotected storage buckets. They have played a part in multiple other recent exposures, and concern is on the rise. When it comes to storing data on public cloud servers like AWS, Azure, Google Cloud, Rackspace and more, what are the key responsibilities of Storage Architects and Engineers, CIOs and CTOs to avoid these types data leaks?

    Tune in with Chris Vickery, Director of Cyber Risk Research at UpGuard and the one who discovered the leak, along with George Crump, Chief Steward, Storage Switzerland, David Linthicum, Cloud Computing Visionary, Author & Speaker, Charles Goldberg, Sr. Director of Product Marketing, Thales e-Security, and Mark Carlson, Co-Chair, SNIA Technical Council & Cloud Storage Initiative, for a live panel discussion on this ever-important topics.
  • The New Business Reality of GDPR with David Siegel and Michael Shea Recorded: Nov 30 2017 52 mins
    David Siegel and Michael Shea (20|30)
    The new business reality of GDPR and how you use customer data is inexorably approaching, if you work in or are doing business with anyone in the EU you must deal with this regulation.

    With data protection, there are really only two options: protection of data through ever-more data centralization and security or turning the customer data paradigm on its head and decentralize the data.

    We have a new model: give your customers full control over their data, gain their trust, and lower your costs with the open-source Pillar Business Wallet. Join our conversation Thursday, 30th of November.
  • From Data with Love: How the data economy is impacting the insurance sector Recorded: Nov 20 2017 60 mins
    JS Gourevitch, Luca Schnettler, Petra Wildermann, Anil Celik, Thomas Lethenborg
    The data economy and digital technologies are deeply transforming almost all areas of our lives. One of the most heavily transformed revolve around insurance and healthcare with a number of really interesting development possibly redefining the way we take care of ourselves and the way we consumer and use insurance as well.

    From harnessing the power of data to better help mental health patients, carers and medical personnel with their treatments to assessing the risk of developing broad range of illnesses and engaging better with users to propose them personalised healthy life plans to using big data and analytics to track down and prepare for epidemics to using data to better cover cars and drivers with car insurances and finally using social media data for insurers to better engage with customers, this webinar will propose a fascinating exploration of the opportunities, risks, new models supporting the digital transformation in banking.

    Moderated by Jean-Stéphane Gourévitch
    Luca Schnettler, CEO and Founder, HealthyHealth (UK)
    Petra Wildermann, Business Development Director, Metabiota (Switzerland)
    Anil Celik, Co-founder and CEO Urbanstats (US)
    Thomas Lethenborg, CEO, Monsenso (Denmark)
  • Transactional Models & Their Storage Requirements Recorded: Oct 31 2017 63 mins
    Alex McDonald, SNIA-ESF Vice Chair, NetApp, J Metz, SNIA Board of Directors, Cisco
    We’re all accustomed to transferring money from one bank account to another; a credit to the payer becomes a debit to the payee. But that model uses a specific set of sophisticated techniques to accomplish what appears to be a simple transaction. We’re also aware of how today we can order goods online, or reserve an airline seat over the Internet. Or even simpler, we can update a photograph on Facebook. Can these applications use the same models, or are new techniques required?

    One of the more important concepts in storage is the notion of transactions, which are used in databases, financials, and other mission critical workloads. However, in the age of cloud and distributed systems, we need to update our thinking about what constitutes a transaction. We need to understand how new theories and techniques allow us to undertake transactional work in the face of unreliable and physically dispersed systems. It’s a topic full of interesting concepts (and lots of acronyms!). In this webcast, we’ll provide a brief tour of traditional transactional systems and their use of storage, we’ll explain new application techniques and transaction models, and we’ll discuss what storage systems need to look like to support these new advances.

    And yes, we’ll explain all the acronyms and nomenclature too.

    You will learn:

    •A brief history of transactional systems from banking to Facebook
    •How the Internet and distributed systems have changed and how we view transactions
    •An explanation of the terminology, from ACID to CAP and beyond
    •How applications, networks & particularly storage have changed to meet these demands

    After watching, check out our webcast Q&A blog http://sniaesfblog.org/?p=662
  • Three Ways To Accelerate Your Data Lake Migration To Cloud Recorded: Oct 25 2017 45 mins
    Kelly Stirman, VP Strategy, Dremio
    Public cloud deployments have become irresistible in terms of flexibility, low barriers to entry, security, and developer friendliness. But the sheer inertia of traditional data lakes make them difficult to transition to cloud. In this talk we'll look at examples of how leading companies have made the transition using open source technologies and hybrid strategies.

    Instead of following a "lift and shift" strategy for moving data lake workloads to the cloud, there are new considerations unique to cloud that should be considered alongside traditional approaches related to compute (eg, GPU, FPGA), storage (object store vs. file store), integrations, and security.

    Viewers will take away techniques they can immediately apply to their own projects.
Making data intelligent
You've got data. It's time manage it. Find information here on everything from data governance and data quality, to master and metadata management, data architecture, and the thing that was just invented ten seconds ago.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Analytical Zen: Keeping it Simple yet Powerful
  • Live at: Feb 25 2015 3:00 pm
  • Presented by: Jeff Mills Director of Analytics at Tableau
  • From:
Your email has been sent.
or close