Hi [[ session.user.profile.firstName ]]

Pre-Processing Big Data: Techniques to improve quality of big data analysis

Data in the real-world is almost always dirty, incomplete, scattered or inconsistent. For data scientists, 'janitor work' is a key hurdle to data insights.

Whether you use big data for analytics or data science, with increasing variety and velocity of big data, the data pre-processing step can be the most time-consuming step in your data pipeline.

Featuring engineering concepts and practical examples in Python and R, this webinar will focus on technical considerations and data engineering techniques to optimise data preparation to get the most value from your big data pipeline.
Recorded Oct 12 2016 48 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Maloy Manna, Engineering, AXA Data Innovation Lab
Presentation preview: Pre-Processing Big Data: Techniques to improve quality of big data analysis

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Enterprise Analytics Journey, the IBM point of view for IBM Z customers Dec 14 2017 1:00 pm UTC 45 mins
    Hélène Lyon, IBM, Distinguished Engineer, IBM Z Solutions Architect
    IT is a key player in the digital and cognitive transformation of business processes delivering solutions for improved business value with analytics. This session will step by step explain the journey to secure production while adopting new analytics technologies leveraging mainframe core business assets
  • Mixed Reality - Market Potential and Use Cases Dec 14 2017 9:00 am UTC 60 mins
    Adnyesh Dalpati, Solutions Architect
    Mixed reality is the result of blending the physical world with the digital world. Though it is relatively new technology and its adoption is still in initial stages. Mixed Reality devices and applications are projected to be the next technological era after smart phones.

    The webinar will give a brief on Mixed Reality Potential Usecases those provide an immersive experience but also revenues streams to the creators.
  • Data Fabric: A New Paradigm For Self-Service Data & Data Scientists Dec 12 2017 5:00 pm UTC 45 mins
    Kelly Stirman, VP Strategy, Dremio
    Data Scientists are rare and highly valued individuals, and for good reason: making sense of data, and using the machine learning libraries requires an unusual blend of advanced skills. Why is it then that Data Scientists spend the majority of their time getting data ready for models, and a fraction actually doing the high value work?

    In this talk we introduce the concept of Data Fabric, a new way to provide a self-service model for data, where data scientists can easily discover, curate, share, and accelerate data analysis using Python, R, and visualization tools, no matter where the data is managed, no matter the structure, and no matter the size.

    We will talk through the role of Apache Arrow, the in-memory columnar data standard that is accelerating analytics for GPU-based processing, as well as the role of Pandas and Arrow in providing unprecedented speed in accessing datasets from Python.
  • The evolution of AI and how it can improve investment outcomes Dec 12 2017 4:00 pm UTC 45 mins
    David Itzkovits, CFA, Chief Executive Officer, Sanlam Global Investment Solutions
    The majority of industries are embracing artificial intelligence (AI) and machine learning (ML) to ensure their clients’ experience is the best it can be. However, the mainstream asset management industry is way behind the curve.

    This presentation will give an introduction to AI & ML, looking at what it is and how it is being used with real world examples. The presentation will then look at how the financial industry is currently using AI and the developments in progress.

    We will end with how Sanlam believes AI can help the asset and wealth management businesses.
  • The Human Role in AI Dec 12 2017 3:00 pm UTC 45 mins
    Peter Bruce, President and Founder, The Institute for Statistics Education at Statistics.com
    Artificial Intelligence (AI) is a hot topic, and there is widespread alarm that AI will replace humans in the analytical process. Adam Selipsky, the CEO of Tableau, terms this a myth, and said recently that AI's role will remain that of an assistant to the analytics professional.

    In this talk we go beyond that, and look at some interesting aspects of the human role as an integral component of machine learning and statistical modeling.

    We discuss how human expertise "supervises" machine learning, how reliance on multiple sources can deliver surprising expertise, and when that system can go wrong.
  • Applying Machine Learning to unstructured files and data for research Dec 12 2017 11:00 am UTC 45 mins
    Dr Tom Parsons, Research Director and Mitchell Murphy, Data Scientist from Spotlight Data
    Researchers generate huge amounts of valuable unstructured data and articles from research every day. The potential for this information is huge: cancer and pharmaceutical breakthroughs, advances in technology and cultural research that can improve the world we live in.

    This webinar discusses how text mining and Machine Learning can be used to make connections across this broad range of files and help drive innovation and research. We discuss using Kubernetes microservices to analyse the data and then applying Machine Learning and graph databases to simplify the reuse of the data.
  • IOT Future Roadmap - Adoption & Standardization Nov 10 2017 9:00 am UTC 60 mins
    Adnyesh Dalpati, Solutions Architect
    Internet of Things (IoT) envisions that everything in the physical world is connected seamlessly and is securely integrated through Internet. New products are innovated under the umbrella of IOT and opening up different opportunities. This webinar will discuss the future potential of IOT and the trend in which it is moving in adoption and standardisation.
  • Predicting Football Results With Statistical Modelling Nov 9 2017 1:00 pm UTC 60 mins
    David Sheehan, Senior Customer Scientist, MoneySuperMarket
    Football (or soccer to any American readers) is full of clichés: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”.

    This webinar will introduce some basic statistical models that have been devised to predict the results of football matches. It won't help you make lots of money, but you will learn about programming, statistics and modelling through this fun intuitive topic.
  • Setting up BI in FinTech companies: challenges and opportunities Nov 2 2017 4:00 pm UTC 45 mins
    Mari Hermanns, Head of Business Intelligence at Solaris Bank
    Today most companies collect more data than ever and as we all know: data is the new oil. However gaining insights and turning them into action is easier said than done. In my experience this is a challenge for many companies, including innovative FinTechs.

    In order to create a data driven business and organisational culture it is important to integrate data collection and an appreciation for data driven truth from the starting of a venture. This webinar is a brief overview of the hurdles and challenges BI faces in growing FinTech companies and how they can be overcome. Furthermore this webinar will briefly mention new BI trends and tools and how they could impact businesses.
  • Effective High-Speed Multi-Tenant Data Lakes Oct 25 2017 9:00 pm UTC 60 mins
    Sean Suchter, CTO and founder, Pepperdata
    Big Data has increased the demand for big data management solutions that operate at scale and meet business requirements. Big Data organizations realize quickly that scaling from small, pilot projects to large-scale production clusters involves a steep learning curve. Despite tremendous progress, critically important areas including multi-tenancy, performance optimization, and workflow monitoring remain areas where the operations team still needs management help.

    Intended for enterprises who already have a data lake or are setting up their first data lake, this presentation will discuss how to implement data lakes with operations tools that automatically optimize clusters with solutions for monitoring, performance tuning, and troubleshooting in production environments.

    Sean is the co-founder and CTO of Pepperdata. Previously, Sean was the founding GM of Microsoft’s Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, Sean managed the Yahoo Search Technology team, the first production user of Hadoop. Sean joined Yahoo through the acquisition of Inktomi, and holds a B.S. in Engineering and Applied Science from Caltech.
  • Three Ways To Accelerate Your Data Lake Migration To Cloud Oct 25 2017 4:00 pm UTC 45 mins
    Kelly Stirman, VP Strategy, Dremio
    Public cloud deployments have become irresistible in terms of flexibility, low barriers to entry, security, and developer friendliness. But the sheer inertia of traditional data lakes make them difficult to transition to cloud. In this talk we'll look at examples of how leading companies have made the transition using open source technologies and hybrid strategies.

    Instead of following a "lift and shift" strategy for moving data lake workloads to the cloud, there are new considerations unique to cloud that should be considered alongside traditional approaches related to compute (eg, GPU, FPGA), storage (object store vs. file store), integrations, and security.

    Viewers will take away techniques they can immediately apply to their own projects.
  • Becoming Data Driven: Building the Foundation of Digital Success Oct 25 2017 2:00 pm UTC 60 mins
    Nigel Turner
    Many organisations aspire to become digital, data driven enterprises. In these organisations data is viewed as a critical asset, both to generate new digitally based products and services, and to guide and improve business operations and decision making. But many companies are failing to live up to this aspiration. They struggle to develop and implement data strategies that align with, and help to deliver, new business strategies.


    This webinar will explore what becoming ‘data driven’ really means, examines some of the reasons why many organisations are failing to realise their ambitions, and propose ways of overcoming the challenges. Key to these is a strong emphasis on the increasingly critical importance of established data management disciplines, especially Data Governance, Data Quality and MDM, which all have a critical role to play in the digital business of the future.

    This session will explore:


    •What is a data driven organisation and how does it differ from a traditional company?
    •The main challenges of creating a data driven organisation
    •Building a data driven capability - the role of business and IT
    •The central importance of a business aligned Data Strategy and how to achieve it
    •Why a successful data strategy needs an integrated focus on Data Governance, Data Quality and MDM
  • Designing Data Lakes: Architecture options with open source tools Oct 25 2017 12:00 pm UTC 60 mins
    Maloy Manna, PM Engineering, AXA Data Innovation Lab, Paris
    The concept of Data lakes evolved to address challenges and opportunities in managing big data.

    Organizations are investing massive amounts of time and money to upgrade existing data infrastructures and build data lakes whether on-premises or in the cloud.

    This talk will discuss architectures and design options to implement data lakes with open source tools. Also covered are challenges of upgrade & migration from existing data warehouses, metadata management, supporting self-service and managing production deployments.
  • Virtual Data Lake: A Reality Oct 25 2017 10:00 am UTC 45 mins
    Hélène Lyon, IBM, Distinguished Engineer, IBM Z Solutions Architect
    As an Enterprise customer, you are potentially using IBM Z in a hybrid cloud implementation. Let's understand how to benefit from cloud access to mainframe data without moving it outside z; thereby improving security, reducing integration challenges and answering your GDPR auditor's needs.
  • Bias in the Web Oct 24 2017 5:00 pm UTC 60 mins
    Ricardo Baeza-Yates, CTO, NTENT; ACM Fellow; IEEE Fellow
    The Web is the most powerful communication medium and the largest public data repository that humankind has created. Its content ranges from great reference sources such as Wikipedia to ugly fake news. Indeed, social (digital) media is just an amplifying mirror of ourselves. Hence, the main challenge of search engines and other websites that rely on web data is to assess the quality of such data. However, as all people have their own biases, web content, as well as our web interactions, are tainted with many biases.

    Data bias includes redundancy and spam, while interaction bias includes activity and presentation bias. In addition, sometimes algorithms add bias, particularly in the context of search and recommendation systems. As bias generates bias, we stress the importance of de-biasing data as well as using the context and other techniques such as explore & exploit, to break the filter bubble.

    The main goal of this talk is to make people aware of the different biases that affect all of us on the Web. Awareness is the first step to be able to fight and reduce the vicious cycle of bias.

    Ricardo Baeza-Yates areas of expertise are web search and data mining, information retrieval, data science, and algorithms. He is CTO of NTENT, a semantic search technology company based in California, USA since 2016. Before, he was VP of Research at Yahoo Labs, based first in Barcelona, Spain, and later in Sunnyvale, California, from January 2006 to February 2016. He also is part time Professor at DTIC of the Universitat Pompeu Fabra, in Barcelona, Spain, as well as at DCC of Universidad de Chile in Santiago.
  • What Advanced Analytics Will Look Like in the Future Oct 24 2017 3:00 pm UTC 60 mins
    Adam Blau - Product Marketing, Sisense
    Join us for the next webinar in the Bright Talk series of Advanced Analytics where we will discuss the future of advanced analytics, and how it can be shaped for everyone, regardless of technical expertise.  

    In this webinar, Michal Becker, Business Analyst from QbeeQ will give a futuristic view of:

    •Advanced Analytics - how far away are we? 

    •What are the steps needed to achieve the future of advanced
    analytics

    •How to make advanced analytics more tangible and
    reachable for the average business user

    •How will AI and Machine learning bring us closer to achieving
    these goals 


    This webinar will also include practical, hands-on examples from Adam Blau, Product Manager Sisense who will discuss:

    •Use case of machine learning, bots, and physical indicators to
    advance analytics

    •Easy to consume data that will allow advanced calculations to
    be sent to a wider audience 

    •Advanced ranking mechanisms that helped a UK health
    organization improve operations


    Save your place in the future now.
  • Visualizing what the web knows about you Oct 24 2017 2:00 pm UTC 60 mins
    Eva Murray, Exasol & Andy Kriebel, The Data School
    In this talk we visualize geolocation data collected by our devices to find out more about ourselves.

    In the course of our digital lives we share a lot of information about the things we do. We give our data freely in exchange for discount vouchers, free wifi or a scoop of ice-cream.
    It's not just businesses who can access this information about us to better target their products.
    In this webinar we look at geolocation data collected by our devices and bring it to live with visualisations to find out more about ourselves. We invite you to join us for this session and take what you learn to play with your own data.
  • Using Docker to realise a modular Big Data Platform & Leveraging SANSA Stack Oct 24 2017 12:00 pm UTC 45 mins
    Dr. Hajira Jabeen, Senior Researcher at the University of Bonn
    Join this webinar where senior researches will present:

    1) Big Data Integrator Platform
    - Use of Docker and Docker swarm to realize a modular Big Data Platform

    2) Semantic Analytics Stack
    - Use of Big data distributed processing engines to leverage Scalable
    processing for the Semantic Web (RDF data representation, Querying,
    Inference, and Machine Learning)

    3) Seven societal challenges in Big Data Europe
    -Combination of different Big Data tools to create Big Data Value Chain
    (Pipeline) for different Use cases representing the societal challenges
  • Advanced Analytics in the Cloud - the latest cloud analytics innovations Oct 24 2017 8:00 am UTC 60 mins
    Iver van de Zand
    Iver van de Zand will talk and demo on the latest SAP innovations for analytics in the cloud. Keywords are live connectivity and the closed loop of combined business intelligence, planning and predictive analytics all in one environment. Fully ready and prepared for big data.
  • Blockchain: Making It More Than Just a Trend Recorded: Oct 12 2017 61 mins
    Travin Keith, Sergey Ivliev, Vlad Dramaliev, Makoto Takemiya, Kat Kirilova
    Perhaps one of the buzziest of buzzwords for the past few years has been "Blockchain". But for those who are well-versed in the technology and the goings on of the industry, it is definitely more than just hype.

    Join this session where a panel of experienced Blockchain specialists will discuss:

    -Viability of Use Cases - Just because a centralized database can be more efficient doesn't mean it's the most viable

    -Preventing Blockchain from Being Just a Trend - if blockchain isn't viable as the solution, it shouldn't be used

    -Public vs Private Blockchain Use

    -State of the Industry - Understanding that a lot of the platforms and frameworks are in their early stage of development and that support may not be readily available

    Moderator: Travin Keith (Agavon),
    Panelists: Sergey Ivliev (Lykke), Vlad Dramaliev (aeternity), Makoto Takemiya (Soramitsu), Kat Kirilova (Crypto Tickets)
Managing and analyzing data to inform business decisions
Data is the foundation of any organization and therefore, it is paramount that it is managed and maintained as a valuable resource.

Subscribe to this channel to learn best practices and emerging trends in a variety of topics including data governance, analysis, quality management, warehousing, business intelligence, ERP, CRM, big data and more.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Pre-Processing Big Data: Techniques to improve quality of big data analysis
  • Live at: Oct 12 2016 12:00 pm
  • Presented by: Maloy Manna, Engineering, AXA Data Innovation Lab
  • From:
Your email has been sent.
or close