Hi [[ session.user.profile.firstName ]]

How to do Large Scale Data Research on a Slurm HPC Cluster with OmniSci

Harvard’s Center for Geographic Analysis was established in 2006 to support research and teaching across all disciplines in the University as they relate to geospatial technology and methods. They recently launched an initiative that has made the OmniSci platform available to all researchers at Harvard, utilizing the university’s slurm HPC cluster. In this talk, Devika and Ben will share why OmniSci is relevant to their work, the kinds of challenges facing researchers in general, and some of the projects that are specifically benefitting from OmniSci at Harvard. They will also explain how they got OmniSci running within their slurm cluster, and share the scripts they’ve developed to make it possible for researchers to get an OmniSci instance, for large-scale research, with a few simple clicks.
Recorded May 20 2020 45 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Devika Kakkar, Geospatial Data Scientist, Harvard University & Ben Lewis, Geospatial Technology Manager, Harvard University
Presentation preview: How to do Large Scale Data Research on a Slurm HPC Cluster with OmniSci

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Predict and Change the Future of Your Network with Multiple Data Sources Recorded: Jun 10 2020 25 mins
    Fraser Pajak, President, Fraser Pajak Consulting and Eric Kontargyris, Director of Sales Engineering, OmniSci
    Telco markets worldwide are highly competitive. Consumers have more options than ever to satisfy their telecommunications needs. This means providers are competing harder, and paying closer attention to critical factors like cost of customer acquisition, average revenue per unit (ARPU), and customer churn. The smartest providers understand the tight relationship between these issues and their massive, and growing, stockpiles of data.

    Fortunately, there are new technologies that can handle billions of rows of data, joined across multiple datasets, with millisecond filtering and visualization times. With OmniSci, massive customer data is an asset instead of a barrier.

    In this webinar, you will learn:
    → How OmniSci can query billions of rows of data, create interactive dashboards, literally at the speed of your curiosity
    → With OmnSci you can bring multiple cohorts of data together in one easy to interpret geospatial display
    → How this analysis capability, along with AI and ML, can unlock insights previously hidden, and allow us to predict the future and drive mitigation before it affects your networks or your customers.
  • Applying Data to the COVID-19 Pandemic Recorded: May 20 2020 38 mins
    Todd Mostak, CEO & Co-Founder, OmniSci
    Data and analytics have played a central role in the first wave of the global response to COVID-19. From the earliest days of the pandemic, we’ve had up-to-date reports of case counts, fatalities and recoveries gathered by industrious volunteers everywhere. See how OmniSci and our partners at AWS, Safegraph, Veraset, and X-Mode are using anonymized, data-driven methods to contribute to relief efforts at a national scale for the next phase of the COVID-19 response efforts.
  • How to do Large Scale Data Research on a Slurm HPC Cluster with OmniSci Recorded: May 20 2020 45 mins
    Devika Kakkar, Geospatial Data Scientist, Harvard University & Ben Lewis, Geospatial Technology Manager, Harvard University
    Harvard’s Center for Geographic Analysis was established in 2006 to support research and teaching across all disciplines in the University as they relate to geospatial technology and methods. They recently launched an initiative that has made the OmniSci platform available to all researchers at Harvard, utilizing the university’s slurm HPC cluster. In this talk, Devika and Ben will share why OmniSci is relevant to their work, the kinds of challenges facing researchers in general, and some of the projects that are specifically benefitting from OmniSci at Harvard. They will also explain how they got OmniSci running within their slurm cluster, and share the scripts they’ve developed to make it possible for researchers to get an OmniSci instance, for large-scale research, with a few simple clicks.
  • OmniSci in Action: Learn from a Telecom Leader Recorded: May 20 2020 37 mins
    Jared Ritter, Sr. Director Analytics & Automation, Charter Communications
    Charter Communications is using Data Science to enhance Systems/Network Operations. Charter will present their evolution to a “Data-First” operations model. Q&A will follow, as well as a live demo of some of Charter’s Dashboard/Tools.
  • Demo With OmniSci Expert - Public Sector: Opioid Crisis Recorded: May 20 2020 11 mins
    Vlaunir da Silva
    New data shows more than 100 billion opioid pills were distributed nationwide from 2006-2014. Experience the power of accelerated analytics to unlock and visualize socially-impactful insights about a national public health crisis.
  • What's New in OmniSci 5.2 Recorded: May 19 2020 10 mins
    Venkat Krishnamurthy, VP Product, OmniSci
    We heard your feedback, and the Immerse SQL Editor has been much improved for OmniSci Release 5.2. Today, you can do more in the SQL Editor by running selected queries, incorporating query snippets, and re-running previous SQL statements. Text highlighting, query logging, and sample queries make data analysis easier and faster. In this session we will take a closer look at SQL Editor in Immerse 5.2
  • Exploring Without Moving: Further Adventures w/ Ibis, Altair and OmniSci, part 2 Recorded: May 19 2020 30 mins
    Venkat Krishnamurthy, VP Product, OmniSci
    On from Saul Shanabrook's great session introducing the value of tools like Ibis, Altair, Vega and the ibis-vega-transform, we'll dive in a little deeper to see how we can use these tools to explore multiple large datasets with minimal forklifting. Looking at github data again, we'll pull in additional data from disparate sources into an integrated visualization inside JupyterLab to provide more context. We'll show how Ibis provides a seamless analytical API layer across sources, and with particularly fast sources like OmniSci, this opens up a whole world of exploratory analysis.
  • Aligning OmniSci with Interactive Computing Standards & Open Source Communities Recorded: May 19 2020 34 mins
    Tony Fast, Data Scientist, Quansight
    At Quansight, we’ve been working closely with OmniSci to create interfaces with interactive scientific Python computing technologies. Throughout this work we have built healthy and sustainable relationships with open source communities that complement the abilities of OmniSci. In this talk, we’ll explore these abilities to describe, query, and visualize data in OmniSci databases in Python, and interface with a wider scientific computing community. It will highlight Quansight’s efforts to bring to OmniSci data and visualizations to interactive scientific computing workflows in Jupyter. Further we’ll acknowledge the benefits of working with open source communities building open standards and technology.
  • Demo with OmniSci Expert: Forecasting Analysis for Mobile Data Access Points Recorded: May 19 2020 17 mins
    Erik Schultz, Sales Engineer, OmniSci
    During this demo we will show a dataset about Mobile Data Access Points in OmniSci, switch over to Jupyter where we will invoke a predictive model that will display how data usage is forecasted to change in the future.
  • How LLVM is Used to Compile SQL Queries in the OmniSci Database Recorded: Apr 30 2020 35 mins
    Alex Baden, Technical Lead, OmniSci
    In this session Alex Baden, technical lead for OmniSci's accelerated database, will give an overview of how we use LLVM to deliver powerful results. He will discuss an overview of the internals of the OmniSci architectural ecosystem and the benefit of tapping into modern CPU and GPU hardware to deliver on our vision.
  • Visualizing & Communicating the Water Pipe Replacement Program Progress in Flint Recorded: Apr 30 2020 38 mins
    Jared Webb, BlueConduit, and Dr. Stacy Woods, Natural Resources Defense Council
    In this session, we’ll discuss how a predictive algorithm helped the City of Flint focus their service line (water pipe) investigations in the areas at highest risk for having lead or galvanized steel service lines. We’ll discuss our work (in progress) to create a public map using best visual data and public health communication practices that, when completed, will allow Flint residents to visualize the predictive model outcomes and the pipe replacement progress in the city.
  • NYC Taxi Rides Demo Recorded: Apr 30 2020 14 mins
    Erik Schultz, OmniSci
    Explore every taxi ride in NYC over a 7-year period with this NYC taxi data visualization, constituting 1.2 billion trips, joined to the building footprint of every store within 30 meters of a pickup or dropoff. Commercial point-of-interest (POI) data courtesy of Factual.
  • Finding Purposefully Hidden Sites with GPUs and ML Recorded: Apr 29 2020 23 mins
    Mike Flaxman, Spatial Data Science Practice Lead, OmniSci and Adam Edelman, Sales Engineer, OmniSci
    Finding purposely-hidden nuclear sites is hard. But new tools and datasets allow analysts to interactively explore huge geotemporal datasets. OmniSci has recently partnered with the Center for Nonproliferation Studies (CNS) and Planet to demonstrate how daily satellite imagery, machine learning for feature extraction, and interactive analytics can help make the world safer. CNS continually assesses potential nuclear missile production sites. It has found that in North Korea these are often hidden at the ends of new mountain roads. How can we turn this insight into actionable data?

    OmniSci’s GPU database technology lets us combine several factors into a suitability model considering roads and their relationships to terrain. We leveraged an amazing new machine learning product from Planet - a monthly road change dataset at 5 meter resolution. We combined this with absolute elevation, percent slope and topographic position. Since there are less than 20 known sites, we elected to use a “human in the loop” process to empower analysts to assess the parameters of known sites semi-manually, and then to search for similar sites across the full country. This allowed us to discover hundreds of potential new sites, which CNS plans to further explore and then monitor.
  • Applying Data to the COVID-19 Pandemic with OmniSci Recorded: Apr 29 2020 38 mins
    Todd Mostak, Co-Founder and CEO, OmniSci and Ray Falcione, VP Public Sector, OmniSci
    Data and analytics have played a central role in the first wave of the global response to COVID-19. From the earliest days of the pandemic, we’ve had up-to-date reports of case counts, fatalities and recoveries gathered by industrious volunteers everywhere. See how OmniSci and our partners at AWS, Safegraph, Veraset, and X-Mode are using anonymized, data-driven methods to contribute to relief efforts at a national scale for the next phase of the COVID-19 response efforts.
  • US Airline Flights Demo Recorded: Apr 29 2020 10 mins
    Michael McCraken, OmniSci
    View historical flight data such as delays and other activity from almost 3 decades, and see which airlines got you there on time. Only OmniSci makes this demanding level of flight data processing possible.
  • COVID-19: A Data Scientists Response to Working from Home Recorded: Apr 29 2020 44 mins
    Jared Dame, Director of AI & Data Science, HP
    COVID-19 has most of the world shifting to remote work, including data science teams. Jared Dame, a HP data scientist and Director of AI and Data Science Business, will discuss how to navigate remote work and how to effectively manage data science teams remotely. This presentation will outline home workplace specifics including CPU, GPU, and storage options, and how to use ZCentral Remote Boost to collaborate with your remote team from the workstation.
  • Using Altair, Ibis, and Vega for interactive exploration of OmniSci Recorded: Apr 28 2020 41 mins
    Saul Shanabrook, Software Developer, Quansight
    Altair is a lovely tool that lets you build up complex interactive charts in Python. Ibis is also a lovely tool that lets you use a Pandas, like API to compose SQL expressions in OmniSci and other backends. By tying them together you can use the familiar syntax of Pandas, combined with the expressive power of Vega and Vega Lite, to visualize large amounts of data stored in OmniSci. This talk will walk through a number of examples of using this pipeline and then go through how it works.
  • OmniSci Shipping Traffic Demo Recorded: Apr 28 2020 12 mins
    Vlaunir de la Silva, OmniSci
    Track ships through US coastal waters using a distributed cluster of the OmniSciDB, featuring over 11 billion rows of AIS (Automated Identification System) telemetry data, courtesy of the U.S. Coast Guard. Gain insights into the navigation patterns of shipping traffic around major US ports, or see how fishing boats follow the seasonal movement of their catch.
  • Rock On Todd! Effortless BI on billions of rows Recorded: Apr 27 2020 18 mins
    Todd Mostak, Co-Founder and CEO, OmniSci
    From HP Rock On 2020: Todd Mostak (CEO, OmniSci) demonstrates how it's now possible to effortlessly perform BI and analytics on 4.5 billion row (or more) data sets, from national flights to retail and CPG.
  • How to Improve the Telecommunications Network with Fast, Scalable Data Recorded: Apr 26 2020 22 mins
    Herfini Haryono, Vice President, Industry Verticals, OmniSci and Mike Flaxman, Spatial Data Science Practice Lead, OmniSci
    Operating and monitoring a modern telecommunications network, especially with the ongoing rollout of 5G, requires teams of Network Performance Management Engineers to find and pinpoint anomalies, trace the root causes, and make quick decisions with enormous collections of data. Fortunately, there are new technologies that can handle billions of rows of data with millisecond filtering and visualization times, that can make that massive telecom data an asset instead of a barrier. In this webinar, we’ll take a look at three common use cases facing telecom companies today: network performance management, optimizing 5G network signal propagation, and mobile data offloading. We’ll show how an analytics platform that is accelerated with GPUs can handle the outsized data volume, performance and geospatial features inherent to these use cases, and we’ll demo the solution on the OmniSci platform.
Massively Accelerated Analytics and Data Science
Interactively query, visualize, and power location intelligence workflows over billions of records.

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: How to do Large Scale Data Research on a Slurm HPC Cluster with OmniSci
  • Live at: May 20 2020 8:00 pm
  • Presented by: Devika Kakkar, Geospatial Data Scientist, Harvard University & Ben Lewis, Geospatial Technology Manager, Harvard University
  • From:
Your email has been sent.
or close