Hi [[ session.user.profile.firstName ]]

Case Study in Big Data and Data Science: University of Georgia

Join this webinar to learn how the University of Georgia (UGA) uses Apache Spark and other tools for Big Data analytics and data science research.

UGA needs to give its students and faculty the ability to do hands-on data analysis, with instant access to their own Spark clusters and other Big Data applications.

So how do they provide on-demand Big Data infrastructure and applications for a wide range of data science use cases? How do they give their users the flexibility to try different tools without excessive overhead or cost?

In this webinar, you’ll learn how to:

- Spin up new Spark and Hadoop clusters within minutes, and quickly upgrade to new versions

- Make it easy for users to build and tinker with their own end-to-end data science environments

- Deploy cost-effective, on-premises elastic infrastructure for Big Data analytics and research
Recorded May 11 2016 61 mins
Your place is confirmed,
we'll send you email reminders
Presented by
Shannon Quinn, Assistant Professor at University of Georgia; and Nanda Vijaydev, Director of Solutions Management at BlueData
Presentation preview: Case Study in Big Data and Data Science: University of Georgia

Network with like-minded attendees

  • [[ session.user.profile.displayName ]]
    Add a photo
    • [[ session.user.profile.displayName ]]
    • [[ session.user.profile.jobTitle ]]
    • [[ session.user.profile.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(session.user.profile) ]]
  • [[ card.displayName ]]
    • [[ card.displayName ]]
    • [[ card.jobTitle ]]
    • [[ card.companyName ]]
    • [[ userProfileTemplateHelper.getLocation(card) ]]
  • Channel
  • Channel profile
  • Hunting Criminals with Hybrid Analytics, Semi-supervised Learning, & Feedback Aug 23 2017 5:00 pm UTC 60 mins
    David Talby, CTO, Atigeo
    Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.

    This talk will cover a code walk-through, the key lessons learned while building such real-world software systems over the past few years. We'll look for fraud signals in public email datasets, using IPython and popular open-source libraries (scikit-learn, statsmodel, nltk, etc.) for data science and Apache Spark as the compute engine for scalable parallel processing.

    David will iteratively build a machine-learned hybrid model – combining features from different data sources and algorithmic approaches, to catch diverse aspects of suspect behavior:

    - Natural language processing: finding keywords in relevant context within unstructured text
    - Statistical NLP: sentiment analysis via supervised machine learning
    - Time series analysis: understanding daily/weekly cycles and changes in habitual behavior
    - Graph analysis: finding actions outside the usual or expected network of people
    - Heuristic rules: finding suspect actions based on past schemes or external datasets
    - Topic modeling: highlighting use of keywords outside an expected context
    - Anomaly detection: Fully unsupervised ranking of unusual behavior

    Apache Spark is used to run these models at scale – in batch mode for model training and with Spark Streaming for production use. We’ll discuss the data model, computation, and feedback workflows, as well as some tools and libraries built on top of the open-source components to enable faster experimentation, optimization, and productization of the models.
  • Toward Internet of Everything: Architectures, Standards, & Interoperability Recorded: Jun 21 2017 63 mins
    Ram D. Sriram, Chief of the Software and Systems Division, IT Lab at National Institute of Standards and Technology
    In this talk, Ram will provide a unified framework for Internet of Things, Cyber-Physical Systems, and Smart Networked Systems and Societies, and then discuss the role of ontologies for interoperability.

    The Internet, which has spanned several networks in a wide variety of domains, is having a significant impact on every aspect of our lives. These networks are currently being extended to have significant sensing capabilities, with the evolution of the Internet of Things (IoT). With additional control, we are entering the era of Cyber-physical Systems (CPS). In the near future, the networks will go beyond physically linked computers to include multimodal-information from biological, cognitive, semantic, and social networks.

    This paradigm shift will involve symbiotic networks of people (social networks), smart devices, and smartphones or mobile personal computing and communication devices that will form smart net-centric systems and societies (SNSS) or Internet of Everything. These devices – and the network -- will be constantly sensing, monitoring, interpreting, and controlling the environment.

    A key technical challenge for realizing SNSS/IoE is that the network consists of things (both devices & humans) which are heterogeneous, yet need to be interoperable. In other words, devices and people need to interoperate in a seamless manner. This requires the development of standard terminologies (or ontologies) which capture the meaning and relations of objects and events. Creating and testing such terminologies will aid in effective recognition and reaction in a network-centric situation awareness environment.

    Before joining the Software and Systems Division (his current position), Ram was the leader of the Design and Process group in the Manufacturing Systems Integration Division, Manufacturing Engineering Lab, where he conducted research on standards for interoperability of computer-aided design systems.
  • The Ways Machine Learning and AI Can Fail Recorded: Apr 13 2017 48 mins
    Brian Lange, Partner and Data Scientist, Datascope
    Good applications of machine learning and AI can be difficult to pull off. Join Brian Lange, Partner and Data Scientist at data science firm Datascope, as he discusses a variety of ways machine learning and AI can fail (from technical to human factors) so that you can avoid repeating them yourself.
  • Logistics Analytics: Predicting Supply-Chain Disruptions Recorded: Feb 16 2017 47 mins
    Dmitri Adler, Chief Data Scientist, Data Society
    If a volcano erupts in Iceland, why is Hong Kong your first supply chain casualty? And how do you figure out the most efficient route for bike share replacements?

    In this presentation, Chief Data Scientist Dmitri Adler will walk you through some of the most successful use cases of supply-chain management, the best practices for evaluating your supply chain, and how you can implement these strategies in your business.
  • Unlock real-time predictive insights from the Internet of Things Recorded: Feb 16 2017 60 mins
    Sam Chandrashekar, Program Manager, Microsoft
    Continuous streams of data are generated in every industry from sensors, IoT devices, business transactions, social media, network devices, clickstream logs etc. Within these streams of data lie insights that are waiting to be unlocked.

    This session with several live demonstrations will detail the build out of an end-to-end solution for the Internet of Things to transform data into insight, prediction, and action using cloud services. These cloud services enable you to quickly and easily build solutions to unlock insights, predict future trends, and take actions in near real-time.

    Samartha (Sam) Chandrashekar is a Program Manager at Microsoft. He works on cloud services to enable machine learning and advanced analytics on streaming data.
  • Bridging the Data Silos Recorded: Feb 15 2017 48 mins
    Merav Yuravlivker, Chief Executive Officer, Data Society
    If a database is filled automatically, but it's not analyzed, can it make an impact? And how do you combine disparate data sources to give you a real-time look at your environment?

    Chief Executive Officer Merav Yuravlivker discusses how companies are missing out on some of their biggest profits (and how some companies are making billions) by aggregating disparate data sources. You'll learn about data sources available to you, how you can start automating this data collection, and the many insights that are at your fingertips.
  • Strategies for Successful Data Preparation Recorded: Feb 14 2017 33 mins
    Raymond Rashid, Senior Consultant Business Intelligence, Unilytics Corporation
    Data scientists know, the visualization of data doesn't materialize out of thin air, unfortunately. One of the most vital preparation tactics and dangerous moments happens in the ETL process.

    Join Ray to learn the best strategies that lead to successful ETL and data visualization. He'll cover the following and what it means for visualization:

    1. Data at Different Levels of Detail
    2. Dirty Data
    3. Restartability
    4. Processing Considerations
    5. Incremental Loading

    Ray Rashid is a Senior Business Intelligence Consultant at Unilytics, specializing in ETL, data warehousing, data optimization, and data visualization. He has expertise in the financial, manufacturing and pharmaceutical industries.
  • HPE ALM Standardization as a Precursor for Data Warehousing Recorded: Feb 9 2017 59 mins
    Tuomas Leppilampi , Assure
    Agenda:
    Data warehousing at a glance
    Wild West vs Enterprise HPE ALM Template
    Planning and configuring the template
    Customer use case: Standardization project walkthrough
    How to maintain a standardized environment
    Next steps with HPE ALM
  • Building Enterprise Scale Solutions for Healthcare with Modern Data Architecture Recorded: Nov 10 2016 47 mins
    Ramu Kalvakuntla, Sr. Principal, Big Data Practice, Clarity Solution Group
    We all are aware of the challenges enterprises are having with growing data and silo’d data stores. Businesses are not able to make reliable decisions with un-trusted data and on top of that, they don’t have access to all data within and outside their enterprise to stay ahead of the competition and make key decisions for their business.

    This session will take a deep dive into current Healthcare challenges businesses are having today, as well as, how to build a Modern Data Architecture using emerging technologies such as Hadoop, Spark, NoSQL datastores, MPP Data stores and scalable and cost effective cloud solutions such as AWS, Azure and BigStep.
  • Data at the corner of SAP and AWS Recorded: Nov 9 2016 48 mins
    Frank Stienhans, CTO, Ocean9
    Past infrastructures provided compute, storage and network enabling static enterprise deployments which changed every few years. This talk will analyze the consequences of a world where production SAP and Spark clusters including data can be provisioned in minutes with the push of a button.

    What does it mean for the IT architecture of an enterprise? How to stay in control in a super agile world?
  • 3 Critical Data Preparation Mistakes and How-to Avoid them Recorded: Oct 20 2016 32 mins
    Mark Vivien, Business Development, Big Data
    Whether you're just starting out or a seasoned solution architect, developer, or data scientist, there are most likely key mistakes that you've probably made in the past, may be making now, or will most likely make in the future. In fact, these same mistakes are most likely impacting your company's overall success with their analytics program.

    Join us for our upcoming webinar, 3 Critical Data Preparation Mistakes and How to avoid them, as we discuss 3 of the most critical, fundamental pitfalls and more!

    • Importance of early and effective business partner engagement
    • Importance of business context to governance
    • Importance of change and learning to your development methodology
  • Practical Data Cleaning Recorded: Oct 13 2016 38 mins
    Lee Baker, CEO, Chi-Squared Innovations
    The basics of data cleaning are remarkably simple, yet few take the time to get organized from the start.

    If you want to get the most out of your data, you're going to need to treat it with respect, and by getting prepared and following a few simple rules your data cleaning processes can be simple, fast and effective.

    The Practical Data Cleaning webinar is a thorough introduction to the basics of data cleaning and takes you through:

    • Data Collection
    • Data Cleaning
    • Data Classification
    • Data Integrity
    • Working Smarter, Not Harder
  • Self-service BI for SAP and HANA – Dream or Reality? Recorded: Sep 14 2016 48 mins
    Swen Conrad, CEO, Ocean9
    Gartner predicts that “analytics will be pervasive … for decisions and actions across the business.” Sounds like analytics nirvana with instant access for any analysis you want to do, in other words self-service BI. Is this dream or reality?

    Join this webinar to find out how clouds like AWS or Azure are moving the industry close to this nirvana today through simple assembly of cloud services combined with the appropriate consumption model of these services.

    We will demonstrate how easy it is to provision your high end SAP HANA Database right next to your BI Analytics tier.

    Maybe we are closer to this nirvana than you think?
  • The Role of FPGAs in SparK Accelerators Recorded: Aug 29 2016 61 mins
    Shreyas Shah, Principal Data center Architect, Xilinx
    In the cloud computing era, data growth is exponential. Every day billions of photos are shared and large amount of new data created in multiple formats. Within this cloud of data, the relevant data with real monetary value is small. To extract the valuable data, big data analytics frame works like SparK is used. This can run on top of a variety of file systems and data bases. To accelerate the SparK by 10-1000x, customers are creating solutions like log file accelerators, storage layer accelerators, MLLIB (One of the SparK library) accelerators, and SQL accelerators etc.

    FPGAs (Field Programmable Gate Arrays) are the ideal fit for these type of accelerators where the workloads are constantly changing. For example, they can accelerate different algorithms on different data based on end users and the time of the day, but keep the same hardware.

    This webinar will describe the role of FPGAs in SparK accelerators and give SparK accelerator use cases.
  • Using Predictive Analytics to optimize Application operations: Can you dig it? Recorded: Jul 22 2016 23 mins
    Lesley-Anne Wilson, Group Product Rollout & Support Engineer, Digicel Group
    Many studies have been done on the benefits of Predictive Analytics on customer engagement in order to change customer behaviour. However, the side less romanticized is the benefit to IT operations as it is sometimes difficult to turn the focus from direct revenue impacting gain to the more indirect revenue gains that can come from optimization and pro-active issue resolution.

    I will be speaking, from an application operations engineers perspective, on the benefits to the business of using Predictive Analytics to optimize applications.
  • Predictive and Prescriptive Power Discovery from Fast, Wide, Deep Big Data Recorded: Jul 22 2016 45 mins
    Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
    I will summarize the stages of analytics maturity that lead an organization from traditional reporting (descriptive analytics: hindsight), through predictive analytics (foresight), and into prescriptive analytics (insight). The benefits of big data (especially high-variety data) will be demonstrated with simple examples that can be applied to significant use cases.

    The goal of data science in this case is to discover predictive power and prescriptive power from your data collections, in order to achieve optimal decisions and outcomes.
  • Live Webinar: Overcoming the Storage Challenges Cassandra and Couchbase Create Recorded: Jun 30 2016 53 mins
    George Crump, Storage Switzerland
    NoSQL databases like Cassandra and Couchbase are quickly becoming key components of the modern IT infrastructure. But this modernization creates new challenges – especially for storage. Storage in the broad sense. In-memory databases perform well when there is enough memory available. However, when data sets get too large and they need to access storage, application performance degrades dramatically. Moreover, even if enough memory is available, persistent client requests can bring the servers to their knees.

    Join Storage Switzerland and Plexistor where you will learn:

    1. What is Cassandra and Couchbase?
    2. Why organizations are adopting them?
    3. What are the storage challenges they create?
    4. How organizations attempt to workaround these challenges.
    5. How to design a solution to these challenges instead of a workaround.
  • Big-Data-as-a-Service: On-Demand Elastic Infrastructure for Hadoop and Spark Recorded: Jun 22 2016 56 mins
    Kris Applegate, Big Data Solution Architect, Dell; Tom Phelan, Chief Architect, BlueData
    Watch this webinar to learn about Big-Data-as-a-Service from experts at Dell and BlueData.

    Enterprises have been using both Big Data and Cloud Computing technologies for years. Until recently, the two have not been combined.

    Now the agility and efficiency benefits of self-service elastic infrastructure are being extended to big data initiatives – whether on-premises or in the public cloud.

    In this webinar, you’ll learn about:

    - The benefits of Big-Data-as-a-Service – including agility, cost-savings, and separation of compute from storage
    - Innovations that enable an on-demand cloud operating model for on-premises Hadoop and Spark deployments
    - The use of container technology to deliver equivalent performance to bare-metal for Big Data workloads
    - Tradeoffs, requirements, and key considerations for Big-Data-as-a-Service in the enterprise
  • The Big Data decision path incorporating SAP landscapes Recorded: Jun 8 2016 49 mins
    Swen Conrad, CEO, Ocean9
    Leading companies derive big data technology choices from business needs instead of technology merits. With the variety of possible use cases, either Hadoop, Spark or SAP HANA may provide the best fit to solve business challenges and create value.

    Sounds easy, but managing a variety of big data solutions within a single company puts a skills and cost premium on the organization.

    This session will guide you to the right big data technology according to business needs and highlights the fastest path to adoption.
  • Case Study in Big Data and Data Science: University of Georgia Recorded: May 11 2016 61 mins
    Shannon Quinn, Assistant Professor at University of Georgia; and Nanda Vijaydev, Director of Solutions Management at BlueData
    Join this webinar to learn how the University of Georgia (UGA) uses Apache Spark and other tools for Big Data analytics and data science research.

    UGA needs to give its students and faculty the ability to do hands-on data analysis, with instant access to their own Spark clusters and other Big Data applications.

    So how do they provide on-demand Big Data infrastructure and applications for a wide range of data science use cases? How do they give their users the flexibility to try different tools without excessive overhead or cost?

    In this webinar, you’ll learn how to:

    - Spin up new Spark and Hadoop clusters within minutes, and quickly upgrade to new versions

    - Make it easy for users to build and tinker with their own end-to-end data science environments

    - Deploy cost-effective, on-premises elastic infrastructure for Big Data analytics and research
DBM
DBM

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: Case Study in Big Data and Data Science: University of Georgia
  • Live at: May 11 2016 5:00 pm
  • Presented by: Shannon Quinn, Assistant Professor at University of Georgia; and Nanda Vijaydev, Director of Solutions Management at BlueData
  • From:
Your email has been sent.
or close