NoSQL, Hadoop and MapReduce: Building a Modern Data Infrastructure that Works

Jeffrey Kelly, Wikibon; Joey Jablonski, Kitenga; Christopher Biow, 10gen; Ron Bodkin, Think Big Analytics; John Akred, SVDS
In a whirlwind of big data tools like MapReduce, NoSQL, Hadoop, and their cousins and brothers, it’s difficult to understand the stack you need to make your data as useful as possible. How do you decide which tools to use, and once you do decide, how do you make the jump?

Join this roundtable led by big data infrastructure experts to:
*Understand the ingredients of a modern data infrastructure
*Learn how to assess your needs
*Make a blueprint for building a modern data architecture that works for you
Aug 21 2013
60 mins
NoSQL, Hadoop and MapReduce: Building a Modern Data Infrastructure that Works
hadoop mapreduce nosql
More from this community:

Business Intelligence and Analytics

Webinars and videos

  • Live and recorded (1022)
  • Upcoming (39)
  • Date
  • Rating
  • Views
  • Tras crear un prototipo inicial de su aplicación para una vista previa limitada ya es hora de que el equipo pase a consolidar la arquitectura haciéndola más robusta y tolerante a los fallos antes de lanzarla oficialmente al público final.

    En este capítulo se tratan conceptos de la infraestructura de AWS tales como regiones y zonas de disponibilidad; además, se explica cómo utilizar tales características para incrementar la tolerancia de la aplicación a los fallos.

    Servicios y características tratados
    •Conceptos clave sobre infraestructura (regiones y zonas de disponibilidad)
    •Equilibro de carga elástico (Elastic Load Balancing)
    •Amazon RDS

    Demostración
    •Creación de una AMI basada en una instancia en ejecución
    •Creación y configuración de un equilibrador de carga elástico
    •Zonas de disponibilidad múltiples con Amazon RDS
    •Alarmas con Amazon CloudWatch
  • Una vez expandida con éxito la capacidad del centro de datos a Amazon Web Services para los entornos de desarrollo y prueba, el equipo de IT se enfrenta a un nuevo reto en cuanto a la capacidad, es decir, cómo almacenar la cada vez mayor cantidad de datos generados por las aplicaciones empresariales y mantener los costes a la baja. Además, también se enfrentan al reto de mantener copias de seguridad de esos datos de manera adecuada.

    Este capítulo aborda ambas cuestiones con servicios como Amazon S3 y Amazon Glacier.

    Demostración:

    •AWS Storage Gateway
    •Datos de Amazon S3 a Amazon Glacier

    Servicios y características tratados:
    •Amazon S3
    •Amazon Glacier
    •AWS Storage Gateway
    •AWS Import / Export
  • Join backup and recovery experts to find out how to build your backup and recovery requirements checklist. By the end of this session, you’ll learn how you can:

    -Cut storage requirements by up to 80%
    -Save on storage costs and performance hits to your network.
    -Leverage near-instant recovery technology for protected virtual machines or servers.
    -Automate application-aware backups and testing for data corruption.
  • Questo è il primo episodio di una serie di webinar che illustreranno le diverse modalità in cui AWS viene utilizzato dai team di sviluppo agili. Tutti gli episodi faranno riferimento a una startup impegnata nell'apertura di una nuova area di business, illustrando i vantaggi offerti dall'utilizzo di AWS. La startup puo' essere una nuova realtà o un centro di innovazione all'interno di una azienda esistente, ad esempio per seguire il lancio di un nuovo prodotto.

    In questo episodio vengono descritti i principali vantaggi di AWS per le startup e i team IT agili, soffermandosi su come il team abbia sviluppato rapidamente un prototipo funzionante utilizzando i diversi servizi offerti dalla piattaforma.
  • Savvy marketers spend a lot of their time analyzing big data, on the lookout for exciting new insights which can translate into action items and strategic advantage. Unfortunately, “giraffes” in their data – portions of data which dominate the rest of it – often hide important insights and lead to erroneous strategic decision making. In this webinar, we will discuss how to spot giraffes in your data and how to make sure they’re not misleading you.
  • A modern Hadoop-based data platform is a combination of multiple source projects brought together, tested, and integrated to create an enterprise-grade platform. In this session, we will review the Hadoop projects required in a Windows Hadoop platform and drill down into Hadoop integration and implementation with Windows, Microsoft Azure, and SQL Server.
  • Join AWS for this Building Scalable Web Applications webinar where we will explain the key architectural patterns used to build applications in the AWS cloud, and how to leverage cloud fundamentals to build highly available, cost effective web-scale applications.

    You will also learn how to design for elasticity and availability within AWS using a common web architecture as a reference point and discuss strategies for scaling, security, application management and global reach. If you want to know how to make your applications truly scale then join this webinar to learn more.

    Reasons to attend:

    • Understand the architectural properties of powerful, scalable and highly available applications in the Amazon cloud
    • Learn about Amazon regions and services that operate within them that enable you to leverage cloud scaling
    • Discover how to manage data with services like Amazon S3, Amazon DynamoDB and Amazon Elastic MapReduce to remove constraints from your applications as your achieve web-scale data volumes
    • Hear about customer case studies and real-world examples of scaling from a handful of resources to many thousands in response to customer demand

    Who should attend?

    • Developers, operations, engineers and IT architects who want to learn how to get the best from their applications in AWS
  • Impala raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
    In this webinar featuring Impala architect Marcel Kornacker, you will explore:

    * How Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive, but also that of a proprietary analytic DBMS over its own native columnar format.
    * The current state of, and roadmap for, Impala's analytic SQL functionality.
    * An example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.
  • Impala raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
    In this webinar featuring Impala architect Marcel Kornacker, you will explore:

    * How Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive, but also that of a proprietary analytic DBMS over its own native columnar format.
    * The current state of, and roadmap for, Impala's analytic SQL functionality.
    * An example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.
Round table sponsored by: 10gen | The MongoDB Company
  • Channel
  • Channel profile
Up Down
  • How to Design Your Digital Ecosystem? Oct 8 2014 10:00 am UTC 45 mins
    Digitization of products and services is changing industry supply chains, markets and jobs. How business companies and the providers of IT services and systems respond is increasingly driven by what digital ecosystems and roles will you play? How do you plan and optimize your digital options? Your brand market strategy and operating performance are affected by digital channels. Existing competitors may be moving to new digital models, new entrants in products and services and the IT enablers of mobility, connected sensors, social, cloud and big data may be creating step change performance.

    The explosion of data and the drive for a multiplicity of customer driven experience and choice is creating an increasingly price competitive digital footprint to be mindful of.
    Can digital ecosystems be “designed” if many of the actors, online social behaviors and large cloud platforms and networks appear to drive their own agendas?

    What characteristics are important to establishing a competitive product and service in a digitalized marketplace and industry?
    How is the balance made between investing in your own digital footprint and partnering or consuming often Co-competing digital platform offerings?

    This session will explore these themes
    · Definitions of what are your digital ecosystems

    · How to design digital workspaces in your digital ecosystem

    · Managing change in your organization to align and exploit your digital ecosystem presence

    · What are the liminality of these digital ecosystem

    See Mark's forthcoming book – “Digital Ecosystems”. For more details see his blog http://www.markskilton.com for announcements
  • Cognitive Computing: What Is It and Why Should You Care May 20 2014 3:00 pm UTC 45 mins
    Cognitive computing is a way of processing data that is neither linear nor deterministic. It uses the ideas behind neuroscience and psychology to augment human reasoning with better pattern matching while determining the optimal information a person needs to make decisions. Cognitive computing is different than other forms of software. Instead of shepherding data through pre-determined pathways, it finds the previously unknown paths and patterns through the data. This is ultimately a more scalable model than relying on experts to synthesize data since there are too few experts of any sort available at any one time. Cognitive computing doesn’t try to fit data into an existing model; it looks at the data and figures out what the model is first.

    This webinar provides an overview this new field of computing. It describes what it is, why it has value, and how it applies to the current business climate.
  • Efficiency from Data among Diverse Lines of Business in the Public Sector May 15 2014 7:00 pm UTC 45 mins
    Large organizations manage many lines of business that can lead to difficulty in efficiently allocating resources. Governments often maintain extensive amounts of data but seldom leverage the value in the data to improve operational efficiency. Discover how the Office of the Utah State Auditor now leverages data resources across thousands of local governments, dozens of state agencies and universities to uncover fraud, waste, and the inefficient allocation of scarce resources.
  • Moving Healthcare Data to the Cloud: Opportunities and Challenges May 15 2014 5:00 pm UTC 45 mins
    Cloud based services are gaining traction in the healthcare community. The combination of reliability, cost effectiveness and security has prompted many healthcare organizations to move data to the cloud for storage and analysis. New cloud opportunities are arising in promoting collaboration among care givers and in analyzing large data sets (eg genomics). This talk will review the possibilities and challenges of moving healthcare data and analysis to the cloud.
  • BI's Big Lie: The Distinction Between Visualization and Analysis May 14 2014 5:00 pm UTC 45 mins
    Business intelligence tools were born to query and report. But now analysts and business users don't just want dashboards, they want to dive deep into ad hoc analyses, to explore dozens of hypotheses in minutes. The BI industry is responding by tacking on better visualization and calling it analysis.

    But visualization and analysis are very different. If they weren't, why do most analysts prefer to query data with BI tools, then do their actual analysis in Excel (or statistical tools)? Or why do Tableau's help documents literally suggest you pull out a calculator if you'd like to run a correlation?

    For many companies, this misunderstood distinction is the final barrier to reaching the promised land of data for all. We'll explore the distinction, as well as the growing divide between exploratory analysis tools and predictive analysis tools. We'll also talk about the reasons that cloud-based analysis tools will leave the rest further and further behind.
  • Considerations for Ramping to a Big Data Network Monitoring Architecture, Part 2 May 7 2014 5:00 pm UTC 60 mins
    This is a continuation of our 2-part series on Big Data Visibility with Network Packet Brokers (NPBs).

    Big data techniques and technologies can be powerful tools for scaling network monitoring and forensics. They can also facilitate new use cases for network data, potentially beyond the scope of Operations.

    Gordon Beith, Director of Product Management at VSS Monitoring, will discuss practical considerations for migrating to a Big Data Visibility Architecture, including:
    • Accommodating network volume, velocity and variety using sophisticated hardware preprocessing and APIs
    • Metadata versus flow statistics versus full packet capture – considerations and use cases for each
    • Open versus proprietary formats for storage
    • Pros and cons of integrated capture/storage/analysis solutions versus separate capture/ storage solutions coupled with virtualized analysis probes
    • Addressing retrieval in an “open” forensics model
    • Leveraging a distributed computing framework for processing large-scale data stores
  • Leveraging a Big Data Model in the IT domain, Part 1 Apr 30 2014 5:00 pm UTC 60 mins
    This is part 1 of our 2-part series on Big Data Visibility with Network Packet Brokers (NPBs).

    Even as network data has exploded in volume, velocity and variety, network monitoring solutions have been behind the curve in adopting new technologies and approaches to cost-effectively scale and accommodate a widening virtualization trend. Customers are demanding greater freedom in how applications are deployed and are moving to a consolidated, shared model of data using big data frameworks, such as Hadoop, which enable large-scale processing and retrieval for multiple stakeholders.

    Join Andrew R. Harding, VP of Product Line Management at VSS Monitoring, as he discusses:
    - Big data and its implications for network monitoring and forensics
    - Why network monitoring solutions are lagging from a virtualization standpoint and why this is a problem for network owners
    - How certain traditional network monitoring functions will eventually be offloaded to adjacent technologies
    - How Network Packet Brokers can accelerate the adoption of virtualized probes, “open” storage, and big data technologies within network management / monitoring
    • How a Big Data Visibility architecture can enable network data to become part of the “big data store,” allowing it to integrate with the rest of enterprise data
  • How Codenomicon Discovered Heartbleed Solutions For Protecting Your Organization Apr 24 2014 4:00 pm UTC 60 mins
    Presented by the experts with the facts.

    The Inside Story of the Discovery, the Timeline and Solutions to Protect Your Organization. Finally, All of Your Questions Answered.
  • Big Data In The Age Of The Customer Recorded: Apr 17 2014 41 mins
    Big data has received a lot of media attention in the last two years, but the noise is drowning out the real opportunity. That is to use more available data to help your business win, serve and retain increasingly powerful customer. This webinar will present Forrester's point of view on why most firms are missing the true nature of big data, and illustrate how leaders are using data that is more diverse, messy, and large to make better competitive decisions.
  • Turn Big Social Data Into Little Actionable Data Recorded: Apr 16 2014 21 mins
    Chances are that Justin Bieber is not the top influencer for your business, but chances are also good that he has more likes and followers than the top influencer for your business. Every day your influencers are generating signals in social media for your business and it's up to you to identify, interpret and leverage these signals. With billions of posts every day, social data is big data and it is continuing to grow.

    Join us for our live webinar and you will learn how to:

    -Design and identify key influencer indicators in social data
    -Create a unique influencer model for your business
    -Identify your businesses' top influencers
    -Leverage your influencers for marketing, brand advocacy, and more
  • Beware the Giraffes in your Data! Recorded: Apr 16 2014 35 mins
    Savvy marketers spend a lot of their time analyzing big data, on the lookout for exciting new insights which can translate into action items and strategic advantage. Unfortunately, “giraffes” in their data – portions of data which dominate the rest of it – often hide important insights and lead to erroneous strategic decision making. In this webinar, we will discuss how to spot giraffes in your data and how to make sure they’re not misleading you.
  • Hadoop for Big Data Storage and Processing in the Enterprise Recorded: Apr 15 2014 46 mins
    A modern Hadoop-based data platform is a combination of multiple source projects brought together, tested, and integrated to create an enterprise-grade platform. In this session, we will review the Hadoop projects required in a Windows Hadoop platform and drill down into Hadoop integration and implementation with Windows, Microsoft Azure, and SQL Server.
  • Building a Hadoop Data Warehouse with Impala Recorded: Apr 14 2014 38 mins
    Impala raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
    In this webinar featuring Impala architect Marcel Kornacker, you will explore:

    * How Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive, but also that of a proprietary analytic DBMS over its own native columnar format.
    * The current state of, and roadmap for, Impala's analytic SQL functionality.
    * An example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.
  • Building a Hadoop Data Warehouse: Hadoop 101 for EDW Professionals Recorded: Apr 10 2014 67 mins
    Dr. Ralph Kimball describes how Apache Hadoop complements and integrates effectively with the existing enterprise data warehouse. The Hadoop environment's revolutionary architectural advantages open the door to more data and more kinds of data than are possible to analyze with conventional RDBMSs, and additionally offer a whole series of new forms of integrated analysis.

    Dr. Kimball explains how Hadoop can be both:

    A destination data warehouse, and also
    An efficient staging and ETL source for an existing data warehouse
    You will also learn how enterprise conformed dimensions can be used as the basis for integrating Hadoop and conventional data warehouses.
  • Is your Data Center Ready for Big Data? Recorded: Apr 10 2014 45 mins
    The C-Suite in every organization is obsessed with the buzz around Big Data and according to industry pundits almost 9 out of 10 organizations today have included this growing trend in their IT plans for 2014. But we all know that when it comes to execution and extracting true business value out of Big Data, only a fraction of the companies are successful. Believe it or not, infrastructure platforms play a key role in demonstrating the power of performance to deliver blazing speed analytics and makes all the difference if you can get a query answered in 3 seconds vs. 3 hours!

    Welcome to the new style of IT and a paradigm shift towards converged infrastructure or as IDC states it as the “3rd platform”, where you are no longer bound by the limitations of your traditional datacenter. Instead of plumbing or retrofitting your existing landscape you now have proven alternatives to augment your legacy environment with leading innovative platforms that are purpose built, seamlessly integrated and can be deployed in days vs. months. Learn from the best practices of some of our customers who have embarked on this journey already and paved the way for handling Big Data!
  • How to get started on Big Data Analytics with Google BigQuery Recorded: Apr 10 2014 44 mins
    Google BigQuery is an analytics service offered in the cloud as a part of Google Cloud Platform portfolio. In this webinar we will explore how customers are using BigQuery to jumpstart their BigData Analytics initiatives. We'll also provide details on how you can get started with BigQuery.
  • Help Yourself to Big Data: The Keys to Using Self-Service BI Recorded: Apr 10 2014 42 mins
    Embracing big data allows you to improve business performance. During this webinar, we will discuss ways to use newer concepts, capabilities and technologies available to empower business users with business intelligence (BI) and analytics on big data -- all based upon field tested deployments and customer-proven success. Among the questions we will answer:

    - How do you change the relationship between users and big data, give them the means to embrace it, work with it very interactively, in almost near instantaneous fashion?

    - How can you enable users to apply more sophisticated BI (i.e. advanced analytics), even on vast amounts of data?

    - How do you reconcile and enable various levels of analytics maturity between users, from simple executive-ready dashboards, all the way to sophisticated programmers and data scientists?

    Join us to learn how to address volume, variety, velocity of big data, without compromising speed to results, or sophistication of analytics.
  • Rethink Analytics with an Enterprise Data Hub Recorded: Apr 10 2014 57 mins
    Cloudera's Director of Data Science Josh Wills and Senior Manager, Solutions Marketing Sandy Lii explain how advanced analytics with an enterprise data hub will allow you to use all of your data, do more with your data, and deliver insights sooner. This breaks-down the barriers caused by increasingly high data storage and/or processing costs, silo-ed data sources, complex management and security, and lack of analytics agility.
  • Leveraging Social Data For More Informed Business Decisions Recorded: Apr 10 2014 45 mins
    Social data is exploding and companies of all sizes are struggling to understand how to make sense of the data. Even the largest and most sophisticated brands are just scratching the surface of social business potential.

    DataSift is powering the next generation of social business for leading brands, social technology companies and agencies around the globe. We are transforming how organizations convert social data into key insights that drive better decisions for brand leaders and market disrupting companies.

    Join Chris Parsons, Product Marketing Manager at DataSift, for a live webinar on Thursday, April 10, where he will share how to unlock the potential of social data, including:

    *Making social make sense in the real world – what the core tenets of a successful social strategy are and where you should start

    *How to move from data to insight: A closer look at common use cases and brand best practices

    *How to get started and get to fast impact: Actionable steps and best practices to unlock social insight and value for your business
  • Solve for Ambiguity: Using Design to Create Value from Data Recorded: Apr 9 2014 43 mins
    Regardless of whether you call it "business intelligence", "big data", "analytics" or just plain old "math", we have many tried and true techniques for dealing with uncertainty. But ambiguity is a separate matter and, at least in my experience, is the hardest part of creating value from data.

    During this talk, I will illustrate how the design process can be used to solve ambiguous problems by drawing on projects we've done at Datascope.
Managing and analyzing data to inform business decisions
Data is the foundation of any organization and therefore, it is paramount that it is managed and maintained as a valuable resource.

Subscribe to this channel to learn best practices and emerging trends in a variety of topics including data governance, analysis, quality management, warehousing, business intelligence, ERP, CRM, big data and more.
Try a powerful marketing platform for your videos and webinars. Learn more  >

Embed in website or blog

Successfully added emails: 0
Remove all
  • Title: NoSQL, Hadoop and MapReduce: Building a Modern Data Infrastructure that Works
  • Live at: Aug 21 2013 6:00 pm
  • Presented by: Jeffrey Kelly, Wikibon; Joey Jablonski, Kitenga; Christopher Biow, 10gen; Ron Bodkin, Think Big Analytics; John Akred, SVDS
  • From:
Your email has been sent.
or close
You must be logged in to email this