Join us for this next segment of “Under the Hood” that focuses on the database designer feature of HPE Vertica.
Learn how the schema designs created by Database Designer provide optimal query performance for your most challenging analytic workloads. Database Designer uses smart strategies to create efficient schema designs that can be deployed, changed and re-deployed by almost anyone, even those without advanced database knowledge.
Earlier this year, the open source community delivered the Stinger Initiative to improve speed, scale, and SQL semantics in Apache Hive. Now Stinger.next is underway to build on those initial successes.
Join this 30-minute webinar with Hortonworks co-founder Alan Gates and Hortonworks Hive product manager Raj Baines to discuss SQL queries in HDP 2.2: ACID transactions and the cost based optimizer. You will also hear about the road ahead for the Stinger.next initiative.
Owen O’Malley and Carter Shanklin host the second of our seven "Discover HDP 2.1" webinars. They discuss the Stinger Initiative and the improvements to Apache Hive that are included in HDP 2.1: faster queries with Hive on Tez, new SQL semantics, and more.Read more >
IBM has taken query tuning to a new level with IBM Data Studio. More detail is available than ever before. However, the tool does take some getting used to, especially for folks that are used to a green screen based query tuning experience. This presentation introduces you to IBM Data Studio and gets you started tuning queries.Read more >
Analysing big data quickly and efficiently requires a data warehouse optimised to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyse big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas and tune query and database performance.
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to migrate from existing data warehouses, optimise schemas and load data efficiently
• Learn best practices for managing workload, tuning your queries and using Amazon Redshift's interleaved sorting features
Who Should Attend:
• Data Warehouse Developers, Big Data Architects, BI Managers and Data Engineers
Other sessions on the AWS Big Data Webinar Day - 28 September:
10:00 - 11:00 GMT | Getting Started with Big Data on AWS
Register » https://www.brighttalk.com/webcast/9019/221047?utm_campaign=Brighttalk
11:15 - 12:15 GMT | Architectural Patterns for Big Data on AWS
Register » https://www.brighttalk.com/webcast/9019/221063?utm_campaign=Brighttalk
12:30 - 13:30 GMT | Building Big Data Solutions with Amazon EMR and Amazon Redshift
Register » https://www.brighttalk.com/webcast/9019/221145?utm_campaign=Brighttalk
Part three in a five-part series, this webcast will be a demonstration of the integration of Hortonworks HDB and Apache Hadoop YARN. YARN provides the global resource management for HDB for cluster-level hardware efficiency, while the in-database resource queues and operators provide the database and query-level resource management for workload prioritization and query optimization. This webinar will focus on demonstrating the installation process as well as discuss the various YARN and HDB parameters and best practice settings.Read more >
Turbo-Charge BI on Hadoop: The Time is Now
Want to turn your Hadoop cluster into a super-powerful, analytics data warehouse? Need to run BI queries on Hadoop at top speed?
Watch this recording of a live best practice session. You'll see how leading companies are super-charging their BI on Hadoop by combining the power of Tableau with the scale of Impala, and accelerating it all with AtScale. In this session, leaders from Cloudera and Tableau share a real-world perspective on
How to get super-fast performance from BI queries on Hadoop
Deliver powerful self-service visualization directly on Hadoop
Leverage existing BI and Hadoop investments to deliver more value to more users
We will show you how Kudu makes it easier for you to perform both real-time monitoring and ad hoc analytic queries on the same set of data.Read more >
GraphFrames bring the power of Apache Spark DataFrames to interactive analytics on graphs.
Expressive motif queries simplify pattern search in graphs, and DataFrame integration allows seamlessly mixing graph queries with Spark SQL and ML. By leveraging Catalyst and Tungsten, GraphFrames provide scalability and performance. Uniform language APIs expose the full functionality of GraphX to Java and Python users for the first time.
In this talk, the developers of the GraphFrames package will give an overview, a live demo, and a discussion of design decisions and future plans. This talk will be generally accessible, covering major improvements from GraphX and providing resources for getting started. A running example of analyzing flight delays will be used to explain the range of GraphFrame functionality: simple SQL and graph queries, motif finding, and powerful graph algorithms.
For experts, this talk will also include a few technical details on design decisions, the current implementation, and ongoing work on speed and performance optimizations.
Learn How to Store and Query Time Series Data in NoSQL and Other Use CasesRead more >
Zoomdata, developers of the world’s fastest big data exploration, visualization & analytics platform, lets business users see and interact with data in all new ways.
Designed mobile and touch first, its patented micro-query architecture delivers results on billions of records in seconds and gives users a single plane of access for bridging old data and new data.
Zoomdata is backed by Accel Partners, B7, Columbus Nova Technology Partners, NEA and Razors Edge Ventures.
What you will learn in this webinar:
-Learn how Big Data is not just about Hadoop, but the wide range of new and existing frameworks inside and outside your enterprise.
-Learn how Zoomdata can query across multiple data sources to bring a single view of data across disparate data sources.
-See how business users can combine multiple sources without waiting for a data architect to set it up.
-See how the power of Apache Spark enables Zoomdata Fusion at Big Data scale.
-Learn how to access Zoomdata Fusion and more cutting-edge features in the Zoomdata Early Access Program.
Join DDN experts to see how organizations are leveraging developments in storage infrastructure to extract the greatest possible value from their data. Material covered will include general architectural concepts on building storage infrastructure for big data analytics, as well as a detailed discussion of real world applications and benchmarking results with SAS and Vertica platforms. Specifics on the impact to data ingest speed, query performance, flexibility, ease of management and overall scalability will also be covered.Read more >
This Tech Talk continues the "deep dive" on all all new IBM DB2 10 and IBM Infosphere Warehouse 10 features. Matthias Nicola from IBM labs explains the new Time Travel Query feature, which is a collection of bitemporal data management capabilities. These capabilities include temporal tables, temporal queries and updates, temporal constraints, and other functionality to manage data as of past or future points in time. Time Travel Query helps improve data consistency and quality across the enterprise and provides a cost-effective means to address auditing and compliance issues. As a result, organizations can reduce their risk of noncompliance and achieve greater business accuracy.
The presentation will discuss:
· How to create and manage temporal tables in DB2 10
· How insert, update, delete, and query data for different points in the past, present, or future
· How to use DB2 as a time machine
Please note that this webcast is conducted at 12:30 PM ET. You may see this time translated into your local time zone.
Join us for this segment of “Under the Hood” of HPE’s Big Data Platform to learn about preaggregating data to accelerate popular queries in the Vertica SQL database.
While HPE’s Vertica database can aggregate billions of rows per second, sometimes there's no substitute to having the answer to common queries precomputed and "ready to go". Learn about Live Aggregate Projections, how they are implemented, and what functionality is supported in the new "Excavator" release, so you can take full advantage for dashboards, reports, and other "serve" use cases that demand subsecond response times.
The world of commercial banking moves swiftly. B2B clients have complex needs and offer great opportunity for banks who can move fast, resolve queries quickly and provide a premium service. If relationship managers aren’t anticipating and responding to their client’s every need then business can easily be taken elsewhere. However, with hundreds of clients to manage at once, it is often impossible to keep them all happy.
For one of the largest commercial banks in the UK, Tableau provided the perfect solution to create client dashboards to help relationship managers, product partners, and service and operational staff to all easily access and take action on client feedback, review product opportunities and keep up to date with client industry news.
Learn to Store and Query Times Series Data in NoSQL and Other Use CasesRead more >
Part two in a five-part series, this webcast will be a demonstration of Pivotal Extension Framework (PXF), an extensible framework that allows Hortonworks HDB to query external system data. This is really useful for both data loading, and also avoiding data loading for data that doesn’t need to reside within the database instance. PXF includes built-in connectors for accessing data inside HDFS files, Hive tables via Catalog, and HBase tables.Read more >
Chief Technologist, Ruhollah Farchtchi gives a presentation at Spark Summit East, 2016 on the Interactive Visualization of Streaming Data Powered by Spark.
Much of the discussion on real-time data today focuses on the machine processing of that data. But helping humans visualize real-time streams is just as important. Visualizing real-time data introduces new UX and usability challenges for any developer embedding analytics into applications, especially when the target end users are business users and not data scientists. Self-service, interactive, subsecond response time to ad hoc queries — these are the new UX requirements for any enterprise visualizing real-time data. Streaming data also lends itself to new paradigms of interaction with the stream itself, like being able to pause, rewind and replay a stream. This talk is a case study in how and why Zoomdata built a “Data DVR” capability using Spark and Spark Streaming. We will describe the required user experience, the overall architecture and the specific use of Spark and Spark Streaming. We will describe the design considerations that led us to choose Spark Streaming over alternatives like Storm. We will show how end users configure the real-time increment and a historical retention window without writing any code themselves. We will also show how pause, rewind, replay is implemented in Spark and how the solution supports both real-time and historical analysis in the same architecture. Attendees will walk away with knowledge of Spark Streaming and how users can interactively work with streaming data. They will develop familiarity with the challenges of a lambda architecture and providing a consistent analytic experience over streaming and historical data.
In this webcast, Patrick Wendell from Databricks will be speaking about Apache Spark's new 1.6 release.
Spark 1.6 will include (but not limited to) a type-safe API called Dataset on top of DataFrames that leverages all the work in Project Tungsten to have more robust and efficient execution (including memory management, code generation, and query optimization) [SPARK-9999], adaptive query execution [SPARK-9850], and unified memory management by consolidating cache and execution memory [SPARK-10000].