Data lakes are centralized data repositories. Data needed by data scientists is physically copied to a data lake which serves as a one storage environment. This way, data scientists can access all the data from only one entry point – a one-stop shop to get the right data. However, such an approach is not always feasible for all the data and limits it’s use to solely data scientists, making it a single-purpose system.
So, what’s the solution?
A multi-purpose data lake allows a broader and deeper use of the data lake without minimizing the potential value for data science and without making it an inflexible environment.
Attend this session to learn:
• Disadvantages and limitations that are weakening or even killing the potential benefits of a data lake.
• Why a multi-purpose data lake is essential in building a universal data delivery system.
• How to build a logical multi-purpose data lake using data virtualization.
Do not miss this opportunity to make your data lake project successful and beneficial.
How do you avoid your enterprise data lake turning into a so-called data swamp? The explosion of structured, unstructured and streaming data can be overwhelming for data lake users, and make it unmanageable for IT. Without scalable, repeatable, and intelligent mechanisms for cataloguing and curating data, the advantages of data lakes diminish. The key to solving the problem of data swamps is Informatica’s metadata driven approach which leverages intelligent methods to automatically discover, profile and infer relationships about data assets. Enabling business analysts and citizen integrators to quickly find, understand and prepare the data they are looking for.Read more >
The shelf life of data is shrinking. A streaming shift is taking place and use cases such as IoT connected cars, real-time fraud detection and predictive maintenance using streaming analytics are becoming commonplace. You too can switch to the fast data lane with Informatica, leveraging Kafka and other big data technologies. So shift gears and change lanes with us while we take you on a journey into the world of streaming data.Read more >
Data is the new currency for most organizations and data volumes are continuing to grow at an explosive rate. While the advantage of collection of such large volumes of data is obvious, protection of this data from cybercriminals and malicious actors is becoming increasingly difficult. Conventional security mechanisms are failing and large-scale security breaches, despite increasing security spends, are becoming commonplace. This, along with increasingly stringent regulatory requirements and privacy laws have brought “Data Security” – protection of the data itself whether in motion, in use or in transit into strong focus. While there has been an increasing focus on Data Centric Security, the solution landscape is fractured and enterprises are still struggling to identify and deploy long term solutions with minimal disruption to existing investments and processes. This talk will focus on the current state of Data security and offer pointers to how organizations can embark on a long-term Data Centric journey which truly adds business value.Read more >
Today's enterprises need broader access to data for a wider array of use cases to derive more value from data and get to business insights faster. However, it is critical that companies also ensure the proper controls are in place to safeguard data privacy and comply with regulatory requirements.
What does this look like? What are best practices to create a modern, scalable data infrastructure that can support this business challenge?
Zaloni partnered with industry-leading insurance company AIG to implement a data lake to tackle this very problem successfully. During this webcast, AIG's VP of Global Data Platforms, Carlos Matos, and Zaloni CEO, Ben Sharma will share insights from their real-world experience and discuss:
- Best practices for architecture, technology, data management and governance to enable centralized data services
- How to address lineage, data quality and privacy and security, and data lifecycle management
- Strategies for developing an enterprise-wide data lake service for advanced analytics that can bridge the gaps between different lines of business, financial systems and drive shared data insights across the organization
If you feel like you don’t trust your data, there’s probably a good reason. It happens all the time; companies implement analytics, customize their solutions and don’t audit the implementation to ensure ongoing data accuracy. This leads to multiple inaccuracies, gaps in tracking, and — even worse — information that’s simply missing. Inaccurate data can send a brand down the wrong path, leading to bad decisions and additional costs for tools and resources that could have otherwise been avoided.
Join and learn:
- What data quality is and why it’s critical to an organization's overall success
- Why your data is a mess and how to identify the warning signs of poor data quality
- Best practices to ensure clean and quality data and how to take back control
- And so much more!
The webinar will conclude with a Fireside Chat with live questions from the audience on all things data quality.
As data is growing at an exponential rate, organizations are increasingly looking to leverage streaming data from mobile devices, wearable technology and sensors for real-time processing and analytics. Gartner estimates that, “By 2020, 70% of organizations will adopt data streaming to enable real-time analytics.” However, implementing real-time data ingest, processing and delivering insights at scale requires infrastructure with zero latency and easy access to information when it is required.
In the webinar, we’ll discuss:
- Adopting Modern Data Lake with the Hortonworks Data Platform (HDP)
- Accelerating real-time data analytics with Hortonworks Data Flow (HDFTM) and Attunity to build a data lake
- Solving challenges with real-time data ingest and managing data in motion workloads
Join subject matter experts from IBM and Hortonworks for a joint webcast to help you accelerate real-time data analytics and manage your data workloads efficiently.
As data analytics becomes more embedded within organizations, as an enterprise business practice, the methods and principles of agile processes must also be employed.
Agile includes DataOps, which refers to the tight coupling of data science model-building and model deployment. Agile can also refer to the rapid integration of new data sets into your big data environment for "zero-day" discovery, insights, and actionable intelligence.
The Data Lake is an advantageous approach to implementing an agile data environment, primarily because of its focus on "schema-on-read", thereby skipping the laborious, time-consuming, and fragile process of database modeling, refactoring, and re-indexing every time a new data set is ingested.
Another huge advantage of the data lake approach is the ability to annotate data sets and data granules with intelligent, searchable, reusable, flexible, user-generated, semantic, and contextual metatags. This tag layer makes your data "smart" -- and that makes your agile big data environment smart also!
This 1-hour webinar from GigaOm Research brings together leading minds in cloud data analytics, featuring GigaOm analyst Andrew Brust, joined by guests from cloud big data platform pioneer Qubole and cloud data warehouse juggernaut Snowflake Computing. The roundtable discussion will focus on enabling Enterprise ML and AI by bringing together data from different platforms, with efficiency and common sense.
In this 1-hour webinar, you will discover:
- How the elasticity and storage economics of the cloud have made AI, ML and data analytics on high-volume data feasible, using a variety of technologies.
- That the key to success in this new world of analytics is integrating platforms, so they can work together and share data
- How this enables building accurate, business-critical machine leaning models and produces the data-driven insights that customers need and the industry has promised
- How to make the lake, the warehouse, ML and AI technologies and the cloud work together, technically and strategically.
Register now to join GigaOm Research, Qubole and Snowflake for this free expert webinar.
Getting your company ready for GDPR isn’t about putting a few new processes in place — it’s about rethinking your entire approach to personal data, including how to get value from it. For decades, companies have collected and stored all kinds of personal information “just in case” they ever needed it.
GDPR requires a different approach. You need to be proactive in thinking about how to get value from your data, and you need to understand exactly what your company is doing with personal data and why.
Join Jill Reber and Kevin Moos of Primitive Logic to learn:
- How to work with third parties who process personal data on your behalf
- How preparing for GDPR helps you understand your data on a whole new level (and why that’s a good thing)
Do you know that your existing investments in Informatica PowerCenter can fast track you to Big Data and data lake technologies? We will demonstrate why our customers are moving from data warehouses to data lakes, leveraging big data and cloud ecosystems and how to do this rapidly, leveraging your existing investments in Informatica technology.Read more >
The data contained in the data lake is too valuable to restrict its use to just data scientists. It would make the investment in a data lake more worthwhile if the target audience can be enlarged without hindering the original users. However, this is not the case today, most data lakes are single-purpose. Also, the physical nature of data lakes have potential disadvantages and limitations weakening the benefits and possibly even killing a data lake project entirely.
A multi-purpose data lake allows a broader and greater use of the data lake investment without minimizing the potential value for data science or for making it a less flexible environment. Multi-purpose data lakes are data delivery environments architected to support a broad range of users, from traditional self-service BI users to sophisticated data scientists.
Attend this session to learn:
* The challenges of a physical data lake
* How to create an architecture that makes a physical data lake more flexible
* How to drive the adoption of the data lake by a larger audience
With new technologies such as Hive LLAP or Spark SQL, do you still need a data warehouse or can you just put everything in a data lake and report off of that? No! In the presentation, James will discuss why you still need a relational data warehouse and how to use a data lake and an RDBMS data warehouse to get the best of both worlds.
James will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. He'll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution, and he will put it all together by showing common big data architectures.
Selling your house in the financial crisis-stricken Greece is up to this day a great ordeal. When faced with such a challenge, I was baffled by the sparsity of conclusive data on land value at my birthplace city, Thessaloniki. Embarking on a personal mission and collecting and processing more than 10K online housing ads together with open data, I managed to render an insightful interactive visualization of the actual real estate values on borough and city block level that was published through the Greek media. Join me on this thought process journey to find out how to
o Gather vast online data with simple scripting
o Combine your data with open data into meaningful structures
o Create interactive data visualizations that have an actual impact @ infographeo.com
This will be an interactive session, so please feel free to bring your thoughts and questions to share during the session.
Join our Big Data Activation Report Webinar where our CEO Ashish Thusoo will go in-depth into our 2018 Qubole Big Data Activation Report findings and share how customers are using multiple engines to get the most out of their big data.
The report analyzes usage data from over 200 Qubole customers to provide answers to key questions such as:
- How fast is usage of open source big data engines like Apache Spark, Presto and Apache Hive/Hadoop growing?
- What engines are used most and for what?
- What engines and big data tools are rising stars?
- How successful are companies at providing their users access to data?
- What are the cost saving benefits of doing big data in the cloud?
You'll come away with both hard data and a few ideas for how to get more out of your big data initiatives.
This webinar is part of BrightTALK's Ask the Expert Series.
Join Christopher Brown, CTO of Uptime Institute and Kelly Harris, Senior Content Manager at BrightTALK, as they take a technical deep dive into data center infrastructure management in 2018.
Chris will answer questions related to trends from the field:
- What really makes a well-run data center?
- The changes we are seeing in the industry
- What Tier level do I need for my data center(s)?
- What can you tell us about the typical issues we see every day?
- What are the challenges ahead for data centers?
Audience members are encourage to send questions to the expert which will be answered during the live session.
Data visualization requires data to be prepared before any meaningful analysis can be conducted. Finding insights, making correct observations and taking actions to drive outcomes therefore don't just depend on the way information is communicated but also on the preparation preceding the analysis.
In this webinar we discuss the key steps for data preparation to enable effective analysis and visual exploration of the data. We will show practical examples from projects we have worked on as well as share some simple data preparation ideas from our Makeover Monday challenges.
Lastly, we will show an example of how data preparation can enrich a dataset and enable further analysis.