Data has ability to create new opportunities and revenue models for organisations, but how to manage all that data generated and drive useful insights is something IT leaders are still grappling with. For good reason, data lakes continue to get a lot of attention, however misconceptions remain. Read these five data lake myths before taking the plunge….
Myth one: A data lake is a product which you can buy
A data lake is a reference architecture that is independent of technology. It’s an approach that an organisation can use to put data at the heart of its operation that includes governance, quality and management of data, thereby enabling self-service analytics to empower all consumers of data.
As helpful as it would be, a data lake is not a product that you can just purchase. You can’t just buy any data warehouse solution and call it a data lake.
Myth two: There is only one data lake solution
A data lake could be developed and used based on many relational database management systems – you’re not tied into the prominent names, there are lots of vendors and systems available.
A data lake combines a variety of technologies to establish systems of insight to provide agile data exploration for data scientists to address business needs.
Myth three: Data lakes are for dumping data (and forgetting about governance)
While software and hardware are key components of a data lake solution, equally important is the cataloguing of data, quality of data, and data governance and management processes.
Just as some data warehouses have become massive black holes from which vast amounts of data never escape, a data lake can become a data swamp if good governance policies are not applied.
All data in a data lake must be catalogued, accessible, trusted, and usable; active governance, quality and information management are indispensable parts of the data lake.
Myth four: Delivering access to the data lake success is a measure of success
Having data in a central location is not a true analytics solution. The goal is to run data analyses that produce meaningful business insights; to uncover new revenue streams, customer retention models or product extensions.
But that data must be trusted, relevant, and available for all consumers of data. A data lake needs an intelligent metadata catalogue that can relate to business terminology, moving cryptic-coded data and making it more understandable with context. It will also attribute to the source and quality of data from both structured and unstructured information assets and governance fabric to ensure that information is protected, standardised, efficiently managed, and trustworthy.
Myth five: The data lake is a replacement for a data warehouse
The data lake can incorporate multiple enterprise data warehouses (EDW), plus other data sources such as those from social media or IoT. These all come together in the data lake where governance can be embedded, simplifying trusted discovery of data for users throughout the organisation.
Therefore, a data lake augments EDW environments to allow, enable or empower data scientists and analysts to easily explore their data; discover new perspectives, insights, and to accelerate innovation and business growth.
Thanks to mobile devices, apps and IoT, for example, the amount of unstructured data is growing exponentially, so it’s no wonder the demand for data storage is intensifying. According to IDC, data lake adoption is expected to rise from 30% of organisations worldwide, to 90% within the next three years; to see if this approach could be a solution for your organisation, download our latest infographic if you’d like to know more and identify some practical next steps, why not register for one of our Data Services Workshops?
Find out more