The 10-year-old Hadoop is a quick learner, but companies are struggling to keep up with this new tool.
Let us consider the leaders of industry who are doing prodigious things with data lakes.
- CapitalOne appeared for a talk in multiple events about its Spark and Hadoop-based fraud detection, security work and its big data analytics.
- Ford relies on Hadoop for car connectivity capabilities. Uploading critical data points for consolidated insight and scrutiny while filtering and decision-making is done at the sensor and car level.
- Macy is doing erudite cross-channel analysis, customers to acquire out-of-stock and online-only items at Macys.com & driving tailored campaigns inspiring online customers to shop in stores
Three gaps i.e. data cataloging and metadata management, data management and governance & data discovery self-service data prep are mostly filled by two vendors that compliments what is trending in Hadoop. Collibra, Podium Data, Alation and Waterline etc. are a few next-gen vendors who have emerged in the big data era. Automation and repeatability are been brought to governance and data lake management by both the vendors. Apart from that the vendors also make the data lake more accessible.
A data lake is not a spare for an orthodox enterprise data warehouse but few data analytics and data processing workloads are moving to Hadoop.