What is Data Lake?

A data lake is a processing engine and repository that holds large amounts of raw data in its native format. Data lakes are data agnostic. They hold data that is structured, semi-structured, and unstructured and process data in very unique ways. Here are some key characteristics of a data lake:

  • Data lakes do not care for data structure and stores all types of data. This stands in contrast to data warehouses which require data to be modeled once written.
  • They use a schema-on-read processing, which enables data to just be stored without structure. Essentially, data is given a model once it is being retrieved.
  • Storage is expansive, scalable and low cost. Data lakes often use a form of distributed storage with low cost commodity hardware to enable large sets of data to be stored and processed.
  •  Data lakes are very complex systems thus have narrow user base. The typical business user cannot possibly navigate a data lake. Their complexity demands for data scientists and other tech savvy individuals to utilize their power.

