What is Data Warehouse Analytics?
Data warehouse analytics leverages large volumes of disparate data which has been centralized in a single repository, known as a data warehouse, for use in data analysis, data discovery, and self-service analytics. The emergence of data warehouses has largely been driven by the need for a higher level view of a business metrics and the need to de-compartmentalization business applications used in different departments.
We’ll talk a little bit more about what a data warehouse actually is, and how it differs from some of the other data repository technologies of today, but first let’s discuss the goal of data warehouse analytics in the context of an actual business example:
Let’s imagine that you create an enterprise resource planning software. For those who are unfamiliar with ERP systems, their purpose is to help manage important parts of a business from planning, inventory, sales, finance, to human resources. They provide value to organizations by simplifying the majority of business workflows in a business, but also provide the added benefit of providing high-level views of a business through the data it collects. This data can be extremely useful for identifying potential areas of improvement in the business.
One of the difficulties of developing and implementing an ERP system is the inclusion of outside data systems. For example, a customer may use your system for planning, inventory, finance, and human resources, but may use a separate system for sales. So how do we help include the information from that separate system in our system? What if your customer had other disparate data they wanted to include in this centralized system for use in decision making and the discovery of new patterns in their business? Adding access to that information can be done through the application layer, but that requires developing complex API connections. Adding access can also be done in the data layer, and that’s where a data warehouse comes in.
So What is Data Warehouse?
A data warehouse is a repository for storage of disparate data that an organization’s business systems collect and use for guided decision making.
The main goal of a data warehouse is to house a lot of data from a variety of sources for reporting and analysis. The benefit of a data warehouse is that it can accommodate any number of applications and integrate their data for a “single version of the truth.” This allows analysis from any point of your organization to be consistent and all your users are on the same page.
Data warehouses are built with the single version of the truth in mind. However, they can include an end user point-of-view known as a data mart, a subset of a data warehouse dedicated to a specific line-of-business. When considering the creation of a data warehouse, there are two approaches: top down and bottom up. The top-down approach focuses on creating the central data warehouse first and then building data marts for specific user groups. In contrast, the bottom-up approach focuses on building dedicated data marts first and then combining them to create the central data warehouse.
What’s the Real Difference between a Database, Data Warehouse, and Big Data?
A data warehouse is just one type of data repository, and as the growth of data being generated by modern applications increases, it’s important to know the distinction and usage of the most commonly used data repositories for analytics.
The most common type of data repository is a relational database. A relational database collects data and organizes it into formal tables which can be transitionally accessed and connected to in different ways depending on the need of the person using the data. The way information is pulled from a relational database is via structured query language (SQL). SQL statements, to put it simply, are defined queries which specify what tables or what parts of tables should be pulled. SQL is very flexible and can be used for a wide variety of purposes.
Relational databases can be easily used by most BI applications and depending on the application you can mash up different databases for deeper data discovery. But relational databases do have their limitations in this use case specifically. The functional construct which allows for data mashup of relational databases inside of a BI application (that is if the BI application isn’t utilizing a data warehouse), is a metadata layer. A metadata layer describes information about other data, and when used in this context means that the data is not physically combined but rather the definitions are stored in one single place and the data from different sources can be used in one place. The physical structure of the data too remains unchanged; meaning the ability to draw connections between data sources can be limited in some instances. For the vast majority of use cases though, relational databases provide more than enough performance and capabilities for use in analytics.
Where data warehouse analytics comes in, is as previously described, when the physical structure of a multitude of data resources is disparate and too complex to easily combine in a metadata definition layer. Data warehouses overcome this problem by physically combining these data sources into a single data repository. The backend infrastructure of which may contain tools like ETL tools which allow different data structures to be transformed into a more easily understood data vault. In scenarios with many disparate datasources, this optimizes the act of pulling out information for analysis and is thereby used for this purpose. Data warehouses are also designed to be able to query larger amounts of structured data than utilizing multiple databases. This is because the more complex the query the more processing power is needed to perform it. When all of your data is in a single source and structure, your queries become simpler.
Big data sources, as its name implies, also hold large amounts of data. The difference being that data help in a big data source is normally unstructured or has only some semi-formal structure. This has its own benefits in and of itself in that analysis which requires multiple iterations of querying (i.e. querying different levels of data in one statement) can be simplified.
- Data warehouse analytics enables you to find meaningful patterns in your data through combining multiple data sources into a more easily understood data source.
- The ability to join multiple data sources in a data warehouse can come with performance gains when compared to utilizing multiple relational databases with widely different data structures.