What is a Data Warehouse?
A data warehouse is a data repository that captures important data about the business. Modern usage of the term data warehouse includes several key features:
A data warehouse is a database that is:
- Separate from operational applications
- Organized by subject and usually in a traditional database schema form
- Integrated and federated such that inbound data is transformed to meet structured warehouse data standards
- “Immutable” in that no data is overwritten allowing for “time travel”
Why Bother With a Data Warehouse?
Applications databases are busy supporting the applications themselves. For example, lets say a company is running SAP. The business wants some information from the SAP database to be joined with data from two other applications.
Application databases are busy supporting their applications. Their schemas are also designed to support the application, not our new report. Therefore the purpose of a data warehouse is to move operational data into one location such that the business can interrogate that data without disrupting the day to day business.
This was really important through the 80s, 90s, and 00s. Relational databases (or SQL databases) became the industry standard for employees to organize and retrieve information from client servers on their computers.
As technology continued to grow, businesses adopted different databases and applications, which resulted in fragmented data and inconsistent information in different locations.
Organizations saw a need to consolidate and store information electronically, pulled from various sources, which could then be searched to extract data for reporting and analytics: enter the data warehouse.
Data warehouses were created to aggregate operational data into one location, so that insights gained from all of the historical information could be used to support business intelligence decisions.
Application databases are busy supporting their applications. Their schemas are also designed to support the application, not our new report. Therefore the purpose of a data warehouse is to move operational data into one location such that the business can interrogate that data without disrupting the day to day business
Data Warehouse Evolution
As data warehouses grew to accumulate more information, expectations evolved beyond just consolidating information sources for examination. The evolution of Business Intelligence tools such as Tableau, Microsoft Power B.I, Qlik and others enabled business operations to build reports and dashboards much faster and easier than ever before. This caused a major shift from older OLAP reporting tools onto the “Do it yourself” business intelligence tools.
Likewise businesses began using and experimenting with NoSQL, or non-relational database systems, proved useful for processing Big Data, more quickly sorting through large amounts of changing unstructured, structured, and semi-structured data.
This spawned the concept of a data lake. It emerged as a data warehouse alternative prioritizing scalability and extensibility over structure and control. In other words data lakes eschew the organization and federation of traditional data warehouses in favor of “just getting the data saved.”
Data lakes are reliant on strong metadata management to be truly effective, which can be challenging for organizations.
The Modern Data Warehouse
Ultimately, a data warehouse is meant to help transform complex data into useful insights that drive meaningful business intelligence.
The cloud-based data warehouse systems that exist today tout the features that many organizations need:
- Massive, scalable, affordable storage.
- No hardware to buy or software to manage.
- Aggregates structured data, semi-structured data, streaming data, etc.
- Supports both business intelligence and data science.
- Additional analytics from other analytics services.
- High performance business intelligence on larger, more complex data sets.
Getting Started With a Data Warehouse
The process of transitioning your organization to a data warehouse can be overwhelming, especially when you consider possible API changes, reliable data access, and multiple source applications.
The help of an ETL tool can make all the difference in the accuracy of your data integration. Learn how to get started with efficiently transitioning your data into a data warehouse on our next post “How ETL and connectors are Used To Move Data Into Data Warehouses and Data Lakes”.