The Data Warehouse
Data Warehouses vary enormously in size, number of data sources, load frequency etc
Data warehouses exist for a number of reasons. Common factors that would make you decide
that a data warehouse is appropriate for your organisation are:
- Data exists in a variety of formats and locations and you need ‘one version of the truth’
- Your data changes on a frequent basis, making it hard to compare moments in time
- The kind of queries and reports you run take time to complete, during which the whole system slows down
- Quite often, all of the above
Data Warehouse design requirements
In situations like these it makes sense to separate, collate and reorganise your data in such a way
that you can retrieve the information you need.
The job of the data warehouse designer is two-fold. The first is to locate all disparate sources
of information and ensure that the data in them is extracted with the right frequency. Then they will start the
job of collating the data, stripping out bad or corrupt data for manual inspection and subsequently
loading it into the data warehouse. The tools for this process are called ETL tools (Extract, Transform, Load).
Objectivity has written its own in-house ETL tool that is freely available for our customers.
The second task is to organise the data warehouse in such a way that the types of queries you are
most interested in execute in the fastest possible way. This means reorganising the data, not in a
traditional relational way, but in a multi-dimensional way. The result is a so-called ‘OLAP Cube’.