The Data Warehouse

 

Data Warehouses vary enormously in size, number of data sources, load frequency etc

 

Data warehouses exist for a number of reasons. Common factors that would make you decide that a data warehouse is appropriate for your organisation are:

  • Data exists in a variety of formats and locations and you need ‘one version of the truth’
  • Your data changes on a frequent basis, making it hard to compare moments in time
  • The kind of queries and reports you run take time to complete, during which the whole system slows down
  • Quite often, all of the above

Data Warehouse design requirements

 

In situations like these it makes sense to separate, collate and reorganise your data in such a way that you can retrieve the information you need.

The job of the data warehouse designer is two-fold. The first is to locate all disparate sources of information and ensure that the data in them is extracted with the right frequency. Then they will start the job of collating the data, stripping out bad or corrupt data for manual inspection and subsequently loading it into the data warehouse. The tools for this process are called ETL tools (Extract, Transform, Load). Objectivity has written its own in-house ETL tool that is freely available for our customers.

The second task is to organise the data warehouse in such a way that the types of queries you are most interested in execute in the fastest possible way. This means reorganising the data, not in a traditional relational way, but in a multi-dimensional way. The result is a so-called ‘OLAP Cube’.