Sunday 8 January 2012

What is a Data Warehouse ?

Data Warehousing is a  relational database storage of data for analysis purpose. Its usually separate from transaction processing system. There is usually an OLAP system that sits on top of this. Data warehouse is separated from the online transactional database because online transaction processing database should not be impacted in terms of performance by running analytical queries.Moreover, historical data are periodically removed from live transactional database to maintain its performance. These historical data are then migrated into a data warehouse.  Datawarehouse is used to study the past behaviour of the data and develop strategies based on the findings. For example: Sales of product is found to be higher during christmas period by analysing last  3 years of data. Businesses can now make a decision to increase the production  accordingly to meet the demand made by  increase in sales.
In many companies, transactional data arrives to them from other providers on a flat/delimited file. There is usually an ETL process to load these flat files into the Datawarehouse. Datastage, Informatica are the modern popular ETL tools. There are also less popular tools such as Sagent in the market that is being used by a few companies. SQL Server DTS and JCL are also still being used by a lot of companies for ETL processing.
There are many open source ETL tools out there. I have myself used a couple of them. They do the job very well. They are as user friendly as expensive tools such as Datastage and Informatica. So if you are thinking of installing new ETL tool in your company and do not want to go with buying expensive tools, I recommend using Open Source tools.
Cognos and Hyperion are among most common OLAP tool used in the market. They can process huge amount of data. All they basically do is help you to slice and dice data to extract some information. You can think of them as Pivot Table function in Microsoft Excel.  Microsoft Excel 2007 has a limitation on the number of records it can handle. It can handle only 1,000,000 rows. Cognos can handle much more records than this. But I think newer version of Excel can handle more records. I have not used them myself so I cannot confirm that.  There are also other cheaper and free open source OLAP tools out there. They do the job as well as the expensive tools. I am a big fan of open source tools.

Please let me know if you have a question and I will be happy to answer it for you.

2 comments: