Thing which seemed very Thingish inside you is quite different when it gets out into the open and has other people looking at it

Thursday, August 21, 2008

Introduction to Data Warehousing

Data warehousing is a very useful process to support business and organizational decision making activities to store organizational historical data which can be later presented using a portal.

Data warehousing is mainly done in three basic steps

Data warehouse mainly depends on the Extraction Transformation Load process

Extraction – Extract the data from the source systems and converts them into a format for transformation process

Resources used for extraction task.

DW can use source system to get data to the system. Such as employee allocation information, employee human resource information, and financial information ect. These resources can come in many forms such as excel/csv/rdbms ect.

These data can be extracted to the warehouses using DTS packages or SSIS packages.

Transformation – Apply set of rules and conditions to the source data and transform filtered data to the data warehouse.

Set of rules which can be applied in the transformation tasks.

    • Aggregation

    • Filtering

    • Joining

    • Sorting

    • Pivoting -Turning multiple columns into multiple rows or vice versa

Loading – Load the filtered data in to the tables arrange them accordingly to the data warehousing schema with the usage of facts and dimensions. Facts store measures calculated, usually business process is represented by a fact table. Dimension tables contain component attributes, which represents descriptive information about a fact.

Data warehouse schema is put to data cubes and they are access by the internal portal

Useful Technologies for Data Warehousing

Data can be extracted through DTS Packages and SSIS packages.

Schedule tasked can be assigned to extract data daily using scheduled job using SQL server 2000/2005.

Data transformation can be done using views and stored procedures designed accordingly and executed those using DTS packages as well as SSIS packages

Loading can be done using data cubes architecture . Table data is loaded to the cubes using SQL 2005 Business Intelligent Studio which are arrange accordingly to the relevant DW schema. These cubes can be access in three ways

  1. Access through the reports which uses SSRS 2005 MDX coding and VB.Net custom codes.

  2. Access data through the portal which uses share point technology and .Net framework.

  3. Generating pivots using Microsoft Excel

Friday, August 1, 2008

Adding A New Cube To The Warehouse

This post explains the tasks which were done/followed when adding a new data cube to an already existing data warehouse. This more like a guide line to follow when you are creating a new cube to the system.

To add a new cube to the system table structure should be created. Following steps are taken to add brand new set of tables from the sources to data warehouse

1 Extraction process

  • Create relevant tables in extraction

  • Create views for all those created tables

  • Create the DTS Package to transform data from source to extraction

2 Transformation process

  • Create the relevant fact table to store the measures

  • Create the needed dimensions to store new attributes

  • Create views for newly created tables

  • Create SPs to apply set of rules to filter information and create calculations for the fact tables

  • Create DTS packages to call the SPs.

All stored procedures and queries can be written using SQL using Microsoft SQL query analyzer.

3 Cube Structure Creations (Loading)

Create cube for the relevant fact table including the needed dimensions using Microsoft SQL business intelligent studio(2005)

After cube structure is created following task should be followed

  • Adjust the dimension property values in the cube structure

  • Under the calculation tab create an MDX query in order to create the calculation.

  • Under newly created dimensions create attribute hierarchy and adjust the properties accordingly.

  • Process the cube.

4 Unit Testing

Last but not least Unit testing. Unit testing is a very crucial task in data warehousing. Since we calculate sensitive information. You can do the unit testing process using SQL coding to check the transformation process is properly done

  • Source to extraction testing

    • Test all source records are extracted to Ext Database

  • Extraction to fact table testing

    • Test fact table calculations are properly calculated.

  • Fact tables to cubes

    • Create a pivot to the newly created cube compare data with the fact table, according to the cube calculation.