[Bytes Link logo]

The Data Warehouse Toolkit

by Ron Rose, NOCCC , bf799@lafn.org - July 08, 2002 at 16:39:48:


The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses by Ralph Kimball

A data warehouse is a copy of transaction data specifically structured for query and analysis. It should reflect the business plan for answering the necessary questions.Generally a structured query language (SQL) is used in the data queries. Even though SQL can do rather complicated queries, that is not easy or the most stable way to get the answers.The structure described in this book is a dimensional model. It is also call the star join schema. The central tables are fact tables consisting of numerical data that can be added. Dimension tables that relate to the fact tables contain dimensions only, no data.

The relation between the data and dimensions is established by the use of index keys. Some examples found in a dimension table would be the dimensional index keys: timeKey, dayOfWeek, month, year, holidayFlag; productKey, description, brand, category; storeKey, storeName, address, floorPlanType. The sales fact table would then have: timeKey, productKey, storeKey, dollarsSold, unitsSold, dollarsCost. Note that the fact table contains only keys and additive quantities.

The "grain" in the example is the day. A much finer grain of time would be the hour. One wants to construct with the largest grain and still be able to find the needed answers.

A standard SQL query would be similar to the following excluding the [comments]: SELECT p.brand, SUM(f.dollars), SUM(f.units) [select-list] FROM salesfact f, product p, time t [f,p,t are used to represent full table names (aliases)] WHERE f.productKey = p.productKey [join constraint] AND f.timeKey = t.timeKey [join constraint] AND t.month = ‘MAY 2000' [dimension value] GROUP BY p.brand [group by clause] ORDER BY p.brand [order by clause].

Throughout the text, key words are highlighted, and they are also found and defined in appendix C.

A number of data warehouses of different businesses are described each with their own structure and variation. Some of them are the grocery store chain, the distribution warehouse, and the shipments systems. Their dimensions are part of the value chain. The demand side and the supply side are explained and organized.

Throughout the book, he gives design principles for a dimensional data warehouse which are collected by chapter and listed in appendix A. At the conclusion of the design description of each type of business, such as insurance or voyage, he shows how to estimate the disk space required to store the data base.

In chapter 12, he gives the details in building a data warehouse. Starting with the nine decision points, on to interviewing the end users and database administrators, assembling the team, filling in the details of the tables, and how to choose hardware and software. He has a chapter using aggregates, which are the precalculated and prestored summaries that are in the data warehouse and are used to improve query performance. Included with the book is a CD-ROM containing "Star Tracker" a query tool made to perform the needed functions to analyze data in the star schema data bases.

Star Tracker explicitly supports Red Brick, Oracle, Sybase, and Informix using Microsoft's Open Database Connectivity (ODBC) server software. In the PC, MS Access is the example used in the book. The installation and how to use is given in appendix D. It tells how to assemble complex reports, and do things such as, drill-down and drill-across plus many more. If you have a business that involves thousands of items/combinations and need to track which gives greater and lesser value, then this book describes how to make data bases that can give the answers. The book was written for owners, managers, and implementers of a data warehouse. Even though it may appear to be highly technical, it is a fairly easy read.

If you are into the web aspects of the above, check out: wiley.com, search: kimball for a recent book involving the web.

The book was copyrighted in 1996 by John Wiley & Sons, Inc. and published by the same. ISBN 0-471-15337-0. Pages 388. CD-ROM included. The cost is $55.



Return to Listing
Home | About NOCCC | Special Interest Groups | Calendar | Membership Information
Meeting Location | Links | Orange Bytes Newsmagazines | Classified Ads | Search the Web

[------STRIPE-----]


Site Disclaimer Suggestions? E-Mail to webmaster@noccc.org
Content suggestions? editor@noccc.org
Last update: 7/08/2002

Copyright © 1995-2002 by North Orange County Computer Club. All rights reserved. Articles by NOCCC authors may be reprinted by other user groups without permission provided they are unaltered and the publication acknowledges the author thereof and NOCCC. Articles contained herein by authors from other organizations retain their original copyright.
Site assistance by CitiVU.