Object Oriented Data Warehouse Design: Building a Star Schema
by William Giovinazzo
"In order for an organization not to just survive but to thrive, a strategic plan must guide its actions. The business strategist defines the strategy in the context of the environment in which the organization competes." Where is a good source for the organized information the strategist needs? Enter the data warehouse.
"The data warehouse is the heart of BI [business intelligence]. It is more than just an archive of operational data. W. H. Inmon defines the data warehouse as ‘a subject oriented, integrated, nonvolatile, time variant collection of data in support of management's decisions.' The data warehouse derives data from the operational environment. As the data is brought into the warehouse, it is transformed into a single cohesive image. From this image the business strategist through a decision support tool is able to analyze behaviors, giving insight into current trends. This enables the strategist to project future conditions."
This book is about data warehouse design. The design is object oriented using the development spiral. This spiral starts with definition, followed by analysis, design, development, implementation, maintenance, evaluation, and back through definition adding and refining on each go around.
The data has data about itself, its context, relations to other data, in other words ‘metadata'. Two types of metadata are static (structural) and dynamic (operational). The static consists of name, format (data type, picture), relation, domain, and business rule (rules on the data element). The dynamic consists of data quality, statistical, usage and status."The question facing the data warehouse architect is how to discover and document this metadata in a way that is easily understood. The answer is ‘objects'."
"What is an object? It is simple enough to describe an object as a thing." Objects can be tangible or intangible like concepts. "... the analyst can begin to view the organization in terms of objects." Objects can be categorized / grouped into classes. A classification of classes would be superclasses, such as, cars and people. In the next section the author diagramed the superclass as a rectangle with rounded corners (people) branching to two rectangles (male, female) classes.
The other superclass diagramed is cars branching to sports car, practical car, and economy car classes. The people and cars are connected. "People drive cars", and "cars transport people". An instance of the people class is the individual person, such as, "Lisa Miller, Jed Perkins". "The object model makes clear the attributes and behaviors of objects as well as the relationships between objects."
In chapter 3 is a case study that starts with the definition phase. The example business does brokerage, property management, appraisal, reality advice, and investments.
What is their mission statement?
Analysis phase consists of the interview process: the project sponsor, the management, the users, the information technology staff, and asking the right questions.
The analysis model shown consists of arrow lines showing the activity between the circles (the objects). In the design phase, superclass, class, subclass, and aggregation (summed objects) are identified. These are connected by relations of the type, exactly-one, one-to-many, zero-to-one, and zero-to-many. A layout is made using the above. The objects are the tables and the attributes are the fields or columns which have metadata consisting of field name, format (data type), null option (is null allowed), comments (descriptions) and system of record (where from).
Multidimensions: time is generally a given. In an auto sales example: time=365, products=200 and dealerships=100 cc 3 dimensions, a cube of 7,300,000 cells already. Time, products, dealerships and customers cc 2 cubes. Time, products, dealerships, customers and payment cc 4 cubes, the number of cells is getting out of control.
Fortunately in the star schema data base, many fewer cells (fields) require data. The non-normalized dimension tables, which have mostly descriptions and primary keys, would be ‘time' with 365 rows per year, ‘products with 200 rows, ‘dealerships with 100 rows and ‘customers being continually added. The rows in these dimension tables have few changes (updates). The fact table ‘payment contains only the above dimensional foreign (secondary) keys and payment columns. The payment column rows are addible which is why it is called a fact table. Dimension tables have nothing that is addible. Additional subjects covered include shadow dimensional tables (non-analytical), snowflake dimensional tables (normalized), degenerate dimensions and facts.
Implementation considerations of parallel processing, symmetric multiprocessing, bitmapped indexing, summation tables, and web-enabled data warehousing showing SQL (Structured Query Language) statements conversion to use summary tables instead of using raw data, are some of the topics covered. The appendixes cover a spatially enabled data warehouse, extraction, transformation and loading of data, metadata standards, and conclude with conventional wisdom, tips, hints, and general advice.
Publisher: Prentice HallI SBN 0-13-085081-0 QA76.9.D37 G56 2000 Price: $49
![]()
Site Disclaimer Suggestions? E-Mail to webmaster@noccc.org