The main subject of this book is data warehousing. A data warehouse is a special kind of database that, in recent years, has attracted a great deal of interest in the information technology industry. Quite a few books have been published about data warehousing generally, but very few have focused on the design of data warehouses. There are some notable exceptions, and these will be cited in this book, which concentrates, principally, on the design aspects of data warehousing.
Data warehousing is all about making information available. No one doubts the value of information, and everyone agrees that most organizations have a potential “Aladdin's Cave” of information that is locked away within their operational systems. A data warehouse can be the key that opens the door to this information.
There is strong evidence to suggest that our early foray in the field of data warehousing, what I refer to as first-generation data warehouses, has not been entirely successful. As is often the case with new ideas, especially in the information technology (IT) industry, the IT practitioners were quick to spot the potential, and they tried hard to secure the competitive advantage for their organizations that the data warehouse promised. In doing so I believe two points were overlooked. The first point is that, at first sight, a data warehouse can appear to be quite a simple application. In reality it is anything but simple. Quite apart from the basic issue of sheer scale (data warehouse databases are amongst the largest on earth) and the consequent performance difficulties presented by this, the data structures are inherently more complex than the early pioneers of these systems realized. As a result, there was a tendency to over-simplify the design so that, although the database was simple to understand and use, many important questions could not be asked.
The second point is that data warehouses are unlike other operational systems in that it is not possible to define the requirements precisely. This is at odds with conventional systems where it is the specification of requirements that drives the whole development lifecycle. Our approach to systems design is still, largely, founded on a thorough understanding of requirements–the “hard” systems approach. In data warehousing we often don't know what the problems are that we are trying to solve. Part of the role of the data warehouse should be to help organizations to understand what their problems are.
Ultimately it comes down to design and, again, there are two main points to consider. The first concerns the data warehouse itself. Just how do we ensure that the data structures will enable us to ask the difficult questions? Secondly, the hard systems approach has been shown to be too restrictive and a softer technique is required. So not only do we need to improve our design of data warehouses, we also need to improve the way in which we approach the design.
It is in response to these two needs that this book has been written.