DATA WAREHOUSE

DATA WAREHOUSE: THE BUILDING BLOCKS

In operational systems, we store data by individual applications. In the data sets for an order

processing application, we keep the data for that particular application. These data sets

provide the data for all the functions for entering orders, checking stock, verifying customer’s

credit, and assigning the order for shipment. But these data sets contain only the

data that is needed for those functions relating to this particular application. We will have

some data sets containing data about individual orders, customers, stock status, and detailed

transactions, but all of these are structured around the processing of orders.

Similarly, for a banking institution, data sets for a consumer loans application contain

data for that particular application. Data sets for other distinct applications of checking

accounts and savings accounts relate to those specific applications. Again, in an insurance

company, different data sets support individual applications such as automobile insurance,

life insurance, and workers’ compensation insurance.

In every industry, data sets are organized around individual applications to support

those particular operational systems. These individual data sets have to provide data for

the specific applications to perform the specific functions efficiently. Therefore, the data

sets for each application need to be organized around that specific application.

In striking contrast, in the data warehouse, data is stored by subjects, not by applications.

If data is stored by business subjects, what are business subjects? Business subjects

differ from enterprise to enterprise. These are the subjects critical for the enterprise. For a

manufacturing company, sales, shipments, and inventory are critical business subjects.

For a retail store, sales at the check-out counter is a critical subject.

Integrated Data

For proper decision making, you need to pull together all the relevant data from the various

applications. The data in the data warehouse comes from several operational systems.

Source data are in different databases, files, and data segments. These are disparate applications,

so the operational platforms and operating systems could be different. The file

layouts, character code representations, and field naming conventions all could be different.

In addition to data from internal operational systems, for many enterprises, data from

outside sources is likely to be very important. Companies such as Metro Mail, A. C.

Nielsen, and IRI specialize in providing vital data on a regular basis. Your data warehouse

may need data from such sources. This is one more variation in the mix of source data for

a data warehouse.

Time-Variant Data

For an operational system, the stored data contains the current values. In an accounts receivable

system, the balance is the current outstanding balance in the customer’s account.

In an order entry system, the status of an order is the current status of the order. In a consumer

loans application, the balance amount owed by the customer is the current amount.

Of course, we store some past transactions in operational systems, but, essentially, operational

systems reflect current information because these systems support day-to-day current

operations.

On the other hand, the data in the data warehouse is meant for analysis and decision

making. If a user is looking at the buying pattern of a specific customer, the user needs

data not only about the current purchase, but on the past purchases as well. When a user

wants to find out the reason for the drop in sales in the North East division, the user needs

all the sales data for that division over a period extending back in time. When an analyst in

a grocery chain wants to promote two or more products together, that analyst wants sales

of the selected products over a number of past quarters.

A data warehouse, because of the very nature of its purpose, has to contain historical data, not just current values. Data is stored as snapshots over past and current periods.Every data structure in the data warehouse contains the time element. You will find historical snapshots of the operational data in the data warehouse. This aspect of the data warehouse is quite significant for both the design and the implementation phases.

For example, in a data warehouse containing units of sale, the quantity stored in each

file record or table row relates to a specific time element. Depending on the level of the

details in the data warehouse, the sales quantity in a record may relate to a specific date,

week, month, or quarter.

The time-variant nature of the data in a data warehouse

Allows for analysis of the past
Relates information to the present
Enables forecasts for the future

Non-Volatile Data

Data extracted from the various operational systems and pertinent data obtained from

outside sources are transformed, integrated, and stored in the data warehouse. The data

in the data warehouse is not intended to run the day-to-day business. When you want to

process the next order received from a customer, you do not look into the data warehouse

to find the current stock status. The operational order entry application is meant

for that purpose. In the data warehouse, you keep the extracted stock status data as snapshots

over time. You do not update the data warehouse every time you process a single

order.

Data from the operational systems are moved into the data warehouse at specific intervals.

Depending on the requirements of the business, these data movements take place

twice a day, once a day, once a week, or once in two weeks. In fact, in a typical data warehouse,

data movements to different data sets may take place at different frequencies. The

changes to the attributes of the products may be moved once a week. Any revisions to geographical

setup may be moved once a month. The units of sales may be moved once a

day. You plan and schedule the data movements or data loads based on the requirements of

your users.

Data Granularity

In an operational system, data is usually kept at the lowest level of detail. In a point-of sale

system for a grocery store, the units of sale are captured and stored at the level of

units of a product per transaction at the check-out counter. In an order entry system, the

quantity ordered is captured and stored at the level of units of a product per order received

from the customer. Whenever you need summary data, you add up the individual transactions. If you are looking for units of a product ordered this month, you read all the orders

entered for the entire month for that product and add up. You do not usually keep summary

data in an operational system.

When a user queries the data warehouse for analysis, he or she usually starts by looking

at summary data. The user may start with total sale units of a product in an entire region.

Then the user may want to look at the breakdown by states in the region. The next

step may be the examination of sale units by the next level of individual stores. Frequently,

the analysis begins at a high level and moves down to lower levels of detail.

In a data warehouse, therefore, you find it efficient to keep data summarized at different

levels. Depending on the query, you can then go to the particular level of detail and

satisfy the query. Data granularity in a data warehouse refers to the level of detail. The

lower the level of detail, the finer the data granularity. Of course, if you want to keep data

in the lowest level of detail, you have to store a lot of data in the data warehouse. You will

have to decide on the granularity levels based on the data types and the expected system

performance for queries.

NEXT BLOG>>

DATA WAREHOUSE

Pages

Tuesday, 26 May 2015

DATA WAREHOUSE: THE BUILDING BLOCKS

Integrated Data

Time-Variant Data

Non-Volatile Data

Data Granularity

1 comment: