Reading Data from Files, 6 of 9

Reading and Maintaining Dimension Values

Reading records only for existing dimension values

The records in a data file often contain dimension values, which are used to identify the cell in which the data values should be stored. When all of the dimension values in the file already exist in your analytic workspace, you can use the default attribute MATCH in the dimension field description. MATCH accepts only dimension values that already are in the analytic workspace.

When FILEREAD finds an unrecognized value, the command signals an error that warns you about the bad data. Your data-reading program can handle the error by skipping the data and continuing processing, or by halting the processing and letting you check the validity of the data file.

Example: Reading records only for existing dimension values

File layout

The following example shows a data file that contains 6-character values for the dimension PRODUCTID, names for each product, and the number of units sold.

     1234AA00CHOCOLATE CHIP COOKIES        123
     1099BB00OATMEAL COOKIES               145
     2344CC00SUGAR COOKIES                 223
     3222DD00BROWNIES                      432
     5553EE00GINGER SNAP COOKIES           233

OLAP DML objects used by the program

The following OLAP DML objects are used by the example program.

DEFINE PRODUCTID DIMENSION ID
DEFINE PRODUCTNAME VARIABLE TEXT <PRODUCTID>
DEFINE UNITS.SOLD VARIABLE INTEGER <MONTH PRODUCTID>

Program description

The DR.PROG program reads the file. The values of PRODUCTID with the associated product name are already part of the analytic workspace, so the program uses the PRODUCTID values only to set status and assign the units data to the right cells of the UNITS.SOLD variable.

The MATCH attribute is left out of the field description because it is the default. When the program finds a value for PRODUCTID that is not in the analytic workspace, it branches to the trap label. If the user interrupts the program (that is, the error name is ATTN) or the data file cannot be opened, then the program ends. Otherwise, the program resets the error trap and branches back to FILEREAD to continue with the next record.

Program code

The example program, named DR.PROG, has the following definition.

DEFINE DR.PROG PROGRAM
LD Reads a file with existing dimension values
PROGRAM
variable funit integer
trap on error
pushlevel 'save'
push month productid
limit month to first 1
funit = fileopen('dr.dat' read)
next:
fileread funit -
   column 1 width 6 productid -
   column 39 width 3 units.sold
fileclose funit
poplevel 'save'
return
error:
"Skip current record and continue processing
if funit ne na and errorname ne 'ATTN'   
   then do  
     trap on error
     goto next
     doend
"Close the file
if funit ne na                        
   then fileclose funit
poplevel 'save'
END

Adding new dimension values from a data file

When your data file contains a mixture of existing and new dimension values, you can add the new values and all the associated data to the analytic workspace by using the APPEND attribute in the field description.

Example: Adding new dimension values from a data file

Program description

The first FILEREAD command in the DR.PROG2 program uses APPEND to add any new PRODUCTID values to the analytic workspace. The second FILEREAD command includes a field to read the product name so the new data will be complete.

The dimension maintenance performed by APPEND might be done in the same FILEREAD command that reads the data, but that would cause inefficient handling of the data. The data is handled more efficiently when the dimension maintenance and data reading are performed in two separate passes over the file.

The error processing in this version is shorter because there is no need to skip nonexistent product values and branch back. If there is an error, then the program closes the file, restores any pushed values, and terminates.

Program code

The program, named DR.PROG2, has the following definition.

DEFINE DR.PROG2 PROGRAM
LD Reads a file with new dimension values
PROGRAM
variable funit integer
trap on error
pushlevel 'save'
push month productid
limit month to first 1
funit = fileopen('dr.dat' read)
fileread funit column 1 append width 6 productid
fileclose funit
funit = fileopen('dr.dat' read)
fileread funit -
   column 1 width 6 productid -
   column 9 width 30 productname -
   column 39 width 3 units.sold
fileclose funit
poplevel 'save'
return
error:
if funit ne na
   then fileclose funit
poplevel 'save'
END

Reading dimension values by position

If the target dimension has a data type of TEXT or ID or a time data type (DAY, WEEK, MONTH, QUARTER, or YEAR) and the input field in the file contains dimension position numbers (rather than dimension values), then you must specify a conversion type of INTEGER in the field description. The conversion type specifies how input data should be converted to values of the target dimension.

Example: Reading dimension values by position

Suppose the target dimension is MONTH, then you can use the following command to read input values that represent positions within the default status of MONTH.

fileread unit column 1 width 8 integer month

When the input field contains position numbers, you cannot use the APPEND keyword to add new values to a target dimension.

Converting time dimension values

When the records in a file contain values that represent time periods, you may need to specify a conversion type in the field description. The input values can be in any of the following formats:

The VNF of the target time dimension, or the default VNF for the target dimension when it does not have a VNF of its own. For a VNF format, the conversion type is VNF, which is the default for time dimensions.
Any of the input styles that are valid for dates. For date-style values, you must specify DATE as the conversion type.
An integer that represents the number of days since December 31, 1899, or a negative integer for earlier dates (1 = January 1, 1900). For values in an integer-style date format, you must specify RAW DATE as the conversion type.

The FILEREAD command converts position values to time dimension values when you specify INTEGER as the conversion type.

Adding values to a time dimension

For input values that are in any format except position values, you can use the APPEND attribute to add values to a time dimension. Any gaps are filled in automatically between the values that already exist in your analytic workspace and the values that are read from the file. For example, when the MONTH dimension in your analytic workspace contains values for January 1995 through December 1996, and the file contains a value for March 1997, the months January 1997 through March 1997 are added automatically to the MONTH dimension.

Regardless of how time periods are represented in the file, they are used and stored internally as integers, and they are shown in output according to the VNF of the target time dimension.

The use of composites

Composites are automatically maintained. The way in which you define and use composites can dramatically improve or hinder performance. The more you know about analytic workspace design, especially in regard to the applications that will be used with an analytic workspace, the more effective your use of composites will be.

Reading and maintaining conjoint dimensions

When you have conjoint dimensions in your analytic workspace, you can set the status of those dimensions while reading a file with the FILEREAD command. Typically, the records in the data file will have a separate field for each base dimension of your conjoint dimension. For example, a file might have a market name in the first field, a product name in the second, and then one or more fields containing sales data.

Example: Reading and maintaining conjoint dimensions

To read the sales data into a variable dimensioned by a conjoint dimension, for example MARKPROD, you can use a FILEREAD command as follows.

fileread funit markprod -
   = <w 8 market w 8 product> w 10 sales

This command will read a value of the MARKET dimension from the first 8-character field of the record and a value of the PRODUCT dimension from the next 8-character field.

The command will then use the results to set the status of MARKPROD, which is a conjoint dimension defined as follows.

DEFINE MARKPROD DIMENSION <MARKET PRODUCT>

The command then reads the last field and assigns the value to the variable SALES, which is dimensioned by MARKPROD.

By including the APPEND keyword in the field description, you can add new values to MARKET, PRODUCT, and MARKPROD, when the FILEREAD command encounters values in the file that do not match existing dimension values.

fileread funit append markprod -
   = <w 8 append market w 8 append product> w 10 sales

Translating coded dimension values

The fields containing dimension information in your data file might have values that are not identical to the dimension values in your analytic workspace. The file values might be abbreviated or otherwise encoded. For time dimensions, the file values might not be in a format that can be interpreted as valid time periods for the target dimension. The way you translate a coded dimension value varies depending on whether the code is merely an abbreviation (for example, "P" for PRODUCT) or if the code is more complicated.

Translating dimension values with abbreviated codes

When the file contains an abbreviated code, you can sometimes complete the value by using the RSET or LSET attribute to add text to the right or left of the value in the file.

For example, products in the file might be identified by all-numeric product numbers, while in your analytic workspace, the values of the PRODUCT dimension might be these same product numbers preceded by the letter P. In this case, you can use the LSET attribute to add the letter P to the values in the file.

fileread funit column 1 width 6 lset 'P' product

The letter P is added when the value is read from the file; it is not added when the modified value is matched with or assigned to the PRODUCT dimension.

Translating dimension values with complicated codes

To correctly read values that have less straightforward codes, you can set up another dimension containing the coded values found in the data file, along with a relation to the real dimension. FILEREAD can then use the relation to determine the actual dimension value. Or you can use any OLAP DML function to alter or manipulate the coded value to make it match a value in your analytic workspace.

Using an assignment statement in the field description

When reading coded data that must be manipulated in some way before being stored in the target, use an assignment statement (shown below) in the field description.

target = expression

The expression argument specifies the processing or calculation to be performed. If you want to include the value just read from the file as part of the expression, then use the VALUE keyword.

Both of the following field descriptions function identically.

COLUMN n WIDTH n target
     target = COLUMN n WIDTH n VALUE

Example: Translating codes into dimension values

This example illustrates the use of an expression for translating codes into dimension values.

File layout

The following example shows the data file, which has 3-character codes for months, and 2-character codes for districts and products.

     SEP BO CH 113945 115
     OCT BO CH 118934 115
     SEP BO CO  92013 119
     OCT BO CO  95820 119
     SEP BO WI  83201 110
     OCT BO WI  82986 110
     SEP DA CH 111792 115
     OCT DA CH 136031 114
     SEP DA CO  91641 121
     OCT DA CO  96347 120
     SEP DA WI  89734 109
     OCT DA WI  88264 109

OLAP DML objects used by the program

The following OLAP DML objects are used by the example program.

DEFINE DISTCODE DIMENSION ID
DEFINE DISTRICT.DCODE RELATION DISTRICT <DISTCODE>
DEFINE PRODCODE DIMENSION ID
DEFINE PRODUCT.PCODE RELATION PRODUCT <PRODCODE>

Program code

The example program, named DR.PROG3, has the following definition.

DEFINE DR.PROG3 PROGRAM
LD Translates coded values into valid dimension values
PROGRAM
variable funit int
funit = fileopen('dr3.dat' read)
fileread funit -
   column 1 width 3 append rset '96' month
fileclose funit
funit = fileopen('dr3.dat' read)
fileread funit -
   column 1 width 3 rset '96' month - 
   column 5 width 2 district = district.dcode -
     (distcode value) -
   column 8 width 2 product = product.pcode -
     (prodcode value) -
   column 11 width 6 strip units -
   column 18 width 3 scale 2 price
fileclose funit
END

Program description

The program translates the 2-character codes for districts and products into values of a DISTRICT dimension and a PRODUCT dimension. The program also illustrates how to translate 3-character month codes into values that can be recognized as valid time periods in a MONTH dimension.

When reading from the file, the program changes the 3-character month read from the file into a value that can be interpreted as a value of the MONTH dimension. Because a MONTH dimension value is identified by a month and a year, the program converts the format used in the file by attaching the year (96) at the end of each 3-character month value.

In the first FILEREAD command, the APPEND keyword is used so that new months are added to the MONTH dimension.

fileread fileunit column 1 width 3 append rset '96' month

For the district and product fields, the program reads the value from the data file and finds the corresponding dimension value using the relations DISTRICT.DCODE and PRODUCT.PCODE.

column 5 width 2 district = district.dcode distcode value)
column 8 width 2 product = product.pcode (prodcode value)

The program uses a QDR with the keyword VALUE representing the code read from the data file. For the districts, the distcode value QDR modifies the relation DISTRICT.DCODE, which holds district names. It specifies the district that corresponds to the value of DISTCODE just read from the data file. The QDR for PRODUCT works the same way.

The program assumes the PRODUCT and DISTRICT dimension values are already in the analytic workspace, along with the DISTCODE and PRODCODE dimensions and the relations connecting them to DISTRICT and PRODUCT. Once the coded values have been processed, the resulting values of DISTRICT and PRODUCT are used to limit the dimension status so that the data is put in the right cells of the UNITS and PRICE variables.

Finally, you can see in the data file that the price data, which starts in column 18, does not have a decimal point. The SCALE attribute on the last line of the FILEREAD command puts two decimal places in each price value.

Related information

For more information, see the following table.

IF you want documentation about . . . THEN see . . .

the VNF command,

the topic for the command in the OLAP DML Reference

the DATEORDER option,

the topic for the option in the OLAP DML Reference

IF you want documentation about . . .	THEN see . . .
the VNF command,	the topic for the command in the OLAP DML Reference
the DATEORDER option,	the topic for the option in the OLAP DML Reference