Reading Data from Files, 6 of 9

Reading and Maintaining Dimension Values

The records in a data file often contain dimension values, which are used to identify the cell in which the data values should be stored. When all of the dimension values in the file already exist in your analytic workspace, you can use the default attribute MATCH in the dimension field description. MATCH accepts only dimension values that already are in the analytic workspace.

When FILEREAD finds an unrecognized value, the command signals an error that warns you about the bad data. Your data-reading program can handle the error by skipping the data and continuing processing, or by halting the processing and letting you check the validity of the data file.

Example 11-3 Reading Records Only for Existing Dimension Values

The following example shows a data file that contains 6-character values for the dimension productid, names for each product, and the number of units sold.

     1234AA00CHOCOLATE CHIP COOKIES        123
     1099BB00OATMEAL COOKIES               145
     2344CC00SUGAR COOKIES                 223
     3222DD00BROWNIES                      432
     5553EE00GINGER SNAP COOKIES           233

The following workspace objects are used by the example program.

DEFINE productid DIMENSION ID
DEFINE productname VARIABLE TEXT <productid>
DEFINE units.sold VARIABLE INTEGER <month productid>

The dr.prog program reads the file. The values of productid with the associated product name are already part of the analytic workspace, so the program uses the productid values only to set status and assign the units data to the right cells of the units.sold variable.

The MATCH attribute is left out of the field description because it is the default. When the program finds a value for productid that is not in the analytic workspace, it branches to the trap label. If the user interrupts the program (that is, the error name is attn) or the data file cannot be opened, then the program ends. Otherwise, the program resets the error trap and branches back to FILEREAD to continue with the next record.

The example program, named dr.prog, has the following definition.

DEFINE dr.prog PROGRAM
LD Reads a file with existing dimension values
PROGRAM
VARIABLE funit INTEGER
TRAP ON error
PUSHLEVEL 'save'
PUSH month productid
LIMIT month TO FIRST 1
funit = FILEOPEN('olapfiles/dr.dat' READ)
next:
FILEREAD funit -
   COLUMN 1 WIDTH 6 productid -
   COLUMN 39 WIDTH 3 units.sold
FILECLOSE funit
POPLEVEL 'save'
RETURN
error:
"Skip current record and continue processing
IF funit NE na and ERRORNAME NE 'ATTN'   
   THEN DO  
     TRAP ON error
     GOTO next
     DOEND
"Close the file
IF funit NE na
   THEN FILECLOSE funit
POPLEVEL 'save'
END

Adding New Dimension Values from a Data File

When your data file contains a mixture of existing and new dimension values, you can add the new values and all the associated data to the analytic workspace by using the APPEND attribute in the field description.

Example 11-4 Adding New Dimension Values from a Data File

The first FILEREAD command in the dr.prog2 program uses APPEND to add any new productid values to the analytic workspace. The second FILEREAD command includes a field to read the product name so the new data will be complete.

The dimension maintenance performed by APPEND might be done in the same FILEREAD command that reads the data, but that would cause inefficient handling of the data. The data is handled more efficiently when the dimension maintenance and data reading are performed in two separate passes over the file.

The error processing in this version is shorter because there is no need to skip nonexistent product values and branch back. If there is an error, then the program closes the file, restores any pushed values, and terminates.

The program, named dr.prog2, has the following definition.

DEFINE dr.prog2 PROGRAM
LD Reads a file with new dimension values
PROGRAM
VARIABLE funit INTEGER
TRAP ON error
PUSHLEVEL 'save'
PUSH month productid
LIMIT month TO FIRST 1
funit = FILEOPEN('olapfiles/dr.dat' READ)
FILEREAD funit COLUMN 1 APPEND WIDTH 6 productid
FILECLOSE funit
funit = FILEOPEN('olapfiles/dr.dat' READ)
FILEREAD funit -
   COLUMN 1 WIDTH 6 productid -
   COLUMN 9 WIDTH 30 productname -
   COLUMN 39 WIDTH 3 units.sold
FILECLOSE funit
POPLEVEL 'save'
RETURN
error:
IF funit NE na
   THEN FILECLOSE funit
POPLEVEL 'save'
END

Reading Dimension Values by Position

If the target dimension has a data type of TEXT, NTEXT, or ID and the input field in the file contains dimension position numbers (rather than dimension values), then you must specify a conversion type of INTEGER in the field description. The conversion type specifies how input data should be converted to values of the target dimension.

Suppose the target dimension is month, then you can use the following command to read input values that represent positions within the default status of month.

FILEREAD unit COLUMN 1 WIDTH 8 INTEGER month

When the input field contains position numbers, you cannot use the APPEND keyword to add new values to a target dimension.

The Use of Composites

Composites are automatically maintained. The way in which you define and use composites can dramatically improve or hinder performance. The more you know about analytic workspace design, especially in regard to the applications that will be used with an analytic workspace, the more effective your use of composites will be.

Reading and Maintaining Conjoint Dimensions

When you have conjoint dimensions in your analytic workspace, you can set the status of those dimensions while reading a file with the FILEREAD command. Typically, the records in the data file will have a separate field for each base dimension of your conjoint dimension. For example, a file might have a market name in the first field, a product name in the second, and then one or more fields containing sales data.

Example 11-5 Reading and Maintaining Conjoint Dimensions

To read the sales data into a variable dimensioned by a conjoint dimension, for example markprod, you can use a FILEREAD command as follows.

FILEREAD funit markprod -
   = <W 8 market W 8 product> W 10 sales

This command will read a value of the market dimension from the first 8-character field of the record and a value of the product dimension from the next 8-character field.

The command will then use the results to set the status of markprod, which is a conjoint dimension defined as follows.

DEFINE markprod DIMENSION <market product>

The command then reads the last field and assigns the value to the variable sales, which is dimensioned by markprod.

By including the APPEND keyword in the field description, you can add new values to market, product, and markprod, when the FILEREAD command encounters values in the file that do not match existing dimension values.

FILEREAD funit APPEND markprod -
   = <W 8 APPEND market W 8 APPEND product> W 10 sales

Translating Coded Dimension Values

The fields containing dimension information in your data file might have values that are not identical to the dimension values in your analytic workspace. The file values might be abbreviated or otherwise encoded. The way you translate a coded dimension value varies depending on whether the code is merely an abbreviation (for example, "P" for product) or if the code is more complicated.

When the file contains an abbreviated code, you can sometimes complete the value by using the RSET or LSET attribute to add text to the right or left of the value in the file.

For example, products in the file might be identified by all-numeric product numbers, while in your analytic workspace, the values of the product dimension might be these same product numbers preceded by the letter P. In this case, you can use the LSET attribute to add the letter P to the values in the file.

FILEREAD funit COLUMN 1 WIDTH 6 LSET 'P' product

The letter P is added when the value is read from the file; it is not added when the modified value is matched with or assigned to the product dimension.

To correctly read values that have less straightforward codes, you can set up another dimension containing the coded values found in the data file, along with a relation to the real dimension. FILEREAD can then use the relation to determine the actual dimension value. Or you can use any OLAP DML function to alter or manipulate the coded value to make it match a value in your analytic workspace.

When reading coded data that must be manipulated in some way before being stored in the target, use an assignment statement (shown below) in the field description.

target = expression

The expression argument specifies the processing or calculation to be performed. If you want to include the value just read from the file as part of the expression, then use the VALUE keyword.

Both of the following field descriptions function identically.

COLUMN n WIDTH n target

target = COLUMN n WIDTH n VALUE

Example 11-6 Translating Codes into Dimension Values

This example illustrates the use of an expression for translating codes into dimension values.

The following example shows the data file, which has 3-character codes for months, and 2-character codes for districts and products.

     SEP BO CH 113945 115
     OCT BO CH 118934 115
     SEP BO CO  92013 119
     OCT BO CO  95820 119
     SEP BO WI  83201 110
     OCT BO WI  82986 110
     SEP DA CH 111792 115
     OCT DA CH 136031 114
     SEP DA CO  91641 121
     OCT DA CO  96347 120
     SEP DA WI  89734 109
     OCT DA WI  88264 109

The following OLAP DML objects are used by the example program.

DEFINE distcode DIMENSION ID
DEFINE district.dcode RELATION district <distcode>
DEFINE prodcode DIMENSION ID
DEFINE Product.pcode RELATION product <prodcode>

The example program, named dr.prog3, has the following definition.

DEFINE dr.prog3 PROGRAM
LD Translates coded values into valid dimension values
PROGRAM
VARIABLE funit INT
funit = FILEOPEN('olapfiles/dr3.dat' READ)
FILEREAD funit -
   COLUMN 1 WIDTH 3 APPEND RSET '96' month
FILECLOSE funit
funit = FILEOPEN('olapfiles/dr3.dat' READ)
FILEREAD funit -
   COLUMN 1 WIDTH 3 RSET '96' month - 
   COLUMN 5 WIDTH 2 district = district.dcode -
     (distcode VALUE) -
   COLUMN 8 WIDTH 2 product = product.pcode -
     (prodcode VALUE) -
   COLUMN 11 WIDTH 6 STRIP units -
   COLUMN 18 WIDTH 3 SCALE 2 price
FILECLOSE funit
END

The program translates the 2-character codes for districts and products into values of a district dimension and a product dimension. The program also appends a 2-digit year to the months.

In the first FILEREAD command, the APPEND keyword is used so that new months are added to the MONTH dimension.

FILEREAD fileunit COLUMN 1 WIDTH 3 APPEND RSET '96' month

For the district and product fields, the program reads the value from the data file and finds the corresponding dimension value using the relations district.dcode and product.pcode.

COLUMN 5 WIDTH 2 district = district.dcode distcode VALUE)
COLUMN 8 WIDTH 2 product = product.pcode (prodcode VALUE)

The program uses a QDR with the keyword VALUE representing the code read from the data file. For the districts, the distcode VALUE QDR modifies the relation district.dcode, which holds district names. It specifies the district that corresponds to the value of distcode just read from the data file. The QDR for product works the same way.

The program assumes the product and district dimension values are already in the analytic workspace, along with the distcode and prodcode dimensions and the relations connecting them to district and product. Once the coded values have been processed, the resulting values of district and product are used to limit the dimension status so that the data is put in the right cells of the units and price variables.

Finally, you can see in the data file that the price data, which starts in column 18, does not have a decimal point. The SCALE attribute on the last line of the FILEREAD command puts two decimal places in each price value.