|Oracle9i OLAP Services Developer's Guide to the OLAP DML
Release 1 (9.0.1)
Part Number A86720-01
Reading Data from Files, 6 of 9
The records in a data file often contain dimension values, which are used to identify the cell in which the data values should be stored. When all of the dimension values in the file already exist in your analytic workspace, you can use the default attribute MATCH in the dimension field description. MATCH accepts only dimension values that already are in the analytic workspace.
When FILEREAD finds an unrecognized value, the command signals an error that warns you about the bad data. Your data-reading program can handle the error by skipping the data and continuing processing, or by halting the processing and letting you check the validity of the data file.
The following example shows a data file that contains 6-character values for the dimension PRODUCTID, names for each product, and the number of units sold.
1234AA00CHOCOLATE CHIP COOKIES 123 1099BB00OATMEAL COOKIES 145 2344CC00SUGAR COOKIES 223 3222DD00BROWNIES 432 5553EE00GINGER SNAP COOKIES 233
The following OLAP DML objects are used by the example program.
DEFINE PRODUCTID DIMENSION ID DEFINE PRODUCTNAME VARIABLE TEXT <PRODUCTID> DEFINE UNITS.SOLD VARIABLE INTEGER <MONTH PRODUCTID>
The DR.PROG program reads the file. The values of PRODUCTID with the associated product name are already part of the analytic workspace, so the program uses the PRODUCTID values only to set status and assign the units data to the right cells of the UNITS.SOLD variable.
The MATCH attribute is left out of the field description because it is the default. When the program finds a value for PRODUCTID that is not in the analytic workspace, it branches to the trap label. If the user interrupts the program (that is, the error name is ATTN) or the data file cannot be opened, then the program ends. Otherwise, the program resets the error trap and branches back to FILEREAD to continue with the next record.
The example program, named DR.PROG, has the following definition.
DEFINE DR.PROG PROGRAM LD Reads a file with existing dimension values PROGRAM variable funit integer trap on error pushlevel 'save' push month productid limit month to first 1 funit = fileopen('dr.dat' read) next: fileread funit - column 1 width 6 productid - column 39 width 3 units.sold fileclose funit poplevel 'save' return error: "Skip current record and continue processing if funit ne na and errorname ne 'ATTN' then do trap on error goto next doend "Close the file if funit ne na then fileclose funit poplevel 'save' END
When your data file contains a mixture of existing and new dimension values, you can add the new values and all the associated data to the analytic workspace by using the APPEND attribute in the field description.
The first FILEREAD command in the DR.PROG2 program uses APPEND to add any new PRODUCTID values to the analytic workspace. The second FILEREAD command includes a field to read the product name so the new data will be complete.
The dimension maintenance performed by APPEND might be done in the same FILEREAD command that reads the data, but that would cause inefficient handling of the data. The data is handled more efficiently when the dimension maintenance and data reading are performed in two separate passes over the file.
The error processing in this version is shorter because there is no need to skip nonexistent product values and branch back. If there is an error, then the program closes the file, restores any pushed values, and terminates.
The program, named DR.PROG2, has the following definition.
DEFINE DR.PROG2 PROGRAM LD Reads a file with new dimension values PROGRAM variable funit integer trap on error pushlevel 'save' push month productid limit month to first 1 funit = fileopen('dr.dat' read) fileread funit column 1 append width 6 productid fileclose funit funit = fileopen('dr.dat' read) fileread funit - column 1 width 6 productid - column 9 width 30 productname - column 39 width 3 units.sold fileclose funit poplevel 'save' return error: if funit ne na then fileclose funit poplevel 'save' END
If the target dimension has a data type of TEXT or ID or a time data type (DAY, WEEK, MONTH, QUARTER, or YEAR) and the input field in the file contains dimension position numbers (rather than dimension values), then you must specify a conversion type of INTEGER in the field description. The conversion type specifies how input data should be converted to values of the target dimension.
Suppose the target dimension is MONTH, then you can use the following command to read input values that represent positions within the default status of MONTH.
When the input field contains position numbers, you cannot use the APPEND keyword to add new values to a target dimension.
When the records in a file contain values that represent time periods, you may need to specify a conversion type in the field description. The input values can be in any of the following formats:
The FILEREAD command converts position values to time dimension values when you specify INTEGER as the conversion type.
For input values that are in any format except position values, you can use the APPEND attribute to add values to a time dimension. Any gaps are filled in automatically between the values that already exist in your analytic workspace and the values that are read from the file. For example, when the MONTH dimension in your analytic workspace contains values for January 1995 through December 1996, and the file contains a value for March 1997, the months January 1997 through March 1997 are added automatically to the MONTH dimension.
Regardless of how time periods are represented in the file, they are used and stored internally as integers, and they are shown in output according to the VNF of the target time dimension.
Composites are automatically maintained. The way in which you define and use composites can dramatically improve or hinder performance. The more you know about analytic workspace design, especially in regard to the applications that will be used with an analytic workspace, the more effective your use of composites will be.
When you have conjoint dimensions in your analytic workspace, you can set the status of those dimensions while reading a file with the FILEREAD command. Typically, the records in the data file will have a separate field for each base dimension of your conjoint dimension. For example, a file might have a market name in the first field, a product name in the second, and then one or more fields containing sales data.
To read the sales data into a variable dimensioned by a conjoint dimension, for example MARKPROD, you can use a FILEREAD command as follows.
This command will read a value of the MARKET dimension from the first 8-character field of the record and a value of the PRODUCT dimension from the next 8-character field.
The command will then use the results to set the status of MARKPROD, which is a conjoint dimension defined as follows.
The command then reads the last field and assigns the value to the variable SALES, which is dimensioned by MARKPROD.
By including the APPEND keyword in the field description, you can add new values to MARKET, PRODUCT, and MARKPROD, when the FILEREAD command encounters values in the file that do not match existing dimension values.
The fields containing dimension information in your data file might have values that are not identical to the dimension values in your analytic workspace. The file values might be abbreviated or otherwise encoded. For time dimensions, the file values might not be in a format that can be interpreted as valid time periods for the target dimension. The way you translate a coded dimension value varies depending on whether the code is merely an abbreviation (for example, "P" for PRODUCT) or if the code is more complicated.
When the file contains an abbreviated code, you can sometimes complete the value by using the RSET or LSET attribute to add text to the right or left of the value in the file.
For example, products in the file might be identified by all-numeric product numbers, while in your analytic workspace, the values of the PRODUCT dimension might be these same product numbers preceded by the letter P. In this case, you can use the LSET attribute to add the letter P to the values in the file.
The letter P is added when the value is read from the file; it is not added when the modified value is matched with or assigned to the PRODUCT dimension.
To correctly read values that have less straightforward codes, you can set up another dimension containing the coded values found in the data file, along with a relation to the real dimension. FILEREAD can then use the relation to determine the actual dimension value. Or you can use any OLAP DML function to alter or manipulate the coded value to make it match a value in your analytic workspace.
When reading coded data that must be manipulated in some way before being stored in the target, use an assignment statement (shown below) in the field description.
The expression argument specifies the processing or calculation to be performed. If you want to include the value just read from the file as part of the expression, then use the VALUE keyword.
Both of the following field descriptions function identically.
This example illustrates the use of an expression for translating codes into dimension values.
The following example shows the data file, which has 3-character codes for months, and 2-character codes for districts and products.
SEP BO CH 113945 115 OCT BO CH 118934 115 SEP BO CO 92013 119 OCT BO CO 95820 119 SEP BO WI 83201 110 OCT BO WI 82986 110 SEP DA CH 111792 115 OCT DA CH 136031 114 SEP DA CO 91641 121 OCT DA CO 96347 120 SEP DA WI 89734 109 OCT DA WI 88264 109
The following OLAP DML objects are used by the example program.
DEFINE DISTCODE DIMENSION ID DEFINE DISTRICT.DCODE RELATION DISTRICT <DISTCODE> DEFINE PRODCODE DIMENSION ID DEFINE PRODUCT.PCODE RELATION PRODUCT <PRODCODE>
The example program, named DR.PROG3, has the following definition.
DEFINE DR.PROG3 PROGRAM LD Translates coded values into valid dimension values PROGRAM variable funit int funit = fileopen('dr3.dat' read) fileread funit - column 1 width 3 append rset '96' month fileclose funit funit = fileopen('dr3.dat' read) fileread funit - column 1 width 3 rset '96' month - column 5 width 2 district = district.dcode - (distcode value) - column 8 width 2 product = product.pcode - (prodcode value) - column 11 width 6 strip units - column 18 width 3 scale 2 price fileclose funit END
The program translates the 2-character codes for districts and products into values of a DISTRICT dimension and a PRODUCT dimension. The program also illustrates how to translate 3-character month codes into values that can be recognized as valid time periods in a MONTH dimension.
When reading from the file, the program changes the 3-character month read from the file into a value that can be interpreted as a value of the MONTH dimension. Because a MONTH dimension value is identified by a month and a year, the program converts the format used in the file by attaching the year (96) at the end of each 3-character month value.
In the first FILEREAD command, the APPEND keyword is used so that new months are added to the MONTH dimension.
For the district and product fields, the program reads the value from the data file and finds the corresponding dimension value using the relations DISTRICT.DCODE and PRODUCT.PCODE.
column 5 width 2 district = district.dcode distcode value) column 8 width 2 product = product.pcode (prodcode value)
The program uses a QDR with the keyword VALUE representing the code read from the data file. For the districts, the
distcode value QDR modifies the relation DISTRICT.DCODE, which holds district names. It specifies the district that corresponds to the value of DISTCODE just read from the data file. The QDR for PRODUCT works the same way.
The program assumes the PRODUCT and DISTRICT dimension values are already in the analytic workspace, along with the DISTCODE and PRODCODE dimensions and the relations connecting them to DISTRICT and PRODUCT. Once the coded values have been processed, the resulting values of DISTRICT and PRODUCT are used to limit the dimension status so that the data is put in the right cells of the UNITS and PRICE variables.
Finally, you can see in the data file that the price data, which starts in column 18, does not have a decimal point. The SCALE attribute on the last line of the FILEREAD command puts two decimal places in each price value.
For more information, see the following table.
|IF you want documentation about . . .||THEN see . . .|
the VNF command,
the topic for the command in the OLAP DML Reference
the DATEORDER option,
the topic for the option in the OLAP DML Reference