Skip to Main Content
Return to Navigation

ETL Hashed Files

This table provides answers to questions about ETL hashed files.

Question

Answer

How are hash files used and for what purpose?

Hash Files are used to enhance the performance of the ETL job. Hash Files are typically used for lookups in an ETL job.

In EPM, there are jobs to initialize Hash Files. These jobs create the hash files before the jobs requiring them for lookup are executed. These Hash Files are also updated once the target table is loaded in the ETL job. This method will enable multiple jobs to utilize the same hash file as long as the structures required are the same.

Another method is to load the hash file within the same job using them as a lookup. This method requires the hash files to be reloaded every time the job executes.

See ETL Hashed Files, Understanding Data Validation and Error Handling in the ETL Process.

What should I keep in mind when managing my hash files?

The default setting for Hashed Files are project specific and cannot be shared across projects. The validity of Hashed Files is dependent on the base table it is generated from. The base table should only be updated by the ETL jobs provided in EPM. If not, the hashed file and the table will be out of sync and may result in faulty data when used in an ETL job.

There are several Hashed File utilities provided in EPM. These are located in the Utilities\Hash_Utils category.

Can I customize the storage location for hash files?

It is possible to customize the storage location for hash files by specifying the directory path.

You can set the storage path of the hash files. The path location has to be set in the environmental parameter #$HASHED_FILE_DIRECTORY# and this parameter is used across all the hash files.

How to recover data from corrupted hash files?

Generally, a corrupted hash file must be reloaded from the base table. EPM provides utilities to back up and recover DateTime and SurrogateKey hashed files.