Using DataStage Administrator

DataStage Administrator enables you to specify general server defaults, administer projects, and set project properties.

The DataStage Administrator window is comprised of the General and Projects tabs.

This chapter disscuses how to:

Note. This chapter does not discuss all the features available for DataStage Administrator. For a complete view of DataStage Administrator functionality, please see the delivered IBM WebSphere documentation.

Click to jump to parent topicSetting DataStage Server Properties

Access the DataStage Administrator - General tab to set DataStage server properties.

You can change the following server-wide properties:

NLS

Enable or disable National Language Support (NLS). DataStage supports the language you specify during the install without any further configuration. However, if your requirements change, you can reconfigure NLS to support different languages using DataStage Administrator. Note: You can only change the NLS character set in the DataStage Administrator. You enable and disable NLS support during install.

Inactivity Timeout

Enter the number of seconds of inactivity allowed before the connection between the DataStage client and server times out.

Note. Server-wide property changes made by an administrator affect all projects on the server.

Click to jump to parent topicSetting Project Properties

Access the DataStage Administrator - Projects tab.

Using the DataStage Administrator - Projects tab, administrators can navigate to projects and:

Click to jump to top of pageClick to jump to parent topicProject Properties - General Tab

Access the Project Properties - General tab (click the Properties button on the DataStage Administrator - Projects tab):

The Project Properties - General tab includes the following options:

Enable job administration in Director

Select to use the Cleanup Resources and Clear Status File options from the Job menu of DataStage Director.

Enable Runtime Column Propagation in Parallel Jobs

If you have parallel jobs, select to enable stages to handle undefined columns during the job run. This setting propagates these columns throughout the rest of the job.

Enable remote execution of Parallel Jobs

Select to specify that parallel jobs in a project be deployed on USS systems.

Auto-purge of job log

Select to automatically delete the logs generated when you run a job, according to the criteria you select in the Auto purge action group box.

Up to previous (job runs) and Over (days old)

Select one of these options to delete jobs based on the number of job logs that you want to retain or based on the number of days old a job is. Enter the appropriate value in the adjacent field.

Protect Project

If you have Production Manager permissions, click to convert the project to a protected project to prevent its modification.

Generate Operational Metadata

Select this check box if you want parallel and server jobs in your project to generate operational metadata.

You can override this setting in individual jobs if desired.

Setting Environment Variables

Click the Environment button on the Project Properties - General tab to set project-wide environmental variables.

DataStage Administrator enables you to create user-defined environment variables and assign default values for existing variables used throughout a project.

Changing an environment variable affects all of the jobs in the project. To change an environment variable for each job, leave the Value column empty and specify the variable value in a job parameter instead. You can also override the value when the job runs.

To set a default value for an environment variable, select the variable type from the Environment Variable Tree in the left pane, and then enter a value in the right pane.

To create a new variable, select User Defined in the Environment Variable Tree, and then enter a new variable name, prompt, and value in the right pane.

Click Set to Default to set the selected variable to its installed default value.

Click All to Default to set all currently visible variables to their installed default values.

Click Variable Help to get information about the selected variable.

Setting Environment Variables - Example

To configure the delivered environment parameters:

  1. Open DataStage Administrator and select your project.

  2. Note the project path name of the selected project and close DataStage Administrator.

  3. Use the project path to navigate to the DSPARAM file.

    The DSPARAM file should be located in that folder.

  4. Open the DSPARAM file in Notepad.

  5. Search for [EnvVarDefns].

  6. Open the ENV_PARAM.txt file, and then select and copy the contents of the ENV_PARAM.txt file.

    You can copy specific entries based on the product.

  7. Paste the copied contents to the DSPARAM file.

    The contents should be pasted below the line that contains the [EnvVarDefns] text.

  8. Save the DSPARAM file.

  9. Open DataStage Administrator, navigate to the Environmental Variables window, and select the User-Defined category.

    You should add values to the environment parameters to successfully run an ETL job.

Click to jump to top of pageClick to jump to parent topicProject Properties - Permissions Tab

Access the Project Properties - Permissions tab:

Before any user can access WebSphere DataStage they must be defined in the Suite Administrator tool as a DataStage Administrator or a DataStage User. As a DataStage administrator you can define whether a DataStage user can access a project, and if so, what category of access they have.

Use the Permissions tab to add groups and assign users to groups. These groups are in turn allocated the role of DataStage Administrator or DataStage User. Any users belong to an administrator group will be able to administer WebSphere DataStage. You can also grant user group access to a project and assign a role to the group.

When setting up users and groups, these still have to have the correct permissions at the operating system level to access the folders in which the projects reside.

The Permissions page contains the following controls:

Click to jump to top of pageClick to jump to parent topicProject Properties - Tracing Tab

Access the Project Properties - Tracing tab:

Use the Project Properties - Tracing tab to enable or disable tracing, and view or delete trace files.

Enabling tracing activity on the server helps diagnose project problems. By default, server tracing is disabled.

When you enable tracing, server activity attached to a specific project is written to trace files. Users can use the information saved in trace files to identify the cause of a project problem.

Click to jump to top of pageClick to jump to parent topicProject Properties - Schedule Tab

Access the Project Properties - Schedule tab:

Use the Project Properties - Schedule tab to modify system authority user name for scheduling jobs. DataStage uses the Microsoft Windows Schedule service to schedule jobs. By default, jobs run under the Microsoft Windows system authority user name. However, this user name may not have sufficient rights, so you may need to change the assigned user name.

To verify that the user name exists, click the Test button. The system schedules and runs a job using the name that you entered.

Note. The Schedule tab is only available on Microsoft Windows.

Click to jump to top of pageClick to jump to parent topicProject Properties - Tunables Tab

Access the Project Properties - Tunables tab:

Use the Project Properties - Tunables tab to set up caching details for hashed file stages and row buffering to improve the performance of server jobs.

When data is referenced repeatedly, for instance in a lookup, storing the data in memory rather than on disk can improve performance. To support this performance improvement, when a hash file stage writes records to a hash file, the data can be cached rather than written to the hash file immediately. Similarly, when a hash file stage is reading a hash file, you can preload the file to memory, which makes subsequent access to the data faster. The hash file stage area of the Tunables tab enables you to adjust the sizes of both the read and write cache sizes.

Another way to improve performance is with the use of row buffering. Row buffering enables connected active stages to pass data by using buffers (memory) rather than passing data row by row.

Click to jump to top of pageClick to jump to parent topicProject Properties - Sequence Tab

Access the Project Properties - Sequence tab:

Use the Project Properties - Sequence tab to add checkpoints to a job sequence and enable automatic handling of failures during sequence runs.

You can insert checkpoints in job sequences to enable the sequence to be restarted if one of the jobs in the sequence fails. Checkpoints enable you to see where the problem is, fix it, and then rerun the sequence from the point at which it left off.