- DataStage Job to Identify Job ID
- Script to clear the RT_LOG and RT_STATUS
- Downloadable Modules
DataStage job logs and status file can grow huge at times due to system issues or poor design which cause performance issues with jobs. It will be cumbersome activity to clear these logs manually from DataStage director\Administrator when we have few many jobs with huge logs in same folder.
- Example of system issue: When DataStage jobs are defined as ISD(Information Services Director) services and it start looping and keep on creating the logs for each instance.
- Example of poor design: Logging few thousands of results to job logs using peek stage.
We will go over a process to execute CLEAR.FILE RT_LOG and CLEAR.FILERT_STATUS automatically.
- Please find the source code for .sh and .dsx attached at the end of this post.
- This post is for IBM DataStage Developers with DataStage and basic shell scripting knowledge.
- This script and job has been tested to work with DataStage version 8.7. There can be some changes to script and DataStage job if IBM changes the log file format.
DataStage Job to Identify Job ID
Every DataStage job in a project will have a unique Job ID and this Job ID is required to clear the RT_LOG and RT_STATUS. You could view the Job ID manually from the job log (Figure: 3.a). To pull the Job ID automatically, created a DataStage server job (ClearDS_RTLogRTStatus) using the predefined routine UtilityHashLookup (“DS_JOBS”,input.JobName,5). This job will accept a sequential file with job name as input (Figure: 3.b) and will create an output file (Figure: 3.c) with job name and job ID. The input file with list of jobs whose logs needs to be cleared is a prerequisite for this job.
Figure: 3.a – Job ID in Job log
Figure: 3.b – Input
Figure: 3.c – Output
Script to clear the RT_LOG and RT_STATUS
Created script(ClearDS_RTLogRTStatus.sh) to call “CLEAR.FILE” for RT_LOG and RT_STATUS. This loops with the output of DatsStage job to will clear the RT_LOG and RT_STATUS of all required jobs
Requirement of this script are
There should be a input folder in same directory level as script is and it should hold the output of DataStage job ClearDS_RTLogRTStatus(Figure: 4.a)
Script should be called with two parameters DATASTAGE_PROJECT_NAME and INPUT_FILE_NAME Example : ClearDS_RTLogRTStatus.sh ISD_DEV ISD_Jobs_Listing_OUT.txt
DATASTAGE_PROJECT_NAME - Project in which your jobs reside INPUT_FILE_NAME - Output of DataStage job ClearDS_RTLogRTStatus
The script execution logs are captured in logs folder (logs folder and temp folder gets auto generated during the first run)
Figure : 4.a – Script Location and input directory