In most of the applications with a batch cycle, IT and Business stakeholders will be interested to know the status of …
ClearDS_RTLogRTStatus - Script & DataStage job to Clear DataStage RTLOG and RTSTATUS
DataStage job logs and status files can grow huge at times due to system issues or poor design which cause performance issues with jobs. It will be a cumbersome activity to clear these logs manually from the DataStage director\Administrator when we have few many jobs with huge logs in the same folder.
- Example of system issue: When DataStage jobs are defined as ISD(Information Services Director) services and it start looping and keep on creating the logs for each instance.
- Example of poor design: Logging few thousands of results to job logs using peek stage.
We will go over a process to execute CLEAR.FILE RT_LOG and CLEAR.FILERT_STATUS automatically.
- Please find the source code for .sh and .dsx attached at the end of this post.
- This post is for IBM DataStage Developers with DataStage and basic shell scripting knowledge.
- This script and job has been tested to work with DataStage version 8.7. There can be some changes to script and DataStage jobs if IBM changes the log file format.
DataStage Job to Identify Job ID
Every DataStage job in a project will have a unique Job ID and this Job ID is required to clear the RT_LOG and RT_STATUS. You could view the Job ID manually from the job log (Figure: 3.a). To pull the Job ID automatically, created a DataStage server job (ClearDS_RTLogRTStatus) using the predefined routine UtilityHashLookup (“DS_JOBS”,input.JobName,5). This job will accept a sequential file with job name as input (Figure: 3.b) and will create an output file (Figure: 3.c) with job name and job ID. The input file with a list of jobs whose logs need to be cleared is a prerequisite for this job.
Figure: 3.a – Job ID in Job log
Figure: 3.b – Input
Figure: 3.c – Output
Script to clear the RT_LOG and RT_STATUS
Created script(ClearDS_RTLogRTStatus.sh) to call “CLEAR.FILE” for RT_LOG and RT_STATUS. This loops with the output of DatsStage job to will clear the RT_LOG and RT_STATUS of all required jobs
Requirement of this script are
There should be a input folder in same directory level as script is and it should hold the output of DataStage job ClearDS_RTLogRTStatus(Figure: 4.a)
Script should be called with two parameters DATASTAGE_PROJECT_NAME and INPUT_FILE_NAME Example : ClearDS_RTLogRTStatus.sh ISD_DEV ISD_Jobs_Listing_OUT.txt
DATASTAGE_PROJECT_NAME - Project in which your jobs reside INPUT_FILE_NAME - Output of DataStage job ClearDS_RTLogRTStatus
The script execution logs are captured in logs folder (logs folder and temp folder gets auto generated during the first run)
Figure : 4.a – Script Location and input directory
See datastage-examples repo for the code mentioned in this article.
Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.
Federated authentication enables your users to connect to Snowflake using secure SSO (single sign-on). With SSO enabled, …