DataStage Log Analyzer - FindJobsByKeyWord Utility

Sunday, May 15, 2016

DataStage Log Analyzer - FindJobsByKeyWord Utility

This is a utility to scan the DataStage job run logs to find the usage of a given table OR file. We can find where a specific table OR file is used just by running a select query after this utility finishes its initial log load.

Lets see an example, I need to find the list of jobs which hits the table TEMP_FINDJOBSBYKEYWORD and I have to do is run below query.

SELECT *
  FROM DATASTAGE.TEMP_FINDJOBSBYKEYWORD A
 WHERE UPPER(A.DS_LOG_DETAILS) LIKE '%TEMP_FINDJOBSBYKEYWORD%'

This will give the results with the list of jobs which uses the table and the DS_LOG_DETAIL will have the detailed log with the usage information.




Utility has three steps(Sub Jobs with scripts) which needs to be executed in sequence and below is details of each of steps.

Step 01 - Execute FindJobsByKeyWord010ext

Here we will execute a DataStage job FindJobsByKeyWord010ext, This DS job job will execute the script FindJobsByKeyWord010ext to create log event listing.
  • Script uses IBM DataStage command "dsjob -lprojects", "dsjob -ljobs" and "dsjob -logsum" to list all the log summary for all jobs in your server.
  • After the successful run of the script, DataStage job filters and keeps on the most recent log entries in the output file. One of the main reason for this filter is to reduce run time of this process and to reduce loading the redundant log if the job had multiple runs.
  • Script needs input directory in the same level as script is. This is one of the prerequisite to run this script, The temp and logs directory will be auto created in same directory level as script is.
  • Script needs 3 parameters as input - ENVIRONMENT NAME, JOB TYPE and LOG TYPE.
    • "ENVIRONMENT NAME" will be the environment or region(could be DEV, PROD as per your shop) and this will be the dynamic part of project name. 
    • "JOB TYPE", As of now "ALL" all is the only valid value.
    • "LOG TYPE", This parameter will tell job scan only a specific type of logs. The valid values are "ALL", "ANY", "INFO", "WARNING", "FATAL", "REJECT", "STARTED", "RESET", "BATCH". ALL and ANY are one and the same, Default is ALL.
  • Output and Logs files of this script will be captured in temp and logs dir respectively, Below is the snippet of a sample output file from this step.

Step 02 - Execute FindJobsByKeyWord020ext

Here we will execute a DataStage job FindJobsByKeyWord020ext, This DS job job will execute the script FindJobsByKeyWord020ext to get the detailed log.
  • Script uses IBM DataStage command "dsjob -logdetail" to pull the detailed log for all the log
  • Script needs 3 parameters as input - "ENVIRONMENT NAME", "EVENT LOG LISTING" and "KEY WORD LISTING"
    • "ENVIRONMENT NAME" will be the environment or region(could be DEV, PROD as per your shop) and this will be the dynamic part of project name. 
    • "EVENT LOG LISTING", This will be the output(currentlogeventlisting.dat) of FindJobsByKeyWord010ext.
    • "KEY WORD LISTING" is the name of file(say KeyWordList.txt) in input directory with the list of key words which needs to be searched initially.
Step 03 - Execute FindJobsByKeyWord030load

Here we will execute a DataStage job FindJobsByKeyWord030load. This is used to load the detailed log into a ORACLE so that we can query the table to analyze and find information for the logs.


Please find the dsx export and script downloadable here. Give it a try and let me know your comments and suggestions for future enhancements.


No comments

Post a Comment

Error 404

The page you were looking for, could not be found. You may have typed the address incorrectly or you may have used an outdated link.

Go to Homepage