DataStage Log Analyzer - FindJobsByKeyWord Utility

  • Home
  • /
  • DataStage Log Analyzer - FindJobsByKeyWord Utility
DataStage Log Analyzer - FindJobsByKeyWord Utility

DataStage Log Analyzer - FindJobsByKeyWord Utility

Data stage 15 May 2016
Overview

This is a utility to scan the DataStage job run logs to find the usage of a given table OR file. We can find where a specific table OR file is used just by running a select query after this utility finishes its initial log load.

Lets see an example, I need to find the list of jobs which hits the table TEMP_FINDJOBSBYKEYWORD and I have to do is run below query.

SELECT \*
FROM DATASTAGE.TEMP\_FINDJOBSBYKEYWORD A
WHERE UPPER(A.DS\_LOG\_DETAILS) LIKE '%TEMP\_FINDJOBSBYKEYWORD%'

This will give the results with the list of jobs which uses the table and the DS_LOG_DETAIL will have the detailed log with the usage information.

post thumb

Utility has three steps(Sub Jobs with scripts) which needs to be executed in sequence and below is details of each of steps.

Step 01 - Execute FindJobsByKeyWord010ext

Here we will execute a DataStage job FindJobsByKeyWord010ext, This DS job job will execute the script FindJobsByKeyWord010ext to create log event listing.

  • Script uses IBM DataStage command dsjob -lprojects, dsjob -ljobs and dsjob -logsum to list all the log summary for all jobs in your server.
  • After the successful run of the script, DataStage job filters and keeps on the most recent log entries in the output file. One of the main reason for this filter is to reduce run time of this process and to reduce loading the redundant log if the job had multiple runs.
  • Script needs input directory in the same level as script is. This is one of the prerequisite to run this script, The temp and logs directory will be auto created in same directory level as script is.
post thumb
  • Script needs 3 parameters as input - ENVIRONMENT NAME, JOB TYPE and LOG TYPE.
    • “ENVIRONMENT NAME” will be the environment or region(could be DEV, PROD as per your shop) and this will be the dynamic part of project name.
    • “JOB TYPE”, As of now “ALL” all is the only valid value.
    • “LOG TYPE”, This parameter will tell job scan only a specific type of logs. The valid values are “ALL”, “ANY”, “INFO”, “WARNING”, “FATAL”, “REJECT”, “STARTED”, “RESET”, “BATCH”. ALL and ANY are one and the same, Default is ALL.
post thumb
  • Output and Logs files of this script will be captured in temp and logs dir respectively, Below is the snippet of a sample output file from this step.
Step 02 - Execute FindJobsByKeyWord020ext

Here we will execute a DataStage job FindJobsByKeyWord020ext, This DS job job will execute the script FindJobsByKeyWord020ext to get the detailed log.

  • Script uses IBM DataStage command dsjob -logdetail to pull the detailed log for all the log
  • Script needs 3 parameters as input - “ENVIRONMENT NAME”, “EVENT LOG LISTING” and “KEY WORD LISTING”
    • “ENVIRONMENT NAME” will be the environment or region(could be DEV, PROD as per your shop) and this will be the dynamic part of project name.
    • “EVENT LOG LISTING”, This will be the output(currentlogeventlisting.dat) of FindJobsByKeyWord010ext
    • “KEY WORD LISTING” is the name of file(say KeyWordList.txt) in input directory with the list of key words which needs to be searched initially.
Step 03 - Execute FindJobsByKeyWord030load

Here we will execute a DataStage job FindJobsByKeyWord030load. This is used to load the detailed log into a ORACLE so that we can query the table to analyze and find information for the logs.

Please find the dsx export and script downloadable here.

Give it a try and let me know your comments and suggestions for future enhancements.

Top

Share: