Datastage Parallel Routine using C Program

Wednesday, September 3, 2014

Datastage Parallel Routine using C Program

Routines are custom developed functions in DataStage. DataStage has option to create custom parallel routine using C/C++ code. Even though DataStage has most of the essential functions available, routines are very helpful to create custom function for a very specific logic(eg. Wild card comparision of two strings) which is not available in DataStage inbuilt functions.

I had to create one to do wild card compare between two strings in DataStage.The C program Source code was pulled from a open source code blog.

In this example we will walk through the process to create a DataStage parallel routine using a custom C function.

Notes
*** For this tutorial I will use a simple test function (not the wild card compare but details about wild card compare can be found at the bottom of this page)

*** For this tutorial you don't have to be a C/C++ expert but if you need to create a complex C function\DataStage routine its better to take help from a C/C++ expert.

Steps to create C routine

Step 1:- Create C function and make sure it works out side of DataStage, Visual Studio OR some online compilers are best to do that, I used online compiler "http://www.compileonline.com/compile_c_online.php",
Here is a simple sample C function which returns n+1 as output when n is passed as input, n is a integer. Please do note C functions will not have main, So if you need to test this you need to create a C program with main function.

C Function
int my_funct(int x)

{
return x+1;
}

C Program to test the Function
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int my_funct(int);
int result;

int main()

{

result= my_funct(5);
printf("%s\n", "The Output is : ");
printf("%i\n" , result);
return 0;

}

int my_funct(int x)

{
return x+1;

}

Test Output from Online compiler





Step 2:- Copy the working code to a dir in Linux machine (Our Datastage is on a Linux machine so this is Linix specific), I have saved the above C function as "TestRoutine.C".

Step 3:- Compile the C function in Linux machine and create the object file. Make sure to compile using the same compile command which you have in Datastage administrator for your project.

You can find the compile option from below DataStage Administration location.

InfoSphere DataStage Administration --> Properties --> General --> Environment --> Compiler --> APT_COMPILER/APT_COMPILEOPT



Based on compiler option on our IT shop, my compile command was "g++ -c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small TestRoutine.C". Just run this through putty from the dir which has the C function and it will create the object file with .o extension.



Step 4:- Create the Routine in Datastage
  • File --> New --> Routine --> Parallel Routine

  • Update the general and Arguments tab as given below.
  Routine name: Name for DataStage Routine (Say CTestRoutine)
  External subroutine name: C Function name (In this case its my_funct)
  Type: External Function
  Object type: Object
  Return type: Return type should match return type from C function(In this case its int)
  Library path: Location of compiled C object (Normally same as dir from step 2)

  Argument name: Same as arguments in C function (In this case its X)
  Native type: Same as data type in C function (In this case its int)




Step 5:- Create a test Datastage job to validate the function. Have a dummy output column and select the routine which you just created and give it a test run. It should run fine if the above steps are followed correctly.





C Function for wild card Compare
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

int WCCompare(char *wildstring, char *regstring) {

  char *cp = NULL, *mp = NULL;

  if (wildstring == 0 || regstring == 0)
     return 0;

  while ((*regstring) && (*wildstring != '*')) {
    if ((*wildstring != *regstring) && (*wildstring != '?')) {
      return 0;
    }
    wildstring++;
    regstring++;
  }

  while (*regstring) {
    if (*wildstring == '*') {
      if (!*++wildstring) {
        return 1;
      }
      mp = wildstring;
      cp = regstring+1;
    } else if ((*wildstring == *regstring) || (*wildstring == '?')) {
      wildstring++;
      regstring++;
    } else {
      wildstring = mp;
      regstring = cp++;
    }
  }

  while (*wildstring == '*') {
    wildstring++;
  }
  return !*wildstring;

}

Use this C function to create the Datastage parallel routine to do wild card compare in DataStage. Below is the screenshot of general and arguments tab of my wild card compare routine.


1 comment

Error 404

The page you were looking for, could not be found. You may have typed the address incorrectly or you may have used an outdated link.

Go to Homepage