Tuesday, October 21, 2008

6. Performing lookups in Datastage

Designing jobs - looking up data using hash files


The data in Datastage can be looked up from a hashed file or from a database (ODBC/ORACLE) source. Lookups are always managed by the transformer stage.

A Hashed File is a reference table based on key fields which provides fast access for lookups. They are very useful as a temporary or non-volatile program storage area. An advantage of using hashed files is that they can be filled up with remote data locally for better performance.
To increase performance, hashed files can be preloaded into memory for fast reads and support write-caching for fast writes.

There are also situations where loading a hashed file and using it for lookups is much more time consuming than accessing directly a database table. It usually happens where there is a need to access more complex data than a simple key-value mapping, for example what the data comes from multiple tables, must be grouped or processed in a database specific way. In that case it's worth considering using ODBC or Oracle stage.

Please refer to the examples below to find out what is the use of lookups in Datastage

In the transformer depicted below there is a lookup into a country dictionary hash file. If a country is matched it is written to the right-hand side column, if not - a "not found" string is generated.

Design of a datastage transformer with lookup



In the job depicted below there is a sequential file lookup, linked together with a hash file which stores the temporary data.
Sequential file lookup

1 comment:

sridevi said...

Hi Subbu,

I am new to Data stage and trying to develop server job with reference link to oracle stage and primary input link is Xml input stage. Can you provide me more details on how to do look up operation using reference link to oracle stage instead of Hash file.


Thanks,
Sridevi.