Tuesday, October 21, 2008

3. Extracting and loading data - ODBC and ORACLE stages

Designing jobs - ODBC and ORACLE stages


ODBC stages are used to allow Datastage to connect to any data source that represents the Open Database Connectivity API (ODBC) standard.
ODBC stages are mainly used to extract or load the data. However, ODBC stage may also be very helpful when aggregating data and as a lookup stage (in that case it can play role of aggregator stage or a hash file and can be used instead).
Each ODBC stage can have any number of inputs or outputs.
The input links specify the data which is written to the database (they act as INSERT, UPDATE or DELETE statements in SQL). Input link data can be defined in various ways: using an SQL statement constructed by DataStage, a user-defined SQL query or a stored procedure.
Output links specify the data that are extracted (correspond to the SQL SELECT statement). The data on an output link is passed through ODBC connector and processed by an underlying database.

If a processing target is an Oracle database, it may be worth considering use of ORACLE (ORAOCI9) stage. It has a significantly better performance than ODBC stage and allows setting up more configuration options and parameters native to the Oracle database.
There’s a very useful option to issue an SQL before or after main dataflow operations (Oracle stage properties -> Input -> SQL). For example, when loading a big chunk of data into an oracle table, it may increase performance to drop indexes in a ‘before SQL’ tab and create indexes and analyze table in a ‘after SQL’ tab ('ANALYZE TABLE xxx COMPUTE STATISTICS' SQL statement).

Update actions in Oracle stage
The destination table can be updated using various Update actions in Oracle stage. Be aware of the fact that it's crucial to select the key columns properly as it will determine which column will appear in the WHERE part of the SQL statement. Update actions available from the drop-down list:

* Clear the table then insert rows - deletes the contents of the table (DELETE statement) and adds new rows (INSERT).
* Truncate the table then insert rows - deletes the contents of the table (TRUNCATE statement) and adds new rows (INSERT).
* Insert rows without clearing - only adds new rows (INSERT statement).
* Delete existing rows only - deletes matched rows (issues only the DELETE statement).
* Replace existing rows completely - deletes the existing rows (DELETE statement), then adds new rows (INSERT).
* Update existing rows only - updates existing rows (UPDATE statement).
* Update existing rows or insert new rows - updates existing data rows (UPDATE) or adds new rows (INSERT). An UPDATE is issued first and if succeeds the INSERT is ommited.
* Insert new rows or update existing rows - adds new rows (INSERT) or updates existing rows (UPDATE). An INSERT is issued first and if succeeds the UPDATE is ommited.
* User-defined SQL - the data is written using a user-defined SQL statement.
* User-defined SQL file - the data is written using a user-defined SQL statement from a file.

Examples of the use of ODBC and ORACLE stages in Datastage
SQL generated by ODBC stage



ODBC columns view



Oracle destination stage update action

1 comment:

Unknown said...


Datastage Tutorial is the best place to learn datastage here.