Friday, 22 March 2013

PL/SQL with Multithreading



We had a piece of PL/SQL code that copies data from remote view to a local table. This code executes once a day and took about 4 hours. As if this is not bad enough, we received a request to run this piece of code 3-4 times daily which will total up to about 16 hours per day.

The original code to copy data from a remote view consists of 3 steps:
  1. Copy over employee numbers (one column only) from remote view to local temp table. This step takes few seconds only.
  2. Use employee numbers copied locally to do a lookup serially (lookup row by row) from remote view. This step takes about 4 hours to copy 20,000 rows. The reason it takes that long is one column from the view is an actual function. Most of the time is spent executing this function and for so many reasons (not all are technical) we can’t get to function to tune it.
  3. Commit one time at the end.
Here how the original code looks like:
– Step 1:
– truncate local temp table so we can copy employee numbers
– copy over employees numbers to local temp table from the remote view 
1execute immediate 'truncate table temp_emp_no';
2execute immediate 'truncate table employees';
3 
4insert into temp_emp_no select distinct EMPLOYEE_NUMBER from remote_employee_view@remote_link;
5commit;

– Step 2:
– Loop through copied over employee numbers and do a remote lookup 1 row at a time
01cursor c1 is
02   select emp_no from temp_emp_no;
03 
04for i in c1 loop
05   exit when c1%notfound;
06   insert into employees
07   select EMPLOYEE_NUMBER, first_name, last_name, RATE, PERIOD, total_rate
08     from remote_employee_view@remote_link;
09   where EMPLOYEE_NUMBER = i.emp_no;
10end loop;
11 
12commit;

At this point we were left with one option only which is run the PL/SQL code in parallel (multithreading), but as you might now, PL/SQL doesn’t support multithreading natively,

To simulate multithreading we need to accomplish 2 steps:
  1. Break the job in multiple pieces (threads).
  2. Schedule every thread to run concurrently.
Remember first step which is coping employee numbers only from remote view to local table. We hash-partitioned this local table into 4 partitions. Each of these partitions will translate into a thread as you will see.
1CREATE TABLE vb_emp_no
2   (emp_no  VARCHAR2(30))
3    PARTITION BY HASH (EMP_NO)
4     PARTITIONS 4
5/

This is the thread code (stored procedure) which takes in a partition name as a parameter and copies employee data from remote view for that partition only.
01CREATE procedure refresh_employee_data_part (p_name in varchar2 )
02  authid definer
03is
04  TYPE EmpCurTyp  IS REF CURSOR;
05  v_emp_cursor    EmpCurTyp;
06  sql_stmt varchar2(2048);
07  v_emp_no varchar2(30);
08 
09begin
10  sql_stmt := 'select emp_no from vb_emp_no partition(' || p_name || ')';
11  open v_emp_cursor for sql_stmt ;
12 
13loop
14  fetch v_emp_cursor into v_emp_no;
15  EXIT WHEN v_emp_cursor%NOTFOUND;
16  insert into employees
17          (EMPLOYEE_NUMBER,
18           VACATION_BALANCE,
19           RATE, PERIOD, TOTAL_RATE)
20  select distinct employee_number, vacation_balance, rate, period, total_rate
21    from   remote_employee_view@remote_link;
22  where  EMPLOYEE_NUMBER = v_emp_no;
23end loop;
24 
25commit;
26close v_emp_cursor;
27end;
28/

This procedure glues all parts together. We schedule the previous procedure 4 times and achieve concurrency.
01CREATE procedure refresh_employees
02  authid definer
03is
04begin
05  execute immediate 'truncate table vb_emp_no';
06  insert into vb_emp_no
07  select distinct employee_number from remote_view@db_link;
08  commit;
09  execute immediate 'truncate table employees';
10 
11dbms_scheduler.create_job(job_name => dbms_scheduler.generate_job_name('VB_P1_'),
12   job_type => 'PLSQL_BLOCK',
13   job_action => 'begin refresh_employee_data_part(''VB_EMP_NO_P1''); end;',
14   comments => 'Thread 1 to refresh employees',
15   enabled => true,
16   auto_drop => true);
17 
18dbms_scheduler.create_job(job_name => dbms_scheduler.generate_job_name('VB_P2_'),
19   job_type => 'PLSQL_BLOCK',
20   job_action => 'begin refresh_employee_data_part(''VB_EMP_NO_P2''); end;',
21   comments => 'Thread 2 to refresh employees',
22   enabled => true,
23   auto_drop => true);
24 
25dbms_scheduler.create_job(job_name => dbms_scheduler.generate_job_name('VB_P3_'),
26   job_type => 'PLSQL_BLOCK',
27   job_action => 'begin refresh_employee_data_part(''VB_EMP_NO_P3''); end;',
28   comments => 'Thread 3 to refresh employees',
29   enabled => true,
30   auto_drop = true);
31 
32dbms_scheduler.create_job(job_name => dbms_scheduler.generate_job_name('VB_P4_'),
33   job_type => 'PLSQL_BLOCK',
34   job_action => 'begin refresh_employee_data_part(''VB_EMP_NO_P4''); end;',
35   comments => 'Thread 4 to refresh employees,
36   enabled => true,
37   auto_drop => true);
38 
39end;
40/

The procedure would finish immediately even if there are errors. To check the execution status look under DBA_SCHEDULER_JOB_RUN_DETAILS to look for errors and running time.

End Result the job finished in 50 minutes more than 4x faster.

Hazem Ameen
Senior Oracle DBA

0 comments:

Post a Comment