.comment-link {margin-left:.6em;}

Oracle Sponge -- Now Moved To Wordpress

Please use http://oraclesponge.wordpress.com

Thursday, September 15, 2005

Optimizing Materialized Views Part II: The Direct Path Insert Enhancement

This follows on from some recent thoughts about the need to find ways of optimizing fast refresh against multiple materialized views.

In the documentation for both 9i and 10g you will find references to the optimization of fast refresh materialized views when direct-path insert is applied to the master table. For example here it states that "Fast refresh can perform significant optimizations if it finds that only direct loads have occurred...". As far as I can tell the documentation gives no further information, but a couple of tests on the internal refresh mechanism are very revealing.

Here is how conventional, non-Partition Change Tracking fast refresh works with conventional insert (two uses for the word "conventional" -- that won't be confusing at all):
  1. Bulk conventional insert into master_table (Insert Into master_table ... select ... or other methods such as SQL*Loader conventional path)
  2. Recursive SQL maintains MV log through multiple Insert Into MLOG$... VALUES ()
  3. Commit
  4. MV maintained by Merge Into MV Using (Select ... From MLOG$...)...
Here is the direct path enhancement:
  1. Direct-path insert into master_table (Insert /*+ APPEND */ Into master_table ... select ... or other methods such as SQL*Loader direct path)
  2. Commit
  3. MV maintained by Merge Into MV Using (Select ... From ALL_SUMDELTA, Master_Table...)...


The Difference


Here are the highlights of the differences:
  • Although some recursive SQL is performed on an empty MLOG, there is no insert activity into the MLOG table at all.
  • The MV's are not updated from the MLOG. Instead, the ALL_SUMDELTA view identifies the ranges of rowid's in the master_table that were just inserted, and a join between ALL_SUMDELTA and master_table is all that is needed to supply the change vectors required for the MV fast refresh. Recall that direct path insert creates contiguous blocks of records above the table's high water mark, and thus the new records are not interspersed with old records.
The loss of MLOG activity is probably the major reason for the performance boost (I'll pick the bones out of the trace file hopefully to confirm this next time). The multiple Insert Into ... Values ... statements are an enormous overhead (measurable with SQL tracing), and they both prevent the insert into the MLOG from being a nologging operation and they also lose the benefits of a single bulk operation over multiple single operations. The lack of rows in the table for subsequent recursive SQL against MLOG also makes those SQL's an insignificant use of resources.

The identification of the change vectors for the refresh process may also show improvement. In particular, where the MLOG represents a substantial proportion of the master table's columns and the master table is compressed (not uncommon in fact tables) then the size of the change vectors stored in the MLOG is likely to exceed that of the new records in the table. On the other hand, the MLOG blocks are possibly still in the buffer cache at the time that the MV refresh takes place, which would not be the case with the new rows in the master table (which bypass the buffer cache).


Further Thoughts

One inference that I'm drawing from all this is that the advantages of Partition Change Tracking fast refresh over conventional fast refresh are eroded enormously, which is a relief because I'm not a PCTFR fan at all for reasons that I think I've mentioned here before. If not, let me know.

And here's another thought: if your intention is that MV refresh must never be based on conventional path insert because of the enormous performance loss, then could you place a trigger on the MLOG table to raise an error before insert?

There's a little more work needed there to iron out the exact costs and benefits on the MV refresh process, but the performance increase on the lack of MLOG maintenance alone is an enormous boost.

So to summarize:

"The best way to improve the performance of a task is not to do it at all"

"Bulk operations beat single-row operations"


Some Results


Now then, I don't want to suggest that the following results are formal and repeatable, but on my own test hardware I see the conventional method coming in at 17 minutes for the insert and 12 minutes for a refresh, in contrast to the direct method's 1 second for the insert and 4 seconds for the refresh. These number appear to be an improvement of such magnitude that I'm almost loath to publish them, so let's place a big "YOUR RESULTS MAY VARY" label on them for now.


Next Time


Anyway, that's all for now. I'm going to have a dig through the trace files for more detailed information.

Oh by the way, this extremely powerful optimization is patented by the Oracle Corporation, so hands off it ;)


The Test Script

drop table master;
drop table master_test_data;
drop materialized view master_mv1;
drop materialized view master_mv2;

create table master
(
location_cd number not null,
tx_timestamp date not null,
product_cd number not null,
tx_qty number not null,
tx_cst number not null
)
pctfree 0
nologging
/

create materialized view log on master
with rowid
(
location_cd,
tx_timestamp,
product_cd,
tx_qty,
tx_cst
)
including new values
/

create materialized view master_mv1
Using No Index
Refresh Fast On Commit
Enable Query Rewrite
As
Select
location_cd,
trunc(tx_timestamp,'MM') tx_month,
sum(tx_qty) s_tx_qty,
sum(tx_cst) s_tx_cst,
count(*) c_star
from
master
group by
location_cd,
trunc(tx_timestamp,'MM')
/

create materialized view master_mv2
Using No Index
Refresh Fast On Commit
Enable Query Rewrite
As
Select
product_cd,
trunc(tx_timestamp,'MM') tx_month,
sum(tx_qty) sum_tx_qty,
sum(tx_cst) sum_tx_cst,
count(*) c_star
from
master
group by
product_cd,
trunc(tx_timestamp,'MM')
/

create table master_test_data
(
location_cd not null,
tx_timestamp not null,
product_cd not null,
tx_qty not null,
tx_cst not null
)
pctfree 0
nologging
as
select
trunc(dbms_random.value(1,10)),
to_date('01-jan-2005','DD-Mon-YYYY')
+ dbms_random.value(1,31),
trunc(dbms_random.value(1,1000)),
trunc(dbms_random.value(1,10)),
trunc(dbms_random.value(1,100),2)
from
dual
connect by
1=1 and
level <= 100000 / commit
/
select count(*) from master_test_data
/

alter session set events '10046 trace name context forever, level 8'
/
set timing on

insert /*+ append */ into master select * from master_test_data
where rownum <= 50000 / commit
/
insert /*+ append */ into master select * from master_test_data
/
commit
/
exit

0 Comments:

Post a Comment

<< Home