The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Thursday, September 15, 2005

Vacation Time

But it's just Dayton, OH, to cheer on the wife's running of the Air Force (half) marathon.

See you next week.

Optimizing Materialized Views Part II: The Direct Path Insert Enhancement

This follows on from some recent thoughts about the need to find ways of optimizing fast refresh against multiple materialized views.

In the documentation for both 9i and 10g you will find references to the optimization of fast refresh materialized views when direct-path insert is applied to the master table. For example here it states that "Fast refresh can perform significant optimizations if it finds that only direct loads have occurred...". As far as I can tell the documentation gives no further information, but a couple of tests on the internal refresh mechanism are very revealing.

Here is how conventional, non-Partition Change Tracking fast refresh works with conventional insert (two uses for the word "conventional" -- that won't be confusing at all):

Bulk conventional insert into master_table (Insert Into master_table ... select ... or other methods such as SQL*Loader conventional path)
Recursive SQL maintains MV log through multiple Insert Into MLOG$... VALUES ()
Commit
MV maintained by Merge Into MV Using (Select ... From MLOG$...)...

Here is the direct path enhancement:

Direct-path insert into master_table (Insert /*+ APPEND */ Into master_table ... select ... or other methods such as SQL*Loader direct path)
Commit
MV maintained by Merge Into MV Using (Select ... From ALL_SUMDELTA, Master_Table...)...

The Difference

Here are the highlights of the differences:

Although some recursive SQL is performed on an empty MLOG, there is no insert activity into the MLOG table at all.
The MV's are not updated from the MLOG. Instead, the ALL_SUMDELTA view identifies the ranges of rowid's in the master_table that were just inserted, and a join between ALL_SUMDELTA and master_table is all that is needed to supply the change vectors required for the MV fast refresh. Recall that direct path insert creates contiguous blocks of records above the table's high water mark, and thus the new records are not interspersed with old records.

The loss of MLOG activity is probably the major reason for the performance boost (I'll pick the bones out of the trace file hopefully to confirm this next time). The multiple Insert Into ... Values ... statements are an enormous overhead (measurable with SQL tracing), and they both prevent the insert into the MLOG from being a nologging operation and they also lose the benefits of a single bulk operation over multiple single operations. The lack of rows in the table for subsequent recursive SQL against MLOG also makes those SQL's an insignificant use of resources.

The identification of the change vectors for the refresh process may also show improvement. In particular, where the MLOG represents a substantial proportion of the master table's columns and the master table is compressed (not uncommon in fact tables) then the size of the change vectors stored in the MLOG is likely to exceed that of the new records in the table. On the other hand, the MLOG blocks are possibly still in the buffer cache at the time that the MV refresh takes place, which would not be the case with the new rows in the master table (which bypass the buffer cache).

Further Thoughts

One inference that I'm drawing from all this is that the advantages of Partition Change Tracking fast refresh over conventional fast refresh are eroded enormously, which is a relief because I'm not a PCTFR fan at all for reasons that I think I've mentioned here before. If not, let me know.

And here's another thought: if your intention is that MV refresh must never be based on conventional path insert because of the enormous performance loss, then could you place a trigger on the MLOG table to raise an error before insert?

There's a little more work needed there to iron out the exact costs and benefits on the MV refresh process, but the performance increase on the lack of MLOG maintenance alone is an enormous boost.

So to summarize:

"The best way to improve the performance of a task is not to do it at all"

"Bulk operations beat single-row operations"

Some Results

Now then, I don't want to suggest that the following results are formal and repeatable, but on my own test hardware I see the conventional method coming in at 17 minutes for the insert and 12 minutes for a refresh, in contrast to the direct method's 1 second for the insert and 4 seconds for the refresh. These number appear to be an improvement of such magnitude that I'm almost loath to publish them, so let's place a big "YOUR RESULTS MAY VARY" label on them for now.

Next Time

Anyway, that's all for now. I'm going to have a dig through the trace files for more detailed information.

Oh by the way, this extremely powerful optimization is patented by the Oracle Corporation, so hands off it ;)

The Test Script

drop table master;
drop table master_test_data;
drop materialized view master_mv1;
drop materialized view master_mv2;

create table master
(
location_cd number not null,
tx_timestamp date not null,
product_cd number not null,
tx_qty number not null,
tx_cst number not null
)
pctfree 0
nologging
/

create materialized view log on master
with rowid
(
location_cd,
tx_timestamp,
product_cd,
tx_qty,
tx_cst
)
including new values
/

create materialized view master_mv1
Using No Index
Refresh Fast On Commit
Enable Query Rewrite
As
Select
location_cd,
trunc(tx_timestamp,'MM') tx_month,
sum(tx_qty) s_tx_qty,
sum(tx_cst) s_tx_cst,
count(*) c_star
from
master
group by
location_cd,
trunc(tx_timestamp,'MM')
/

create materialized view master_mv2
Using No Index
Refresh Fast On Commit
Enable Query Rewrite
As
Select
product_cd,
trunc(tx_timestamp,'MM') tx_month,
sum(tx_qty) sum_tx_qty,
sum(tx_cst) sum_tx_cst,
count(*) c_star
from
master
group by
product_cd,
trunc(tx_timestamp,'MM')
/

create table master_test_data
(
location_cd not null,
tx_timestamp not null,
product_cd not null,
tx_qty not null,
tx_cst not null
)
pctfree 0
nologging
as
select
trunc(dbms_random.value(1,10)),
to_date('01-jan-2005','DD-Mon-YYYY')
+ dbms_random.value(1,31),
trunc(dbms_random.value(1,1000)),
trunc(dbms_random.value(1,10)),
trunc(dbms_random.value(1,100),2)
from
dual
connect by
1=1 and
level <= 100000 / commit
/
select count(*) from master_test_data
/

alter session set events '10046 trace name context forever, level 8'
/
set timing on

insert /*+ append */ into master select * from master_test_data
where rownum <= 50000 / commit
/
insert /*+ append */ into master select * from master_test_data
/
commit
/
exit

Wednesday, September 14, 2005

From the "I'm no expert, but..." file

* Ahem *

It would take a blind man, or one with work/hobbies to get on with instead, to avoid noticing all the talk about legal liability that folks might incur as the "owner"/"publisher" of a blog or forum. Serious stuff really, when some people say that just by allowing comments on your blog you are potentialy opening yourself up to potential liability for actionable defamation and such things.

My policy has been to allow comments by anyone and so far the only one's I've removed have been spam. I must have a well-behaved readership, or something. :D

Anyhoo, here's a resource that gives some pretty good news on the subject: the Electronic Frontier Foundation's Legal Guide For Bloggers. The best news is that even if you don't have a blog then it's an interesting read all the same. For example, you can learn why a person included on a list of "Top Ten Dumbasses" lost their case for libel. More seriously the page on Section 230 is an essential read.

Like I say though, I'm no expert. But these people certainly seem to know what they're talking about.

Thursday, September 08, 2005

Webpage Guru

I don't want to speak too soon but I seem to have managed to alter the blog template to make the text area wider, through a technical process of mucking about with likely looking parameters (in many ways similar to network administration). Best of all nothing seems to be broken.

If you have problems viewing this then let me know and I'll re-tweak it.

Next challenge -- alter the background colour to an eye-watering bath sponge yellow.

Wednesday, September 07, 2005

Optimizing Materialized Views Part I: Principles For Analysis

Some Background

For a variety of reasons that are too painful to relate, query rewrite against some largish fact tables has been unsuccessful in the past, and manual builds of summary tables have been combined with the aggregate awareness of Business Objects to produce the same effect.

The major drawback to this was the extraordinary number of hoops that had to be jumped through to make a change. The creation of a new summary table required coordinated efforts by a database designer, a DBA, a Business Objects designer and an ETL designer. As we all know the difficulty of coordinating such effort is related to something like the 7th power* of the number of people involved, so in other words by the time you have four people involved the job will take forever to get done.

In other other-words, new summary tables were never added.

However now that we have an apparantly robust and automatic method for maintaining the integrity of the materialized view refresh process and the resultant query rewrite ability, we find ourselves with a different choke-point on the creation of new materialized view-based summaries: the duration of the daily fast refresh process that accompanies the arrival of new fact table data. So here is a thought experiment that I'm going through at the moment.

"What techniques do we have for optimizing the refresh of materialized views?"

For my own selfish reasons I'm thinking about conventional fast refresh of materialized views based on a single table. Any resemblance between this scenario and others is purely coincidental, but may exist all the same.

The Scenario

So let us examine a scenario where we have multiple fast-refreshable materialized views based on a single table. Personally when thinking about fast-refreshable MV's I prefer to consider that they are based on a materialized view log, not the table itself. It is fairly trivial to demonstrate that a fast refresh of a materialized view is based entirely on the MV log without the requirement of access to the master table -- I did it for Oracle Support once by creating a fast refresh-on demand MV, and taking the tablespace with the master table offline before successfully refreshing the MV.

The MV log table serves as a change-capture mechanism for its master table Some of them represent conventional summaries of the fact table. Some of them represent lists of distinct values of particular attribute columns in the table (such as Date_Of_Day, for example) in order to support fast retrieval of such values for the user interface. Some of these materialized views could logically be based on others, for example a list of Date_Of_Day values could be based on a summary table that stored metrics at a dimensionality of (Date_Of_Day, Location_Cd, Product_Cd).

I suppose that we had also best define what we mean by "minimizing work". Again for my own selfish reasons I'm talking here about minimizing disk reads and writes, and in making those that occur as efficient as possible.

The issue here is in the multiple scans of a materialized view log that are required in order to support the maintenance of materialized views. If you have seven materialized views based directly on a fact table then I don't see that Oracle can avoid seven consecutive scans of the log in order to maintain them.

Reducing The Workoad

In approaching this problem I'm dividing the approaches up into a number of categories.

More efficient modifications to the materialized view log.
More efficient reads of the materialized view log.
Fewer reads of the materialized view log.
More efficient update of the materialized views.
Fewer materialized views.
Erm ...
That's it.

Now it seems to me that any one of these categories with the possible exception of the last two holds promise for more efficient MV maintenance -- when they are combined together we ought to be able to do something worthwhile.

Methodology

I think that I can say without any fear of contradiction that if you want to make a process more efficient then you have to start by fully understanding it, and to my knowledge there are few tools more powerful for rummaging in the internals of an obscure Oracle process than a SQL trace.

So here's the plan.

Set up a representative scenario involving conventional fast refresh of multiple materialized views against a single master table, in the form of a SQL script.
Execute the script with SQL tracing turned on.
Extract from the trace file any statement that touches the materialized view log or the materialized views.
Look for methodologies for optimizing each statement.
Look for opportunities for reducing the number of statements executed.
Measure.
Change.
Re-execute
Re-measure.
Compare benefits
Et cetera

Well, as I mentioned previously these blogs have a habit of growing out-of-control, and personally I have the attention span of a goldfish so I'm going to split this up into a number of blogs. Hopefully this will also give anyone who spots an error or bad assumption the chance to leap in before I plow on down the Path of Wrongness.

So in the next blog I'll set up the representative scenario, run a SQL trace on it, and carve out the most tender parts of the internals for our later consumption.

continued here ...

* This rule is based on a mixed bag of Democrat, Republican, and Anarcho-syndicalist technicians with varying number of children, wives, previous work commitments and deadlines. Your mileage may vary.

Tuesday, September 06, 2005

An Apology (no, not from me)

Just dabbling over at the website of TriCare, and came across what I thought was an extraordinary statement.

The registration functionality will continue to be available until further notice. We will provide an update as soon as a new date is established for disabling the registration functionality. We apologize for any inconvenience this may cause.

Um ... they appear to be apologising for their continued inability to prevent online registration. Shurely shome mishtake?

What I'm Thinking About At The Moment

... is how to improve the performance of multiple materialized view fast refreshes based on a single master table.

Hmmm.

One of the values of this blogging thing is that you start with a problem, and then develop a plan to tackle it a certain way, and then you snap your fingers and think, "Best blog this right away!". Then as you write your thoughts down you realise, "What if someone suggests .... or what if they ask ... best write something about that option as well then!"

Hence a brief note about your plan of action could turn into a near-total examination of the subject, some of which is not relevant to your situation anyway. In some cases you never see an end in sight -- in order to do justice to the topic (even one as narrowly focused as my current concern) in any rigorous sense you can see a twenty thousand word task ahead of you complete with scripts to justify every assumption and display every facet of the argument. This turns out to be so overwhelming that the blog collapses half-built under the pressure of actually having to do something in a hurry.

It's a hard life, eh? Well, I'm picking the bones out of my materialized view refresh issue before is gets turned into a problem, so hopefully I'll get to do justice to the topic.

Thursday, September 01, 2005

Trivia for the day: the History Of The Ampersand

http://www.artlebedev.com/mandership/112/

Oracle Sponge -- Now Moved To Wordpress

Thursday, September 15, 2005

Vacation Time

Optimizing Materialized Views Part II: The Direct Path Insert Enhancement

Wednesday, September 14, 2005

From the "I'm no expert, but..." file

Thursday, September 08, 2005

Webpage Guru

Wednesday, September 07, 2005

Optimizing Materialized Views Part I: Principles For Analysis

Tuesday, September 06, 2005

An Apology (no, not from me)

What I'm Thinking About At The Moment

Thursday, September 01, 2005

Trivia for the day: the History Of The Ampersand

About Me

Spongeworthy Links

Some Articles

Practical Guide To Oracle Data Warehousing

Best Of The Sponge (IMHO)

Materialized View Articles

Sponge Lists

Spongeworthy Documentation

Version 9i

Version 10G

Previous Posts

Archives