.comment-link {margin-left:.6em;}

Oracle Sponge -- Now Moved To Wordpress

Please use http://oraclesponge.wordpress.com

Wednesday, April 12, 2006

A Quick Materialized View Performance Note

I was asked in an email this morning about how to improve materialized view refresh performance, and it's something that appears in Google hits for the blog quite frequently. So I thought I'd write a quick-and-dirty posting to summarise some of the research (and the resulting practical applications) that I've been fiddling with recently.

Here are some bullet points:
  1. The materialized view fast refresh mechanism is a one-size-fits-all solution, and is probably not efficient for 99% of summary table maintenance operations.
  2. The join of the aggregated change data to the MV is function-based, as the columns of both relations are wrapped in the Sys_Op_Map_NonNull() function that allows "null = null" joins. I think that it is extremely unlikely that anyone has nullable attribute columns in their fact or summary tables, so this (and the composite function-based index required to support it) are a waste of resources.
  3. Because of the nature of the join it seems to be extremely unlikely that partition pruning of the summary table could take place.
  4. The join mechanism promotes nested loop joins, where a hash join is probably more efficient (that's technically an outer join in the merge, of course).
  5. The refresh mechanism assumes that a merge will be required, when sometimes an insert is not only possible but is very much more efficient.
If performance (and robustness, IMHO) are an issue for you then I would advise that you do the following:
  1. Use materialized views only for enabling query rewrite (which means creating them on a prebuilt summary table, and unless you are loading to the summary by partition exchange then you have to drop the MV, refresh the summary table, and recreate the MV). In 10g it is much more easy to use the DBMS_Advanced_Rewrite package instead of MV's.
  2. Write your own refresh code, based on the usual principles of writing good SQL. If you don't need a merge then don't use it. If you don't need to join to dimension tables to get higher attributes then don't do it.
  3. Leverage different levels of aggregation to help produce higher levels. For a series of hierarchical summaries, multi-level aggregations can be extremely beneficial.
  4. Consider storing the refresh and MV definition SQL in CLOB columns of a summary management table, so they can be tuned and edited without needing to open up package code to do so.
  5. Consider using a complete refresh, either through MV's or manually, for higher aggregation levels, particularly when you can reference another summary table to do so.
My practical experience of taking this path delivered a reduction in the refresh time of a set of seven materialized views based on a single fact table, from a couple of hours down to six minutes. The publication of this is an example of what my client's technical lead calls "polishing the crown" :D

Anyway, this is all a sort-of abstract from a whitepaper that I'm working on right now, which will include all sorts of proofs and demonstrations of the above, plus some process and risk management advice, but it doesn't look like it'll be ready very soon. Feel free to comment or ask questions on the points raised above though -- I don't want anyone to be kept in suspenders 'till the paper is finished.

8 Comments:

At 11:03 AM, Anonymous Anonymous said...

David,

I found a lot of information on materialized views in this site. You have spend a lot of time documenting the details about MVs. This is really helpfull and I really I appreciate that. I have a question about creating the materialized view on prebuilt table option. I noticed you are able to create the materialized view with a different name than the name of the table. Is this a 10g feature?

 
At 11:09 AM, Blogger David Aldridge said...

Can you post a demonstration script for that? If you can do it then I don't think it's documented, as it would require an extra clause in the Create MV DDL to specify which prebuilt table the MV is to be created on.

 
At 11:41 AM, Anonymous Anonymous said...

David,
Thanks for the quick reply. I think I didn't explain the question clearly. If is about an example from

http://oraclesponge.blogspot.com/2005/12/ora-12034-materialized-view-log.html
Please see the script below.
In this example the name of the prebuilt table is m_mv_cstar and the name of the mv is m_mv. I get an error when I tried to run these scripts in oracle 9.2. Is this a typo or is this a 10g feature?

"Doubtless you script hounds will be wanting to see a test of this, huh?

create table m (col1 number);

create materialized view log on mwith rowid (col1)including new values;

create table m_mv_cstar
as
select count(*) c_star
from m;

create materialized view m_mv on prebuilt table refresh fast on commit
as
select count(*) c_starfrom m;

insert into m values (1);
commit;
That gets us a successfully fast refreshing materialized view. Now we break it."

 
At 12:03 PM, Blogger David Aldridge said...

Oh , quite right ... how the dickens did that slip thorugh? It is indeed a typo ... i'll correct it.

Thanks a lot. I'm glad someone's paying attention!

 
At 2:11 PM, Blogger Pete Scott said...

We have 'fast refreshed' a set of mviews (on Oracle 9.2) for a customer the way you suggested for about 2 years now - it works for us and is very fast. On 9.2 we wrote our own rewrite equivalence by not having the mview definition the same as our update code. And yes, we do store a refresh queries in the database. Our code is partition aware and is based on the "truncate partition and insert append" principle - bit of overkill to update1 row but magic to merge a new day into a week level summary
The other saving refresh time is not having to clear down the mview logs after refresh; when we had logs and did partition maintenance it took minutes to run a drop partition, with out the log it was instant.
You are quite right, there are probably no nulls in the dimensions so this function based join is not really needed (and nor is the default mview index!)

There could be a problem (or two problems) with query rewrite if you have nested (mviews built on mviews) mviews for the higher level aggregations. The query is initially rewritten against the "best match" first-level mview and that may not be the parent of the 'best fit' mview - so you are stuck! And when the first re-write happens it then tries to find a better rewrite against the remaining views but fails as the one-to-one join to the denormailised dimension table is no longer there; you can get around this by snowflaking or better still use re-write equivalence.

biqiguy - anyone for a hobnob?

 
At 2:35 PM, Blogger David Aldridge said...

Amen on the nested mv's ... and that reinforces the point of having different refresh code to the mv statement. It seems a lot safer to have all mv's based on the master table, but leverage other mv's as intermediate result sets for reducing resource consumption in the refresh process.

 
At 2:59 PM, Blogger Pete Scott said...

It seems a lot safer to have all mv's based on the master table, but leverage other mv's as intermediate result sets for reducing resource consumption in the refresh process.... and gues what we do!

The only problem we have is that customer insists that we can get fast refresh to work so that they don't have to re-summarise the DW when hierarchies change :(

 
At 3:10 PM, Blogger David Aldridge said...

When the higher attributes in the dimension change you mean, Pete?

 

Post a Comment

<< Home