Optimizing Materialized Views Part I: Principles For Analysis
Some Background
For a variety of reasons that are too painful to relate, query rewrite against some largish fact tables has been unsuccessful in the past, and manual builds of summary tables have been combined with the aggregate awareness of Business Objects to produce the same effect.
The major drawback to this was the extraordinary number of hoops that had to be jumped through to make a change. The creation of a new summary table required coordinated efforts by a database designer, a DBA, a Business Objects designer and an ETL designer. As we all know the difficulty of coordinating such effort is related to something like the 7th power* of the number of people involved, so in other words by the time you have four people involved the job will take forever to get done.
In other other-words, new summary tables were never added.
However now that we have an apparantly robust and automatic method for maintaining the integrity of the materialized view refresh process and the resultant query rewrite ability, we find ourselves with a different choke-point on the creation of new materialized view-based summaries: the duration of the daily fast refresh process that accompanies the arrival of new fact table data. So here is a thought experiment that I'm going through at the moment.
"What techniques do we have for optimizing the refresh of materialized views?"
For my own selfish reasons I'm thinking about conventional fast refresh of materialized views based on a single table. Any resemblance between this scenario and others is purely coincidental, but may exist all the same.
The Scenario
So let us examine a scenario where we have multiple fast-refreshable materialized views based on a single table. Personally when thinking about fast-refreshable MV's I prefer to consider that they are based on a materialized view log, not the table itself. It is fairly trivial to demonstrate that a fast refresh of a materialized view is based entirely on the MV log without the requirement of access to the master table -- I did it for Oracle Support once by creating a fast refresh-on demand MV, and taking the tablespace with the master table offline before successfully refreshing the MV.
The MV log table serves as a change-capture mechanism for its master table Some of them represent conventional summaries of the fact table. Some of them represent lists of distinct values of particular attribute columns in the table (such as Date_Of_Day, for example) in order to support fast retrieval of such values for the user interface. Some of these materialized views could logically be based on others, for example a list of Date_Of_Day values could be based on a summary table that stored metrics at a dimensionality of (Date_Of_Day, Location_Cd, Product_Cd).
I suppose that we had also best define what we mean by "minimizing work". Again for my own selfish reasons I'm talking here about minimizing disk reads and writes, and in making those that occur as efficient as possible.
The issue here is in the multiple scans of a materialized view log that are required in order to support the maintenance of materialized views. If you have seven materialized views based directly on a fact table then I don't see that Oracle can avoid seven consecutive scans of the log in order to maintain them.
Reducing The Workoad
In approaching this problem I'm dividing the approaches up into a number of categories.
- More efficient modifications to the materialized view log.
- More efficient reads of the materialized view log.
- Fewer reads of the materialized view log.
- More efficient update of the materialized views.
- Fewer materialized views.
- Erm ...
- That's it.
Methodology
I think that I can say without any fear of contradiction that if you want to make a process more efficient then you have to start by fully understanding it, and to my knowledge there are few tools more powerful for rummaging in the internals of an obscure Oracle process than a SQL trace.
So here's the plan.
- Set up a representative scenario involving conventional fast refresh of multiple materialized views against a single master table, in the form of a SQL script.
- Execute the script with SQL tracing turned on.
- Extract from the trace file any statement that touches the materialized view log or the materialized views.
- Look for methodologies for optimizing each statement.
- Look for opportunities for reducing the number of statements executed.
- Measure.
- Change.
- Re-execute
- Re-measure.
- Compare benefits
- Et cetera
So in the next blog I'll set up the representative scenario, run a SQL trace on it, and carve out the most tender parts of the internals for our later consumption.
continued here ...
* This rule is based on a mixed bag of Democrat, Republican, and Anarcho-syndicalist technicians with varying number of children, wives, previous work commitments and deadlines. Your mileage may vary.
5 Comments:
Nobody talking to about this, David?
Which version of Oracle will you be using for your tests?
I know that fast refresh of partitioned MVs worked differently in Oracle 9i to 10g. A case of delete in 9 against truncate partition in 10 ! - guess which was slow?
I am looking at how fast refresh works with dimensional changes with a 'report as now' DW model - I'll blog something when I get a round tuit (as my engineer friends say)
I think we're paddling a quiet backwater of the databasing world here Pete.
That's an interesting thing about the fast refresh ... are you refering to PCT fast refresh there?
Forgot the other question ... I think it'll be 10.1 -- my 10.1 database has the Sales History "SH" sample schema on it, and I think I'll dabble with using that.
Because I'm concentrating on conventional fast refresh at the moment there ought to be fewer grounds for inter-version variation, and I'm anticipating that most of the techniques will be transferrable to other versions.
Oh how easy it is to become an expert on things nobody else knows about!
Yes, I was talking about PCT on fact summary MV.
I guess you should look at two types of refresh - the ones where a load of new fact is added (daily batch) and the ones where a dimension key changes for some rows (hierarchy moves)
Well I could look at those, but at the moment I'm just looking at the more simple case of an MV based on a single master (fact) table. That's not to say I can't extend the case later of course or that MV's on multiple masters are a bad thing, but again I'm invoking the "selfish blogger" clause that allows me to just write about what I need to know right know :)
Post a Comment
<< Home