Some BackgroundFor a variety of reasons that are too painful to relate, query rewrite against some largish fact tables has been unsuccessful in the past, and manual builds of summary tables have been combined with the aggregate awareness of Business Objects to produce the same effect.
The major drawback to this was the extraordinary number of hoops that had to be jumped through to make a change. The creation of a new summary table required coordinated efforts by a database designer, a DBA, a Business Objects designer and an ETL designer. As we all know the difficulty of coordinating such effort is related to something like the 7th power* of the number of people involved, so in other words by the time you have four people involved the job will take forever to get done.
In other other-words, new summary tables were never added.
However now that we have an apparantly robust and automatic method for maintaining the integrity of the materialized view refresh process and the resultant query rewrite ability, we find ourselves with a different choke-point on the creation of new materialized view-based summaries: the duration of the daily fast refresh process that accompanies the arrival of new fact table data. So here is a thought experiment that I'm going through at the moment.
"
What techniques do we have for optimizing the refresh of materialized views?"
For my own selfish reasons I'm thinking about conventional fast refresh of materialized views based on a single table. Any resemblance between this scenario and others is purely coincidental, but may exist all the same.
The ScenarioSo let us examine a scenario where we have multiple fast-refreshable materialized views based on a single table. Personally when thinking about fast-refreshable MV's I prefer to consider that they are based on a
materialized view log, not the table itself. It is fairly trivial to demonstrate that a fast refresh of a materialized view is based entirely on the MV log without the requirement of access to the master table -- I did it for Oracle Support once by creating a fast refresh-on demand MV, and taking the tablespace with the master table offline before successfully refreshing the MV.
The MV log table serves as a change-capture mechanism for its master table Some of them represent conventional summaries of the fact table. Some of them represent lists of distinct values of particular attribute columns in the table (such as Date_Of_Day, for example) in order to support fast retrieval of such values for the user interface. Some of these materialized views could logically be based on others, for example a list of Date_Of_Day values could be based on a summary table that stored metrics at a dimensionality of (Date_Of_Day, Location_Cd, Product_Cd).
I suppose that we had also best define what we mean by "minimizing work". Again for my own selfish reasons I'm talking here about minimizing disk reads and writes, and in making those that occur as efficient as possible.
The issue here is in the multiple scans of a materialized view log that are required in order to support the maintenance of materialized views. If you have seven materialized views based directly on a fact table then I don't see that Oracle can avoid seven consecutive scans of the log in order to maintain them.
Reducing The WorkoadIn approaching this problem I'm dividing the approaches up into a number of categories.
- More efficient modifications to the materialized view log.
- More efficient reads of the materialized view log.
- Fewer reads of the materialized view log.
- More efficient update of the materialized views.
- Fewer materialized views.
- Erm ...
- That's it.
Now it seems to me that any one of these categories with the possible exception of the last two holds promise for more efficient MV maintenance -- when they are combined together we ought to be able to do something worthwhile.
MethodologyI think that I can say without any fear of contradiction that if you want to make a process more efficient then you have to start by fully understanding it, and to my knowledge there are few tools more powerful for rummaging in the internals of an obscure Oracle process than a SQL trace.
So here's the plan.
- Set up a representative scenario involving conventional fast refresh of multiple materialized views against a single master table, in the form of a SQL script.
- Execute the script with SQL tracing turned on.
- Extract from the trace file any statement that touches the materialized view log or the materialized views.
- Look for methodologies for optimizing each statement.
- Look for opportunities for reducing the number of statements executed.
- Measure.
- Change.
- Re-execute
- Re-measure.
- Compare benefits
- Et cetera
Well, as I mentioned previously these blogs have a habit of growing out-of-control, and personally I have the attention span of a goldfish so I'm going to split this up into a number of blogs. Hopefully this will also give anyone who spots an error or bad assumption the chance to leap in before I plow on down the Path of Wrongness.
So in the next blog I'll set up the representative scenario, run a SQL trace on it, and carve out the most tender parts of the internals for our later consumption.
continued here ...* This rule is based on a mixed bag of Democrat, Republican, and Anarcho-syndicalist technicians with varying number of children, wives, previous work commitments and deadlines. Your mileage may vary.