(Updated below)Here's a follow up to the
previous post on this subject, which ought to be read first. This is just to give more background on why I think this technique works.
We have a scenario where we want to read all of the rows where fiscal_month is Jan-2005 (around one million rows) from a fact table of sixty partitions (based on file_load_month). 99.5% of these rows (995,000 rows) are contained in a single partition (file_load_month=Jan-2005) which itself holds around one million rows. The other 5,000 rows are scattered across other partitions of the table. In fact they will probably be found in around ten-twenty of the remaining fifty-nine partitions.
If you just query for rows where "fiscal_month = Jan-2005" then no partition pruning is possible (remember that the table is partitioned on file_load_month) and therefore the optimizer has two choices on how to execute this
- Access by index -- find several million rows by reading a bitmap index on fiscal_month and then reading the relevant blocks through single block io.
- Access by full table scan -- read all sixty partitions of the table in multiblock io.
Neither of these are satisfactory, so a different query was employed where two queries were UNION ALL'd together.
- Selects all of the required rows from the file_load_month Jan-2005 partition
- Select all of the required rows from all of the other partitions.
The first query uses partition pruning to read 995,000 rows of the one million rows in the file_load_month Jan-2005 partition. Multiblock io is used.
The second query accesses the table through a BITMAP MINUS operation on two bitmap indexes (on fiscal_month minus file_load_month) to read around 5,000 rows that are probably scattered throughout the remaining fifty-nine partitions using single block io.
Now if you consider how many blocks get read for the old technique and the new technique, then the two methods are practically the same. In fact the new method probably reads slightly more blocks than the old method. The difference between them comes from the ability of the new methodology to read nearly all of the required table blocks in multiblock mode.
It is interesting to note that the optimizer actually has all of the information that it needs in order to perform this optimization internally. Partition-level histograms on the fiscal_month show that nearly all of the required rows are contained in a single partition, and almost none of them are contained in any other.
I'm going to get some explain plans extracted that are more instructive than those I currently have, and I'll get back to you with them.
UpdateJonathan Lewis has written to point out the similarity between this approach and the practically-defunct feature of "Partition Views" (PV), in which similar tables are UNION ALL'ed together in a view that can be queried to produce an effect similar to pruning in a partitioned table (PT).
He also points out one of the strengths of PV -- that the optimizer could not only decide not to touch a partition (a form of optimization we know as "pruning" of course) but could also generate a different form of execution plan for each partition that the query would touch. This is a benefit of PV that was lost with PT (except for the "do not touch this partition" pruning optimization) and which is resurrected in the technique I'm describing here. Exactly the same principle applies, of breaking up a result set into multiple subsets with UNION ALL to merge them together.
I'm inclined to wonder whether in this particular example an approach of defining a partition view over a partitioned table, in the form of ...
Select fiscal_month, file_load_month, ...
from my_table
where fiscal_month = file_load_month
Union All
Select fiscal_month, file_load_month, ...
from my_table
where fiscal_month != file_load_month
... would be productive. I wonder whether the optimizer transitivity would be robust enough to handle this situation.
Jonathan's article comparing PV and PT is in Microsoft Word format
here -- for the MS-averse Google has a passable HTML conversion
here.