.comment-link {margin-left:.6em;}

Oracle Sponge -- Now Moved To Wordpress

Please use http://oraclesponge.wordpress.com

Monday, August 22, 2005

Execution Plans for "Partition Not-Quite-Pruning"

This is a followup to this blog, which itself was a followup to this one.

Table Structure

The table here is called FCT_FSA_MSD_GSD and is composite partitioned on file_load_month and fund_cd. There are about 60 partitions on file_load_month and three subpartitions per partition, of which one holds only fund_cd values of '6C'. A fiscal month is not directly represented in the table, and is defined with a range of values for the date_of_day column.

Indexes
xie06fct_fsa_msd_gsd: single column bitmap index on FILE_LOAD_MONTH
xie15fct_fsa_msd_gsd: single column bitmap index on DATE_OF_DAY
xie28fct_fsa_msd_gsd: single column bitmap index on FUND_CD

Explain Plans
I've edited the plans just to take out the parallelism references that were mucking up the format, and joined lines together where they broke.

Here's the old form of the query.

SQL> select count(*), sum(fiscal_amt) from fct_fsa_msd_gsd
2 where date_of_day between '01-jan-2004' and '31-jan-2004' and
3 fund_cd = '6C'
4 /

COUNT(*) SUM(FISCAL_AMT)
---------- ---------------
6469727 3521311527

Elapsed: 00:00:41.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22964 Card=1 Bytes=16)
1 0 SORT (AGGREGATE)
2 1 SORT* (AGGREGATE)
3 2 PARTITION RANGE* (ALL)
4 3 TABLE ACCESS* (BY LOCAL INDEX ROWID) OF 'FCT_FSA_MSD_GSD' (Cost=22964 Card=940684 Bytes=15050944)
5 4 BITMAP CONVERSION* (TO ROWIDS)
6 5 BITMAP AND*
7 6 BITMAP MERGE*
8 7 BITMAP INDEX* (RANGE SCAN) OF 'XIE15FCT_FSA_MSD_GSD'
9 6 BITMAP INDEX* (SINGLE VALUE) OF 'XIE28FCT_FSA_MSD_GSD'


Statistics
----------------------------------------------------------
20 recursive calls
3 db block gets
36489 consistent gets
36360 physical reads
1016 redo size
258 bytes sent via SQL*Net to client
274 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
70 sorts (memory)
0 sorts (disk)

1 rows processed

Commentary on above: A bitmap merge combines
the fund code and date of day predicates,
then every partition of the table is accessed by
rowid.

Here's the first part of the improved query

SQL> select count(*), sum(fiscal_amt) from fct_fsa_msd_gsd
2 where date_of_day between '01-jan-2004' and '31-jan-2004' and
3 fund_cd = '6C' and
4 file_load_month = '01-jan-2004'
5 /

COUNT(*) SUM(FISCAL_AMT)
---------- ---------------
6469717 3521307863

Elapsed: 00:00:12.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=197 Card=1 Bytes=25)
1 0 SORT (AGGREGATE)
2 1 SORT* (AGGREGATE)
3 2 TABLE ACCESS* (FULL) OF 'FCT_FSA_MSD_GSD' (Cost=197 Card=3375101 Bytes=84377525)

Statistics
----------------------------------------------------------
149 recursive calls
3 db block gets
37043 consistent gets
36651 physical reads
936 redo size
244 bytes sent via SQL*Net to client
275 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1 rows processed

Commentary on above: This is 9.2.0.6 and the explain plan is not
showing that partition pruning has taken place.
However from the consistent gets and physical reads is evidently has.
Looks like the subpartition stats are off as well. Tut.


Here's the second part of the improved query

SQL> select count(*),sum(fiscal_amt) from fct_fsa_msd_gsd
2 where date_of_day between '01-jan-2004' and '31-jan-2004' and
3 fund_cd = '6C' and
4 file_load_month != '01-jan-2004'
5 /

COUNT(*) SUM(FISCAL_AMT)
---------- ---------------
10 3663.2

Elapsed: 00:00:04.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22432 Card=1 Bytes=24)
1 0 SORT (AGGREGATE)
2 1 SORT* (AGGREGATE)
3 2 PARTITION RANGE* (ALL)
4 3 TABLE ACCESS* (BY LOCAL INDEX ROWID) OF 'FCT_FSA_MSD_GSD' (Cost=22432 Card=919682 Bytes=22072368)
5 4 BITMAP CONVERSION* (TO ROWIDS)
6 5 BITMAP AND*
7 6 BITMAP MINUS*
8 7 BITMAP MERGE*
9 8 BITMAP INDEX* (RANGE SCAN) OF 'XIE15FCT_FSA_MSD_GSD'
10 7 BITMAP INDEX* (SINGLE VALUE) OF 'XIE06FCT_FSA_MSD_GSD'
11 6 BITMAP INDEX* (SINGLE VALUE) OF 'XIE28FCT_FSA_MSD_GSD'

Statistics
----------------------------------------------------------
20 recursive calls
3 db block gets
725 consistent gets
591 physical reads
1008 redo size
253 bytes sent via SQL*Net to client
275 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
70 sorts (memory)
0 sorts (disk)
1 rows processed

Commentary on above. Bitmap indexes on file_load_month is minus'd
from that on date_of_day, then the result is merged with that on
fund cd, and the entire table is accessed by rowid.


Here's the combined parts of the improved query

SQL> select sum(rc),sum(fiscal_amt) from
2 (
3 select count(*) rc, sum(fiscal_amt) fiscal_amt from fct_fsa_msd_gsd
4 where date_of_day between '01-jan-2004' and '31-jan-2004' and
5 fund_cd = '6C' and
6 file_load_month = '01-jan-2004'
7 union all
8 select count(*),sum(fiscal_amt) from fct_fsa_msd_gsd
9 where date_of_day between '01-jan-2004' and '31-jan-2004' and
10 fund_cd = '6C' and
11 file_load_month != '01-jan-2004'
12 )
13 /

SUM(RC) SUM(FISCAL_AMT)
---------- ---------------
6469727 3521311527

Elapsed: 00:00:37.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22629 Card=1 Bytes=26)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=22629 Card=2 Bytes=52)
3 2 UNION-ALL
4 3 SORT (AGGREGATE)
5 4 SORT* (AGGREGATE)
6 5 TABLE ACCESS* (FULL) OF 'FCT_FSA_MSD_GSD' (Cost=197 Card=3375101 Bytes=84377525)
7 3 SORT (AGGREGATE)
8 7 SORT* (AGGREGATE)
9 8 PARTITION RANGE* (ALL)
10 9 TABLE ACCESS* (BY LOCAL INDEX ROWID) OF 'FCT_FSA_MSD_GSD' (Cost=22432 Card=919682 Bytes=22072368)
11 10 BITMAP CONVERSION* (TO ROWIDS)
12 11 BITMAP AND*
13 12 BITMAP MINUS*
14 13 BITMAP MERGE*
15 14 BITMAP INDEX* (RANGE SCAN) OF 'XIE15FCT_FSA_MSD_GSD'
16 13 BITMAP INDEX* (SINGLE VALUE) OF 'XIE06FCT_FSA_MSD_GSD'
17 12 BITMAP INDEX* (SINGLE VALUE) OF 'XIE28FCT_FSA_MSD_GSD'

Commentary on above: Here we go, all working nicely together. the timing
for this combined query is high, but this is a working production
machine with a bunch of other stuff going on.

Other Comments
My guesstimate on the row proportions was off. In fact there were 6.5 million rows involved in the result set, and only 10 of them were not in the Jan 2005 partition (so that's 0.000154%).

19 Comments:

At 2:23 PM, Anonymous Anonymous said...

I don't see that big difference in performance as you mentioned ("The query time dropped from around 56 seconds to 17 seconds.")

 
At 2:45 PM, Blogger David Aldridge said...

As I say, this is a production box with production things happening on it. On a quiet Friday afternoon I got the 56 -> 17 second drop. Now if you look at the individual timings of the two components of the improved query then you see 12sec and 4sec, which sum to be pretty close to the 17 I got last week. Last week I got 56sec for the original query and now I get 41sec. When the two components of the new query were combined the query ran in 30+ seconds, but is not inconsistent.

Don't forget that this isn't like the conventional OLTP system with a gentle hum of continual activity as 10's/100's/1000's of users submit many small queries. This isa data warehouse/decision support machine that alternates between very low activity levels and steam-coming-out-of-ears on a pretty fast cycle.

The important thing here is that you understand the difference between the old and the new queries, and why the new query is more efficient than the old one.

 
At 6:26 PM, Anonymous Anonymous said...

The difference between the queries as per this example is 3+ seconds for a very unrealistic data distribution. Please compare these queries with the initial queries you posted and also compare the data distribution. To me it seems to be totally different. Now, why you need to create monthly range partitions when 6,469,717 records belongs to one parition and 10 records falls into the other 59 partitions?. Just removing the partition on date should give much better performance.

I am curious about what type of business this is. A business where eighty systems created 6,469,717 transactions in one month and 10 transaction in all the 59 remaining months!.

If the queries where as it were before in the first two postings I wouldn't have asked this question.
The month in which most of transactions happend was Jan 2005. I can guess the business!. It has to be some voodoo corporation in an Indian ocean island. Sunami waves hit many islands during January 2005 and this company must be employing people (or "systems") to count the waves that are more than 10 feet high. It makes sense if the count for Jan 2005 was so high when all the remaining 59 months in a five year period created a count of 10. But that cannot be the case anymore because your new queries are showing Jan 2004.

 
At 8:18 PM, Blogger David Aldridge said...

Go and read the business case again -- I obviously wouldn't be bothering with any of this if there were only 6.5 million records in one partition and only 10 in all the other 59.

* sigh *

Let me state it again for those at the back of the class.

The source systems do not send all of the January records in January. Some of them get sent in other months. In this case we have 6.5 million rows representing January transactions that were received in January. The other ten January records were not received in January. Because the table is partitioned according to when records are received, these ten records are part of a different partition.

Really, I do wish you'd pay more attention.

 
At 3:42 AM, Blogger Pete_S said...

Nice method, Dave!

Tell me, was the original dataload date partitioning scheme a "bad" original choice or a business need at the time? And are you going to bitmap index the load dates (or don't people ask that sort of question?)

 
At 6:51 AM, Blogger David Aldridge said...

Thanks Pete,

Back in the dawn of time when I first started working on this, the fact tables were already partitioned according to an organizational level. This divided the data into roughly five equal data sets and had the odd virtue that it would almost never allow partition pruning or easy data loads. Furthermore the data load was by conventional path insert and there were of course no bitmap indexes.

The data was aggregated into summary tables by rolling up days to months, quarters, and years. Sparsity failure gave a row ratio between the fact table and the highest level summary of about 3:1.

Back in those times the load of about half a million records would take about two days.

In other words, just about everything either sucked or blew, in about equal proportions.

The load process was the big problem though. For some reason when the data volume doubled the load time quadrupled, so I changed the structure to partition by load month and started using partition exchanges and bitmap indexes. The load was from a Windows box to an HP/UX box so we couldn't use direct path (it was a restriction of 8i, I think). Anyway, the new partitioning scheme took the load time down to a few hours and was scalable even as load volumes increased.

What prompted this change was increasing data volumes and high server load, but mostly it's a change to the loading schedule from weekly to daily. The partition-by-load-month scheme had to be changed, and partition-by-load-day didn't appeal at all as we weren't going to get partition pruning on it. The ability to perform direct path inserts directly from Informatica on Windows to Oracle on HP/UX also helps.

So, the old method was just aged out of usefulness by changes in the load cycle and the data volume and the server load, I guess.

 
At 7:47 AM, Blogger Pete_S said...

Partition by day is not something to be afraid of now - I have one system with 2.25 years of daily partitions (partition key is transaction date) we load by direct path - late arriving data is appended to older partitions.

We also get the benefit that we can roll dates up to calendar months as well as fiscal periods (4 weeks aligned on Saturdays)

 
At 8:43 AM, Blogger David Aldridge said...

Yes, I agree. There's nothing wrong with partition-by-day per se, but we wouldn't be getting partition pruning if we partitioned by day-of-load so that'd give us hundreds of partitions to touch. We could partition by date of transaction but it's pretty rare for our users to want reports by day -- they mostly report at the monthly level.

That last point is a key issue, I think. Users want to say "Fiscal Month = 2005 M10", and with a clause like that I don't believe that the optimizer will partition prune on a day level ... well maybe I'm wrong about that, maybe a combination of constraints and Dimensions will allow it but I don't think so.

Oh, since we're doing composite multicolumn range/list partitioning (fund_cd,fiscal_month_cd)/(general ledger account) the number of subpartitions would also be truly awe-inspiring with daily partitioning. Something like 3,000 per year, and that would make even me nervous!

 
At 8:56 AM, Blogger Pete_S said...

Because our day partitions are around 1/30 the size we got rid of sub-partitions - ours used to be hash subpartition on product key; they offered no real benefit to user queries as most users asked for multiple products and hit most of the sub partitions!

When I around to it I'll have a look at some query plans to see if we prune on month based queries, I feel that we do, but we have dimensions, constraints etc comming out of our ears!

 
At 9:07 AM, Blogger David Aldridge said...

I never managed to measure any benefit in hash subpartitions myself, even on joining to a similarly hash partitioned large dimension table. Maybe the data volume is just not large enough for the benefits to emerge.

The list subpartitioning looks like a winner for us because we have user groups who are only interested in particular lists of general ledger accounts, and in some cases the data volumes that they're interested turn out to be just a few percent of the total. Subpartition pruning then makes even a select of all of the historical data for those GL's perform very well even when there is no predicate on the time dimension.

I'd be interested to hear if you get that pruning with month-level predictaes. I might have missed something in my testing.

 
At 1:47 PM, Anonymous Janu P said...

David,

Thanks for posting all these details. Yes, this is a cool idea. I can see a lot of practical applications of this method. I always have issues with meeting the requirements of users who want access data by transaction date and those who want to query the same data by post date!. (I am working for a credit card company)
This method seems to be good in dealing with those issues.

 
At 6:39 PM, Blogger David Aldridge said...

Hmmm, i wonder whether it would be worth exploring composte range-list partitioning for that. If there were only relatively few transaction dates per post date then if you had range partitioning on the post date (for example) with list subpartitioning on the transaction date.

Depends how your tableis partitioned, of course. If you had daily partitions on the post date (which intuitively makes m=some sense to me because that's probably how you load the data) then I wonder whether the partition for post date '15-jan-2005' might have list subpartitions for '15-jan-2005','14-jan-2005','13-jan-2005','12-jan-2005','11-jan-2005', and default?

It'd mean a different set of subpartitions per partition, but maybe it would be feasible. it'd certainly give the optimizer more choices.

 
At 1:51 AM, Blogger Pete_S said...

I had a go with a month selected from a day level partitioned table - and yes we do get pruning!

See here
for explain plans etc

 
At 12:53 PM, Anonymous Janu P said...

David,

Using range list partition is a good suggestion. We are using range partition on post date(monthly) and list sub-partition on transaction date in a few cases and it works fine. The user queries are not allways based on the post date or transaction date. In many cases the quereis would be based on currency codes or merchant codes. So I thought of doing a monthly range partition on post date (which will be also be the load date) and a sub-partition on currency code or merchant code (or some other field) if I could get the transaction date post date issue fixed without creating two partitions on date. I agree partitioning by date is not a big issue. But we don't create daily paritions for etl purpose. It is generally monthly partitions and we copy the current partition to a temp table and load the current days data and do a partition exchange.

 
At 1:52 AM, Blogger Pete_S said...

Janu
In my opinion, there is an on-going maintenance problem with list sub-partitions for a dynamic partition key such as merchant number: as new merchants come on board the DBA team need to maintain the partitioning scheme to ensure the new merchant transactions are stored in the appropriate sub-partitions. This is probably less of a problem with currency as this list is not likely to be so open ended. But with currency you may need to look out for skew in sub-partition size, I guess that most transactions will be in the local currency, a significant number in one or two major international currencies, then decreasing numbers of transactions in other less popular currencies, This skew could affect query optimisation

 
At 7:21 AM, Blogger David Aldridge said...

I would imagine that something could be done with the ETL process to handle new values for these subpartitioning codes. One reasonably efficient (though a little "high maintenance" way of doing this would be to provide a default value subpartition in both the composite range-list subpartitioned table and the list partitioned table with which you will swap it. Then after loading to the exchange table you check for the existence of rows in the default list partition. If there are none then the ETL goes on as before. If there are any you copy the rows out, scan them and create the required partitions and subpartitions based on (eg.) the merchant code, then reinsert them. A solid naming convention for the partitions and subpartitions would be practically essential.

For the query optimization I wonder whether there is a problem there also. Just working through the combinations of whether partition and subpartition pruning occur or not, it seems that you don't lose anything much by having the subpartitions, unless there is specifically a problem with having different lists in different partitions. If there is, then I wonder whether it would then be helpful for the ETL process to create list subpartitions for new merchant codes back in every historical partition regardless of that values existence at that time.

On the other hand, I've only been awake for about 30 mins this morning, so this isn't what you'd call a fully thought-out theory.

What do you think, Pete?

 
At 10:54 AM, Anonymous Janu P said...

Pete,

You are correct. The currency sub-partition will not help much.
Thanks.
What I meant by merchant code was MCC. http://www.usda.gov/procurement/card/card_x/mcc.pdf

 
At 1:56 AM, Blogger Pete_S said...

Janu - Ah, I see: your list of merchants could work, the data may skew a bit so watch out when global stats get used

David - interesting. I have not done much with range/list composite partitioning (and never with variable numbers of sub partitions) Perhaps an experiment is called for

 
At 7:15 AM, Blogger David Aldridge said...

Ah, experiments are always good. Range/List seems to be pretty promising. We've been using single-column-range/list for a while now, and are starting to dabble in multicolumn range/list.

Where you have ...

Partition by range
(col1,col2)
Subpartition By List
(col3)

... you get partition pruning on where you have predicates on (col1) or on (col1 and col2), and subpartition pruning with a predicate on (col3) independently of the partition pruning. So if you have no predicates oncol1 or col2, you still get pruning at the subpartition level with a predicate on col3.

Having said that, I have no time to follow my natural inclination to put together a demo script, which won't please the other scriptologists ... so maybe I'm all wrong there :D

 

Post a Comment

Links to this post:

Create a Link

<< Home