The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Friday, April 22, 2005

A DBMS_STATS Oddity

As part of our load procedure we have various packaged Oracle procedures called by Informatica, in order to perform such tasks as:

Analyzing objects
Moving tables
Rebuilding indexes

When a patch from 9.2.0.5 to 9.2.0.6 was installed, something deep within the bowels of the package went *twang* and the jobs that run the analyze statements started to fail. Or in other words, stopped to work. Or ... never mind. It broke.

The problem was a Method_Opt clause of "FOR TABLE". What that? Well, it's not in the documentation, but in 9.2.0.5 and 10.1.0.2 it appears to be equivalent to "FOR ALL COLUMNS SIZE AUTO". I'm not sure what we're doing using it, but I'm sure that it seemed like a good idea at the time. Anyway, it's ouda here, and while I'm sniffing around the code I'm going to be running some 10046 traces to see what's really going on with DBMS_STATS and some of it's options.

For one thing I want to get rid of collecting histograms for SKEWONLY columns, and just go with all of them. Small-scale tests suggest that the optimizer gets much better statistics to work with when all indexed columns on the fact table are analyzed, and I suspect that the amount of work involved in detecting column value skew is not much different that that involved in collecting the statistics themselves.

Another issue is parallelism. Since a parallelised full table scan (for reasonably large tables anyway) uses direct i/o, there is no caching of the table's blocks. Fine for tables large enough to flood the block buffers, because physical reads would just keep occuring anyway, but where a table is around three-quarters the size of the block buffers I'm going to experiment with non-parallel statistics gathering, and hopefully this will reduce physical i/o during "the gathering".

Actually I've been doing a smiliar thing already, by setting parallelism on tables based on what proportion of the block buffers their total size, or in some cases their average partition size, represents. The tweaking of the DBMS_STATS parallelism just represents a modification of that principle to reflect the multiple back-to-back reads of the analyzed segment. Maybe that's another posting though.

Anyhoo, more later.

4 Comments:

At 8:27 AM, Niall said...: Hi David

If you want to collect histograms on all columns that are indexed then the method_opt you want is

'FOR ALL INDEXED COLUMNS SIZE n'

If you are going to do this I see no point in setting n to anything other than 254 (the max), as if Oracle needs less than 254 buckets for all the distinct values, it will only collect the number of buckets it needs. i.e. think of the size as specifying the maximum number of buckets to use.

There is a pretty strong argument against this though, that I think Wolfgang Breitling has best articulated in the past, which goes something like this.

* Not all indexed columns need histograms
* Some unindexed columns need histograms - so that cardinality estimates are right

Cheers

Niall
At 8:54 AM, David Aldridge said...: Niall,

Yup, I'm switching over to that 'FOR ALL INDEXED COLUMNS SIZE 254'. I disagree with the argument against it, though.

Firstly, it's a pretty easy demonstration that estimates of cardinality improve even with completely evenly distributed data values.

Secondly, and I'll grant that this applies only to a subset of applications including my own, we have indexed every column on which a predicate can be placed -- just about the only unindexed columns are metrics (which in our case are rarely predicated on) and metadata columns (such as those that tell us what version of the ETL procedures loaded that row of data).

I'll post a demo in another ... um ... post.

Thanks Niall.
At 9:05 AM, Pete Scott said...: Finding the right stats for DWH performance balanced against time to collect is hard. And testing what you have is right is tricky. I am extremely unconvinced that index stats on bitmap indexes are worth a great deal compared with the cost of collecting. Likewise, how useful are stats on column skew if you have bitmap indexes? Ideas please?

If you play with alternative stat gathering you can always save the original stats away with the DBMS_STATS package try your new approach and swap back quickly when the users complain ;)
At 9:20 AM, David Aldridge said...: Yes, nice observations Pete.

My instinct is that stats on bitmap indexes are worthwhile, but given that they have a very low cost of access (I'm guessing that that is what you were basing this on) you may well be right there -- that they are much less significant than on b-tree indexes.

The phrase "clustering factor!" just jumped into my head then. Do I recall correctly that clustering factor on bitmap indexes is, or has been in some version of Oracle, a problem?

Oracle Sponge -- Now Moved To Wordpress

Friday, April 22, 2005

A DBMS_STATS Oddity

4 Comments:

About Me

Previous Posts