The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Saturday, April 30, 2005

EXCEPTIONS INTO

Mike Ault ran foul of a misunderstanding over the functionality of the EXCEPTIONS INTO claues of the constraint validation SQL command, and writes about it on his blog (skip the stuff about the generation of the test data -- not valid to this issue).

The documented functionality of this clause is that it inserts into an exceptions table a record of all the rows in a table which prevent a constraint from being validated. The example here is of a unique constraint, but most of the same principles apply to other constraint types.

The problem that Mike has is that he is attempting to use this clause to identify the rows in a table that must be deleted in order to allow a unique constraint to be validated. Fortunately or unfortunately -- and I shall later explain why I believe this to be fortunate -- the clause will log all occurances of values which are duplicated in the source table -- it wil not exclude rows which represent the original occurance of the value that is duplicated.

So why do i believe this to be fortunate? Because I believe that it is the only rational way that the functionality could possibly work.

I don't think that I'm misrepresenting Mike when I say that his desire that the clause should not log all of the violations is based on the notion that there is an "ordinal value" involved -- that there is a single row of which the others are duplicates. Now this is true in the case of a constraint violation caused by the addition of a duplicate value in a table wherein the constraint is already enabled -- the ordinal row would be the one that has already passed the validation.

This is not the case here though. To the database all of the existing, non-validated rows are equivalent and the code has no insight into which of the multiple rows that contain a duplicated value represents the ordinal value in the eyes of the user. And nor should it do so, because for the code to make an assumption concerning which of the rows was the first to be inserted (in the absence of any application knowledge concerning sequence numbers or timestamps or whatever) would represent a violation of one of Codd's rules -- I forget which one. That would be a pretty serious issue, IMHO.

Now suppose that the EXCEPTIONS INTO clause did work as Mike desired. Then I would submit that it is of almost no use whatsoever, because even if the rationale by which some dupes were not tagged was fully documented then if it did not agree with your own deduplicating intentions then it is of no value to you -- you'd still have to go and hand-code your own methodology.

I also believe that the faults that Mike believes this functionality has are exaggerated by the case at hand. If you took a one million row table out of which EXCEPTIONS INTO logged 100 as ... erm ... exceptions then that would probably appear to be reasonable -- after all to identify the 99, or 48, or 86 rows out of one hundred that must be deleted from a one million row table to allow the constraint to be validated does not sound unreasonable at all, and some value has been added by use of the clause. In this case though, having run this functionality Mike is literally no closer to resolving the issue than he was beforehand (other than having learned something about the number of rows that are duplicates of another, of course). Even in the 100-out-of-1,000,000 example you still have to make your own choice and write your own code to deduplicate the original data set, so there is no difference in the amount of work to be done, just in the perceived value of the EXCEPTIONS INTO clause in the two cases.

There is a point on which I do agree, however, where Mike quotes the documentation ... “All rows that violate a constraint must be either updated or deleted from the table containing the constraint.” and adds his own comment "Which is incorrect". Well it is actually only partially incorrect. It is correct for what I would term "intra-row" constraints such as NOT NULL, check, or foreign keys which are based on the values of a single row in the table. For "inter-row" constraints such as unique and primary keys it is not correct, and follwoing as it does an example of an intra-row violation it is poorly phrased. It would probably behoove Mike to submit this as a documentation bug.

Other than that issue, which I'd suggest is a minor one, the documentation of the clause seems to accurately reflect it's functionality, and its functionality seems to be in line with the only possible way in which it could make sense.

Comments welcome, as always.

Edit: TK has his own thoughts in the comments on Mike's blog and also on his own blog here.
Edit: Dumb spelling errors corrected -- thanks PN

Friday, April 29, 2005

Gloves Off at AskTom: KO Delivered

I thought that I'd drop a challenge on TK's doorstep to see if I could distract him further from his authoring responsibilities, and a question leapt unbidden into my mind: How can parallel query ensure consistency when it can't detect dirty blocks in the buffer? God 'un, eh? See parallel query reads the data files directly, so how does it know whether there are any dirty blocks in the buffer or not?

Not only did this strike me as a very deep and weighty point directly relevant to some issues I've been grappling with recently, but I couldn't think of a way of working out what the mechanism might be either.

Well these technical issues are often like good cryptic crossword questions: you can't work out the answer for the life of you but once you hear the answer it all sounds very obvious.

Read the thread starting somewhere around here.

I'll be back, Mr Kyte ... oh yeah, I'll be back ...

Oak Tables

I just noticed that on the back of my copy of "Oracle Insights: Tales of the Oak Table" is the instruction "Shelve in Oracle". To avoid it being shelved along with "Woodworking For Dummies" I suppose.

Some Thoughts on Oracle's Data Dictionary

... prompted by a slightly unrelated question, but leading in a nice direction I thought.

Edit: Oh, if anyone was confused then the link was reached by clicking on the post title. But here it is anyway http://dba.ipbhost.com/index.php?showtopic=1604

Thursday, April 28, 2005

Writing Good SQL

This is prompted by a question asked by eastmael at the Oracle DBA Forum -- "what I'd like to accomplish are optimized SQL statements ... do you guys have any tips, links, or resources that could help me with my study of efficient/optimized SQL statements?" The topic deserves a little more then the brief forum answer, so here goes for a more considered approach.

This is a fundamental question that ought to be asked more often, and it's to eastmael's credit that he (just guessin' there) is asking now when just starting out on the topic. We see an enormous number of forum postings that ask, "Why is this SQL performing badly?", "How do I get this to ... ?", "Is there a faster way of ...?" and very few that address the basic educational issues. So instead of handing out fish I'll see if I can teach/suggest how to catch your own.

I think that there are at least five elements to consider. In no particular order:

Understanding what you are trying to do
Knowledge of SQL syntax
Appreciation of the data and the structure
Optimization
Measurement of performance

Which is to say that if you are going to write top-notch efficient SQL then you must do the following:

Understand what question you are asking of the database
Know how to write multiple SQL statements to ask that question
Know about the data and the structures being queried
Understand how the optimizer works with different statements in the context of the data and structures present
Know how to read and measure how the statement is optimized and executed.

Piece of cake, once it's broken down like that.

1: Understand what question you are asking of the database

Most SQL statements in an application serve some aim that can be expressed in business terms. Or in other words, what question does the SQL purport to answer? "Get a list of all customers who were not billed last month" ... "Work out what the total amount is to be billed"... "Does this customer have multiple shipping addresses?" ... "Fire all the Capricorns".

Phrases like "Work out ...", "Get ...", "Change ..." sound to some people distressingly informal and unscientific, but I like 'em. They help to keep our minds in the real world, in which we are solving real business problems. It is unlikely that your code is going to be submitted to the Nobel Prize committee, distressing and unfair as that sounds, so imagine that someone has asked you "So, what does this bit of code do then?" and just write down the answer. and when I say "write down the answer" I mean "write down the answer in a comment block immediately above where you are going to write the code".

So aside from reminding us what the code is going to do, what else does it help with?

After you've gained some experience working with the data you'll find that the statement will give you some instinctive performance clues. "Let me see now, 'Get the employee's length of service'. Two seconds? What the... ?!? Two seconds is way too long for that!" Do everyone a favour and keep that monologue internal, though, thanks. I might be sitting at the next cube to you one day, and I wouldn't want to have to smack you or nuttin'.

Alse at some point in the future the application or the data may (actually I mean "will") change. Maybe the performance will suddenly go pear-shaped and people will be crawling over the code trying to find out what the problem is. If you followed the philosophy on commenting the code that I expressed here then this statement is going to help future maintainers of the system to keep your code efficient and correct.

2: Know how to write multiple SQL statements to ask that question

Of all of the suggestion that I'm making here, the one that probably requires the least amount of hard work is the knowledge of the SQL syntax because we can start by just reading the Oracle SQL Reference (see link in side-panel). There are a number of sections to this document but since nearly all of them are of interest you might as well read the whole lot.

You cannot write good SQL unless you understand the data types you are working with. The most misunderstood and abused type appears from my own observations to be the DATE. Sometimes people seem to have an almost instinctive fear of using the DATE type, possibly assuming that there is an enormous overhead associated with it (either in terms of storage or in terms of processing time) and they start storing dates and times in character or number columns, or splitting out times from the date and using multiple columns. The DATE datatype is your friend -- if you think that you can work out the number of days between two arbitrary dates faster than the built-in Oracle functions then you're wrong, and in any case don't forget that the built-in functions are tested on your behalf billions of times a day, by users on multiple platforms, versions, and configurations.

So read all about data types, and pay attention to how you convert between them. Understand what precision and scale mean on a numeric type, and how they affect storage. Understand the difference between VarChar2 and Char. Never rely on implicit datatype conversion, especially between date types and character types -- always code the conversion yourself.*3

Practice both forms of the CASE expression.

Play around with all the forms of condition. I've met people with some years of experience who didn't know that just as you can use a BETWEEN condition, you can also use a NOT BETWEEN condition.WTF, dude? It's there in black and white.

Some conditions do pretty much the same thing in slightly different ways. There are no conditions that are inherently bad to use, just bad applications of them in particular circumstances, and you'll learn those by reading and measuring (later on I'll talk about measuring).

There are around 170+ built-in SQL functions in Oracle 9i. You don't have to learn them all, but you do need to know which ones exist. If you think that a particular built-in function ought to exist then it probably already does. If it doesn't already exist then you can probably combine and nest a few functions to do what you want. String manipulations like InStr() and SubStr() seem to be overlooked by a lot of beginners who want to extract the next contiguous block of characters after the second occurance of a space, for example. Translate() is great for pattern checking to see if a string is in SSN format 999-99-9999 for example. You do not understand any of these functions until you have solved a real problem with them.

Work to understand the aggregate and analytic functions -- aggregates are extremely common and analytics are extraordinarily powerful. Look at Tom Kyte's website for many, many examples of problems solved with the application of analytic functions -- hint: search for the phrase "analytics rock" ;)

The whole of the section on SQL Queries and Subqueries is essential reading.

It is a fundamental property of relational databases that the result of a query can be used as the input for another query -- not only fundamental in terms of syntax but fundamental in terms of performance. Read about queries and subqueries, and in-line views -- the Oracle query optimizer is very good at using them. Subquery factoring is a powerful technique in the right circumstances, but until you've played around with it a while you won't get any instinctive grasp of when it can be used.

Many people don't seem to have heard of hierarchical queries, which use the CONNECT BY clause. This is used for querying an hierarchical data structure such as the decomposition of manufactured items into their component parts, or for manager-employee relationships. If you've read this section in the documentation then you already have an advantage over them.

The set operators of UNION [ALL], MINUS and INTERSECT are often strangely neglected. I like them -- a query that uses them is often much easier to read than other techniques for the same result. Understand the difference between UNION and UNION ALL -- by default use UNION ALL unless you need to use UNION. Many people seem to get this the wrong way round.

When people talk about performance they are othen talking about SELECT statements, which tend to be more complex and more common than DELETE's, INSERT's and UPDATE's. Focus on SELECT first to get a grip on the fundamentals. Performance problems in SQL statements usually come from two factors:

Finding rows
Changing rows

Selects need to find rows, inserts need to change rows (well, sort of), and deletes and updates need to do both. There is more "change overhead" in updates than in deletes or inserts. These are fundamental differences that you must bear in mind in order to anticipate potential sources of performance problems.

Now that you understand the syntax you ought to understand more about what Oracle does with it. There is no section of the Concepts Guide that you can afford to completely ignore, but you can start with the section on Schema Objects, then brush up on your knowledge of Datatypes, then read through the section on SQL Overview (in particular how SQL is executed).

3: Know about the data and the structures being queried

If you're going to write SQL then obviously you're going to need to know at least the bare minimum about the tables and columns which you are addressing, but it's important to note that the names of the tables and the names and dataypes of the columns are absolutely the bare minimum, and like most bare minima they are only barely sufficient to make your work barely functional.

There is so much more information available to you that at first pass it threatens to boggle the mind. In the Concepts Guide you can read about some significant differences between different table types, for example -- the default heap table, the index-organized table, the cluster (both index and hash based) to name a few. You may not be responsible within your organization for deciding which of these is appropriate, but if you appreciate the strengths and weaknesses of each then you can not only leverage them where they are found in your schema but you can also advise others on any benefits and detriments in their use and non-use in your environment. You will be a hero.

Two quick things to add in there. Firstly, wherever there is choice between two options, whether in table type or in SQL structure or in where you buy your groceries there are always strengths and weaknesses to each option. If one of the options was 100% strengths and 0% weaknesses then the other options would not exist (OK, except maybe for backwards compatibility purposes). Secondly, whenever you accept a default you are making a decision, conscious or unconscious, to reject the alternatives. If your schema is 100% vanilla heap tables then you have made a decision to reject hash clusters and partitioning and index-organized tables etc. -- whether you realise it or not, and whether you act through ignorance or through informed choice, is determined by how much effort you have put into your own search for knowledge. So you'd better try to make that choice a conscious one, right?

Moving along ...

So to write efficient SQL you have to be very aware of the structures that you are using. Not only the table type, but what indexes are present, and what type of index they are. You have to be aware of the nature of the data. Not just the datatype but how many distinct values are there likely to be? Are common values going to be clustered together or randomly distributed throughout the table? Learn about index clustering factors -- it is not the percentage of rows to be retrieved that determines whether an index is likely to be used but the percentage of blocks, and the clustering factor is used by the optimizer to help estimate this. Howard Rogers' Dizwell web site has an excellent explanation.

Learn how to query the data dictionary -- the Database Reference lists all of the system views that you need in order to find out what table, columns, indexes, constraints, source code, materialized view ... in short what objects are stored in the database, and a skill in querying the data dictionary views is priceless. If someone suggests that you use a GUI tool then see if it will tell you all the tables that have a column with a particular name, or will list all the tables that have no primary key, or which have no foreign keys against them from other tables. It probably won't do it. But you will be able to if you know how to query these views. Unfortunately you'll then have to be the go-to guy/girl for such oddball requests -- it's the price of competence in any field of endeavor so you'll just have to suffer with that. Also, ask for a pay-rise.

Work to understand the business and it's data. Understand how existing applications use the data. Draw sketches of how the tables relate to each other.

4. Understand how the optimizer works with different statements in the context of the data and structures present

A long title for a challenging subject.

Oracle provides (for free!) a very long document on the subject of Performance Tuning. You cannot write efficient SQL unless you understand the optimizer, and unless you are forced to use the Rule Based Optimizer (RBO) then you should be using the Cost Based Optimizer (CBO).

Understanding the functioning of the CBO is no trivial matter, but that doesn't excuse a merely superficial knowledge of it. It's internal workings are very well documented both in the Oracle documentation and extensively on the internet, and there is no-one out there writing efficient SQL who does not know how it works. The websites of Jonathon Lewis and Tom Kyte are also choc-a-bloc with information.

Learn about the choices that the CBO can make -- what order to join the tables in, how to access the tables, what kind of join to use, how the joins work and what their advantages and disadvantages are. Learn why full table scans are Not Always A Bad Thing, and why index-based access is Not Always A Good Thing.

Why is a hash join beneficial in joining large data sets? Why does it only work with equality joins? Does indexing help with it?

Learn how the optimizer uses indexes, and how different types of indexes relate to NULL values.

Read all about how the optimizer uses statistics. Learn how it is affected by initiation parameters such as cpu_count and optimizer_index_cost_adj.

The CBO is based on statistics -- statistics about tables, statistics about indexes, statistics about the distribution of values within a column. Learn where the statistics are sorted, how they are calculated, and how they are used.

Learn how to manipulate the decisions made by the CBO through the use of optimizer hints, and then try not to use them in your code anyway. If you think you need them then look for other explanations for the problem you are trying to solve -- inadequate statistics or missing constraints, for example.

Perhaps more than any other element of the Oracle RDBMS the cost based optimizer is constantly changing. It gets new access methods, new initiation parameters, new ways of accessing data that have even been patented. Stay alert to changes between versions and between patch levels.

You will never know everything about the optimizer,
because there is so much to learn and because it keeps evolving.

But try anyway.

5: Know how to read and measure how the statement is optimized and executed

Question: "My query runs too slowly. How can I improve it's performance?"
Answer: "By understanding and measuring it's performance"

Oracle now has very extensive internal mechanisms for documenting and measuring just about everything to do with SQL execution, and it is all at your fingertips.

Probably the simplest method is to type into SQL*Plus the command "set timing on", and you'll get wall clock timings of how long your statement took to execute.

In the Database Performance Tuning Guide and Reference there is a section which will tell you how to invoke and read an "Explain Plan" through SQL*Plus -- this will show you how the optimizer expects to get the result you have asked for. Remember that SQL is a Fourth Generation Language (4GL) in which you describe the result set, not how to get it. The explain plan tells you how the optimizer is going to tell the database instance to get the result, and the documentation tells you exactly how to read the cryptic-at-first-glance output.

When you understand the explain plan then learn about SQL Trace, in which details of the actual execution of the SQL is written to a server trace file. The execution plan here may be different to the one you saw through SQL*Plus, so understand why that is.

I don't recall ever using the methods documented in the Database Performance Tuning Guide and Reference -- the command line interface, the initialization parameters, and the PL/SQL methods. I jump straight to using "alter session set sql_trace = true;" or the 10046 event invoked with syntax such as "alter session set events '10046 trace name context forever, level 4;". Search for information about these on Tom Kyte's website, on Metalink, on Oracle forums, and wherever you can find it. They give you access to the Oracle "wait interface" that will tell you exactly what is making your SQL take as long to execute as it does, and with experience you will be able to translate into optimization problems or structural problems -- wrong/missing indexes for example, or an over-enthusiasm for them.

Final Thoughts

You are going to be reading a lot of opinions in your pursuit of SQL excellence, so here is my opinion on finding good information.
Add comments to your code.
When you screw up, admit it.
Test every assumption.
Benchmark every feasible alternative.
Don't be afraid to ask questions.
Never stop learning.
Erm ...
That's it. Thanks for reading.

Acknowledgements

Thanks to eastmael for asking the question that prompted the original article.

Very many thanks to rbaraer and pete_s for suggestions on material, corrections, and encouragement along the way. I probably would never have finished without their feedback.

Further comments and contributions of any kind are always very welcome.

Wednesday, April 27, 2005

Blog "Feed" "Syndication"

Thomas from COS (is that Colorado Springs?) asks if I can publish the blog in a feed.

By using a combination of dictionary and documentation I understand that there is some type of feed going on from http://oraclesponge.blogspot.com/atom.xml but that it might not be as compatible as it might be. I've signed up for a hopefully-money-and-spam-free service that "syndicates" the "feed" through a "website" apparantly ... http://feeds.feedburner.com/OracleSponge

Let me know how that works out for you feed-reading technoliterati.

Comments (the type you get in code, that is)

"Supposed" to get, that is. What is it with SQL and comments? Why do we so rarely see explanatory comments in SQL code? Or even a decent structure?

One of the features of front-end tools like Business Objects or Microstrategy is their ability to allow the designer to create extraordinarily long and complex looking SQL with ease. (If you don't use such tools then don't go away -- I'll deal with you in a moment). It starts by having an object or two that are pretty simple, then you add them together ... then you divide by another two objects added together, but then you need a decode on the divisor to change zero's to null's so that the SQL doesn't error out, and pretty soon you've got an expression with a length of 1000 characters that is unreadable to the human eye.

Of course it's not just extremely long expressions that are tricky ... sometimes it's to look at a piece of SQL even to find out which simple expression relates to which object in the universe/project/whatever. So do me a favour and put a comment block in the definitions ... something like ...

Sum(order_amt)/Decode(Sum(order_qty),0,Null,Sum(order_qty)) /* "Average Unit Cost" */

... and it'll appear in the SQL itself. Not too tricky, eh?

You know, I do have a method for doing this automatically by hacking a Business Objects repository with an update statement, but if I told you how to do that and you fkd it up then you'd probably sue me or something.

By the way, the same thing applies to join definitions.

Moving along ... if you're hand-coding SQL statement then you have even less excuse for not commenting your SQL. If you follow the principle that it is better to do things in SQL than PL/SQL then you're probably packing a fair amount of functionality into each statement, and that means ...

Write a header to say what the hell the SQL is supposed to do ... doesn't have to be in the SQL itself, just above it will do nicely. In fact, work out this statement even before you write the SQL itself and you'll keep a more clear mind when you are writing the code. Include in that header how it is going to do it -- remember that complex SQL packs a huge amount of functionality into a small amount of text -- it's much more "logically dense" than regular 3GL code. TK has an example of this here, and thanks for the suggestion to include this, Tom.
Maybe you might even like to put in a comment about performance, eh? "Runs in about 0.2 seconds" ... "Uses a lot of hash joins" ... etc
Put a statement inside the SQL to say where in the code it is located (package/procedure). You wouldn't inflict a twenty-page procedure on anyone so it'll be easy to find.
You might even like to put in a unique identifier for the SQL as well as it's location (for example, so that it's easy to reference the copy of a "good" explain plan that you've kept for that query, in case of problems down the road).
Format the code nicely
NEW ITEM: if you are referencing any view more complex than "SELECT blah from whatever", or if any of the tables are accessed remotely through a db link that is obscured by a synonym, or is actually the query name for a subquery factoring clause, then a comment would be appropriate. And when I say "appropriate" I mean just do it.
Clearly seperate the joins from the filters
Any tricky looking expressions, state what they do. Especially analytic functions and subquery factors ("what they?")
Document anything that looks funky (if you are using "... and rownum > 1" in an in-line view to stop the optimizer from pushing predicates then say so ... not everyone is as smart as you are).
(Except Tom Kyte ("who he?"), who tells me that he's perfect. My opinion: man, has he aged since his photo in "Expert One On One". Am I right, ladies?).
Erm ...
For the love of God, use table aliases. Friendly ones, not a, b, and c.
Just use your common sense. People have to support that code
That's it

Back to work, fellas.

Tuesday, April 26, 2005

Hooray for the Oracle Documentation

A prime example of the value of the docs, right here.

Monday, April 25, 2005

Thread about block sizes

An interesting question from sejasp on blocks sizes ... http://dba.ipbhost.com/index.php?showtopic=1512

A five-times improvement in i/o performance, just by changing a block size? It's the sort of thing that ought to be making the hairs on the back of your neck rise.

UPDATE: April 23rd

Well the thread has morphed into the issue of whether it is appropriate/rude/presumptious of a person to ask questions of the original poster in a forum thread.

My opinion, which I think I state pretty clearly there, is based on the following:

A person has asked a question in a public forum, requesting advice.
As their question stands, it offers the possibility of a number of different answers, and some of the answers may be inappropriate.
Erm ... that's it.

The most appropriate answer for the situation may be, "I do not think that any action is required here ... what you perceive as a problem does not in my opinion justify any corrective action." You know that's advice that I've given quite frequently. Here's some examples of questions to which the answer is "Do Nothing"

My data warehouse application is not using bind variables ... what should I do?
I have a number of indexes with more than 120 extents ... how should I reduce this?
My buffer cache hit ratio has dropped below 99% to 97% ... help!

So under these circumstances, can anyone tell me what is wrong with probing for more information? Especially when you are giving free advice.

UPDATE: 24th April

Thanks for the nice thoughts there, fellas. Much appreciated.

UPDATE: 25th April

Happy ending -- see, I knew sejasp was a good bloke on account of being an Aussie. I have about nine great relatives over there (mostly poaching, but a little light larceny also).

Hey, just jokes fellas. Happy ANZAC day. (Is that appropriate? Is ANZAC day potentially a happy thing? comments welcome, as always)

If ...

... the weather forecast says "50 degrees with chance of afternoon thunderstorms", and you find it snowing at 3pm, then you probably live in Colorado Springs

Saturday, April 23, 2005

BS ALERT: "Temporary Tablespaces Like Large Blocks"

The myth is still alive ...

http://dba.ipbhost.com/index.php?showtopic=1512&st=15

"To answer your questions, in general, DSS and OLAP databases (those characterized with lot's of full scans) might see a reduction in consistent gets with larger blocksizes. I have some client's use a db_cache_size=32k so that TEMP gets a large blocksize, and then define smaller buffers to hold tables that experience random small-row fetches."

If the clients are using a default block size of 32kb to do that, Don, I hope it wasn't based on your advice, because the blocks size of a temporary tablespace is irrelevant to the size of i/o it uses, because that is governed by a hidden parameter. The is no logical i/o involved, it is 100% physical.

Better get your excuses lined up for when they find out that this is a Big Oracle Myth, Don. Not only are you giving bad advice to people for free on your forum, you're actually charging your clients for advice that will make their database more complex to create, more difficult to manage, probably more prone to bugs, and with no performance advantages whatsoever.

Mike Ault has excuses ready, by the way ... he's just standing by a friend. http://www.blogger.com/comment.g?blogID=11462313&postID=111369118788103463

Comments welcome. As always.

Friday, April 22, 2005

Update on Histograms

OK, that was a bad example. Here's a new approach.

When the Oracle documentation talks about skewed values they are referring to particular values in a column occuring with significantly different frequency than other values. A classical example of this is 95% "M"and 5% "F".

However, let me propose that what this misses is the consideration of situations like the following: 95% A, 5% B, 0% C, 0% D ... ie. there are values that do not occur in the column but which the optimizer might believe do occur -- or might occur, at any rate.

Let's take a slightly more "real world" example. Let's say that we have a wholesale transactions table that is range partitioned on "Date Of Transaction" at a monthly level, so that there are approximately 30 days to the month. Now the key to this example is that no sales occur on Saturday or Sunday, so in fact there are around twenty unique values to each partition. Each value in our example has the same number of rows, and the distinct values have a pretty even distribution.

Now even if the data does not possess the classical type of skew, there is still a type in there that is hidden from the optimizer, because Saturday and Sunday values of data have 0% representation in the table. The key is to let the optimizer know about these gaps that it is not aware of from High-Low-N column statistics.

Example script:

SQL> drop table demo_hist;

Table dropped.

SQL>
SQL> create table demo_hist
2 as
3 select my_date,
4 num
5 From (
6 Select To_Date('01-aug-2005')+
7 Mod(rownum,
8 Add_months('01-Aug-2005',1)-To_Date('01-Aug-2005'))
9 my_date,
10 rownum num
11 From dual
12 Connect By 1=1 and level <= 100000 13 ) 14 Where to_char(my_date,'D') not in ('1','7'); Table created. SQL>
SQL> commit;

Commit complete.

SQL>
SQL> set autotrace on
SQL>
SQL> select count(*) from demo_hist
2 where my_date = to_date('01-aug-2005');

COUNT(*)
----------
3225

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=44 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DEMO_HIST' (TABLE) (Cost=44 Card
=2943 Bytes=26487)

Statistics
----------------------------------------------------------
48 recursive calls
0 db block gets
251 consistent gets
184 physical reads
0 redo size
394 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from demo_hist
2 where my_date between to_date('06-aug-2005') and to_date('09-aug-2005');

COUNT(*)
----------
6452

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=44 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DEMO_HIST' (TABLE) (Cost=44 Card
=5830 Bytes=52470)

Statistics
----------------------------------------------------------
6 recursive calls
0 db block gets
247 consistent gets
0 physical reads
0 redo size
394 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL>
SQL> begin
2 dbms_stats.gather_table_stats
3 (
4 ownname => user,
5 tabname => 'DEMO_HIST',
6 estimate_percent => 100,
7 method_opt => 'for columns NUM size 254'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from demo_hist
2 where my_date = to_date('01-aug-2005');

COUNT(*)
----------
3225

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DEMO_HIST' (TABLE) (Cost=45 Card
=742 Bytes=6678)

Statistics
----------------------------------------------------------
42 recursive calls
0 db block gets
192 consistent gets
0 physical reads
0 redo size
394 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from demo_hist
2 where my_date between to_date('06-aug-2005') and to_date('09-aug-2005');

COUNT(*)
----------
6452

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DEMO_HIST' (TABLE) (Cost=45 Card
=185 Bytes=1665)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
188 consistent gets
0 physical reads
0 redo size
394 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> begin
2 dbms_stats.gather_table_stats
3 (
4 ownname => user,
5 tabname => 'DEMO_HIST',
6 estimate_percent => 100,
7 method_opt => 'for columns MY_DATE size 254'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from demo_hist
2 where my_date = to_date('01-aug-2005');

COUNT(*)
----------
3225

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DEMO_HIST' (TABLE) (Cost=45 Card
=3225 Bytes=25800)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
188 consistent gets
0 physical reads
0 redo size
394 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL>
SQL> select count(*) from demo_hist
2 where my_date between to_date('06-aug-2005') and to_date('09-aug-2005');

COUNT(*)
----------
6452

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DEMO_HIST' (TABLE) (Cost=45 Card
=6452 Bytes=51616)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
188 consistent gets
0 physical reads
0 redo size
394 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Summary ...

Predicate = Between
------------------------ ------ --------
Correct Answer 3,225 6,452
No Stats Estimate 2,943 5,830
Col Stats Estimate 742 185
Histogram Stats Estimate 3,225 6,452

Obviously there's more to this ... the histogram-based estimate was so good because the number of distinct values was low, and you'd expect the quality of the estimate to degenerate as the number of distinct values increases.

Thoughts?

Using Histograms on Evenly Distributed Values

A quick demo ... here's a script ...

drop table demo_hist
/

create table demo_hist
as
select floor(rownum/5) rn,do.* from dba_objects do
/

begin
dbms_stats.gather_table_stats
(
ownname => user,
tabname => 'DEMO_HIST',
estimate_percent => 100,
method_opt => 'for columns OWNER size 254'
);
end;
/

select distinct column_name from user_tab_histograms
where table_name = 'DEMO_HIST'
/

Select min(rn),
max(rn),
count(distinct rn),
count(*)
from demo_hist
/

set autotrace on

select count(*) from demo_hist
where rn = 5
/

select count(*) from demo_hist
where rn between 1 and 100
/

begin
dbms_stats.gather_table_stats
(
ownname => user,
tabname => 'DEMO_HIST',
estimate_percent => 100,
method_opt => 'for columns RN size 254'
);
end;
/

select count(*) from demo_hist
where rn = 5
/

select count(*) from demo_hist
where rn between 1 and 100
/

When I run this I get a demonstration that with a histogram on the RN column the optimizer gets a much better cardinality estimate for the two Count(*)'s on RN = 5 and RN between 1 and 100.

But I'm not going to show you by how much because you need to go and run this script yourself ;)

Whether the results are significant in terms of getting a good execution plan, well this test doesn't demonstrate that. But I believe that it does. More of that damned instinct.

OK, if someone begs me then I'll post my results. But it'll have to be a real grovelling.

Thoughts on Cost Based Optimization

Just a brief note.

In general terms there are three elements to getting a good execution plan out of Oracle's Cost Based Optimizer (CBO).

The first of these is to give Oracle sufficient information about the data - "metadata". This means, for example:

Using constraints where possible to indicate uniqueness, nullability, foreign key relations
Using statistics to indicate data volumes and distributions

The second of these is to give Oracle sufficient information about the system on which the instance is running. This means, for example:

Correct values for initiation parameters such as db_file_multiblock_read_count, cpu_count.
Before 9i, appropriate values for optimizer_index_cost_adj
From 9i upwards, use of DBMS_Stats.Gather_System_Stats() to benchmark the hardware.

The third of these is to write SQL that the optimizer can work with. I have no examples of this, but I have an instinctive feeling that it is possible to write suboptimal SQL that is "difficult" to optimize. If I think of anything more substantial then I'll update.

So given that you believe that you have a CBO-related problem, I would propose that there are three categories of investigation: Metadata, Configuration, and SQL.

I can't decide whether this is blindingly obvious to everyone or a sudden flash of meaningful insight. Is there a way of conducting a poll in this blog? Just post a comment. "Tautology" or "Genius".

As always, thoughts are welcome, particularly on that last point but on anything herein of course.

A DBMS_STATS Oddity

As part of our load procedure we have various packaged Oracle procedures called by Informatica, in order to perform such tasks as:

Analyzing objects
Moving tables
Rebuilding indexes

When a patch from 9.2.0.5 to 9.2.0.6 was installed, something deep within the bowels of the package went *twang* and the jobs that run the analyze statements started to fail. Or in other words, stopped to work. Or ... never mind. It broke.

The problem was a Method_Opt clause of "FOR TABLE". What that? Well, it's not in the documentation, but in 9.2.0.5 and 10.1.0.2 it appears to be equivalent to "FOR ALL COLUMNS SIZE AUTO". I'm not sure what we're doing using it, but I'm sure that it seemed like a good idea at the time. Anyway, it's ouda here, and while I'm sniffing around the code I'm going to be running some 10046 traces to see what's really going on with DBMS_STATS and some of it's options.

For one thing I want to get rid of collecting histograms for SKEWONLY columns, and just go with all of them. Small-scale tests suggest that the optimizer gets much better statistics to work with when all indexed columns on the fact table are analyzed, and I suspect that the amount of work involved in detecting column value skew is not much different that that involved in collecting the statistics themselves.

Another issue is parallelism. Since a parallelised full table scan (for reasonably large tables anyway) uses direct i/o, there is no caching of the table's blocks. Fine for tables large enough to flood the block buffers, because physical reads would just keep occuring anyway, but where a table is around three-quarters the size of the block buffers I'm going to experiment with non-parallel statistics gathering, and hopefully this will reduce physical i/o during "the gathering".

Actually I've been doing a smiliar thing already, by setting parallelism on tables based on what proportion of the block buffers their total size, or in some cases their average partition size, represents. The tweaking of the DBMS_STATS parallelism just represents a modification of that principle to reflect the multiple back-to-back reads of the analyzed segment. Maybe that's another posting though.

Anyhoo, more later.

Thursday, April 21, 2005

Well, I'm all pissed off ...

... because a lengthy and very interesting :) post I was working on got lost. I'll try again another time. Lesson learned.

Wednesday, April 20, 2005

More on Response Times and User Perception

Niall Litchfield has some interesting comments on my previous posting about user response time in his blog. I think he's spot on about users expecting batch processes and large report to take a long time, and also that in many cases it also doesn't matter too much for them.

I'd add to that the concept that there are probably two different type of reports being run in data warehouses -- or that there ought to be two types, at any rate.

The first would be standard status reports -- show me all of the items that have been on backorder for more than a month, for example. This is usually a straightforward click or two, and a report pops up in a familiar format. Now for these reports I think that Niall's comments directly apply. They are standard "Monday morning" issues that the user is comfortable in dealing with, and that don't demand a great continuity of attention span that could be broken by a two minute execution time.

However the second class of report is investigative in nature. The user has run his "items on backorder" report and had a sudden "WTF!" moment on realising that there are now over two hundred items valued at more than $5,000 a piece that have been on backorder for 32 days, and he has a meeting with his boss in two hours. Now this is when the attention span issue may become more critical. He starts running ad-hoc reports to discover why this is happening, probing into sales, inventory, supplier business areas, and working on his own thesis to explain the source of the problem. While the reports are running he has a little internal dialogue running in which he thinks of different causes and how to extract the data from the system. This is where instant answers become of value, because in the time between report submission and the return of the result the user has now forgotten what the original question was. Or worse still, someone has popped his head over the cube, seen the user staring blankly at the screen (eyes slightly disposes) and asked him if he needs to put in an order for paperclips this week. * POP * There goes the train of thought.

There's a big sort-of-ironic issue here. The standard reports that the designer has built into the system, and which are therefore most amenable to optimization, are those for which the user can afford to wait. It's those stinkin' ad-hoc reports that cause the problem -- fast response required and only the designer's imagination to predict that they'd ever be submitted.

And I've just thought of a third class of report -- the one the user regularly runs for a meeting or status report, and which he knows is meaningless nonsense, and that he therefore never even bothers to read in any more than the most cursory fashion. But that's another issue, I guess.

Instant Messaging for Developers ...

I used to like it -- IM is to email as telephone conversation is to sending a telegram, but this mornings I was having simultaneous conversations about bug and issues in three different software products plus what I'm currently working on at the moment with five people, and my brain is starting to go soft around the edges. Softer, I mean.

It's great to be able to just send code snippets, especially when your client's email system sometimes doesn't spit out outgoing messages until two hours after the sender hit the "go" button, but there's definitely a dark side to the technology. Tut.

Upgrade News

Informatica

The worlds smallest hooray for our upgrade from Informatica 5 to the latest version 7. Apparantly in version 5 when some numeric fields where contaminated with non-numerics, they were just being set to zero. Version 7 correctly rejects these rows, thus revealing a problem with one of our source data feeds. Looks like some of the columns sometimes get shifted over.

Informatica is not-so-good at dealing with such an oddity, although to be fair I suppose that if a file is suypposed to be "fixed width" then the very least that you could expect of it is that it's widths would be in some way "fixed".

I guess that we'll probably apply some kind of "preprocessing" script to the raw data files to detect, correct, and log the problem.

Oracle

I'm a little startled to find that an upgrade to 10g is suddenly a possibility/probability. This brings to mind the saying about being careful what you wish for, 'cos my mental (at the moment) list of things to be be done is stretching out towards the distant horizon.

Tuesday, April 19, 2005

New Favourite

For a long time I've been enjoying the regular doses of user and management madness that ComputerWorld's "Shark Tank" sends me, so I thought I'd share. See the links.

Monday, April 18, 2005

What I did on my holidays

Edit: Taken out the pictures links ... just go here to see what I was up to.

Why TEMP Tablespace != Large Block Size (the second in an occasional series)

Because for temporary tablespaces Oracle uses direct i/o, so block size is not relevant.

Friday, April 15, 2005

"Open Source" Oracle Books

I'm thinking (well, actually planning) of getting a project started to electronically publish a series of manuals/books on Orlacle* subjects along an "open source" framework. So we'd start off with some small-ish basic books on specific subjects (partitioning, installation, SQL tuning etc) and allow further development on them in an open fashion, with full credit given for enhancements.

Any ideas on feasibility/licensing/anything else gratefully received, offers of interest for authoring even more so.

* Tom Kyte (who he?) questions the spelling of this word. I'm going to credit him in this footnote with being a nit-picker and a busy-body, and also with demonstrating the exact principle which I think would make this idea work :)

Back On Burleson!

Don has restored my access at his forum -- see you there, men.

Why Large Segment != Large Block Sizes (necessarily)

Because if a session is going to read a high proportion of the table/index/partition thereof then it ought to be using parallel query, and that bypasses both the buffer cache and the regular read mechanism.

With index-based access this doesn't apply of course, but with index-based access the number of LIO's on the table is not reduced anyway.

How does that sound for succinctness?

Tuesday, April 12, 2005

Response Times: What The User Thinks And Why It Matters

A while ago I started working full time on a system that I hadn't been associated with for a year or two -- a decision support system with a Microstrategy front end. One of the pleasing features about Microstrategy was the logging of report executions, and we extracted historical report performance numbers from the metadata. We found that over the prior months the average report duration had increased from around 25 seconds up to around 100 seconds.

Quelle horreur! (sp? Where's DaPi when you need him?)

There were two main causes for this. Most obviously, at some point in the past few months the automatic collection of table and index statistics had ceased, so there were partitions of fact tables with hundreds of thousands of rows in them that were marked as having zero, and other such anomalies.

Less obviously there were some suboptimal design elements that were causing problems for the optimizer, such as not allowing partition pruning.

Update Apr 14: I just remembered that there was a third reason, and this was a big one. Star transformations were turned off quite a while before due to an optimizer bug that caused reports to fail. Despite upgrading out of the problem, they had not been turned back on.

Sleeves rolled up, we worked through the issues in the odd gaps between "real" work and after three months the average report time was down to 16 seconds. A nice improvement that the client project staff were very pleased with. But while project staff opinions count for a lot, what do the users think?

We informally polled a few high-profile users, and were able to show that for their particular frequently run reports (which were on average larger/more work intensive than the average user's) the improvement was even more dramatic. A factor of 5-10x faster was not uncommon. so what did the users' think? Here's the point of this article. They had not noticed a thing! Even when we were able to say, "You know, that monthly sales report that you like to run used to take 250 seconds to execute just before Christmas, and now it's all done in 25 seconds. How about that?", the response was muted. "Hmmm, yeah maybe you're right. Hadn't really noticed myself, but thanks".

Dispiriting? Maybe. But in retrospect I'm barely surprised, and here is my theory to explain why.

First of all, try staring at a blank spot on your screen, and see how long it is before your attention naturally starts to wander. Remeber, this isn't a staring competition with the hardware, just be natural about it.

I may be some kind of easily distracted gadfly but I reckon that I'm thinking about wallpaper paste, how ball bearings are made, or how much more I'd be enjoying myself out on my bike this afternoon within something like ten seconds. Maybe a shade less if it's just before lunch. So I would be shouting over the cubical wall or brushing specs of dust off the screen or checking my voice mail pretty damned quickly, and you know then it doesn't matter much how long that report takes to run unless the boss is looking over my shoulder waiting for the result to come back.

So here's my theory.

In general users do not distinguish between a report that takes a short time and one that takes a long time, unless the result comes back within their casual attention span of a few seconds

As a corollary of that let me also propose the following.

"If you improve report performance from a minute or so to anything more than about fifteen seconds, then the users will not notice"

Pretty cynical stuff, you may think, but I'd disagree there. I think that it's a general observation that may well hold true for most systems and users. Other opinions are always welcome and will always be credited, even if I disagree.

I recall reading quite a while ago of some scheduling systems for buses and elevators that used research on how long people were happy to wait for before they got irritated by a delay. I recall something like one minute for an elevator and seven minutes for a bus. Don't quote me on that though. The important issue is that people do have these fairly sudden cut-off points at which their attention span or patience will turn for the worst. I don't know what drives this behaviour but I bet that it varies by population and by prior experience. Maybe Russians are better at waiting than Americans, or something. I'm sure that stock brokers are less patient than librarians when it comes to report run times, so these numbers may be variable. ie. your mileage will vary.

Here's another corollary, before anyone thinks that by all this I mean "do not bother tuning your system".

"If you do reduce reporting time as above then it is of value only for the purposes of system resource conservation and of making you look good".

I have some more thoughts on what this actually means for the tuning process itself, and I'll publish that separately because your attention span is now probably exhausted.

Monday, April 11, 2005

Banned By Burleson!

Boy, I thought it would never happen. I thought that by being reasonable and polite (yet irritatingly insistant) I would never get banned by Don Burleson from his Oracle DBA Forum.

I guess the post where I satirically responded to Mike Ault's blog entry with my own version, in which I satirically suggested that DKB used a crystal ball to diagnose Oracle database problems, gave him the excuse that he needed to remove a thorn in his side. Hey, if Mike wasn't blocking comments from his posting then I wouldn't have felt the urge now, would I?

How did all this happen? Well, it's a long and sad story.

Some weeks ago DKB posted an article (or did it start off as a forum posting? I don't recall) in which he suggested that Oracle users were getting excellent performance improvements by moving their indexes to tablespaces with large block sizes. "Sounds interesting!" I thought. "Let's see what sort of improvement we're talking about here!". So I ran some simple SQL*Plus scripts in which I compared CPU usage and wall clock time for both fast full index scans (FFIS) and index range scans (IRS) for indexes of the same size, on the same data, with one index on 8kb block size and one on 16kb block size.

Result: well, nothing to write home about to be honest. Nothing to rave about, certainly. But Don had a user quote attributing "a 20% reduction in I/O" to this technique (what kind of I/O, on what range of objects, over what time period, for what load, we don't know).

So either there was something wrong with the script (could easily be the case) or something was wrong with the advice. Don went for "something wrong with the script" -- in fact he went a stage further and said that such simple SQL*Plus scripts didn't mean anything because they weren't predictive of performance on a "real world system". Now my interpretation of Don's position is that the only things that mean anything in terms of real-world performance are either articles giving generic advice without any qualification on system type or load (eg. his original artical) or a full-blown system benchmarking process on the target system. Rather non-intuitively, intermediate steps between "trust-me-I'm-a-professional" and "full-blown-system-test" are worth less than either of these extremes. Extraordinary -- but also convenient.

So that thread went on and on and on and on. I would ask for any evidence, speculation, or wild guess as to why the script was not valid, and Don (and Mike too, if memory serves) would just ignore it. At one point a forum member (I forget who, but I'll be happy to give credit if someone will remind me update: diogenes tells me it was the pseudonymous ora_dba_guy, and that sounds right to me -- thanks D and ODG) suggested that the 20% performance improvement might be due to a number of other factors, including just the fact that the index was rebuilt as part of the move process. Now, that suddnely made absolute sense to me. Don is a proponent of index rebuilds in some situations, so maybe he'd acknowledge that the rebuild process was responsible for what improvement there was.

Nope. Apparantly not. Or maybe he does agree, but I don't recall a reply. I can't check now because I've been banned, right? And to be honest, Don has established a pattern of editing posts critical of him, so I wouldn't trust the results anyway.

So the interested reader might like to pop over to the forum and check things out, and read what others think over at http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:38264759390157 - maybe let me know why I've been banned as well. Don hasn't let me know, although he has my email address of course. He's getting pretty adept at banning dissenters, editing threads, closing them, deleting them. Does he still have that post in there about how no-one is banned, and there are no filters on the forum? Maybe not.

Maybe he thinks that he is now free to post scripts such as ...

select /*+ index(emp_bitmap_idx) */
count(*)
from
emp, dept
where
emp.deptno = dept.deptno;

... http://www.dba-oracle.com/oracle_tips_bitmapped_indexes.htm in which he believes that the index hint takes as it's argument the name of the index alone. Clang! Either the table name or alias, or the table name/alias and the index name. Hands up all those who think this script was tested? That error just jumped right out at me.

Also, why is it a "common problem when implementing bitmap indexes" that the CBO may chose to use a FTS if your table is small? Anyone? That's what it's supposed to do! It is not a problem!

And I appreciate that 100 distinct values is a "rough" guide but again it's plain wrong. If you have a table with 100,000,000 rows in it then you can make use of a bitmap index on a column with 10,000 distinct values or more.

Why do I know these things? Because I do them every working day, hands on with Oracle. And to think that just a month ago Don was asking me if I wanted to be a tech reviewer for his publishing company, and to spot article errors for him. How the mighty are fallen ;)

Oh, and here's what I posted in response to Mike's satirical blog ...

Client: Mr Burleson! Thanks goodness you've come! We had an expert in here looking at our performance problem, but it would take too long to get a result from him, so we've called you in. You need database access?

DKB: Pah, not a bit of it. Didn't you know that you can't "prove" anything with a script?

Client: Uh, OK. So what do you need then?

DKB: Just let me get my crystal ball plugged in a sec ... um ...

Client: Yes?

DKB: OK, got it. Here's my recommendation. Rebuild all your indexes, moving them to 32kb tablespaces. Then rebuild them every two weeks, get yourself some SSD hardware, more memory, more processors. Got it?

Client: That's it?

DKB: Sure. If you have any more problems, call me in and I'll give you another customized solution.

Client: Should we do any measurements of performance before and after the changes?

DKB: Well, as long as you don't use a SQL*Plus script to do so, you ought to be OK. Oh, don't forget to increase the number of LGWR processes.

Client: ???

Now the funny thing about Mike's blog is that he implied that the "Oracle Scientist" was being unreasonable in demanding database access to perform the following ...

Looking at data and relationships
How data is used.
the physical relationships of your entire database
how all the indexes are built
how all the tables are configured
root cause analysis of all waits and events.

Funny list. Like, funny-ha-ha, because most of those are not what is required, right?

Here's some highlights of his own list (which is actually a pretty good one)

Wait events
IO spread and timing
Log events
Memory statistics and parameters
SQL usage
SQL statements which require too many resources
Access statistics if in 9i and 10g, otherwise major problem SQL is analyzed.
Parameter settings
Disk IO
CPU usage statistics
Memory usage statistics

Wait events top of the list. Look, he's measuring stuff. Suppose the client did have access to the code, would he look for poorly performing SQL with a view to changing it? I bet he would.

You see, Mike's not just writing articles that say "move your indexes to tablespaces with large block sizes" at all! You know what? I think Mike's in denial, because I think that he may secretly be an Oracle scientist! He doesn't realise it because from his list above he doesn't know what an Oracle scientist is. He's just been criticised so often in the past by people associated with that methodology that he doesn't want to be associated with it.

But I don't believe that of Don. I've seen too many basic errors in scripts that he's published, and too much generic advice that he cannot defend on it's technical merits. I've seen him advise people to create tablespaces with names that start with a digit (an elementary and obvious error, that makes an experienced Oracle professional's hair stand on end) and to create tablespaces with non-default block size before creating the appropriate memory structure (anyone who has actually done this themselves knows it has to be done in the reverse order to that originally stated in the article). I pointed out those errors and he corrected them, by the way.

So, when is Oracle going to institute a "Banned By Burleson!" award? The list of inductees is getting longer by the day, it seems, and eventually the only members of Don's forum will be gullible noobies and Burleson Consulting employees -- eventually the noob's will grow out of it though. Eventually we could have a "Banned By Burleson!" banner on our web pages, and a stamp on our business cards to advertise our rationale approach to our profession. If anyone would like to take that thought and run with it then they'll have my full support, for one.

By the way, I'm allowing comments on this. No bans, no filters. Flame away!

Saturday, April 09, 2005

Update: BETWEEN ... a rock and a hard place

Ahem, well no sooner do you start looking at something then it changes.

While investigating the various fact tables that are used for this reporting exercise, we learn that some of the original requirements were changed post-implementation, as so while we're joining on about six dimensions, only two of them are actually being used. The rest are effectively ...
Where A between and lowest_possible_value and highest_possible_value
So, we'll take out these surplus joins from the universe definition, and see what happens to performance.

Friday, April 08, 2005

What I look for in good information

There sure is a lot of crap on the internet, and Oracle has it's fair share. So how do you assess the quality of a piece of advice or a technical article? Here's some of my own thoughts.

Good Sign: Lots of "ifs, buts, and caveats". Very few features of a technology are straightforward good news, and there's nearly always a downside to any optional feature or technique -- otherwise it wouldn't be optional, I suppose. For example, an article that stated outright "index-organized tables give better performance" is leading you astray, because while key-based retrieval may be faster, there is an overhead on data modification that you ought to be aware of. If an article doesn't tell you about the bad news, then look for one that does.
Good Sign: an explanation that is rooted in documented behaviour. There may be features and techniques in Oracle on which you can find no documentation, either here or in Metalink, but in my experience they are few and far between. The Oracle documentation is excellent and the best articles are those that supplement and expand on information contained therein.
Good Sign: the ability to question the author. Can you send an email, or a forum PM to the author asking for clarification? If so what sort of response do you get?
Good Sign: edits. If the article contains something like "Shortly after publication I received an email from ... pointing out that ... and in my opinion this is/is not a valid point". An author who is willing to acknowledge a mistake and credit the person who corrected them, or who is willing to acknowledge that their information was challenged, is probably writing better articles.
Bad Sign: $$$'s required. If you get a brief introduction or rave reviews of a technique, and are then referred to a book or service you have to pay for in order to get more (useful) information, then I'd be steering clear. You absolutely would not believe how much nonsense has been published about Oracle in the last decade, nor how much of it is just a re-phrasing of documentation.
Good Sign: practical demonstrations. There are some features that it are difficult or impractical to demonstrate, but you ought to be able to recognise those situations. Speaking as a guy who develops and tests on a small system for later test and deployment on much larger hardware (on a completely different operating system as well) it is my opinion that if you are careful then lessons learned in development can be successfully integrated into the larger scale system. Even if I have a single CPU at my disposal I can infer a great many useful lessons about how parallel query works on an eight cpu system.
Ambivalent: the author. There are obviously some authors who I am going to trust more than others, but I try to be equally cynical about everyone. I've made enough mistakes in my professional life, and seen enough by others, not to expect perfection from anyone. This leads us back to my fourth point of course ... if I have never seen an author admit to a mistake and acknowledge it publicly, then that's a big red flag. so maybe it would be more fair to say that I am cynical about everyone, but more cynical about some people in particular.

I expect that I'll think of more, so I'll repost if anything comes up ... or if I get other suggestions (full credit promised!)

Thursday, April 07, 2005

Dumb things I've done recently: Part I

Assumed that the code that I had locally was the same as the one giving problems in production. From the symptom of the problem I almost immediately suspected what the problem was (a column not specified in a joiner transformation in Informatica) but my own code said that the suspect column was included. So, after many hours of picking the bones out of a 500Mb session log file, I checked what the file had to say about the join specification.

Ping! It was wrong. Solution appiled in production and problem went away.

Interestingly enough though, I also fixed a different problem that would have caused the same symptom had we hit particular data combinations. That must have been the silver lining on the cloud of a mostly-wasted day.

Prime locations for sponging

AskTom is a great site that I dip into pretty regularly. Say what you like, but you'll never find Tom Kyte without an opinion or an illustrative example on how to do things the right way. Except when he doesn't have an opinion or experience, but he's refreshingly open about that also. His examples are clear and concise (and your own had better be also), and it's a rare day that you can't learn something new.

Jonathan Lewis' site is not so frequently updated, but hosts the excellent Oracle Users Cooperative FAQ, a great source of information.

Howard Rogers recently started a forum at his Dizwell Informatics site that is picking up pace nicely, and blogs his own thoughts and experiences also.

You'll sometimes find me hanging around the forums at DBAsupport.com, answering more than asking, I guess. The Obfuscation Unlimited forum is usually lively, though less so now that the election is over ;)

The Oracle Technology Network has it's own forums, which I have mixed feelings about. They're busy, but the interface is not my favourite, nor is the general style and content of the questions. However, if you're an Oracle professional and you don't have an account at OTN then you're missing out on the best source of information available -- the documentation site. Get that on your bookmarks, because 90% of the questions that you see on almost any forum can be answered with a one minute search of the relevant documents. The Concepts Guide, which is essential reading, runs to 732 pages in PDF format, so that gives you a feel for the level of detail available. At the very least it should be browsed so that you know that the information exists, even if you can't absorb it all.

The start ...

Is there a single sentient being in the western hemisphere who is not currently spewing their random thoughts at passers-by on the internet? Apart from my kids, that is? Maybe so, but it feels like I'm just about the last one to start.

And that's an oddity for someone who spends so much time on the internet. "And why, pray tell do you find yourself with so much time on your hands?". Well, time-on-my-hands comes and goes. I currently work from home for a large contracting corporation as an independent subcontractor, and that means I have ...

no-one to talk about golf with for the first two hours of every day -- hey, not that I'm interested thanks very much.
nobody popping into my cube to distract me from whatever it is that I do.
long periods with a single task to accomplish.

And that task is to design, deliver and support elements of a financial and logistics data warehouse/decision support system for um ... shall we say an internationally recognised ... um ... entity.

So I'm running various enterprise-level software on what many would regard as a rather undersized platform, and that leads me to two rather important problems.

Firstly, how can I be sure that what I do on a home-built machine running Windows XP Professional is going to scale appropriately to a large unix server. Let me tell you, it ain't easy and it takes a lot of research and understanding of some very fundamental workings of the Oracle database.

Secondly, what do I do in the "down-time" while waiting for a couple of gig's of synthetic test data to be squirted into the appropriate tables?

The solutions to these two problems, dear reader, happily intersect. I spend a lot of time reading documentation, performing experiments (of the practical and the "thought" type), and trawling through various forums and websites in search of tidbits to solve my immediate problems and to store away for later.

Long story short, I sponge up information from wherever I can get it. and instead of treating you to details of my sons' periodic episodes of misbehaviour at daycare, or the contents of my cat's stomach as left on a carpet, or what I cooked for dinner last night, this is what I'm going to serve up in this blog -- the regurgitated remains of whatever drops of knowledge and scurrilous Oracle-related gossip I've been absorbing recently. Yes, that 's right -- sort of like a cat might do.

Except that the cat is not going to be writing the occasional article on Oracle Partitioning, or Parallel Query, or pointing out interesting or amusing Oracle-related forum postings to you, of course.

Well, ho hum back to work.

Enjoy! Or not. Please yourself.

Oracle Sponge -- Now Moved To Wordpress