The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Wednesday, April 27, 2005

Comments (the type you get in code, that is)

"Supposed" to get, that is. What is it with SQL and comments? Why do we so rarely see explanatory comments in SQL code? Or even a decent structure?

One of the features of front-end tools like Business Objects or Microstrategy is their ability to allow the designer to create extraordinarily long and complex looking SQL with ease. (If you don't use such tools then don't go away -- I'll deal with you in a moment). It starts by having an object or two that are pretty simple, then you add them together ... then you divide by another two objects added together, but then you need a decode on the divisor to change zero's to null's so that the SQL doesn't error out, and pretty soon you've got an expression with a length of 1000 characters that is unreadable to the human eye.

Of course it's not just extremely long expressions that are tricky ... sometimes it's to look at a piece of SQL even to find out which simple expression relates to which object in the universe/project/whatever. So do me a favour and put a comment block in the definitions ... something like ...

Sum(order_amt)/Decode(Sum(order_qty),0,Null,Sum(order_qty)) /* "Average Unit Cost" */

... and it'll appear in the SQL itself. Not too tricky, eh?

You know, I do have a method for doing this automatically by hacking a Business Objects repository with an update statement, but if I told you how to do that and you fkd it up then you'd probably sue me or something.

By the way, the same thing applies to join definitions.

Moving along ... if you're hand-coding SQL statement then you have even less excuse for not commenting your SQL. If you follow the principle that it is better to do things in SQL than PL/SQL then you're probably packing a fair amount of functionality into each statement, and that means ...

Write a header to say what the hell the SQL is supposed to do ... doesn't have to be in the SQL itself, just above it will do nicely. In fact, work out this statement even before you write the SQL itself and you'll keep a more clear mind when you are writing the code. Include in that header how it is going to do it -- remember that complex SQL packs a huge amount of functionality into a small amount of text -- it's much more "logically dense" than regular 3GL code. TK has an example of this here, and thanks for the suggestion to include this, Tom.
Maybe you might even like to put in a comment about performance, eh? "Runs in about 0.2 seconds" ... "Uses a lot of hash joins" ... etc
Put a statement inside the SQL to say where in the code it is located (package/procedure). You wouldn't inflict a twenty-page procedure on anyone so it'll be easy to find.
You might even like to put in a unique identifier for the SQL as well as it's location (for example, so that it's easy to reference the copy of a "good" explain plan that you've kept for that query, in case of problems down the road).
Format the code nicely
NEW ITEM: if you are referencing any view more complex than "SELECT blah from whatever", or if any of the tables are accessed remotely through a db link that is obscured by a synonym, or is actually the query name for a subquery factoring clause, then a comment would be appropriate. And when I say "appropriate" I mean just do it.
Clearly seperate the joins from the filters
Any tricky looking expressions, state what they do. Especially analytic functions and subquery factors ("what they?")
Document anything that looks funky (if you are using "... and rownum > 1" in an in-line view to stop the optimizer from pushing predicates then say so ... not everyone is as smart as you are).
(Except Tom Kyte ("who he?"), who tells me that he's perfect. My opinion: man, has he aged since his photo in "Expert One On One". Am I right, ladies?).
Erm ...
For the love of God, use table aliases. Friendly ones, not a, b, and c.
Just use your common sense. People have to support that code
That's it

Back to work, fellas.

6 Comments:

At 10:41 AM, Anonymous said...: Hey Dave,

Sorry, something off topic: Do you think you could publish your blog for "feed" users? I use feedreader for a few oracle blogs (e.g.Mark Rittman, Niall Litchfield). I think you can use feedburner with the blogger site. I'd love to have your blog every morning with a cup of coffee:)

Thank you,
Thomas from COS
At 7:02 PM, Thomas Kyte said...: who tells me that he's perfect.

quoting me out of context I tell you!

Do i really look that much older than in my Expert one on one Picture?

I'll add to your list -- comment HOW the sql query does it's thing. consider a SQL query to be a subroutine (for younger folk, that is "method" not subroutine). Just like we used to put a block header on top of a subroutine, do the same for your sql that is "complex". Else you won't know what it is doing next week.
At 8:27 PM, David Aldridge said...: I've mislaid my copy of "Expert ..." at the moment ... probably forced it onto someone in the hope of banging some sense into their head, so that's the last I'll hear of that. That's my recollection though ... a black and white photo wasn't it, like on Technet ... maybe it's the b&w/colour thing.

The "how it does it" thing with the SQL sounds tempting, but I have a niggling doubt that some purist might slap me for suggesting that SQL has a "how" ... it being 4GL it's supposed to be "what I want" rather than "how to get it". Not that I imagine many RDBMS purists are reading my blog, mind. They're not exactly my target audience. Maybe I'll get a quote on Fabian's "Database Debunkings" site if they are though.

* pause to deal with 5 year old tantrum *

I wouldn't like to give the impression that I have a very clear idea of who my target audience is, mind you.

However it's probably also true that the more SQL-the-language develops, the more ways there are of writing the "what I want" statement (analytical functions etc), and the 4GLness gets less relevant.

Are you thinking of such statements as "This query gets the most recently paid invoice for a customer by using an analytic function to return the invoice number attached to the latest payment" or something more detailed, stating what order you're expecting tables to be joined in etc.?
At 5:20 AM, Thomas Kyte said...: I'm thinking -- tell me what this SQL is to do -- just like you used to tell me what a subroutine was to do. My favorite example, I'm given some PLSQL to tune. No comments, nothing. One of the longest running bits goes like this:

insert into t ( c1, c2, .... )
select c1, c2, ....
from t1, t2, t3, t4, ....
where ....;

loop
delete from t
where (c1,c2) in
( select c1, min(c2)
from t
group by c1
having count(1) > 1 );

exit when sql%rowcount = 0;
end loop;

Ok, big "insert as select". It took long.

Then a big delete. If the delete removed at least one row -- do it again.

What does that delete do? Sat and stared at it for a while. Got it, the delete:

remove rows with the oldest C2 (c2 was a date) value by C1 if C1 has multiple rows.

Additionally, if all of the rows for the oldest C1 hasve the same C2, these rows are to be removed as well. That is, given:

C1 C2
---- -----
1 01-jan-2005
1 02-jan-2005
2 01-jan-2005
2 01-jan-2005
2 01-jan-2005

it would leave 1 row

C1 C2
---- -----
1 02-jan-2005

Well, now I know what they want -- "take this table with history records and get me the current one. also, if the most current record is duplicated -- ignore it"

Interesting thing -- after I explained to them what the code did they said "no it doesn't, it gets the most current". I said Oh yes it does.... anyway -- a small single user test case later with 2 rows and they were convinced.

So once I had the question "in english":

insert into t ( c1, c2, .... )
select c1, c2, ......
from
( select c1, c2, .... ,
max(c2)
OVER ( partition by c1 )
max_c2,
count(c2)
OVER ( partition by c1,c2 )
cnt
from t1, t2, t3, t4, ....
where .... )
where c2 = max_c2
and cnt = 1;

All we needed to do was remove the cnt column and the predicate on cnt and we got the most current record from the history table by C1 -- in a single pass, in as much time as it took to do the initial select -- the deletes -- gone (and they ran for a long time).

But we are left with a query I know, but someone coming along later might not. Need to set up a comment block that says "the goal of this insert is to ......... it does it by assigning each row in the inline view the max date by C1, then we filter out and keep the rows that have that max date...."
At 1:39 PM, David Aldridge said...: Excellent example -- I spent a little time trawling for such examples round the forums, but not a lot -- if anyone came across good illustrations of publicly posted non-commented SQL I'd be interested in getting a link to them.

Thanks -- article updated.
At 6:28 PM, Thomas Kyte said...: In fact, work out this statement even before you write the SQL itself and you'll keep a more clear mind when you are writing the code

whoa... big flashback.

I'm 22 again, just fresh out of college. "Ok, you are not going to write code today (big disappointment), today you are going to say what the code you write tomorrow will do"

Say it in english (or whatever language you speak -- that matters not, you say it in your language, not computer speak).

Write it down.

We called it "specs" in the past.

Beats "java doc" to death. Say in plain words what the next bit of code is going to actually do...

Oracle Sponge -- Now Moved To Wordpress

Wednesday, April 27, 2005

Comments (the type you get in code, that is)

6 Comments:

About Me

Previous Posts