The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Thursday, May 05, 2005

Mutating Tables and Hash Clusters

Here's an example of a problem that is probably caused by too much denormalization. In many cases designers are trying to get fast access to an aggregated number by storing it in a seperate table -- from my own observation it's usually a SUM(), but here we have a MAX() . Same principle.

Often an attempt is made to use a trigger to maintain a higher-level table by reading the table on which the trigger acts. Here's some issues with this.

Obviously there is a mutating table condition here, which requires complex code to work around. In this case the requirement could actually be fulfilled by just querying the aggregate table of course, to see if the new version number is higher than that already stored there.
It is not very obvious to a reader of the application code how the higher level aggregate is maintained, because the code that performs the maintenance is nowhere near the code that modifies the triggered table. So once you have established the practice of using triggers as part of an application you have instantly made the maintenance job more difficult.
To get an accurate value stored in the aggregate table you must serialise the application on the update operation.
Oh, there's just so many more. I don't like triggers for application code.

If the aim is to be able to quickly work out what the aggregate value is for a particular key or set of key values then scanning the detail records may actually be a fast option. An index range scan is likely to be used (providing that the appropriate indexing is in place of course) and the "trick" to good performance is to ensure that the rows for the detail records are co-located in the table. And how do we ensure such a thing? Well, typically with a hash cluster. In fact if the detail table is often subjected to range scans on the some key value (as might be the case with an invoice detail table, or an employee's pay records, to take two examples), then a hash cluster may be beneficial in many other queries.

You'd be a fool to go taking my word* for this of course, so if this is all jibba-jabba to you then go and take half-an-hour to read the relevant sections of the Concepts Guide (link in side panel), then go play around with the structure. Pay attention to the possible negative impacts of such a beast also -- remember, if hash clusters were 100% advantageous and 0% disadvantageous then every table would have one. Evidently, that is not the case.

* To be fair to me, I should say "don't take anyone's word for it". :)

7 Comments:

At 6:35 PM, Thomas Kyte said...: would anyone care to point out to the poor souls that with multi-versioning and read consistency -- the proposed approaches are doomed?

take edward stoever's pointed to trigger and run it in a multi-user situation.

That is the classic example of *why the constraint* exists in the first place :)

get the number of "employees on project" in his example to 3...

then in two sessions -- assign a separate employee in each to a project.

Whoops.

can someone please post a big warning to that thread "careful with that axe eugene, watch this:

ops$tkyte@ORA9IR2> delete from employee_on_project where ep_proj_id = 114;

1 row deleted.

ops$tkyte@ORA9IR2> commit;

Commit complete.

ops$tkyte@ORA9IR2> INSERT INTO employee_on_project (ep_emp_id, ep_proj_id, ep_hourly_rate, ep_mgr_emp_id ) VALUES (100, 666, 100, 50);

1 row created.

ops$tkyte@ORA9IR2> declare
2 pragma autonomous_transaction;
3 begin
4 INSERT INTO employee_on_project (ep_emp_id, ep_proj_id, ep_hourly_rate, ep_mgr_emp_id ) VALUES (100, 999, 100, 50);
5 commit;
6 end;
7 /

PL/SQL procedure successfully completed.

ops$tkyte@ORA9IR2> commit;

Commit complete.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select ep_emp_id, count(*) from employee_on_project group by ep_emp_id;

EP_EMP_ID COUNT(*)
---------- ----------
100 5
110 1

You have to understand it before you can use it safely (hey I know, I have a chainsaw and a very torn pair of pants and a neat very straight scar :)
At 6:40 PM, Thomas Kyte said...: Let me just add a "rule not of thumb but of reality" of mine

It is extremely hard, if not impossible to enforce integrity constraints that

a) cross ROWS in a table
b) cross OBJECTS in a database

without using the lock table command.

The rule Edward Stover was trying to enforce above could have been done with an ON COMMIT materialized view and a check constraint -- or lock table.
At 9:20 PM, Anonymous said...: I am glad to have my name mentioned by people I respect such as David and Tom. Thanks guys.

I had a few extra minutes one afternoon and typed up the code that I posted there. I did not have time to test it in a multi-user scenario, and quite honestly I didn't even think about it, but I should have done so!

My favorite thing about Oracle is there is always something new to learn and always some new angle on an old idea!

Edward
;-)
At 10:42 PM, David Aldridge said...: Ed,

You know, the important thing is that you took part and you showed your method, so even though there is a fault with it in a multi-user environment that fault was exposed through peer review in an open and public arena (I won't say "forum" there for obvious reasons), and people learn from it. Thus, the Oracle community-at-large is strengthened by one more notch!

So if you feel free to post a correction, or a reference to one, then it also serves as an example to others also. ;)

Cheers mate, and keep up the good work.
At 10:52 PM, David Aldridge said...: Ajay,

Yeah, sorry mate, no check constraints on a view.

Amusingly enough the SQL Reference does say "However, you can define the view using the WITH CHECK OPTION clause, which is equivalent to specifying a check constraint for the view." but that's a little deceptive because it only (OK, as far as I know it can only) applies to data modification applied by DML against the view, not against the underlying table.

Unless anyone else knows different ...
At 2:11 AM, Pete Scott said...: You guys always post things when I am asleep! I would have posted this before Tom ;-)

I hate triggers (well maybe not Roy Roger's horse - I don't want to be called an "evil horse hater") and do what can to avoid them. If I was doing this I would create an on-commit refreshed MV (perhaps using the analytic max() over... function if my summary table needs to hold multiple maxima). This might not be appropriate if the driving table is constantly being updated.
At 4:54 AM, Thomas Kyte said...: Ed --

Just wanted to let you know that this particular case happens to be a pet peeve of mine. No offense intended (one of my favorite Oracle features is multi-versioning and read consistency).

I am of the mind that triggers are evil now a days. Too many accidental side effects. SQLServer (sybase originally) corrupted everyone with the concept. When you haven't many features but you do have triggers everything becomes a trigger.

I prefer the more straightforward "linear" approach -- but it is true that they too have to be considered in depth with the integrity constraint crosses ROWS or crosses OBJECTS. They work well "on a row" but beyond that, you have to be careful (in sqlserver, in oracle, in whatever).

Virtually every integrity constraint I've seen in the last 10 years that tries to do this (in an application, in a stored procedure, in a trigger) has been done wrong -- going across rows or across objects without some level of manual locking is hard.

hope I didn't come off too strong sounding above... didn't mean to.

Oracle Sponge -- Now Moved To Wordpress

Thursday, May 05, 2005

Mutating Tables and Hash Clusters

7 Comments:

About Me

Previous Posts