Natural and Synthetic Keys
Update 27 June 2005
Reversing the order of the updates here to put the most recent at the top. Sorry about any confusion, but it seems to work better this way.
Anyway this is just an update to point out that while the ungrammatical language in the fifth point of "Synthetic Keys: Cons" does indeed show that I backed off on characterising it as always a con, this is because I was considering a quote that HJR posted by Jonathon Lewis in which he said that he would probably not consider the issue of reverse-key indexes at design time, but would monitor for wait events that indicate a possible problem later on.
For anyone inclined to interpret it as evidence of a fervent devotee of synthetic keys, [who] cannot therefore quite bring himself to admit this drawback freely and unreservedly into his list, I'd point out that I also labelled as "tentative" one-out-of-two cons of natural keys, and two-out-of-four of the pros for synthetic keys ( I sort-of labelled halk of one point as tentative, I think -- maybe that should have been two points, but they're very closely related). Not the actions of a fervent devotee, I'd suggest, or at least not one intending to deceive.
I did screw up on the buffer wait thing though. :(
Update 26 June 2005
The 3:01pm comment deleted by the blog administrator at Dizwell was mine, and said "Thanks for sharing". Seems to me that for someone who starts a forum, insults people who disagree with him, then censors their postings, Howard is remarkably prone to telling other people that they are like Don Burleson.
Witness a spirited and lively debate here.
Update: And note that the link to the Oracle Sponge is no longer on the Dizwell Blog template -- I guess that's what happens when you use someone's example against them ;) The Dizwell Informatics link will stay here though. I don't take these things personally.
Here for the record is my list of pros and cons for the natural and the synthetic key approaches ...
Natural Keys: Pros
- One less column and one less unique constraint with supporting index: benefit is in proportion to table growth rate
- Potentially fewer joins, particularly for "first-generation" child tables, leading to more simple and potentially better performing queries
- Natural key changes require more complex code, and execution of code requires more extensive processing and locking: detriment of these are in proportion to the number of tables referencing the key and the rate of key changes, and the latter is also in proportion to the number of child table rows per key value.
- (Tentative point): More temptation for composite primary keys, particularly in parent-child hierarchies, leading to larger index keys and more block splits -- alternatively keys may be non-composite by concatanation of parent keys into child keys, leading to more complexity on point 1 above. Detriment is in proportion to use and size of composites.
- Changes to natural key values are coded more simply, executed more quickly, and have less system impact.
- Primary key columns are always of the same type and are (tentatively) generally single column, potentially leading to more simple join syntax and developer education.
- (Tentatively) With smaller non-composite key values either the index size or the number of block splits is reduced.
- Potentially more joins required in queries, leading to more complex and less performant code.
- Requires an extra column with supporting index, and an extra unique constraint (all tables having a PK, the extra UK is on the natural key that is no longer the PK)
- (Tentatively) With smaller non-composite key values either the index size or the number of block splits is reduced.
- May requires extra decision-step to identify high concurrency inserts fort adoption of reverse key index, or later monitoring for excessive buffer free waits (update: Doh! as HJR points out, that ought to be "buffer busy waits". My bad.)
Think hard about the design and what natural keys there are, and allow them to change, and make them user/third-party friendly, and print them on reports and show them on the screen -- then make them unique keys and make the primary keys synthetic.
Comments as always are welcome.
Howard's blog comments are here.
25 Comments:
Howard, you're free to describe the subtleties of the application, but you chose *ahem* "characterisation" (rolls-eyes) instead -- your choice, Howard.
Tell me, am I free to comment in a neutral and constructive manner on your latest blog posting, to give the explanation for my editing of the comment? (Hint: you quote Jonathon Lewis as saying that he wouldn't bother thinking about this issue until post deployment) Perceptive job on picking that up, all the same -- I did edit it, but maybe not for the reasons you state -- sometimes a cigar is just a cigar ;)
I wonder whether you made the point in a comment on the blog instead of the forum because deleting forum postings would underline the parallels with you-know-who, but you can get away with deleting comments on a personal blog more easily -- I'm willing to consider other interpretations, of course.
Would you rather that the link pointed to the main page? Done. I don't see how it's my business that your forum contains no link back the your main page. An alternative interpretation of the Great Link Controversy, other than that I am a disingenuous person trying to appear reasonable, is that I am actually a reasonable person who does not hold the exact details of every link from their blog in their head. That you choose to dismiss this possibility is probably revealing of something in your own current state of mind.
The sad thing for me is before I went on vacation I had half of an article written on how an overly-developed professional ego is an impediment to admitting error and to learning from it. I now have to wait a respectable amount of time before publishing it, to avoid further bruising of your sensitive nature because it touches so close to the nature of your complaint -- that someone referenced a problem that you described with some work you did 15 years ago. The result: an embarrassing hissy fit.
Your comments are always welcome here, Howard. The Sponge is proud to be a no censorship zone!
Is this commentary a little over the top for a discussion of database design? So what if David tried to have the "last word". You know what 99% of the people reading that comment going to do? Move on to the next one. Sure, it doesn't add anything, but its also so small that it doesn't take anything away.
I've noticed that Howard, as creative and diligent as his database knowledge is, has quite the temper. I would suggest learning how to take things in stride and with a grain of salt.
This is the first time I visited your blog after your blog entry "So long, Wage slaves". I did not like that particular post.
http://oraclesponge.blogspot.com/2005/06/so-long-wage-slaves.html
Apart from few generic scripts, I did not find anything useful here. Hopefully, this would be my last visit to your blog.
< / reading_your_blog >
[Quote Rajesh:]
"Hopefully, this would be my last visit to your blog."
[End Quote]
What a bizarre statement. Are there strange forces at work that compel you to read this blog then?
Hope they don't attack the rest of us, I personally value being able to choose whether to read blogs or not.
Rajesh,
Did you read other articles? There are very few here that are not about Oracle.
Incidentally, for those who came here from elsewhere there's a budding discussion here
anon1,
Yes, much over the top on database design, although I think that there are many other things that could be said about natural and synthetic keys. Unfortunately the forum thread stopped being about keys and started being about "characterisation" a while ago, and never looked back.
Not my choice, and in retrospect I'd as soon not have posted anything on the forum at all about it if I'd known that it was going to cause all this, but you can't turn the clock back etc. I guess.
Can't we all just get along?
Wow I really opened up the pickle jar with what I thought was a simple question! :)
Here is the link to my blog. Comments already link to Howard's and David's.
http://thinkoracle.blogspot.com/2005/06/natural-vs-synthetic-keys.html
Ah, the simple questions are so frequently the more interesting ones though.
David Aldridge wrote:
"Think hard about the design and what natural keys there are, and allow them to change, and make them user/third-party friendly, and print them on reports and show them on the screen -- then make them unique keys and make the primary keys synthetic."
But there you are locking yourself into a design implementation with no wiggle room. My mantra (if I had one - sadly, I don't) would be "Do what the data and its use necessitate". Or simply put, whatever makes the data relate better and the application more efficient - and that won't always be the same thing all the time. :D
bill,
all designs are about compromise, when you have to chose between different options I guess. While I'd agree with "Do what the data and its use necessitate", I'd want to consider whether those doings apply to future data, which is often an unknown or uncertain "thing".
I think everyone should have at least one mantra! Maybe up to a dozen. One feature of that one I posted is that it is in one sense a compromise -- it acknowledges both the validity and utility of natural keys as well as the possibility that they'd change in future.
I think that there's an invisible paragraph at the end of every requirements document that states "The above is all subject to change now or in the future, and the design should accomodate this". In fact I've seen written requirements with a similar effect. "We don't expect that ...", "It is unlikely that...", and such like. It makes the hair on your neck stand up, that does.
Robert Vollman said...
"Wow I really opened up the pickle jar with what I thought was a simple question! :)"
I hope you're proud of yourself young man! :)
David Aldridge wrote:
"I think that there's an invisible paragraph at the end of every requirements document that states "The above is all subject to change now or in the future, and the design should accomodate this"."
I would agree, but you can't prepare for every eventuality or you suffer paralysis by analysis. I have never seen a design that was so well done that no change to requirements in the future would disturb it. I do know where you're coming from as we tend to live it every data cycle where I work, but the nature of our business dictates that we set up the data as the current regs require, and expect those regs to change every few years. Does not necessarily affect whether we use natural or synthetic keys - that is chosen based on the data requirements NOW and the expected uses of it NOW and LATER. And if we need to change it in two years, so be it. But so far we have had to add structure, not change keys. I still believe if your keys are chosen well at the beginning, even if the data requirements change the keys in all probability will not need to be redone.
David,
I meant, Freedom of Speech does not mean "insulting others"
http://oraclesponge.blogspot.com/2005/06/so-long-wage-slaves.html
Its your site, you have the choice to write whatever you want. I have my choice to read whatever i like.
Afterall, its all about freedom. And, dont complain when somebody removes a link to your site!
+++++
PEOPLE -- some make things happen, some watch things happen, and the majority has no idea what's happened
+++++
Rajesh said...
David,
Rajesh, I've read that posting a couple of dozen times. I'm afraid you'll have to enlighten us as to the patently insulting part?
What was insulting about it? I don't get it.
Rajesh,
Tom's (excellent) point aside, I'm inclined to think that insults do fall under the umbrella of "Freedom of Speech". I'm no constitutional expert, so consult professional legal help before acting on this advice, but I think that if FoS meant the freedom only to be be nice to everyone, it'd be a pretty meaningless freedom, wouldn't it?
Naturally, I don't get the "insult" bit, but if it's any comfort to you then remember that I didn't get paid while on vacation, nor when I am sick, or a family member is sick and I have to take care of them. It doesn't bother me, but like I say it may make you feel better.
Bill,
I don't find much in your posting to disagree with ... as you say at the end it is all a game of probabilities, and I think that's a theme that I have followed throughout this *ahem* debate. I'm not proposing a rule for every individual that says, "Failure to use synthetic keys will result in later disaster" -- it may result in no problems, or in some minor inconvenience, or ... well yes it may result in a disaster that could have been averted.
Now although my personal position is at one end of a spectrum and by definition is "extreme" in that sense, I'm not a fanatic on this subject -- if I worked with people who really were uncomfortable and twitchy about synthetic keys then I'd go along with that. That's partly because it would be "my fault" if ever there were any problem identified as being possibly related to synthetic keys, and partly for the satisfaction if shrieking "Told you so!" should natural key values need to be changed, but mostly because this is not as important an issue as the use of bind variables, or the use of a date format picture, or instrumentation of code, or any number of other issues.
In fact, it seems that I don't find anything in your post to disagree with - it is just our individual assessments of the risk that vary, and like so many other things that's just a matter of personal history.
David,
Thanks, and I don't necessarily disagree with you either.;-D
Like you, I'll use whatever makes sense (and hopefully, I'm not off floating in the ozone layer on that day) or whatever I'm forced to use (but not without a fight). My concern was that last part of your mantra, which seems to say "use synthetic keys all the time". Anytime someone says "always do..." it makes me think of a certain saying this smart fella I know has:
"Never say never, never say always. I always say." If I have misinterpreted that part of your post, please accept my apologies for wasting blog comment space! :p
Regards,
Bill
Oh, I don't think that there's such a thing as wasted blog space ... it's all free!
Regarding the mantra, it's a personal thing -- in fact that was part of the reason why I chose the "m" word to describe it, as it has a nice personal and religious feel to it (ie. it's not necessarily 100% rooted in rationalism and practicality, but is there for spiritual guidance!)
Oh boy, that ought to offend at least one reader. Freedom of Speech!
Who'd have thunk it - David Aldridge professing to practice shamanism! Gee, didn't someone else make a comment in some other forum regarding a certain company becoming a faith-based organization? You're not gonna start down that slippery slope are you?
:D
Hey, free speech means you get to air your point of view so long as you aren't slandering or libelling someone else. I used to HATE it when people were screaming about getting Howard Stern off the airwaves. I never much liked him, but I exercised my freedom of speech by changing the channel. Wish others would learn to do that as well (you self-indulgent non-channel-changers know who you are).
Heh heh heh. That "faith-based" thing was funny, but not my joke, alas.
David/Tom,
Guess its got something to do with my cultural background. (When its day for you guys, its already night for me!)
So, you can dis-regard my previous comments. Not a big issue anyways
Jonathon Lewis: http://www.jlcomp.demon.co.uk/
Another consideration in the choice of synthetic/natural keys is the possibility that, in the future, data from different, but possibly identically structured, databases will need to be merged. This comes up frequently in the design of packaged applications but is by no means limited to them; any entity that must merge databases has this problem.
An example: corporations A and B are competitors who use the same CRM package. The two corporations merge and you are the DBA who must merge their CRM databases.
Much of the data in the databases of A and B intersects (e.g., they have customers and contacts in common) and so may be merged. I say "may" because, depending on the organization of the new combined firm the sale staffs (which may or may not remain separate in the new firm) may wish to keep data separate or may wish to merge it.
Merging is (usually) easier if the application uses natural keys. If you are using synthetic keys, then you must carefully seek out and define candidate keys to use in merging the data, since a single customer would (probably) be represented by different synthetic keys in the two databases.
Anonymous,
I think that you're probably right -- that there are certainly circumstances where the merging of databases may favour natural over synthetic. I'm inclined to think that the point is two theoretical to carry much weight at the time when you are designing a package implementation, though. The alignment of stars required in order to get the benefit is too remote, requiring as it does the same package, configured in a compatible manner, and with the same choice being made for the key in both systems of course.
Furthermore as has been remarked elsewhere the identification of duplicates and other challenges is so high that I feel that the synthetic/natural issuer is but a drop in the oceon. Consider two companies supplying the same third party for which the choice of natural key is not a simple choice -- where the two systems create their natural keys in slightly different ways, there is essentially no difference between the keys being synthetic or natural, if they cannot be compared in a sys1.key = sys2.key fashion.
Thanks for the thoughts,
DA
Post a Comment
<< Home