.comment-link {margin-left:.6em;}

Oracle Sponge -- Now Moved To Wordpress

Please use http://oraclesponge.wordpress.com

Tuesday, June 28, 2005

Dates as Key Values: Natural or Synthetic?

Here's a question: when deciding whether to use a date datatype as a key value, why does there seem to be some consensus that it is OK to do so, even in situations where the same person would not consider using a country code, for example? Now I'm not talking about whether people should use a date datatype or should instead represent a date as a character string or number, because for me that is not an issue. If you want to store a date then you use a date datatype.

But how about a situation where you are using a date value as a key, for example in a data warehouse fact table? Is it "OK" to use the date, or should you be using a synthetic key? This has always been a bit of a philisophical conundrum for me, because I always think, "Yeah, dates are OK as key values -- no need for a synthetic there" but have not put my finger on the reason why.

But recent discussions have firmed up some thoughts on the "true nature" of a synthetic key, and I had a little epiphany on the subject in response to a question by Pando at DBASupport.com.

I stareted with two points:
  1. The difference between a natural key and a synthetic key is that a natural key has some meaning to the naked eye in isolation, and therefore can be exposed directly to the user and "generated" by users. However a synthetic key's meaning is only revealed through translation to a natural value.
  2. If you change the value of a natural key, you are generally not changing what that key value represents, eg. "Finland", you are just changing it's means of representation -- ie. some visible code value.
Now what is a date value in Oracle? It is merely some internal representation that correlates one-to-one with that particular date. Let me float the idea that a date column is actually a synthetic value, but appears to be natural because Oracle uses one of a great many possible conversions to represent it as such. So if your boss came in one morning and said, "Bad news lads, the business is changing the standard date format to be Julian", or you wanted to suddenly start representing month names in ... um ... Urdu, then just as in the case where a country changes its code we're not saying that the date itself has changed, only that its "human-friendly" representation has. You can do this pretty simply through changing NLS settings -- if you have quality code that always uses a date format to say such things as:

Where date_of_transaction <> To_Date('01-02-2005','DD-MM-YYYY') *

... then you generally don't even have to change that.

So here's my proposition: date-based keys are really synthetic, and that is why representing them with a regular sequence-generated synthetic value makes no sense -- it's the representation of them, with year-month-day or whatever, that is natural, and Oracle is effectively maintaining an internal and very flexible two-way translation table for you, accessed through the to_date() and to_char() functions.

Now one difference between dates as synthetic values and other synthetics is that every Oracle database is using the same internal representation for the same date -- in a sense the translation table is global withing the Oracle world. So when moving these raw internal values between different Oracle databases (through tablespace transport, for example) there is not the usual trouble with synthetics of needing a translation to the local system of representation. On the other hand, I'd be willing to lay a couple of $'s on the line that converting from Oracle to other RDBMS's does indeed require a translation of the internal value, possibly by converting to and from a natural representation.

* There was a formatting error in the original post that dropped the RHS of this predicate.

29 Comments:

At 11:23 AM, Blogger DaPi said...

Hi slim,

Do you really ever have a date as a key value? i.e. an identifier?

Dates are attributes of entities such as events (a sale, a delivery). But what kind of entity has a date as its identifier?


(War story: application processing a data stream uses the time as a generated key; the machine is upgraded and the application fails beacuse it can now process two data chunks in the same second.)

 
At 11:29 AM, Blogger David Aldridge said...

I think that that is an observation that applies very much to normalized OLTP systems, where a date would be an attribute of a retail transaction for example, but less so to data warehousing.

If you were building a DWh based on retail sales transactions you would almost certainly push the date of the transaction into the fact table as it would be one of the most commonly applied predicates. In a 3NF schema the cost of identifying transactions based on a date range and then joining to a table of transaction details would be far too high.

Nice story!

 
At 12:43 PM, Blogger Jeff Hunter said...

How about partitioning over time with a synthetic pk?

 
At 12:53 PM, Blogger David Aldridge said...

Um ... you mean, "Is it a good thing or a bad thing?"? And do you mean a PK?

 
At 1:56 PM, Blogger Pete Scott said...

I had a surrogate post on my blog that predated the dizwell forum. One of the points that came up was around surrogate date keys.
The date type as a natural key has a lot going for it: the optimiser knows that 31-dec-2004 and 1-jan-2005 are only a day apart (but would it know the same gap existed between 123454 and 124567 if they were the surrogate key values for the same two dates? Partitioning by date is easy (range on date) but to do this by surrogate key may require us to use list partitioning (maybe range if we know the the keys only increase) and the code to automate partitioning becomes a little more tricky

On the whole I think dates are something special (no don't talk about figs)

 
At 5:50 PM, Anonymous Anonymous said...

"31-dec-2004 and 1-jan-2005 are only a day apart" but are lots of seconds apart, and despite the datatype being called DATE, it is really a DATETIME.

"the same gap existed between 123454 and 124567 if they were the surrogate key values for the same two dates"
Yup...assuming in both cases that the stats tells the optimizer the high/low values and number of distinct keys.

Basically a date is really a number with a format mask. As such, there is no point in substituting it with another number.

My reading of whether a key is natural or synthetic is simply, was the 'value' known prior to the start of the process of creating the entity in the database. If you make up (or pick up) the value as part of the creation process, it is synthetic.
[Note : This means that it may be natural as far as your application is concerned, but synthetic from the point of view of another application]

Mostly a date would be natural, though I accept there may be the odd exception.

 
At 1:56 AM, Blogger DaPi said...

I'm still trying to get my head round this one.

slim said: So if your boss came in one morning and said, "Bad news lads, the business is changing the standard date format to be Julian"

Suppose instead he had said " . . we're going metric, no more feet and inches" or " . . . no more solar time, we're now using the siderial clock". I think those examples are closer to the problem of changing the value of a natural key.

There does seem to be something special about measuring date attributes. But why are they different from length, colour, flavour, weight?

 
At 4:27 AM, Anonymous Anonymous said...

You're confusing the representation of the thing with the thing itself. That Monday is called Monday in English, and Lundi in French (or whatever) and squiggle-dash-dot in Urdu does not mean that that particular period of 24 hours didn't exist, or didn't happen. If we establish some common reference point, we could agree which set of 24 hours elapse, regardless of what it was called.

That Oracle uses one such reference point (its epoch of midnight on Dec. 31 4173BC) and SQL Server (or whatever) uses another (possibly) doesn't negate the fact that a specific day happened. It just means we don't have a common reference point, but that is relatively easily sorted with a bit of mathematics.

So time has an existence independent of your database, and you are merely tossing up different representations of that independently-existing entity.

All of which must mean that dates (and clock times) are definitely a natural key, and it is semantically meaningless to try and redefine them as surrogate keys. Are you trying to do it so you don't have to say there are times when you use natural keys?

 
At 8:37 AM, Blogger David Aldridge said...

DaPi,

I think that anon hit the issue on the head there -- the underlying nature of a day is unchanged by whatever name you call it.

Regarding why dates are different to length, colour, flavour or weight, it I don't think that they neccessarily are -- if Oracle made up it's own completely independent unit of weight, let's call it a "kyte" and defined a new datatype of WEIGHT, then it could internally represent weights in terms of kytes and supply conversion functions like to_number(my_weight_col,'kg'), or to_weight(27,'ton'). Same principle, I think. Of course that would all be pointless really ... you can store weights in a number column in units of kilograms and easily convert to and from them because the conversion is universal.

Colour would be a bit trickier I suppose, because the range is theoretically infinite, but you could have a COLOR datatype that internally represents the shade as an RGB combo ... then have to_char(my_colour_col,'FR')='NOIR'. This is fun stuff, no?

So I suppose there is only a difference when the concept of the entity is not universal (and unchanging), as days and years are. Theoretically, Oracle could have a datatype of COUNTRY with it's own internal representation of each one and maintain conversions to natural values such as the countries name -- thinking about the practical difficulties of such a feature seems to be a revealing exercise in why it is not realistic.

Anon, the concept of a date is obviously natural, Oracle's internal representation of it probably only has meaning within Oracle. Whether that makes it more appropriate to label it as "natural" or "synthetic" doesn't matter much -- my point was to illustrate why a date stored in a date datatype in Oracle might be considered to have some of the characteristics of a synthetic, when it is conventionally held to be a natural value. As I said at the beginning of the post, it has just been somthing of a conundrum to me.

"Are you trying to do it so you don't have to say there are times when you use natural keys?"

Short answer, "no". And in fact one of my next posts will be about why natural keys might be more appropriate than synthetics in a data warehousing environment -- I think that there's a "con" to synthetics that hasn't been identified so far that is much more significant to DWh's than it is to OLTP systems.

OK, that was supposed to be the short answer, so I don't know what this makes my other answer: It doesn't matter whether I have an agenda to promote synthetics or not, because I have laid out the logical basis for my position in full, and the position stands or falls on it's own merits. It is a serious error to discount arguments for or against a position just because you think that the author has a bias or agenda on the subject. For more details see TK's review of the book "Crimes Against Logic". It is just as irrational as discounting the logic of an anonymous person because you do not not have a way of assessing their bias, no? :D

 
At 3:25 PM, Anonymous Anonymous said...

"I have laid out the logical basis for my position in full, and the position stands or falls on it's own merits."

True. You did lay out the logic underpinning of your suggestion to label dates 'synthetic', and it does indeed fall on its merits. The merits I mentioned. I didn't argue with any agenda you might have. It was just a question at the end.

 
At 7:45 AM, Blogger David Aldridge said...

"So time has an existence independent of your database, and you are merely tossing up different representations of that independently-existing entity.

All of which must mean that dates (and clock times) are definitely a natural key, and it is semantically meaningless to try and redefine them as surrogate keys.
"

I am not talking about whether dates themselves are some synthetic or natural entity here, I'm talking about the manner in which they are represented in an Oracle database. I don't see how time's existence outside of the database is relevant -- surely that just affects whether we can or can't represent it in the database, not the means by which we choose to identify a unique occurance of a time?

To draw a parallel, a person exists as an entity outside of the database, but it doesn't automatically follow that we have to identify a person with a natural key inside the database, or that however they are represented (even with a GUID) that the key is ipso facto natural.

Now the different representations of the date that you speak of, through different names and calenders say, are "natural" representations of the date, but Oracle's internal storage of the date only has that natural meaning through Oracle's mechanism of translation. Therefore the internal representation is akin to a synthetic value while the way that we view it post-translation is "natural"

 
At 2:09 PM, Anonymous Anonymous said...

PMFJI:

DATE is internally represented as a number of seconds since Midnight Dec 31 4713BC. And seconds are a natural entity.

So, the paralell with people/employees is a false one. There, you have a natural thing called "a person", to whom you assign a synthetic ID, "employee number". With dates, you have a natural entity internally assigned or described by another natural entity, seconds. There's no "mechanism of translation". Seconds aren't being converted into anything, just counted off from a known starting point.

Is time a natural entity with an ever-incrementing quality like a sequence would be? Yes. So say so. But don't start calling natural things synthetic, because that's just word play.

My $0.02

 
At 7:19 PM, Blogger Thomas Kyte said...

DATE is internally represented as a number of seconds since Midnight Dec 31 4713BC. And seconds are a natural entity.

actually, no. Excerpt from a way too long new chapter in my forthcoming book:

..
The DATE type is a fixed width 7 byte date/time datatype. It will always contain the seven attributes of the century, the year within the century, the month, the day of the month, the hour, the minute and the second. Oracle uses an internal format to represent that information - so it is not really storing 20, 05, 06, 25, 12, 01, 00 for June 25th, 2005 at 12:01:00. Using the built-in DUMP function we can see what Oracle really stores:

ops$tkyte@ORA10G> create table t ( x date );
Table created.

ops$tkyte@ORA10G> insert into t (x) values
2 ( to_date( '25-jun-2005 12:01:00',
3 'dd-mon-yyyy hh24:mi:ss' ) );
1 row created.

ops$tkyte@ORA10G> select x, dump(x,10) d from t;

X D
--------- -----------------------------------
25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1

The century and year bytes (the 120,105 in the dump output) are stored in an excess-100 notation. You would have to subtract 100 from them in order to determine the correct century and year. The reason for the excess-100 notation is in order to support "BC" and "AD" dates. If you subtract 100 from the century byte and get a negative number, it is a "BC" date. For example:

ops$tkyte@ORA10G> insert into t (x) values
2 ( to_date( '01-jan-4712bc',
3 'dd-mon-yyyybc hh24:mi:ss' ) );
1 row created.

ops$tkyte@ORA10G> select x, dump(x,10) d from t;

X D
--------- -----------------------------------
25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1
01-JAN-12 Typ=12 Len=7: 53,88,1,1,1,1,1

So, when we insert 01-JAN-4712BC, the century byte is 53 and 53-100 = -47, the century we inserted and since it is negative we know that it is a "BC" date. It also allows the dates to be naturally sortable in a binary sense. Since 4712 BC is "less than" 4710 BC, we'd like a binary representation that support that. Dumping those two dates:

ops$tkyte@ORA10G> insert into t (x) values
2 ( to_date( '01-jan-4710bc',
3 'dd-mon-yyyybc hh24:mi:ss' ) );
1 row created.

ops$tkyte@ORA10G> select x, dump(x,10) d from t;

X D
--------- -----------------------------------
25-JUN-05 Typ=12 Len=7: 120,105,6,25,13,2,1
01-JAN-12 Typ=12 Len=7: 53,88,1,1,1,1,1
01-JAN-10 Typ=12 Len=7: 53,90,1,1,1,1,1

We can see that 01-JAN-4710 BC is "larger" than the same day in 4712 BC, so they will sort and compare nicely.
.............

 
At 7:21 PM, Blogger Thomas Kyte said...

ps: not sure if the date format as it actually exists adds to or takes away from either side of this messy long discussion that I'm refusing to stick my toe into.

Just wanted to show what a date really is.

 
At 10:15 PM, Anonymous Anonymous said...


ps: not sure if the date format as it actually exists adds to or takes away from either side of this messy long discussion that I'm refusing to stick my toe into.


Insert chicken clucking sound here

 
At 8:18 AM, Blogger David Aldridge said...

I agree that a representation of a date in terms of seconds since a-very-long-time-ago would not be truly synthetic since it is deterministic, but it really might as well be, since no-one I know reads or writes date and time in that format. The only thing that saves the format from being worthless is that Oracle can internally translate it to and from natural representations. I'd suggest that the translation is no trivial matter either, and is not simply a question of adding on days, months and years, and taking into account leap years.

As Tom points out the internal representation is more complex/less readable than a simple number, and that helps to emphasise that it requires a translation to and from one of multiple readable formats to make the datatype usable. That is the parallel that I'm drawing with synthetics -- that they mean nothing by themselves and that they allow a mapping to multiple forms of natural representation.

But I'm not starting a worldwide crusade to get people to say "date datatypes are synthetic", I'm just writing my own thoughts to explain something that has been nagging at me for a while. No-one's advocating a rewrite of the text books here.

 
At 9:26 PM, Anonymous Anonymous said...

I stand corrected by Mr. Kyte regarding the mechanics of how dates are stored in Oracle, but it is really irrelevant, since my point was really 'what does a DATE *measure*'? And that is indeed a number of seconds since whenever, and the precise storage mechanism doesn't alter that.

The sequence numbers 100, 110, 120 are probably not stored internally in that format either, but as binary ones and zeros. But the base of a number doesn't stop it being a number, just as the mechanics of arriving at a date doesn't stop it being a measure of elapsed time.

The real test is: What does a DATE measure? Answer: Number of seconds since "Year Zero". What does a sequence measure? Answer:Nothing.

Two very different things, one with an independent meaning, and one without, so I am glad you do not propose re-designating DATE as a surrogate in the text books. One person confusing two different things is enough.

 
At 7:05 AM, Blogger David Aldridge said...

Well, speaking of confusing two different things, you seem to be confusing sequences and synthetic keys -- I don't think that there's any rule that says that synthetic keys have to be sequences, right? They could be GUID's of course, or even a hash of a natural key value.

But the point is that they are not readable or meaningful until they have undergone some translation process to a natural value -- conventionally via a lookup, but in the case of dates through an Oracle function to some user or system specified format. Just as a well-designed application hides synthetic key values from the user, Oracle hides the internal representation.

"What does a sequence measure? Answer:Nothing.". Just plain wrong. In the context of an application a sequence/synthetic key value does mean something -- you just have to translate it to a meaningful natural value to find out what. Just as the seven bytes that make up the internal representation mean nothing except in the context of a date column in an Oracle database. You see, they are both the same -- out of their proper contexts they mean nothing, and inside their proper conext they have natural meaning by some translation process.

 
At 7:50 AM, Blogger Thomas Kyte said...

fleagate said "what does a date measure"

Oh, but a date does not measure that, the number of seconds since anything.

A date in fact doesn't 'measure' anything. A duration or interval might, but a date is a thing we humans have convienently assigned using rules that have been adjusted and readjusted over time.

was there a year zero?

whats up with leap seconds?

what happened in 1582?

ops$tkyte@ORA9IR2> select to_date('31-oct-1582')-to_date('1-oct-1582') x from dual;

X
---------
20

Or wait, was it really 1752?

[tkyte@desktop tkyte]$ cal 9 1752
September 1752
Su Mo Tu We Th Fr Sa
1 2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30


A date is just a pretty thing that pleases us for display purposes.

Actually, I'll stick my toe in there.

A date is a synthetic key we humans have associated with particular points in time. We need a dimension table to perform look ups to tell us things like day of week, week of year and so on.

signing out at stardate -318497

 
At 5:33 PM, Anonymous Anonymous said...

A date is not "a synthetic key we humans have associated with particular points in time". A particular date (meaning, day, month, year, hours, minutes, seconds etc as in the DATE data type) is certainly a frozen moment in time's passing. But a series of dates represents time elapsing, a natural thing.

And if you are going to use a DATE field as a key, you will have a series of dates. That series represents natural time passing, and must therefore be a natural key.

My representation of a DATE is different from a Muslim's, because our 'year zero' is different. But we are both measuring the passage of time since our respective 'year zeros'.

Sure, the labels we use "TUE, WED" and "JAN FEB" etc are artifical things, constructed to enable people to talk time meaningfully. But as someone said earlier in the thread: don't confuse the representation of the thing with the thing itself.

And Oracle is doing nothing different, either, except that it too has its own year zero. And as you pointed out the internal mechanism needed to record dates is again an artificual thing, designed for convenience. But that's the internal mechanism, not the thing itself.

A clock is a mechanism, artificial, constructed. But what it measures is natural.

Leap years and seconds are added because the passage of the Earth round the sun, and its rotation on its axis, is not perfectly regular and is affected by gravitational interaction with the moon etc. IE. Leap seconds are a consequence of natural events.

As for David: "What does a sequence measure? Answer:Nothing.". Just plain wrong. In the context of an application a sequence/synthetic key value does mean something -- you just have to translate it to a meaningful natural value to find out what.

Well, that wraps it up for my point! The sequence is meaningless, it's what it points to or leads to that is meaningful. But a date is meaningful in and of itself, and needs no translation, though it might need re-representing.

Sure, 879349872 in Oracle gets "turned into" JUL042005 or whatever. But that is no translation, merely a conversion of the storage mechanism into the particular form of representation you happen to like. A muslim would probably have their database represent that same number different. But the number is the same for us both, because it has intrinsic meaning as an offset from Oracle's 'year zero'. And both of us are interested in 879349872 in and of itself. We need help in reading it in our different ways, but it's that piece of data in and of itself that we're interested in.

Altering a representation isn't a 'translation', and certainly doesn't alter the underlying nature of the thing being represented.

But with a sequence, you use the sequence to find something else you're actually interested in. You aren't interested in the sequence itself, for it has no meaning in itself. It is only good for being a 'hook' that lets you fish something out that yu actually want to know about.

So one is based on nature. One is entirely artificial. One is interesting in its own right. One is interesting only because it links to something else. One has a reality outside a database. One doesn't.

But I can see you have a big investment in wanting to re-label dates as synthetic. It can't be comfortably for you to be using a natural key, can it?

So go on re-designating the natural world for your own purposes, since you it is obvious you have a mind to anyway. I won't bother you again.

 
At 12:44 AM, Blogger David Aldridge said...

You have missed my point by an extraordinarily wide mark ... I am not debating whether the date of your date of birth is real or synthetic, nor making any statements about the real world passage or description of dates whatsoever.

I am talking solely about the Oracle DATE datatype, which in it's internal form is so completely meaningless to a human being as to share that same property of a synthetic value. Even if it were "seconds since year zero" it would still be practically useless to the human eye. It is so meaningless that you have to take special steps to view it, and even if you query it directly then Oracle will convert it to a human-readable form according to an implicit or explicit date format.

This is an exact parallel to using an application with synthetic keys -- the designer does not allow you to view the synthetic key value, and instead it is converted to a readable form. See, I'm talking about parallels and analogies with synthetic values.

It almost seems as if you are have a big investment in characterising me as some kind of synthetic key fanatic ... fyi I use natural keys all the time in data warehouses, and if I wasn't so busy dealing with your silly attempts to mis-"characterise" my motives then I'd have more time to write the blog explaining why.

 
At 4:16 PM, Anonymous Anonymous said...

Like Fleagate, I must have misunderstood your blog title then. It reads "Dates: Natural or Synthetic". Not "Oracle DATE datatype: Natural or Synthetic".

For someone who is not a synthetic key fanatic, you are certainly playing fast and loose with words to promote something as a synthetic!

Fleagate got it right: with a DATE, you are interested in the column/data itself, and Oracle merely re-formats that data into something you choose to display. With a sequence, you aren't interested in the sequence number itself, but in the things it hooks into.

It is just wrong to say the designer "converts the sequence into readable form". He does nothing of the sort: he uses the sequence to retrieve other data, which is already in readable form. There's a lookup, sure. But no data gets converted anywhere.

But I see that anyone who disagrees with you on this topic is just "silly", according to you. And I guess anyone who agrees with you is not silly?

Nice way to hold a discussion, Mr. Sponge!

 
At 12:26 AM, Blogger David Aldridge said...

Tom,

Dammit, you're right about the title -- it was very poorly chosen, and I understand now that it contributed to some (or most) of the debate misunderstandings. I don't know if it would be proper to change it now though -- perhaps if I put a note to say why ...

My intention was that this was to do with the Oracle DATE datatype, not with dates themselves, and hopefully in that context my original posting and comments make more sense to you.

Sorry guys. Can we start again?

 
At 2:54 AM, Anonymous Anonymous said...

I don't get why it bothers you that Oracle stores the date in some illegible format.

That would make anything stored in a computer the same. I can't (well I can, but my daughter can't) read ascii but she can read the text on a computer screen. What's the difference?

 
At 3:26 AM, Anonymous Anonymous said...

If you change the value of a natural key, you are generally not changing what that key value represents, eg. "Finland", you are just changing it's means of representation -- ie. some visible code value.

I don't agree with this at all. If you had chosen "COUNTRY" as a natural key, and then chosen to represent "COUNTRY" as a name you would not be chaning the representation when:

(1) When the "Kingdom of Serbs, Croats and Slovenes" was created you'd have the choice of deleting "croatia", "serbia" and "slovenia" and creating a new record, or updating one of them to be "Yugoslavia" ( possibly "serbia" if you were observing the will of the political climate ) and deleting the others.

(2) you would then have "updated" the natural key to "Kingdom of Yugoslavia" which would've meant updating the key, but let me tell you, this would not have been "changing the representation"

(3) you would then have needed to update the key again to "Socialist Federal Republic of Yugoslavia", again not just "changing the representation"

(4) once again for the "Federal Republic of Yugoslavia" when you would've had the option of deleting the country you had under the key "Montenegro" and recreating several of the countires you deleted earlier - and a couple of others, "Slovenia" "Croatia" "Macedonia" and "Bosnia Herzegovina"

(5) Quite what you'd do when you needed to account for "Serbia and Montenegro" I don't know

(6) Again you'd create a "new" country "The Former Yugoslav Republic of Macedonia" which you'd have to consider doing something spectacular with since in parts of the world this is known as the distinct entity "Macedonia" with different borders.

Yes, you change the representation, but have also changed the meaning.

Secondly:

internal representation that correlates one-to-one with that particular date

is not true either since a date is only meaningful with a timezone.

Sales made on "05-JUL-2005" may be made on one of three different "business days" according to where you consider the sale to be made which is not an intransitive fact.

 
At 7:55 AM, Blogger David Aldridge said...

cspierings,

I wouldn't say that I'm bothered by it's internal illegibility, I just think that when you combine the need for some conversion process with the flexibility of that conversion then it is to some extent analagous to the way that synthetic keys function in an application.

 
At 8:12 AM, Blogger David Aldridge said...

anonymous,

In the examples that you list for the way in which countries change, I think that you are showing how the entity itself is changing -- countries splitting into multiple new countries, or multiple countries merging into a single country. So in those cases you're right -- the issue of how you represent the new entity(s) is really a secondary problem because the fundamental nature of the entities has changed.

However when I talked about the representation for a country changing I was not referring to such cases, it was more akin to "Burma" changing it's name to "Myanmar", or to changes in the international country code for "Finland" - same entity, just a different name or code ("representation") for it.

I note your point on timezones, but I feel that the phrase that you quote holds up because neither the internal representation of a DATE datatype nor the string representation of it contain timezone information, unless it is inferred from business rules and other data of course (eg. "We always use the local date, with the location being identified by the country_code", or "We always store dates as GMT and convert in the application logic to local timezones"). This is an obvious weakness of DATE that has been addressed with the TIMESTAMP WITH TIME ZONE datatype. So while DATE has a weakness in this respect it is one that is shared with its string representation (in the absence of external data or metadata) and the string representation is just as meaningful as the internal representation is.

 
At 4:39 PM, Anonymous Anonymous said...

But Burma, under British colonial rule was administered as part of India, it is the same entity as Burma, Independent Commonwealth State after 1947ish. The changes represented by a country changing name, such as in the case you quote invariably denote some meaningful change in the entity. Myanmar has changed its name largely as a marketing stunt, but it's borders have shifted and it's currency is altered. Your point seemed to be that in cases where you have a candidate key that requires an update, the update is purely a matter of changing how you represent an entity which has remaining static. I challenge that this is true.

 
At 4:59 PM, Blogger David Aldridge said...

"Your point seemed to be that in cases where you have a candidate key that requires an update, the update is purely a matter of changing how you represent an entity which has remaining static. I challenge that this is true."

Are you challenging whether it is ever true, or just that in some cases it is not true? Because it obviously is sometimes true. People change their names, companies change their names, telephone area codes change, our representation of city names changes ("Peking" becomes "Beijing", for example), need I go on?

 

Post a Comment

<< Home