The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Wednesday, August 31, 2005

Partitions With And Without Rows

Just a quick note on a SQL technique I stumbled on that I thought I'd share.

Having a partitioned fact table, with some partitions populated and others not, I have a series of materialized views that have not necessarily been entirely maintained. In this situation the materialized views are all partitioned with exactly the same scheme as the fact table but some of the partitions that have rows in the fact table do not have rows in the materialized views.

I needed a query to rapidly identify those cases so that the offending MV partitions could be maintained, and I came up with a requirement as follows: Identify all the partitions of the fact table that have at least one row, and for which there are no rows in the equivalent partition of a particular materialized view.

Because I had a usable naming convention for my partitions I was able to translate a partition name into a predicate very easily. for example given a partition name "Y2005M01" I can translate this into '01-jan-2005' using the simple method to_date(partition_name,'"Y"YYYY"M"MM'). You can try this yourself with:
select to_char(sysdate,'"Y"YYYY"M"MM') from dual;
...and then ...
select to_date(to_char(sysdate,'"Y"YYYY"M"MM'),'"Y"YYYY"M"MM') from dual;

So this allowed a query of the form:

Select partition_name
From user_tab_partitions utp
Where table_name = 'MY_FCT_TABLE' And
Exists
(Select /*+ norewrite */ 1
From MY_FCT_TABLE
Where date_of_month =
to_date(utp.partition_name,
'"Y"YYYY"M"MM')) And
Not Exists
(Select /*+ norewrite */ 1
From MY_FCT_TABLE_MV
Where date_of_month =
to_date(utp.partition_name,
'"Y"YYYY"M"MM'));

For a table having 175 partitions this identified the half-dozen or so problem partitions in around 3 seconds. Not too shabby, I thought.

Tuesday, August 30, 2005

An Easier Explanation

The DBMS_XPLAN supplied package gives results that are so much more usable than the old "set autotrace traceonly explain", or any manual method of reading the plan_table, that the only thing holding me back from using it more often was the lengthy syntax.

No more: I have created views to save my aching fingers. Feel free to use them, modify them, whatever for your own purposes -- just don't blame me if they break something ;)

create or replace view xplan_basic
as
select * from table(dbms_xplan.display('plan_table',null,'BASIC'));*

create or replace view xplan_typical
as
select * from table(dbms_xplan.display('plan_table',null,'TYPICAL'));

create or replace view xplan_parallel
as
select * from table(dbms_xplan.display('plan_table',null,'ALL'));

create or replace view xplan_serial
as
select * from table(dbms_xplan.display('plan_table',null,'SERIAL'));

I am now a mere "select * from xplan_parallel" away from my execution plans.

* see comment from Pete ... I missed the "BASIC" parameter value in the original post.

Partition Pruning and Dimension Tables

As a side note to a previous blog, Peter Scott kindly humoured me by checking on a partition pruning scenario that I had been unable to simulate. Specifically, the scenario is of partition pruning in a fact table based on a predicate that references a higher dimensional level through a joined dimension table.

Well, I found the metalink note that describes this method of optimization, and it's number 209070.1, with the generously proportioned title: "Partition Pruning based on Joins to Partitioning Criteria Stored in Dimension Tables". In contrast to the Most Useless Metalink Article Ever, this one is actually extremely useful, and aside from skipping details of a couple of hidden parameters it is very detailed in its description of the internal mechanism by which the optimizer implements "subquery pruning". Maybe I'll name it "Most Useful Metalink Article Ever". Nah, maybe not.

What also aroused my interest was the article's mention of the TBL$OR$IDX$PART$NUM() function, which I'd previously only encountered in trace files for partition exchanges and an article by Jonathan Lewis. I guess that this is about as documented as this function gets.

Another feature, interesting by it's omission, is that foreign keys do not get a mention either in this article or the associated 179518.1: "Partition Pruning and Joins". I'll have to see if my tests require a foreign key or not -- Peter's own test used foreign key constraints in the disabled/rely state, and I'm guessing that they're not considered.

Also worthy of note are the conditions under which subquery pruning will be invoked: "The cost of the recursive subquery must not exceed 5% of the cost of accessing all data in the partitioned fact table and the predicates on the dimension table must select less than 50% of the data in the dimension table." The former appears to be controlled by the hidden parameter _subquery_pruning_cost_factor and the latter by _subquery_pruning_reduction, which have default values of 20 (1/0.05 of course) and 50 respectively on my 10.1.0.3.0 database. Rather circumstantial evidence, but supported by the note's advice:
"As long as there is a valid join on the partition key then pruning can be forced by setting :
_subquery_pruning_cost_factor =1
and
_subquery_pruning_reduction =100.
This effectively overrides the default choice made by the optimizer."

The recursive subquery referred to above appears to be the one by which the required partitions are identified: that involving the TBL$OR$IDX$PART$NUM() function. Since this is a query against a dimension table it is difficult toimagine that it could reach 5% of the total query cost under real world conditions, but it's another issue to bear in mind. I speculatively wonder whether it would be possible to create a materialized view to help return this result faster, or whether the nature of the function or the use of recursive SQL would prohibit it's creation or use? Another thing to think about later.

Anyway, a little more work could nail these issues down ... alas, tempus fugit etc.

Finally there is another hidden parameter (revealed through the use of JL's script here) to consider: _subquery_pruning_mv_enabled, with a default value of "false" and a description of "enable the use of subquery predicates with MVs to perform pruning". This seems to me to be a critical parameter. Why it is set to false by default is not immediately clear to me but I'll be keeping this parameter in mind as it appears that it could relate to query rewrite which is an important part of the database design I am working on. Worse case scenario: raising an iTar to get the thumbs up from Oracle Support to make a change to a hidden parameter. Not a bad thing in itself, as they generally like you to jump through a few hoops in justifying such a course of action.

Well, it's been an interesting and educational week so far, that's for sure.

For another resource on this issue see this thread at AskTom.

Friday, August 26, 2005

Doings With Dimensions

An interesting blog over at Peter Scott's place -- BYOB.

Is there anyone out there not on vacation or travel?

So, Howard Rogers is in New Zealand, Tom Kyte is in Iceland (and heading to the rest of Europe), Jonathan Lewis was in Norway yesterday and is in Lithuania next week, and everyone from Burleson Consulting is in an undisclosed location -- I don't know where it is but it sure ain't North Carolina.

I get two nights with the family in a travel trailer up in Buena Vista (pronounced "Byoona Vista" by the locals, it seems). Oh, I'll be in Dayton, Ohio for the Air Force Marathon next month (no, not me -- the wife is running the half-marathon).

What exactly am I doing wrong?

The Linux Experience: What's Missing

It's been a week or two since I dedicated my laptop entirely to Whitebox Linux, and so far I have few complaints.

There are some applications that I rely on that are either not available on Linux, or for which I don't have a spare $100,000 for the license. In the former category is UltraEdit. It's one of those applications that seem pretty simple to start with, and then you find one key feature that you can't do without. For me that feature is column-mode editing.

Long-story-short, column-mode editing allows you to select, copy and paste a rectangular block of text. It also allows you to select a single column, start typing, and have whatever you type echoed on every line selected.

So for example if you want to create a view over a table with fifty columns and you want to wrap every column in an expression such as:
Decode(col1,' ','N/A',col1) col1,
...then you;

DESC the table
paste the result into UltraEdit
use column mode to select and copy the rectangle that holds the column names
paste the result over to the right, twice

... and this gives you your list of column names three times, looking something like;

col1 col1 col1
col2 col2 col2
col3 col2 col3

You then use column mode to select the gap to the left of the leftmost column of column names (nothing confusing about this example!) and type "Decode(". This gives you:

Decode(col1 col1 col1
Decode(col2 col2 col2
Decode(col3 col3 col3

Then you select the gap to the left of the second column of column names, and type ",' ','N/A'," giving you:

Decode(col1,' ','N/A',col1 col1
Decode(col2,' ','N/A',col2 col2
Decode(col3,' ','N/A',col3 col3

Well, you see how this works anyway. Once you get into the swing of things this is very fast and pretty soon you have:

Decode(col1,' ','N/A',col1) col1,
Decode(col2,' ','N/A',col2) col2,
Decode(col3,' ','N/A',col3) col3,

Now you could do the same thing by querying user_tab_columns, but with a complex or lengthy expression I much prefer this way. For example in:

Decode(LTrim(col1,'0'),'}',0,'A',1,'B',2,zoned_to_number(col1)) col1,

... I just can't keep track of all of those single quotes that have to be doubled-up.

It also will search and replace on special characters such as carriage returns very easily, so you can remove occurances of blank lines by replacing "^p^p" with "^p" until they're all gone for example, or get a line break after each comma by replacing commas with ",^p".

Not ... available ... on ... Linux.

Ouch.

Still, if you're on Windows then I recommend it. For $40 I'd give a copy to every DBA and developer (only if they're on Windows, of course. It'd be a bit of a waste otherwise.)

Tuesday, August 23, 2005

Word Verification for Comments

Following a tip by Gabe over at TK's blog, I've turned on word verification for comments to stop the spambots.

I don't know, I'll kind of miss them in a way ...

Monday, August 22, 2005

Execution Plans for "Partition Not-Quite-Pruning"

This is a followup to this blog, which itself was a followup to this one.

Table Structure
The table here is called FCT_FSA_MSD_GSD and is composite partitioned on file_load_month and fund_cd. There are about 60 partitions on file_load_month and three subpartitions per partition, of which one holds only fund_cd values of '6C'. A fiscal month is not directly represented in the table, and is defined with a range of values for the date_of_day column.

Indexes
xie06fct_fsa_msd_gsd: single column bitmap index on FILE_LOAD_MONTH
xie15fct_fsa_msd_gsd: single column bitmap index on DATE_OF_DAY
xie28fct_fsa_msd_gsd: single column bitmap index on FUND_CD

Explain Plans
I've edited the plans just to take out the parallelism references that were mucking up the format, and joined lines together where they broke.

Here's the old form of the query.

SQL> select count(*), sum(fiscal_amt) from fct_fsa_msd_gsd
2  where date_of_day between '01-jan-2004' and '31-jan-2004' and
3  fund_cd = '6C'
4  /

COUNT(*) SUM(FISCAL_AMT)
---------- ---------------
6469727      3521311527

Elapsed: 00:00:41.01

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=22964 Card=1 Bytes=16)
1    0   SORT (AGGREGATE)
2    1     SORT* (AGGREGATE)
3    2       PARTITION RANGE* (ALL)
4    3         TABLE ACCESS* (BY LOCAL INDEX ROWID) OF 'FCT_FSA_MSD_GSD' (Cost=22964 Card=940684 Bytes=15050944)
5    4           BITMAP CONVERSION* (TO ROWIDS)
6    5             BITMAP AND*
7    6               BITMAP MERGE*
8    7                 BITMAP INDEX* (RANGE SCAN) OF 'XIE15FCT_FSA_MSD_GSD'
9    6               BITMAP INDEX* (SINGLE VALUE) OF 'XIE28FCT_FSA_MSD_GSD'


Statistics
----------------------------------------------------------
     20  recursive calls
      3  db block gets
  36489  consistent gets
  36360  physical reads
   1016  redo size
    258  bytes sent via SQL*Net to client
    274  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
     70  sorts (memory)
      0  sorts (disk)

      1  rows processed

Commentary on above: A bitmap merge combines
the fund code and date of day predicates,
then every partition of the table is accessed by
rowid.

Here's the first part of the improved query

SQL> select count(*), sum(fiscal_amt) from fct_fsa_msd_gsd
2  where date_of_day between '01-jan-2004' and '31-jan-2004' and
3  fund_cd = '6C' and
4  file_load_month = '01-jan-2004'
5  /

COUNT(*) SUM(FISCAL_AMT)
---------- ---------------
6469717      3521307863

Elapsed: 00:00:12.02

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=197 Card=1 Bytes=25)
1    0   SORT (AGGREGATE)
2    1     SORT* (AGGREGATE)
3    2       TABLE ACCESS* (FULL) OF 'FCT_FSA_MSD_GSD' (Cost=197 Card=3375101 Bytes=84377525)

Statistics
----------------------------------------------------------
    149  recursive calls
      3  db block gets
  37043  consistent gets
  36651  physical reads
    936  redo size
    244  bytes sent via SQL*Net to client
    275  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      3  sorts (memory)
      0  sorts (disk)
      1  rows processed

Commentary on above: This is 9.2.0.6 and the explain plan is not
showing that partition pruning has taken place.
However from the consistent gets and physical reads is evidently has.
Looks like the subpartition stats are off as well. Tut.


Here's the second part of the improved query

SQL> select count(*),sum(fiscal_amt) from fct_fsa_msd_gsd
2  where date_of_day between '01-jan-2004' and '31-jan-2004' and
3  fund_cd = '6C' and
4  file_load_month != '01-jan-2004'
5  /

COUNT(*) SUM(FISCAL_AMT)
---------- ---------------
    10          3663.2

Elapsed: 00:00:04.00

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=22432 Card=1 Bytes=24)
1    0   SORT (AGGREGATE)
2    1     SORT* (AGGREGATE)
3    2       PARTITION RANGE* (ALL)
4    3         TABLE ACCESS* (BY LOCAL INDEX ROWID) OF 'FCT_FSA_MSD_GSD' (Cost=22432 Card=919682 Bytes=22072368)
5    4           BITMAP CONVERSION* (TO ROWIDS)
6    5             BITMAP AND*
7    6               BITMAP MINUS*
8    7                 BITMAP MERGE*
9    8                   BITMAP INDEX* (RANGE SCAN) OF 'XIE15FCT_FSA_MSD_GSD'
10    7                 BITMAP INDEX* (SINGLE VALUE) OF 'XIE06FCT_FSA_MSD_GSD'
11    6               BITMAP INDEX* (SINGLE VALUE) OF 'XIE28FCT_FSA_MSD_GSD'

Statistics
----------------------------------------------------------
     20  recursive calls
      3  db block gets
    725  consistent gets
    591  physical reads
   1008  redo size
    253  bytes sent via SQL*Net to client
    275  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
     70  sorts (memory)
      0  sorts (disk)
      1  rows processed

Commentary on above. Bitmap indexes on file_load_month is minus'd
from that on date_of_day, then the result is merged with that on
fund cd, and the entire table is accessed by rowid.


Here's the combined parts of the improved query

SQL> select sum(rc),sum(fiscal_amt) from
2  (
3  select count(*) rc, sum(fiscal_amt) fiscal_amt from fct_fsa_msd_gsd
4  where date_of_day between '01-jan-2004' and '31-jan-2004' and
5  fund_cd = '6C' and
6  file_load_month = '01-jan-2004'
7  union all
8  select count(*),sum(fiscal_amt) from fct_fsa_msd_gsd
9  where date_of_day between '01-jan-2004' and '31-jan-2004' and
10  fund_cd = '6C' and
11  file_load_month != '01-jan-2004'
12  )
13  /

SUM(RC) SUM(FISCAL_AMT)
---------- ---------------
6469727      3521311527

Elapsed: 00:00:37.03

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=22629 Card=1 Bytes=26)
1    0   SORT (AGGREGATE)
2    1     VIEW (Cost=22629 Card=2 Bytes=52)
3    2       UNION-ALL
4    3         SORT (AGGREGATE)
5    4           SORT* (AGGREGATE)
6    5             TABLE ACCESS* (FULL) OF 'FCT_FSA_MSD_GSD' (Cost=197 Card=3375101 Bytes=84377525)
7    3         SORT (AGGREGATE)
8    7           SORT* (AGGREGATE)
9    8             PARTITION RANGE* (ALL)
10    9               TABLE ACCESS* (BY LOCAL INDEX ROWID) OF 'FCT_FSA_MSD_GSD' (Cost=22432 Card=919682 Bytes=22072368)
11   10                 BITMAP CONVERSION* (TO ROWIDS)
12   11                   BITMAP AND*
13   12                     BITMAP MINUS*
14   13                       BITMAP MERGE*
15   14                         BITMAP INDEX* (RANGE SCAN) OF 'XIE15FCT_FSA_MSD_GSD'
16   13                       BITMAP INDEX* (SINGLE VALUE) OF 'XIE06FCT_FSA_MSD_GSD'
17   12                     BITMAP INDEX* (SINGLE VALUE) OF 'XIE28FCT_FSA_MSD_GSD'

Commentary on above: Here we go, all working nicely together. the timing
for this combined query is high, but this is a working production
machine with a bunch of other stuff going on.

Other Comments
My guesstimate on the row proportions was off. In fact there were 6.5 million rows involved in the result set, and only 10 of them were not in the Jan 2005 partition (so that's 0.000154%).

Sunday, August 21, 2005

More on Partition Not-Quite-Pruning

(Updated below)

Here's a follow up to the previous post on this subject, which ought to be read first. This is just to give more background on why I think this technique works.

We have a scenario where we want to read all of the rows where fiscal_month is Jan-2005 (around one million rows) from a fact table of sixty partitions (based on file_load_month). 99.5% of these rows (995,000 rows) are contained in a single partition (file_load_month=Jan-2005) which itself holds around one million rows. The other 5,000 rows are scattered across other partitions of the table. In fact they will probably be found in around ten-twenty of the remaining fifty-nine partitions.

If you just query for rows where "fiscal_month = Jan-2005" then no partition pruning is possible (remember that the table is partitioned on file_load_month) and therefore the optimizer has two choices on how to execute this

Access by index -- find several million rows by reading a bitmap index on fiscal_month and then reading the relevant blocks through single block io.
Access by full table scan -- read all sixty partitions of the table in multiblock io.

Neither of these are satisfactory, so a different query was employed where two queries were UNION ALL'd together.

Selects all of the required rows from the file_load_month Jan-2005 partition
Select all of the required rows from all of the other partitions.

The first query uses partition pruning to read 995,000 rows of the one million rows in the file_load_month Jan-2005 partition. Multiblock io is used.

The second query accesses the table through a BITMAP MINUS operation on two bitmap indexes (on fiscal_month minus file_load_month) to read around 5,000 rows that are probably scattered throughout the remaining fifty-nine partitions using single block io.

Now if you consider how many blocks get read for the old technique and the new technique, then the two methods are practically the same. In fact the new method probably reads slightly more blocks than the old method. The difference between them comes from the ability of the new methodology to read nearly all of the required table blocks in multiblock mode.

It is interesting to note that the optimizer actually has all of the information that it needs in order to perform this optimization internally. Partition-level histograms on the fiscal_month show that nearly all of the required rows are contained in a single partition, and almost none of them are contained in any other.

I'm going to get some explain plans extracted that are more instructive than those I currently have, and I'll get back to you with them.

Update

Jonathan Lewis has written to point out the similarity between this approach and the practically-defunct feature of "Partition Views" (PV), in which similar tables are UNION ALL'ed together in a view that can be queried to produce an effect similar to pruning in a partitioned table (PT).

He also points out one of the strengths of PV -- that the optimizer could not only decide not to touch a partition (a form of optimization we know as "pruning" of course) but could also generate a different form of execution plan for each partition that the query would touch. This is a benefit of PV that was lost with PT (except for the "do not touch this partition" pruning optimization) and which is resurrected in the technique I'm describing here. Exactly the same principle applies, of breaking up a result set into multiple subsets with UNION ALL to merge them together.

I'm inclined to wonder whether in this particular example an approach of defining a partition view over a partitioned table, in the form of ...

Select fiscal_month, file_load_month, ...
from my_table
where fiscal_month = file_load_month
Union All
Select fiscal_month, file_load_month, ...
from my_table
where fiscal_month != file_load_month

... would be productive. I wonder whether the optimizer transitivity would be robust enough to handle this situation.

Jonathan's article comparing PV and PT is in Microsoft Word format here -- for the MS-averse Google has a passable HTML conversion here.

Saturday, August 20, 2005

Partition Not-Quite-Pruning

I'm in the middle of a migration of a fact table from one structure to another. The key difference is in the partitioning key -- the old structure was partitioned according to the month in which the data arrived, wheras the new structure is partitioned according to the fiscal month of the transaction. It's a very slight but very important difference, because it is the fiscal month that the users will query by, hence partition pruning is very much more likely to occur in the new structure.

The data is being migrated one new-table partition at a time, to allow MV maintenance-without-tears. Now a partition of the new table sources data almost entirely from one partition of the old table, but not quite. Probably around 99.5% of the new partition's data will come from a single partition of the old table (and will represent about 99.5% of that old partition's data). The other 0.5% can come from pretty much any other partition of the old table.

So, here's the problem -- does the optimizer scan the entire old table in lovely efficient multiblock reads (direct, reads, because of the query parallelism), or does it use single block reads and the bitmap index on fiscal_month? Either one of these is unsatisfactory, so here's the solution I came up with.

Instead of the straightforward statement:

Select *
From old_table
Where fiscal_month = '01-jan-2005'

... we instead can use:

Select *
From old_table
Where fiscal_month = '01-jan-2005' and
file_load_month = '01-jan-2005'
Union All
Select *
From old_table
Where fiscal_month = '01-jan-2005' and
file_load_month != '01-jan-2005'

The first result set selects the 99.5% of the required rows that represent 99.5% of the old tables file_load_month-based partition for '01-jan-2004', and it does this with an efficient (multiblock) full scan of a single partition.

The second result set selects the 0.5% of the required rows that are in any partition other than that for file_load_month '01-jan-2004', and it does this with single block reads through the combination of the bitmap indexes on fiscal_month and file_load_month.

So in this case we got the best of both worlds. The query time dropped from around 56 seconds to 17 seconds.

Now, what should we call this? "Partition Not-Quite-Pruning"? "Partition Semipruning?" ... ?

Continued here ...

Thursday, August 18, 2005

A List: Moving Data Between Databases

A simple one, this. But there's a lot of ways out there.

Spool to Flat File, Load with SQL*Loader or External Tables: SQL*Plus does a reasonable job of unloading. Tom Kyte has a C utility for unloading data also.
SQL*Plus COPY command: Sadly deprecated, and without support for all data types, but a handy tool for basic data transfers. Allows arbitrary SQL on the data selection.
Database Links: Direct movement from one database to another. Aside from the network latency this is basically the same as reading data from your local database. If you combine database links and materialized views, then you can replicate remote data to your local database through the databae link.
DBMS_PIPE: I don't recall reading of anyone using this, but I expect that it's a theoretical possibility to send data from one instance to another through a pipe. scratch this one: see TK's comment
Export/Import: Sometimes mistaken for part of a backup and recovery strategy!
Transportable Tablespaces: Closely related to Export/Import, allowing movement of entire tablespaces without the need for unloading and reloading data.
Data Pump: Export/Import for the new millenium :) and with many new features.
Oracle Streams: Powerful functionality here. Non-trivial, but probably the modern tool of choice for the regular propagation of data around your corporate network.

Finally, here is an introduction to the sharing of information between database systems.

Wednesday, August 17, 2005

Physics Review

Not an Oracle topic, but here is a news story of interest to the scientifically inclined reader. Science is truly on the defense.

http://www.theonion.com/news/index.php?issue=4133&n=2

Monday, August 15, 2005

A List: Things That Materialized Views Can Be (Philisophically) Similar To

When is a materialized view like a ... ?:

Constraint: When you use it to enforce a multirow constraint, such as "SUM(allocation_pct) per Site = 100", which is not supported through regular constraints. Better then triggers because MV's refresh on commit, not on DDLoperations themselves.
Index: When you use it to speed a full scan of a subset of a table's columns .
Partition: When a materialized view selects a subset of a table's rows, thus making the subset accessible through multiblock reads instead of single block index-based access, and without requiring the whole of the original table to be scanned.
New Set Of Unbelievably Fast Disks: When a materialized view with query rewrite allows a large data set to be pre-aggregated and subsequently queried in almost no time at all.
Trigger: When you stop trying to enforce multirow constraints with triggers and start using materialized views instead.
Synonym: When it allows you to give an alternative name to a table.
Time Machine: When a stale materialized view shows you the previous state of a table's data. A materliazed view log then gives you an audit trail of sorts for changes since the last refresh*

1, 3 and 4 seem to be the most interesting ones there. I don't think I'd use 2, unless it was on a single column with nulls, and a bitmap index was not appropriate, and the query had to consider nulls. 6 seems like a real stretch.

* Thanks Gary

Training and Demonstrations

A few years ago some collegues of mine introduced themselves to a Vice-President of Sales for a Well Known Business Intelligence corporation at a local trade fare, and explained that they were familiar with his product thorugh a "cook-off" that our group had held between his product and that of a major competitior. "Ah ha", cries he, "is that British bastard here as well then?"

Heh heh heh. It had been an interesting exercise all right. It had culminated in a demonstration to users of the two product suites, with my own demonstration of Product A facing off against the corporation's pre-sales consultant's demonstration of Product B, followed by a round of discussion and voting by the users and project managers. Product A swept the board and was duly annointed the successor to our then-current tool. Apparantly there then followed much gnashing and grinding of teeth at the WKBI corporation, and a belief that the competition had been unfairly skewed in some way.

So why the overwhelming vote for Product A? The users believed that the product was more intuitive, easier to use, with a more simple interface. They believed that it would just be easier to get their jobs done with that choice. Whether they were right or not, I'm not sure. The difference between theusability of the two products is probably pretty slight, but the difference between the demonstations was very clear and is very well illustrated by the article at ComputerWorld that prompted today's stream of consciousness. I'll just pause while you absorb it.

...

The two demonstrations were almost exactly like those example methodologies. While the pre-sales consultant for Product B described every feature available to the users, your humble correspondant showed them how to login to Product A, how to run one of the standard reports, how to create and modify a personal report (once at demonstrate-and-explain-every-mouse-click speed, then at normal user speed), and how to send it to other users. It was a very *ahem* dynamic presentation with a great deal of audience interaction. I suppose that the essence of the matter was that a sales demonstration was turned into a training session of the type described in the article.

So the end result was that the audience saw that Product A did what they want, but that Product B did a lot of stuff that they didn't fully comprehend. From there the voting was practically a formality.

I wonder whether the Well Known Business Intelligence corporation's consultant ever realised why the bid was lost to them? The VP of Sales certainly had a strong opinions on the matter. The big baby.

Thursday, August 11, 2005

How Not To Ask A Question

Do you think it's possible that this questioner is being just a little broad in the advice he/she is requesting? Maybe just a shade optimistic?

Lesson from the Shark Tank

Here are two learning experiences in one -- how databases get corrupted, and how consultants get ... um ... corrupted as well.

Waiting For A Delivery

If there's anything more exciting than waiting for the delivery of a shiny new piece of technology, then I'd like to know what it is.

Today I'm waiting for a replacement hard drive for my two year old laptop, which I'm going to store in the basement as a little database and HTML DB server, on my "shelf of technology" (cable modem, wireless router, NSLU2 and attached hard drive). The new drive is a speedy little Hitachi Travelstar 7K60, which seems to be about the fastest I can get. A bit tricky to come by, but I found one here.

So I'm really changing the drive because if it all goes pear-shaped and I want to get my old WinXP/Suse9.1 dual-boot laptop back then I can just swap the disks over again. That, and it's a shiny new thing that I get delivered, and sometimes you just have to find excitement where you can.

Wednesday, August 10, 2005

A List: Ways to Scan a Table Faster

So you need to read an entire table? Here's some features and techniques that may help.

Employ Parallelism: Ah, the sweetest technique of all. Brute force. If you have the resources, then let Oracle divide the table into chunks and use all your system resources to scan it. Unfriendly in an OLTP environment, mind you.
Compress Data Segments: Make the table smaller! Block level compression can shrink that data like a plunge into cold water. Very OLTP unfriendly.
Reduce Free Space Percent: PCTFREE = very small => more rows per block => smaller table. And potentially a higher chance of row migration, of course.
Increase Percent Used: PCTUSED = large => less likely that blocks retain free space following deletes => more rows per block => smaller table.
Use a Larger Block Size: For significantly long rows you may get reduced empty space, thus a smaller table to scan.
Reorder The Columns: If you are commonly interested in just a subset of columns (for example metrics in a fact table) then consider making them the first columns of the table definition - for tables with lots of columns there is measurable overhead in finding the end columns (I'm not talking about row chaining here). Hmmm, maybe I'll post something else about this.
Index Columns of Interest: An index can be treated as a skinny table, and your query might be satisfied by a fast ful or full index scan. The usual comments about NULL values in indexes apply here. Don't neglect consideraton of index size either - index key compression and use of bitmap indexes provide smaller structures to scan.
Materialized Views: Is there anything they're not good for? This could be a genuine skinny table, or an aggregation of selected columns.
Ensure Table Is Spread Over Available Devices: With consequent reduced likelihood of encountering an i/o choke point.

There is not a single feature listed here that carries with it no disadvantages, and depending on your circumstances the severity of the disadvantages may range from the insignificant to the devastating.

Do not implement any of these without understanding how the feature works, the mechanism by which it provides the advantage, how it brings disadvantages, and how these all interact in your particular situation.

Tuesday, August 09, 2005

Handy Hardware, Or Not ...

Here's an AnandTech review that caught my eye -- 4Gb of memory with a battery backup. Could this be an economical way of getting performance for non-sequential storage access?

I wasn't particularly impressed with the way that the performance was reported, particularly for the random access test, because no indication was given on whether the device was the bottleneck on the test or not -- hey, maybe the CPU's were maxed out? It's a mystery. There's an indication here that random reads and writes would really benefit.

So what do you think? I'm no shill for solid-state drives, but for a small-but-busy server, would you put your redo logs on this? Temp space? How about the system tablespace? ;)

Monday, August 08, 2005

A List: SQL Features You've Probably Never Used In Production

By popular demand (one anonymous person -- hey, it's all relative), another list.

OK, you may have played with them, found a hypothetical circumstance for them, but you've probably never found a real situation in which to use them, or if you have you've probably backed out at the last minute. Or the DBA won't let you use them. Or you found there was a bug that spoiled it.

More likely you will not even recognize them -- unless you're one of those losers who enjoys browsing documentation of course. * ahem *

The Subquery Factoring Clause: OK, a bit of a softball to start with. I bet that someone has used this. Not first time, but as part of a tuning exercise. And to show off.
The Partitioning Clause of a Query Table Expression: Actually hardcoding a partition name in a query? Wellll, OK maybe.
The Model Clause: This is great! What's it for?
Multiset Operators: Or nested tables in general
Submultiset: OK, that's cheating. Let's just say "nested tables"
The CORR_K() Function: Oh wait, there was that time when ... nah not really. Never used it.
PowerMultiset_By_Cardinality: "...takes as input a nested table and a cardinality and returns a nested table of nested tables containing all nonempty subsets (called submultisets) of the nested table of the specified cardinality". * wipes-eyes * OK seriously, that's the last time I'm going to mention nested tables. Or nested tables of nested tables.
Alter Sequence: Admit it ... you thought you were going to be able to change the current value with it, but then found you just had to drop the sequence and recreate it, so that's what you did.
Associate/Dissociate Statistics: What?

Well, I pride myself on the high education level of my readership, and also it's high regard for risk-taking. If you have actually used one of these features in a production system, and it has actually made it past the DBA/watchdog, then please reply under the heading "I have an unusual affinity for Shiny New Things, and have successfully used the following features in a production system:"

A List: Accessing Part Of A Table Quickly

I like lists. In particular, I like to make little lists of Oracle features that have some commonality in the way they work, or in what they do.

Today's list is Oracle Features For Accessing Part Of A Table Quickly. Quicker than scanning the whole table anyway, and assuming that the table is not of trivial size. Here's what I have so far:

Creating an Index: obviously.
Making the Table Index-Organized: Is that too similar to Option 1 to justify a new heading, do you think?
Placing The Table In A Cluster: Hash or indexed.
Partitioning The Table: Partition pruning being the key feature here, or through using the PARTITION or SUBPARTITION clauses of the query table expression.
Creating a Materialized View: By precomputing the result set for a query that requires only part of the full table. Query rewrite optional, I suppose.

Off the top of my head, that's all I can think of right now. Other suggestions welcome.

Trouble Down Under

I see that DKB and HJR are having at it again.

DKB posted some inflammatory comments here, and HJR posted responses here, here, and here.

So apparantly DKB's secret location from which he is broadcasting is Australia. I'm sure he knows his own business, but it seems to me that his blog entry is very derogatory towards the people paying the bill for his consultancy ... "Offshoring (the use of grossly under-trained and unskilled IT professionals)" ... "if you are an IT manager stupid enough to entrust a mission-critical computer system to a foreign “offshore” developer firm".

I think that it's nice that Burleson Consulting now has a legal presence in Australia .. DKB and HJR can get on with their legal business without bothering the rest of us. On a side note, while OraBlogs is getting notices that they are rendering copyrighted images, Don's own forum makes extensive use of copyrighted images. Like I say though, doubtless he knows his own business ...

Daylight Savings Time

Update

Looks like the journos have figured out the downside ... http://www.cnn.com/2005/TECH/08/08/daylight.saving.ap/index.html

The good news though: "Missiles won't be launching". I don't know how they expect the unscrupulous to make a profit out of something if reputable news media aren't going to scream hysterically about it for months in advance, in the style of The Great Y2K Fiasco. Ah, how fondly I recall the sweet satisfaction of waking up on January 1st 2000, having taken not a single precaution against the promised apocalypse, and finding everything working completely normally.

Anyway, this is a serious business in its own way: I recall being asked a few years ago to opine on a bid for a state system that was being converted to Oracle. The system defined funding for various state entities, and virtually the entire business logic was in effect defined by the state legislature. As such the funding model was vulnerable to change at any time, even retrospectively, and the only limit to the scope of change was the imagination of the state representatives and their advisors. For a fixed price contract and with existing staff already reeling from a series of recent changes it would have been a risk management nightmare. "Thumbs down" for that one.

Original Article

Heads-up people!

CNN are reporting on imminent changes to daylight savings time.

"House and Senate negotiators on an energy bill agreed to begin daylight-saving time three weeks earlier, on the second Sunday in March, and extend it by one week to the first Sunday in November.
...
Lawmakers said they hoped to complete the energy legislation next week."

I wonder how much bespoke code this is going to affect? I'm supposing that there will be some Oracle patches to deal with the required changes to TIMESTAMP WITH TIMEZONE, etc.

Here's a search on the 10g documentation for "daylight savings time", by the way.

Thursday, August 04, 2005

Search Oracle Docs from Firefox

Eddie Awad has developed a Firefox search plugin for the Oracle documentation ... see his blog entry for details.

Nice job Eddie, thanks for that.

A search plugin is also available for AskTom ... does anyone have a link to it?

Wednesday, August 03, 2005

Marketing of Materialized Views

I was having a casual browse of the Oracle website today, like you do, and noticed in a side panel here the following phrase: "The materialized view can be thought of as a special kind of view that physically exists inside the database."

A less reasonable person would have instantly vomited at such a tortured description. Fortunately I'm made of sterner stuff. Here are some descriptions that I thought up to do an even worse job of describing the feature.

"The materialized view can be thought of as a special kind of view that holds material."
"The materialized view can be thought of as a special kind of table that is also a view."
"The materialized view can be thought of as a special combination of a view and its data."

I don't know, are they worse or not?

In attempting to be positive I tried to think of a better one sentence description.

"A materialized view is a SELECT query definition stored in the data dictionary for which the complete result set is also stored, in a table logically associated with the query."

What do you reckon? Any improvements you can suggest?

Monday, August 01, 2005

Blogshares

What a busy evening for posts.

Can anyone explain what this is all about, for someone who is interested but has had far too much rum to understand it? The Sponge seems to be doing pretty well -- or pretty badly -- but at least it's doing something! A p/e of $27.83, by cracky. Now that's worth investing in ... or not. Or is it?

Anyway there's some good blogs linked into there it seems. I'm going to go ahead and claim my blog to see what happens.

It's Hangs Eight Inches and Weighs One Pound

You know you want to click on this to find out ... and yes, it is "work friendly".

Welcome to America!

Motivation

I don't know about you, but sometimes I feel that getting motivated to learn a new task or topic can be one of the hardest stages to tackle. It often goes something like this:

Thinks: I really like the idea of knowing more about X
A little online research to get the general feel for the topic
Software downloads
Possible investment is (tax-deductable) book
Realisation that neither my client nor I have any problems to be solved by this
Collapse into reclining chair, exhausted by ordeal

What is needed is some actual problem that can be solved, and that's how I ended up learning shell scripting, SQL, Perl, PL/SQL ... wel just about everything really.

To take an example a third-party tool was generating EDIFACT files for transmission to a client's major customers. Sad to relate, the customers' superior acceptance software would often reject the files due to some error in a totals line. Hence I researched, bought a book on Perl, and learned enough that in a couple of days we could scan the files for internal consistency before transmission. I bet that now I would not even recognise the code that I wrote back then, some nine or ten years ago, for I have had no use for Perl ever since.

I like to think of it as a beneficial character trait -- it keeps me focussed on what I need to know right now in order to make Bad Things go away and not come back, yet allows me the flexibility of getting familiar enough with topics to hopefully be able to recognise situations where they can help. On the other hand, it is really irritating that I can't follow through with these potentially educational experiences, and am condemned to live the life of a practical person. Yuk.

So speaking of which, what would be the worst possible job for me? It would be this: working on manned space flight at NASA.

Those with long and detailed memories may recall that the the space shuttle was needed to support the International Space Station, which was needed to support manned missions to Mars, which were needed because ... erm ... well let us not inquire too closely of that. The ISS has been occupied since November 2002, with a net contribution to science of precisely zero -- if we can exclude the science of how to stay alive in Earth orbit, or the urine-based sciences, that is. Can anyone explain why it it helps to study astronaut urine in order to understand the formation of kidney stones? Or why you need to have an astronaut to take photos of the Earth in order to "... provide researchers on Earth with vital, continuous images needed to better understand the planet"?

And the purpose of the space shuttle now seems to be to research how to mend space shuttles so as to better avoid killing people with them.

Sending robots to Mars, or better yet to Europa, now there's some interesting work. But not this manned-mission nonsense. How depressing it would all be.

Bak to Skool

My son starts first grade next week. Finally, an end to crippling daycare fees after nearly six years of them ...

... but wait, what's this? School Supplies! Ka-Ching!

Now the school supply list includes the following curious entries:

GIRLS ONLY: 1 box Quart Size Ziploc Bags
BOYS ONLY: 1 Box Gallon size Ziploc Bags and Antibacterial Wet Wipes

I've been staring at these entries, hoping that the letters would reform themselves to something that I could understand ... nothing yet though. I'll give it another hour then give up.

Update: Doh, it is because they need both quart and gallon sized bags of course, and half the kids will bring one and half will bring the other.

Also I was reading the wrong section, and spent fifteen minutes collecting kindergarten supplies by mistake. Number One Son was triumphant (for over an hour) when he heard that he really did need a nice new pair of scissors, and that Daddy was wrong about that. But he was still wrong about needing rainbow-coloured pencils, and he's definitely not getting glue with glitter in it.

On a completely different subject, at what age of child is it appropriate to start worrying about your chances of getting grandchildren?

How's Your Style?

I invite readers to drop in on the Economist Style Quiz, which is based upon the excellent Economist Style Guide for a lesson in how bad your English really is.

For the record, I achieved a score of eight out of twelve.

Oracle Sponge -- Now Moved To Wordpress