The Oracle Sponge Blog -- An Oracle Data Warehousing Resource

Thursday, October 27, 2005

Dynamic Sampling at AskTom

I saw an interesting question on dynamic sampling over at Ask Tom and ended up pretty much dumping a half-finished blog posting in reply. I'll save myself the elctrons of publishing it here by just linking to it. Enjoy.

Thursday, October 06, 2005

Testing the Trace Loader

As I said yesterday, I've been fiddling around with loading trace files into the database for analysis, and today I was looking for a good test case to see how the numbers are coming out. The obvious example to choose was the direct path enhancement to materialized view refresh that I also wrote about recently.

Here is my test script:


drop table master;
drop table master_test_data;
drop materialized view master_mv1;

create table master
   (
   location_cd  number not null,
   tx_timestamp date   not null,
   product_cd   number not null,
   tx_qty       number not null,
   tx_cst       number not null
   )
pctfree 0
nologging
/

create materialized view log on master
with rowid
   (
   location_cd,
   tx_timestamp,
   product_cd,
   tx_qty,
   tx_cst
   )
including new values
/

create materialized view master_mv1
Using No Index
Refresh Fast On Commit
Enable Query Rewrite
As
Select
   location_cd,
   trunc(tx_timestamp,'MM') tx_month,
   sum(tx_qty) s_tx_qty,
   sum(tx_cst) s_tx_cst,
   count(*)    c_star
from
   master
group by
   location_cd,
   trunc(tx_timestamp,'MM')
/

create materialized view master_mv2
Using No Index
Refresh Fast On Commit
Enable Query Rewrite
As
Select
   product_cd,
   trunc(tx_timestamp,'MM') tx_month,
   sum(tx_qty) sum_tx_qty,
   sum(tx_cst) sum_tx_cst,
   count(*)    c_star
from
   master
group by
   product_cd,
   trunc(tx_timestamp,'MM')
/

create table master_test_data
   (
   location_cd  not null,
   tx_timestamp not null,
   product_cd   not null,
   tx_qty       not null,
   tx_cst       not null
   )
pctfree 0
nologging
as
select
   trunc(dbms_random.value(1,10)),
   to_date('01-jan-2005','DD-Mon-YYYY')
      + dbms_random.value(1,31),
   trunc(dbms_random.value(1,1000)),
   trunc(dbms_random.value(1,10)),
   trunc(dbms_random.value(1,100),2)
from
   dual
connect by
   1=1 and
   level < 1000001
/
commit
/
select count(*) from master_test_data
/
alter session set timed_statistics = true;
alter session set max_dump_file_size=1000000000;
alter session set events '10046 trace name context forever, level 8';

insert into master
select *
from   master_test_data
where  rownum < 100001
/
commit
/
exit

Rather straightforward stuff. The "select count(*)" of rows in the master test data table is just to eliminate the possiblity of the test results getting skewed by delayed block cleanout, by the way. The example above is for the conventional path insert. The only change required for the direct path enhancement to kick-in is to use an APPEND hint in the last insert statement.

So, having identified the trace files I issue the following command:

begin
 trace.purge; -- purges all data from the trace file repository

   trace.load_file
   (directory   =>'UDUMP',
    file_name   => 'slimdb_ora_4412.trc',
    identify_as => 'direct path',
    debug       => 0,
    log_extents => true,
    log_files   => true);

   trace.load_file
   (directory   =>'UDUMP',
    file_name   => 'slimdb_ora_4600.trc',
    identify_as => 'conv path',
    debug       => 0,
    log_extents => true,
    log_files   => true);
end;
/

This allows me to query the trace file information directly, giving me such output as:

SQL> select
2     substr(identify_as,1,12),
3     op_name,
4     sum(op_c),
5     sum(op_e)/1000000 sec,
6     count(*)
7  from
8  (
9  select
10     identify_as,
11     cur_dep,
12     f.file#,
13     f.file_name,
14     c.cursor#,
15     o.op_name,
16     OP_C  ,
17     OP_E  ,
18     OP_P  ,
19     OP_CR ,
20     OP_CU ,
21     OP_TIM
22  from
23     trace_ops o,
24     trace_cursors c,
25     trace_files f
26  where
27     f.file#   = c.file# and
28     c.cursor# = o.cursor#
29     )
30  group by
31     substr(identify_as,1,12),
32     op_name
33  order by
34     1,2,4 desc
35  /

SUBSTR(IDENT OP_NAME     SUM(OP_C)        SEC   COUNT(*)
------------ ---------- ---------- ---------- ----------
conv path    EXEC         20875000  33.358519     100342
conv path    FETCH         2296875   3.026768        304
conv path    PARSE          156250    .661319        333
direct path  EXEC          2890625   3.476836        350
direct path  FETCH          140625    .406403        402
direct path  PARSE          859375   1.302354        224

6 rows selected.

Of course all the interesting work is going on behind the scenes in the form of recursive SQL:

SQL> select
2     substr(identify_as,1,12),
3     cur_dep,
4     op_name,
5     sum(op_c),
6     sum(op_e)/1000000 sec,
7     count(*)
8  from
9  (
10  select
11     identify_as,
12     cur_dep,
13     f.file#,
14     f.file_name,
15     c.cursor#,
16     o.op_name,
17     OP_C  ,
18     OP_E  ,
19     OP_P  ,
20     OP_CR ,
21     OP_CU ,
22     OP_TIM
23  from
24     trace_ops o,
25     trace_cursors c,
26     trace_files f
27  where
28     f.file#   = c.file# and
29     c.cursor# = o.cursor#
30     )
31  group by
32     substr(identify_as,1,12),
33     cur_dep,
34     op_name
35  order by
36     1,2,3,5 desc
37  /

SUBSTR(IDENT    CUR_DEP OP_NAME     SUM(OP_C)        SEC   COUNT(*)
------------ ---------- ---------- ---------- ---------- ----------
conv path             0 EXEC         14484375  22.637736         18
conv path             0 FETCH         2125000   2.342986         15
conv path             0 PARSE           93750     .56999         16
conv path             1 EXEC          6343750  10.694057     100188
conv path             1 FETCH          109375    .619292        147
conv path             1 PARSE           62500    .079386        187
conv path             2 EXEC            46875    .026135        128
conv path             2 FETCH           62500    .062152        134
conv path             2 PARSE               0    .009992        128
conv path             3 EXEC                0     .00006          1
conv path             3 FETCH               0    .002111          1
conv path             3 PARSE               0    .001875          1
conv path             4 EXEC                0    .000531          7
conv path             4 FETCH               0    .000227          7
conv path             4 PARSE               0    .000076          1
direct path           0 EXEC          1718750   2.218695          3
direct path           0 PARSE           31250     .17893          2
direct path           1 EXEC           750000    .841392        138
direct path           1 FETCH           15625    .128181         91
direct path           1 PARSE          343750     .47748        133
direct path           2 EXEC           140625     .14998        124
direct path           2 FETCH          109375    .168264        124
direct path           2 PARSE          421875     .52747         27
direct path           3 EXEC           281250    .260751         78
direct path           3 FETCH           15625    .068693        180
direct path           3 PARSE           46875    .109109         61
direct path           4 EXEC                0    .006018          7
direct path           4 FETCH               0    .041265          7
direct path           4 PARSE           15625    .009365          1

29 rows selected.

So what leaps out at us here (or it would do if Blogger didn't trim leading and multiple white spaces) is an enormous number of executions at a recursive depth (cur_dep) of 1 in the conventional path refresh -- coincidentally (well, actually not at all coincidentally) slightly more than the number of rows being inserted into the master table.

This of course, is the maintenance of the materialized view log being performed in an inefficient row-by-row manner:

SQL> select CUR_SQL_TEXT
2  from   trace_cursors
3  where  cursor# =
4     (
5     select cursor#
6     from
7     (
8     Select cursor#,count(*)
9     from   trace_ops
10     where  op_name = 'EXEC'
11     group by cursor#
12     order by 2 desc
13     )
14     where rownum = 1
15     )
16  /

CUR_SQL_TEXT
--------------------------------------------------------------------------------

INSERT /*+ IDX(0) */ INTO "DAVE"."MLOG$_MASTER" (dmltype$$,old_new$$,snaptime$$,

change_vector$,m_row$,"LOCATION_CD","TX_TIMESTAMP","PRODUCT_CD","TX_QTY","TX_C

ST") VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:m,

:1,:2,:3,:4,:5)

Hey, at least it's using bind variables!

Wednesday, October 05, 2005

Reporting Wait Events For Data Warehouse Queries

Funny how easy it is to get side-tracked. Well not really side-tracked, but just more and more involved in one particular thing, so that you end up burrowing so deeply into it that it loses all relevance to the real world. Just as a simple SQL statement generates deeper and deeper levels of more and more obscure recursive statements, so it can be with the most simple of tasks.

Those of you still awake may recall that I was looking at materialized view refresh, and in the course of getting to grips with the optimum refresh methodology for fast refreshing multiple MV's I've been picking the bones out of some lovely warm trace files. My guidance has come from two books that are probably familiar to many readers, and which have become indispensible to me: Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning; Shee, Deshpande and Gopalakrishnan; Oracle Press, and Optimizing Oracle Performance; Millsap and Holt; O'Reilly. And let me first say that they are a beautiful compliment to each other. While the former goes into just the right level of detail on how to understand and mitigate the wait events themselves, the latter presents a methodology for data capture and event interpretation backed up with a strong background in queueing theory.

Now the usual challenge is to translate an approach aimed at OLTP databases into something that we can use in data warehousing, and fortunately this is not too tricky in the case of wait event capture and analysis. One of the major challenges associated with wait event analysis on OLTP databases is simply in capturing the correct data in the first place (a topic addressed in depth by the Millsap/Holt book) but this is relatively easy in "the other world": a problematic query that we are interested in tuning is generally well-known and easily isolated, whether it is part of the ETL routine or part of a report.

Issues relating to use of shared servers, which can scatter a single session's wait events among multiple trace files practically at random, are also probably rare. Shared server configurations are inherently problematic in an environment where system load is dominated by fewer, longer running queries than an OLTP system would experience, and so are rarely used.

On the other hand we do have the challenge of parallel query, in which the most meaningful wait events get pushed to different trace files, one for each PQ process. Without identifying these files, capturing their data, and consolidating them into a single entity for analysis there is little that can usefully be done.

And now a word about recursive SQL. Recursive SQL statements are issued by Oracle in support of the execution of the SQL statements submitted to the database. They are used to parse the query, to maintain the data dictionary (for example to add extents to table and index segments), to perform dynamic sampling, to retrieve information on segments for the optimizer to use, etc.. So we might be able to deduce immediately that recursive SQL plays a very large role in the execution of data warehouse queries for a number of reasons, the foremost of which are the high parse ratio and the complexity of the objects that the optimizer has to consider.

The high parse ratio is caused by our desire for the optimizer to be sensitive to literal values in the predicates (histogram usage being particularly important in this environment), and where an OLTP system could be unscalable due to the workload imposed by a high hard parse rate, the longer duration of the data warehouse queries renders this overhead less significant. Not wholly insignificant, but not so significant that it is worth eliminating entirely.

Now there is also the issue of object complexity. I venture to suggest that in data warehousing we have a much higher probability that partitioning is being used, and also that we will be leveraging partition pruning. So despite the apprant simplicity of many queries against star schemas there are actualy a disproportionate number of data and index segments for the optimizer to be aware of. There is also a high chance that we will be using query rewrite, and hence that there are a great many more objects to be considered even if they are never going to be touched by the optimized query.

Now I rather glossed over the issue of how easy it is in a data warehouse to identify the SQL for which we want to the wait events. However if you read much of the literature on wait event analysis you will find that this is actually a very large part of the formal methodologies generally presented. Do you want to identify the queries with the longest duration, or the most logical reads, or the most physical I/O, or what other considerations are there? When data warehousing relieves us of that issue we are left with two other major issues:

What makes our query run for so long?
What can we do about it?

And these are by no means trivial tasks. However I would suggest that the first of these issues really ought to be more trivial than it currently is. Consider the steps we might follow in a wait event analysis for a hypothetical data warehouse query.

Extract the query, along with representative predicate literals
Execute the query with the 10046 event set at level 8 ("8" rhymes with "wait", which is how I remember that level 8 gives us wait events and level 4 gives us binds :) ... and level 12 gives us both of course).
Find the trace files (parallel query giving us multiple files, of course)
Run tkprof on the trace files to turn them into something readable
Coalesce the results
Analyze what events are causing us pain (generally I/O-related ones and sometimes high CPU usage, I find).
For I/O related events examine what object they are occuring on, how many there are, their average duration, whether there are I/O hot spots on our devices etc..

For me the tricky part starts at step 4: I can extract queries, set events, find trace files without much problem, but the rest leaves me a little cold.

Let's start with tkprof. A fine program in general, but remembering the options for sorting the queries is beyond me, and having listed my major wait events I then have to go probing through the raw file again plus querying the data dictionary to find out what segment and file the events occured on. And remember that we may have upwards of a dozen file to contend with. Unless I'm missing out on something, that means a dozen executions of tkprof then a manual process to merge and examine all those raw files anyway.

Merging, sorting and reporting: these are the bread and butter tasks of a database, not a command-line programs. So while staring at the ceiling in the dark one night I had the following thoughts:

"Load the raw files into the database then report on them from there."
"That means I can link the I/O-related wait events directly to the data dictionary to get object and file names automatically".
"That means we'd have a history of the the exact wait events and the real execution plan associated with a query within the database itself"
"That's so obvious that it must have been done before."
"Even if it has, I bet I could do it easily and customize it for my own purposes."

So that's what I've been up to for the past couple of weeks, in the little time I have between my real work and my family, instead of going through the materialized view refresh analysis that I actually promised. But it's a step in the right direction anyway -- should make that process more easy.

How do you go about desinging and implementing such a methodology then? Here are the aims that I started out with.

Load into database tables ther significant data from multiple trace files

For me the data of interest are the wait events, and in order to put them into a usable context I also need to associate them with SQL statements (cursors, in the trace file), the file name, operations such as PARSE, EXEC and FETCH. They all have to be stored in a set of dedicated tables.

Exactly which data would we load for each of these types of trace information? As much as possible, and for the most part that means all of it. We can get timings, recursive depth (very useful!), user id's, just about anything we need.

Associate significant system data with the trace file data

In our world of data warehousing we regularly add, truncate, split, merge and drop partitions, and when we do so we change the object id to which the wait event parameters refers. It seems prudent for this reason and for performance purposes (data dictionary views being notoriously cranky when it comes to joining to other data) to provide the option of storing the appropriate data at the time that the trace file is loaded.

Associate multiple trace files with a single report

This is of particular use in coalescing the results of multiple files generated by parallel query for a single submitted SQL statement. So we'd want to load multiple trace files and say "these belong together".

Provide predefined mechanisms for reporting on the events

A set of views on the database tables was going to be required, to allow more simple custom queries to be defined on the trace data. Analagous to the data dictionary views provided in the database in fact. In just the same way as "select * from table(dbms_xplan.display())" reports on the most recently generated explain plan, we can provide a mechanism for reporting on the most recently loaded trace file(s) for example.

Provide some administrative tools for purging old data

Purge all the data, or all the data prior to some date, and some options like that.

The progress so far has been encouraging. Loading in a trace file with a known name is now a trivial task, and basic reporting on the events is a doddle. For example I'm currently using the following command to load a single file:


begin
   trace.load_file
     (directory   =>'UDUMP',                -- 1
      file_name   => 'slimdb_ora_4680.trc', -- 2
      debug       => 0,                     -- 3
      log_extents => true,                 -- 4
      log_files   => true);                 -- 5
end;
/

The parameters are pretty intuitive:

The name of the directory object that points to the location of the file to be loaded.
The name of the file to be loaded (duh).
The level of debugging message required (spat out through DBMS_OUTPUT).
Whether to take a snapshot of the relevant entities in the dba_extents system view, to enhance later reporting (default "false" due to relatively high overhead).
Whether to take a snapshot of the relevant entities in the dba_data files system view, to enhance later reporting (default "true").

With the data loaded I can then report in the following manner:


SQL> select * from all_trace_files;

     FILE# FILE_NAME                      READ_DATE
---------- ------------------------------ ---------
    132440 slimdb_ora_4680.trc            05-OCT-05

SQL> column SUM_WT_ELA format 999,999,990
SQL> column AVG_WT_ELA format 999,999,990
SQL> column waits format 999,990
SQL> break on report
SQL> compute sum of SUM_WT_ELA on report
SQL> select substr(WT_NAME,1,25) wt_name,
  2  sum(WT_ELA) sum_wt_ela,
  3  avg(wt_ela) avg_wt_ela,
  4  count(*) waits
  5  from last_trace_waits
  6  group by substr(WT_NAME,1,25)
  7  /

WT_NAME                     SUM_WT_ELA   AVG_WT_ELA    WAITS
------------------------- ------------ ------------ --------
SQL*Net message from clie   27,372,317    9,124,106        3
SQL*Net message to client           14            5        3
db file scattered read       4,771,858       18,640      256
db file sequential read      1,490,384        3,644      409
                          ------------
sum                         33,634,573

SQL> select   CUR_DEP,
  2           count(*)
  3  from     last_trace_cursors
  4  group by cur_dep
  5  order by 1
  6  /

   CUR_DEP   COUNT(*)
---------- ----------
         0          2
         1        391
         2       2189
         3         92
         4          2
         5          1

6 rows selected.

SQL> set pagesize 100
SQL> select substr(WT_NAME,1,23) wt_name,
  2  substr(segment_name,1,20) segment_name,
  3  sum(WT_ELA) sum_wt_ela,
  4  avg(wt_ela) avg_wt_ela,
  5  count(*) waits
  6  from last_trace_obj_waits
  7  group by substr(WT_NAME,1,23),substr(segment_name,1,20)
  8  order by 3 desc
  9  /

WT_NAME                 SEGMENT_NAME           SUM_WT_ELA   AVG_WT_ELA    WAITS
----------------------- -------------------- ------------ ------------ --------
db file scattered read   FCT_FSA_1               4,698,204       19,908      236
db file sequential read INDSUBPART$             3,963,720        1,149    3,450
db file sequential read C_OBJ#_INTCOL#          2,355,677        4,403      535
db file sequential read C_FILE#_BLOCK#          2,129,218        1,342    1,587
db file sequential read I_OBJ2                    676,269        1,375      492
db file sequential read I_FILE#_BLOCK#            470,275        4,237      111
db file sequential read FCT_FSA_1                 417,323        5,962       70
db file sequential read TABSUBPART$               170,888        4,069       42
db file scattered read  INDCOMPART$               130,647        6,221       21
db file sequential read HIST_HEAD$                 62,906        7,863        8
db file sequential read I_INDSUBPART$              61,507        1,922       32
db file scattered read  TABCOMPART$                57,753        6,417        9
db file sequential read I_CDEF2                    34,272        4,284        8
db file sequential read SNAP$                      30,162        7,541        4
db file sequential read PARTOBJ$                   29,690        7,423        4
db file sequential read SUMAGG$                    24,404        6,101        4
db file sequential read INDCOMPART$                19,534        9,767        2
db file sequential read SUMKEY$                    19,116        4,779        4
db file sequential read DIMJOINKEY$                17,296        4,324        4
db file sequential read PARTCOL$                   15,068        3,767        4
db file sequential read TABCOMPART$                14,988        7,494        2
db file sequential read SUMDELTA$                  12,143       12,143        1
db file sequential read I_TABSUBPART$              11,146        5,573        2
db file sequential read I_CDEF3                    10,627       10,627        1
db file sequential read I_SUBCOLTYPE1              10,131       10,131        1
db file sequential read I_NTAB2                     9,696        9,696        1
db file sequential read I_CCOL1                     9,366        4,683        2
db file sequential read C_COBJ#                     9,322        9,322        1
db file sequential read SUM$                        8,722        4,361        2
db file sequential read I_OBJ#_INTCOL#              7,496        7,496        1
db file sequential read I_SUBPARTCOL$               7,059        7,059        1
db file sequential read I_DIMATTR$_2                6,601        6,601        1
db file sequential read DEPENDENCY$                 6,432        3,216        2
db file sequential read I_HH_OBJ#_INTCOL#           6,267        6,267        1
db file sequential read I_DEFSUBPART$               5,595        5,595        1
db file sequential read VIEW$                       5,231        5,231        1
db file sequential read DEFSUBPART$                 4,795        4,795        1
db file sequential read I_DEPENDENCY1               4,649        4,649        1
db file sequential read I_REFCON2                   3,544        3,544        1
db file sequential read I_PARTCOL$                  2,705        2,705        1
db file scattered read  SUM$                          742          371        2
db file sequential read I_OPQTYPE1                    195          195        1
db file sequential read SUBPARTCOL$                   138          138        1
db file sequential read I_DIMLEVELKEY$_2              136          136        1
                                             ------------
sum                                            15,541,655

The last two are pretty interesting, for a trace file based on a single star schema query, huh? Sorry about the crappy formating though.

There's still work to do, and at the top of my list are the following:

Allow grouping of trace files as a single logical entity
Add more reporting views, focusing on filtering out recursive statements and on a set of "MY_%" views
Associating recursive queries with their parents to enable reporting views based on CONNECT BY queries
Better handling of files
Creating a load script for the TRACELDR schema under which all this runs
Creating a Trace Loader role to allow other users to access the package and views.

Based on that you can probably expect to hear from me again around Christmas.

Nah, not really. Soon.

Oracle Sponge -- Now Moved To Wordpress

Thursday, October 27, 2005

Dynamic Sampling at AskTom

Thursday, October 06, 2005

Testing the Trace Loader

Wednesday, October 05, 2005

Reporting Wait Events For Data Warehouse Queries

About Me

Spongeworthy Links

Some Articles

Practical Guide To Oracle Data Warehousing

Best Of The Sponge (IMHO)

Materialized View Articles

Sponge Lists

Spongeworthy Documentation

Version 9i

Version 10G

Previous Posts

Archives