Granularity 'APPROX_GLOBAL AND PARTITION'

The patchset 11.1.0.7 introduces in the package DBMS_STATS a new value for the parameter GRANULARITY.

The description provided by the development team in $ORACLE_HOME/rdbms/admin/dbmsstat.sql is the following:

'APPROX_GLOBAL AND PARTITION' - This option is similar to
   'GLOBAL AND PARTITION'. But the global statistics are aggregated
   from partition level statistics. It will aggregate all statistics except number of
   distinct values for columns and number of distinct keys of indexes.
   The existing histograms of the columns at the table level
   are also aggregated.The global statistics are  gathered
   (i.e., going back to GLOBAL AND PARTITION behaviour)
   if partname argument is null or if the aggregation cannot be done
   e.g., statistics of one of the partitions is missing.

   This option is useful when you collect statistics for a new partition added
   into a range partitioned table (for example, a table  partitioned by month).
   The new data in the partition makes the global statistics stale (especially
   the min/max values of the partitioning column). This stale global statistics
   may cause suboptimal plans.  In this scenario, users can collect statistics
   for the newly added partition with 'APPROX_GLOBAL AND PARTITION'
   option so that the global statistics will reflect the newly added range.
   This option will take less time than 'GLOBAL AND PARTITION' option since the
   global statistics are aggregated from underlying partition level statistics.
   Note that, if you are using APPROX_GLOBAL AND PARTITION,
   you still  need to collect global statistics (with granularity = 'GLOBAL' option)
   when there is substantial amount of change at the table level.
   For example you added 10% more data to the table.  This is needed to get the
   correct number of distinct values/keys statistic at table level.

To illustrate how it works, let’s have a look to an example…

Create a range-partitioned table with a local index, load some data and gather object statistics.

SQL> CREATE TABLE t (
  2    id NUMBER,
  3    n NUMBER,
  4    d DATE,
  5    pad VARCHAR2(4000)
  6  )
  7  PARTITION BY RANGE (d) (
  8    PARTITION t_jan_2008 VALUES LESS THAN (to_date('2008-02-01','yyyy-mm-dd'))
  9  )
 10  NOLOGGING;

SQL> CREATE INDEX i ON t(n) LOCAL;

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 1000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,11) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-01-01','yyyy-mm-dd')+mod(rownum,31),
  9         rpad('*',100,'*')
 10  FROM t1000, t1000
 11  WHERE rownum <= 500000;

500000 rows created.

SQL> COMMIT;

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      estimate_percent => 100,
  6      granularity => 'global and partition',
  7      method_opt => 'for olumns size 1 id, d, pad, n size 254',
  8      cascade => TRUE
  9    );
 10  END;
 11  /

Add a new partition and load data into it.

SQL> ALTER TABLE t
  2  ADD PARTITION t_feb_2008 VALUES LESS THAN (to_date('2008-03-01','yyyy-mm-dd'));

SQL> INSERT /*+ append */ INTO t
  2  WITH
  3    t1000 AS (SELECT rownum AS dummy
  4              FROM dual
  5              CONNECT BY level <=1000)
  6  SELECT 2000000+rownum,
  7         CASE WHEN rownum <= 500 THEN mod(rownum,12) ELSE nullif(mod(rownum,2),1) END,
  8         to_date('2008-02-01','yyyy-mm-dd')+mod(rownum,29),
  9         rpad('*',200,'*')
 10  FROM t1000, t1000;

1000000 rows created.

SQL> COMMIT;

Gather object statistics (notice that the new value for the parameter GRANULARITY is used).

SQL> BEGIN
  2    dbms_stats.gather_table_stats(
  3      ownname => user,
  4      tabname => 't',
  5      partname => 't_feb_2008',
  6      estimate_percent => 100,
  7      granularity => 'approx_global and partition',
  8      method_opt => 'for columns size 1 id, d, pad, n size 254',
  9      cascade => TRUE
 10    );
 11  END;
 12  /

Compare the current object statistics with the previous ones.

SQL> SELECT *
  2  FROM table(dbms_stats.diff_table_stats_in_history(
  3               ownname => user,
  4               tabname => 't',
  5               time1 => localtimestamp,
  6               time2 => localtimestamp-to_dsinterval('0 00:00:15'),
  7               pctthreshold => 0
  8             ));

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : T
OWNER         : OPS$CHA
SOURCE A      : Statistics as of 21-SEP-08 06.39.37.597595 PM +02:00
SOURCE B      : Statistics as of 21-SEP-08 06.39.22.597595 PM +02:00
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

T                           T   A   1500000    40091      183        500000
                                B   500000     8614       116        500000
T_FEB_2008                  P   A   1000000    31477      216        1000000
                                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

D               A   31      .032258064 NO   0       8    786C0 786C0 1500000
                B   31      .032258064 NO   0       8    786C0 786C0 500000
ID              A   500000  .000002    NO   0       6    C4020 C404  1500000
                B   500000  .000002    NO   0       6    C4020 C4023 500000
N               A   11      .000001998 YES  749500  2    80    C10C  750500
                B   11      .000001998 YES  249750  2    80    C10B  250250
PAD             A   1       1          NO   0       168  2A2A2 2A2A2 1500000
                B   1       1          NO   0       101  2A2A2 2A2A2 500000

                              PARTITION: T_FEB_2008
                              .....................

D               A   29      .034482758 NO   0       8    786C0 786C0 1000000
                B   NO_STATS
ID              A   1000000 .000001    NO   0       6    C4030 C404  1000000
                B   NO_STATS
N               A   12      .000000999 YES  499750  2    80    C10C  500250
                B   NO_STATS
PAD             A   1       1          NO   0       201  2A2A2 2A2A2 1000000
                B   NO_STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

INDEX / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME      TYP SRC ROWS    LEAFBLK DISTKEY LF/KY DB/KY CLF     LVL SAMPSIZ
...............................................................................


                                    INDEX: I
                                    ........

I               I   A   750500  1221    12      101   3332  39991   2   750500
                    B   250250  407     11      37    778   8565    1   250250
T_FEB_2008      P   A   500250  814     12      67    2618  31426   2   500250
                    B   NO_STATS

A few remarks about the number of distinct keys…

As documented, the number of distinct values at table level has not been updated. I do understand that aggregating them is not possible. However, IMHO, it should be quite easy to do a kind of sanity check and to set the number of distinct values at the table level according to the highest number found at partition level. For example, in the previous case, the new partition contains 1M distinct values for the column ID. Hence, the number of distinct values for that column at the table level should be at least 1M.

Interestingly, the number of distinct values at the index level is correct. The reason is quite simple… they do gather them as usual (I checked that with SQL trace). In other words, the new value for the parameter GRANULARITY is only valid for tables. This is also the case when the procedure GATHER_INDEX_STATS is used. In other words, the new value for the parameter granularity is accepted but ignored!

What really puzzle me is why such a feature has been implemented in first place. In fact, as of Oracle Database 11g, with an incremental gathering is possible to have better statistics than with this feature.

Any suggestion/thought/opinion is welcome…

Granularity 'APPROX_GLOBAL AND PARTITION'

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112