The patchset 11.1.0.7 introduces in the package DBMS_STATS a new value for the parameter GRANULARITY.
The description provided by the development team in $ORACLE_HOME/rdbms/admin/dbmsstat.sql is the following:
'APPROX_GLOBAL AND PARTITION' - This option is similar to 'GLOBAL AND PARTITION'. But the global statistics are aggregated from partition level statistics. It will aggregate all statistics except number of distinct values for columns and number of distinct keys of indexes. The existing histograms of the columns at the table level are also aggregated.The global statistics are gathered (i.e., going back to GLOBAL AND PARTITION behaviour) if partname argument is null or if the aggregation cannot be done e.g., statistics of one of the partitions is missing.
This option is useful when you collect statistics for a new partition added into a range partitioned table (for example, a table partitioned by month). The new data in the partition makes the global statistics stale (especially the min/max values of the partitioning column). This stale global statistics may cause suboptimal plans. In this scenario, users can collect statistics for the newly added partition with 'APPROX_GLOBAL AND PARTITION' option so that the global statistics will reflect the newly added range. This option will take less time than 'GLOBAL AND PARTITION' option since the global statistics are aggregated from underlying partition level statistics. Note that, if you are using APPROX_GLOBAL AND PARTITION, you still need to collect global statistics (with granularity = 'GLOBAL' option) when there is substantial amount of change at the table level. For example you added 10% more data to the table. This is needed to get the correct number of distinct values/keys statistic at table level.
To illustrate how it works, let’s have a look to an example…
- Create a range-partitioned table with a local index, load some data and gather object statistics.
SQL> CREATE TABLE t ( 2 id NUMBER, 3 n NUMBER, 4 d DATE, 5 pad VARCHAR2(4000) 6 ) 7 PARTITION BY RANGE (d) ( 8 PARTITION t_jan_2008 VALUES LESS THAN (to_date('2008-02-01','yyyy-mm-dd')) 9 ) 10 NOLOGGING; SQL> CREATE INDEX i ON t(n) LOCAL; SQL> INSERT /*+ append */ INTO t 2 WITH 3 t1000 AS (SELECT rownum AS dummy 4 FROM dual 5 CONNECT BY level <=1000) 6 SELECT 1000000+rownum, 7 CASE WHEN rownum <= 500 THEN mod(rownum,11) ELSE nullif(mod(rownum,2),1) END, 8 to_date('2008-01-01','yyyy-mm-dd')+mod(rownum,31), 9 rpad('*',100,'*') 10 FROM t1000, t1000 11 WHERE rownum <= 500000; 500000 rows created. SQL> COMMIT; SQL> BEGIN 2 dbms_stats.gather_table_stats( 3 ownname => user, 4 tabname => 't', 5 estimate_percent => 100, 6 granularity => 'global and partition', 7 method_opt => 'for olumns size 1 id, d, pad, n size 254', 8 cascade => TRUE 9 ); 10 END; 11 /
- Add a new partition and load data into it.
SQL> ALTER TABLE t 2 ADD PARTITION t_feb_2008 VALUES LESS THAN (to_date('2008-03-01','yyyy-mm-dd')); SQL> INSERT /*+ append */ INTO t 2 WITH 3 t1000 AS (SELECT rownum AS dummy 4 FROM dual 5 CONNECT BY level <=1000) 6 SELECT 2000000+rownum, 7 CASE WHEN rownum <= 500 THEN mod(rownum,12) ELSE nullif(mod(rownum,2),1) END, 8 to_date('2008-02-01','yyyy-mm-dd')+mod(rownum,29), 9 rpad('*',200,'*') 10 FROM t1000, t1000; 1000000 rows created. SQL> COMMIT;
- Gather object statistics (notice that the new value for the parameter GRANULARITY is used).
SQL> BEGIN 2 dbms_stats.gather_table_stats( 3 ownname => user, 4 tabname => 't', 5 partname => 't_feb_2008', 6 estimate_percent => 100, 7 granularity => 'approx_global and partition', 8 method_opt => 'for columns size 1 id, d, pad, n size 254', 9 cascade => TRUE 10 ); 11 END; 12 /
- Compare the current object statistics with the previous ones.
SQL> SELECT * 2 FROM table(dbms_stats.diff_table_stats_in_history( 3 ownname => user, 4 tabname => 't', 5 time1 => localtimestamp, 6 time2 => localtimestamp-to_dsinterval('0 00:00:15'), 7 pctthreshold => 0 8 )); STATISTICS DIFFERENCE REPORT FOR: ................................. TABLE : T OWNER : OPS$CHA SOURCE A : Statistics as of 21-SEP-08 06.39.37.597595 PM +02:00 SOURCE B : Statistics as of 21-SEP-08 06.39.22.597595 PM +02:00 PCTTHRESHOLD : 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TABLE / (SUB)PARTITION STATISTICS DIFFERENCE: ............................................. OBJECTNAME TYP SRC ROWS BLOCKS ROWLEN SAMPSIZE ............................................................................... T T A 1500000 40091 183 500000 B 500000 8614 116 500000 T_FEB_2008 P A 1000000 31477 216 1000000 B NO_STATS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COLUMN STATISTICS DIFFERENCE: ............................. COLUMN_NAME SRC NDV DENSITY HIST NULLS LEN MIN MAX SAMPSIZ ............................................................................... D A 31 .032258064 NO 0 8 786C0 786C0 1500000 B 31 .032258064 NO 0 8 786C0 786C0 500000 ID A 500000 .000002 NO 0 6 C4020 C404 1500000 B 500000 .000002 NO 0 6 C4020 C4023 500000 N A 11 .000001998 YES 749500 2 80 C10C 750500 B 11 .000001998 YES 249750 2 80 C10B 250250 PAD A 1 1 NO 0 168 2A2A2 2A2A2 1500000 B 1 1 NO 0 101 2A2A2 2A2A2 500000 PARTITION: T_FEB_2008 ..................... D A 29 .034482758 NO 0 8 786C0 786C0 1000000 B NO_STATS ID A 1000000 .000001 NO 0 6 C4030 C404 1000000 B NO_STATS N A 12 .000000999 YES 499750 2 80 C10C 500250 B NO_STATS PAD A 1 1 NO 0 201 2A2A2 2A2A2 1000000 B NO_STATS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INDEX / (SUB)PARTITION STATISTICS DIFFERENCE: ............................................. OBJECTNAME TYP SRC ROWS LEAFBLK DISTKEY LF/KY DB/KY CLF LVL SAMPSIZ ............................................................................... INDEX: I ........ I I A 750500 1221 12 101 3332 39991 2 750500 B 250250 407 11 37 778 8565 1 250250 T_FEB_2008 P A 500250 814 12 67 2618 31426 2 500250 B NO_STATS
A few remarks about the number of distinct keys…
As documented, the number of distinct values at table level has not been updated. I do understand that aggregating them is not possible. However, IMHO, it should be quite easy to do a kind of sanity check and to set the number of distinct values at the table level according to the highest number found at partition level. For example, in the previous case, the new partition contains 1M distinct values for the column ID. Hence, the number of distinct values for that column at the table level should be at least 1M.
Interestingly, the number of distinct values at the index level is correct. The reason is quite simple… they do gather them as usual (I checked that with SQL trace). In other words, the new value for the parameter GRANULARITY is only valid for tables. This is also the case when the procedure GATHER_INDEX_STATS is used. In other words, the new value for the parameter granularity is accepted but ignored!
What really puzzle me is why such a feature has been implemented in first place. In fact, as of Oracle Database 11g, with an incremental gathering is possible to have better statistics than with this feature.
Any suggestion/thought/opinion is welcome…