Manage extended statistics

Go back to OCM 12c Preparation Project 

Hands On Lab – Index

[1] Column group statistics
1.1 Creating and Gathering Statistics on Column Groups Automatic
1.2 Creating and Gathering Statistics on Column Groups Manually
[2] Expression statistics
[3] Automatic Column Group Detection
3.1 Automatic Column Group Detection and Dynamic Statistics
3.2 Automatic Column Group Detection and Statistics Feedback
3.3 Automatic Column Group Detection and SQL Plan Directives

Extended statistics were introduced in Oracle 11g to allow statistics to 
be gathered on groups of columns, to highlight the relationship between 
them, or on expressions. In Oracle 11gR1, extended statistics had to
be created manually for column groups. In Oracle 11gR2, the 
DBMS_STATS.SEED_COL_USAGE procedure was added, allowing Oracle to determine 
which column group statistics would be useful, based on a specified SQL Tuning 
Set or a specific monitoring period.

Oracle 12c introduced automatic column group detection, but as you will 
see when working through the sections below, the circumstances where it 
is triggered can appear less than obvious.

-> DBMS_STATS enables you to collect extended statistics, which are statistics 
that can improve cardinality -> estimates when multiple predicates exist 
on different columns of a table, or when predicates use expressions. 
-> An extension is either a column group or an expression.

Oracle Database supports the following types of extended statistics:

[1] Column group statistics

-> This type of extended statistics can improve cardinality estimates when 
multiple columns from the same table occur together in a SQL statement. 

-> An example might be a car make and model, or a city and state.

Why Column Group Statistics Are Needed: Example

The following query of the DBA_TAB_COL_STATISTICS table shows information 
about statistics that have been gathered on the columns cust_state_province 
and country_id from the sh.customers table:

COL COLUMN_NAME FORMAT a20
COL NDV FORMAT 999

SELECT COLUMN_NAME, NUM_DISTINCT AS "NDV", HISTOGRAM
FROM DBA_TAB_COL_STATISTICS
WHERE OWNER = 'SH'
AND TABLE_NAME = 'CUSTOMERS'
AND COLUMN_NAME IN ('CUST_STATE_PROVINCE', 'COUNTRY_ID');

Sample output is as follows:

COLUMN_NAME NDV HISTOGRAM
-------------------- ---------- ---------------
CUST_STATE_PROVINCE 145 FREQUENCY
COUNTRY_ID 19 FREQUENCY

As shown in the following query, 3341 customers reside in California:

SELECT COUNT(*)
FROM sh.customers 
WHERE cust_state_province = 'CA';

COUNT(*)
----------
3341

Consider an explain plan for a query of customers in the state CA and in 
the country with ID 52790 (USA):

EXPLAIN PLAN FOR
SELECT *
FROM sh.customers
WHERE cust_state_province = 'CA'
AND country_id=52790;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 3780 | 423 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 20 | 3780 | 423 (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

1 - filter("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"=52790)


Based on the single-column statistics for the country_id and 
cust_state_province columns, the optimizer estimates that the query of 
California customers in the USA will return 20 rows.

In fact, 3341 customers reside in California, but the optimizer does not 
know that California is in the USA, and so greatly underestimates cardinality
by assuming that both predicates reduce the number of returned rows.

You can make the optimizer aware of the real-world relationship between 
values in country_id and cust_state_province by gathering column group 
statistics. These statistics enable the optimizer to give a more accurate 
cardinality estimate.

Table 13-4 DBMS_STATS Column Group Program Units

SEED_COL_USAGE
-> Iterates over the SQL statements in the specified workload, compiles 
them, and then seeds column usage information for the columns that appear 
in these statements.
-> To determine the appropriate column groups, the database must observe 
a representative workload. 
-> You do not need to run the queries themselves during the monitoring 
period. Instead, you can run EXPLAIN PLAN for some longer-running queries 
in your workload to ensure that the database is recording column group 
information for these queries.


REPORT_COL_USAGE
-> Generates a report that lists the columns that were seen in filter 
predicates, join predicates, and GROUP BY clauses in the workload.
-> You can use this function to review column usage information recorded 
for a specific table.


CREATE_EXTENDED_STATS
-> Creates extensions, which are either column groups or expressions. 
-> The database gathers statistics for the extension when either a 
user-generated or automatic statistics gathering job gathers statistics 
for the table.

-> Detecting Useful Column Groups for a Specific Workload

You can use DBMS_STATS.SEED_COL_USAGE and REPORT_COL_USAGE to determine 
which column groups are required for a table based on a specified workload. 
This technique is useful when you do not know which extended statistics 
to create. This technique does not work for expression statistics.

Note:
You can seed column usage from a SQL tuning set (see Managing SQL Tuning Sets ).

1.1 Creating and Gathering Statistics on Column Groups Automatic

====
HOL
====
Assumptions
This tutorial assumes the following:

Cardinality estimates have been incorrect for queries of the sh.customers_test 
table (created from the customers table) that use predicates referencing 
the columns country_id and cust_state_province.
You want the database to monitor your workload for 5 minutes (300 seconds).
You want the database to determine which column groups are needed automatically.

To detect column groups:

1. Connect SQL*Plus to the database as user sh, and then create the 
customers_test table and gather statistics for it:

CONNECT SH/SH
DROP TABLE customers_test;
CREATE TABLE customers_test AS SELECT * FROM customers;
EXEC DBMS_STATS.GATHER_TABLE_STATS(user, 'customers_test');

2. Enable workload monitoring.

In a different SQL*Plus session, connect as SYS and run the following 
PL/SQL program to enable monitoring for 300 seconds:

BEGIN
DBMS_STATS.SEED_COL_USAGE(null,null,300);
END;
/

3. As user sh, run explain plans for two queries in the workload.

The following examples show the explain plans for two queries on the 
customers_test table:

SQL> SELECT count(*)
FROM customers_test
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;

COUNT(*)
----------
932

EXPLAIN PLAN FOR
SELECT *
FROM customers_test
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;

SELECT PLAN_TABLE_OUTPUT 
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 4115398853

----------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | TABLE ACCESS FULL| CUSTOMERS_TEST | 1 |
----------------------------------------------------

8 rows selected.

EXPLAIN PLAN FOR
SELECT country_id, cust_state_province, count(cust_city)
FROM customers_test
GROUP BY country_id, cust_state_province;

SELECT PLAN_TABLE_OUTPUT 
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));


PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 3050654408

-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 1949 |
| 1 | HASH GROUP BY | | 1949 |
| 2 | TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------

9 rows selected.

The first plan shows a cardinality of 1 row for a query that returns 932 
rows.The second plan shows a cardinality of 1949 rows for a query that 
returns 145 rows.

4. Optionally, review the column usage information recorded for the table.


SET LONG 100000
SET LINES 120
SET PAGES 0
SELECT DBMS_STATS.REPORT_COL_USAGE(user, 'customers_test')
FROM DUAL;

LEGEND:
.......

EQ : Used in single table EQuality
predicate
RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE p
redicate
NULL : Used in single table is (not) NULL pre
dicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate

FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
GROUP_BY : Used in GROUP BY expression
...................................................
............................

##################################################
#############################

COLUMN USAGE REPORT FOR SH.CUSTOMERS_TEST
.........................................

1. COUNTRY_ID : EQ

2. CUST_CITY : EQ
3. CUST_STATE_PROVINCE : EQ
4. (CUST_CITY, CUST_STATE_PROVINCE,
COUNTRY_ID) : FILTER
5. (CUST_STATE_PROVINCE, COUNTRY_ID) : GR
OUP_BY
####################################################
###########################


The sets of columns in the FILTER and GROUP_BY report are candidates for 
column groups.

#->Creating Column Groups Detected During Workload Monitoring
Assumptions
This tutorial assumes that you have performed the steps in 
"Detecting Useful Column Groups for a Specific Workload".

To create column groups:

1. Create column groups for the customers_test table based on the usage 
information captured during the monitoring window.

SELECT DBMS_STATS.CREATE_EXTENDED_STATS(user, 'customers_test') FROM DUAL;

Sample output appears below:


############################################################
###################

EXTENSIONS FOR SH.CUSTOMERS_TEST
................................

1. (CUST_CITY, CUST_STATE_PROVINCE,
COUNTRY_ID) : SYS_STUMZ$C3AI
HLPBROI#SKA58H_N created
2. (CUST_STATE_PROVINCE, COUNTRY_ID) : SYS_STU#S#WF25
Z#QAHIHE#MOFFMM_ created
#######################################################
########################

The database created two column groups for customers_test: 
one column group for the filter predicate and 
one group for the GROUP BY operation.

2. Regather table statistics.

Run GATHER_TABLE_STATS to regather the statistics for customers_test:

EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'customers_test');

3. As user sh, run explain plans for two queries in the workload.

Check the USER_TAB_COL_STATISTICS view to determine which additional 
statistics were created by the database:

SELECT COLUMN_NAME, NUM_DISTINCT, HISTOGRAM
FROM USER_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'CUSTOMERS_TEST'
ORDER BY 1;

............
............
SYS_STU#S#WF25Z#QAHI 145 NONE
HE#MOFFMM_

SYS_STUMZ$C3AIHLPBRO 620 HYBRID
I#SKA58H_N

This example shows the two column group names returned from the 
DBMS_STATS.CREATE_EXTENDED_STATS function. 
The column group created on CUST_CITY, CUST_STATE_PROVINCE, and COUNTRY_ID 
has a height-balanced histogram.

4. Explain the plans again.

EXPLAIN PLAN FOR
SELECT *
FROM customers_test
WHERE cust_city = 'Los Angeles'
AND cust_state_province = 'CA'
AND country_id = 52790;

SELECT PLAN_TABLE_OUTPUT 
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

----------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1093 |
| 1 | TABLE ACCESS FULL| CUSTOMERS_TEST | 1093 |
----------------------------------------------------


EXPLAIN PLAN FOR
SELECT country_id, cust_state_province, count(cust_city)
FROM customers_test
GROUP BY country_id, cust_state_province;

SELECT PLAN_TABLE_OUTPUT 
FROM TABLE(DBMS_XPLAN.DISPLAY('plan_table', null,'basic rows'));

Plan hash value: 3050654408

-----------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------
| 0 | SELECT STATEMENT | | 145 |
| 1 | HASH GROUP BY | | 145 |
| 2 | TABLE ACCESS FULL| CUSTOMERS_TEST | 55500 |
-----------------------------------------------------

1.2 Creating and Gathering Statistics on Column Groups Manually

In some cases, you may know the column group that you want to create. 
The METHOD_OPT argument of the DBMS_STATS.GATHER_TABLE_STATS function 
can create and gather statistics on a column group automatically. 
You can create a new column group by specifying the group of columns 
using FOR COLUMNS.

Assumptions

This tutorial assumes the following:

You want to create a column group for the cust_state_province and country_id 
columns in the customers table in sh schema.
You want to gather statistics (including histograms) on the entire table 
and the new column group.

To create a column group and gather statistics for this group:

1. Start SQL*Plus and connect to the database as the sh user.

2. Create the column group and gather statistics.

BEGIN
DBMS_STATS.GATHER_TABLE_STATS( 'sh','customers',
METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY ' ||
'FOR COLUMNS SIZE SKEWONLY (cust_state_province,country_id)' );
END;
/

#-> Displaying Column Group Information

-> To obtain the name of a column group, 
use the DBMS_STATS.SHOW_EXTENDED_STATS_NAME function or a database view. 
-> You can also use views to obtain information such as the number of 
distinct values, and whether the column group has a histogram.

Assumptions

This tutorial assumes the following:

You created a column group for the cust_state_province and country_id columns 
in the customers table in sh schema.
You want to determine the column group name, the number of distinct values, 
and whether a histogram has been created for a column group.

To monitor a column group:

1. Start SQL*Plus and connect to the database as the sh user.
2. To determine the column group name, do one of the following.

SELECT SYS.DBMS_STATS.SHOW_EXTENDED_STATS_NAME( 'sh','customers',
'(cust_state_province,country_id)' ) col_group_name 
FROM DUAL;

COL_GROUP_NAME
----------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_

SELECT EXTENSION_NAME, EXTENSION 
FROM USER_STAT_EXTENSIONS 
WHERE TABLE_NAME='CUSTOMERS';

EXTENSION_NAME EXTENSION
-----------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE","COUNTRY_ID")

3. Query the number of distinct values and find whether a histogram has 
been created for a column group.

SELECT e.EXTENSION col_group, t.NUM_DISTINCT, t.HISTOGRAM
FROM USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE e.EXTENSION_NAME=t.COLUMN_NAME
AND e.TABLE_NAME=t.TABLE_NAME
AND t.TABLE_NAME='CUSTOMERS';

COL_GROUP NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------
("COUNTRY_ID","CUST_STATE_PROVINCE") 145 FREQUENCY


#-> Dropping a Column Group

Assumptions

This tutorial assumes the following:

You created a column group for the cust_state_province and country_id columns 
in the customers table in sh schema. You want to drop the column group.

To drop a column group:

1. Start SQL*Plus and connect to the database as the sh user.

2. Drop the column group.

BEGIN
DBMS_STATS.DROP_EXTENDED_STATS( 'sh', 'customers', 
'(cust_state_province, country_id)' );
END;
/

[2] Expression statistics

When Expression Statistics Are Useful: Example

The following query of the sh.customers table shows that 3341 customers 
are in the state of California:

sys@PROD> SELECT COUNT(*) FROM sh.customers WHERE cust_state_province='CA';

COUNT(*)
----------
3341

Consider the plan for the same query with the LOWER() function applied:

EXPLAIN PLAN FOR
SELECT * FROM sh.customers WHERE LOWER(cust_state_province)='ca';


select * from table(dbms_xplan.display);

Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 555 | 108K| 423 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 555 | 108K| 423 (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(LOWER("CUST_STATE_PROVINCE")='ca')

13 rows selected.


Because no expression statistics exist for LOWER(cust_state_province)='ca', 
the optimizer estimate is significantly off. You can use DBMS_STATS procedures 
to correct these estimates.

Creating Expression Statistics
You can use DBMS_STATS to create statistics for a user-specified expression.

You can use either of the following program units:

GATHER_TABLE_STATS procedure
CREATE_EXTENDED_STATISTICS function followed by the GATHER_TABLE_STATS procedure

Assumptions
This tutorial assumes the following:

Selectivity estimates are inaccurate for queries of sh.customers that use 
the UPPER(cust_state_province) function.
You want to gather statistics on the UPPER(cust_state_province) expression.

To create expression statistics:

1. Start SQL*Plus and connect to the database as the sh user.
2. Gather table statistics.

BEGIN
DBMS_STATS.GATHER_TABLE_STATS( 
'sh', 'customers', method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY ' || 
'FOR COLUMNS (LOWER(cust_state_province)) SIZE SKEWONLY' 
);
END;
/

Displaying Expression Statistics

You can use the database view DBA_STAT_EXTENSIONS and the 
DBMS_STATS.SHOW_EXTENDED_STATS_NAME function to obtain information about 
expression statistics. You can also use views to obtain information such 
as the number of distinct values, and whether the column group has a histogram.

DBMS_STATS.
FUNCTION SHOW_EXTENDED_STATS_NAME RETURNS VARCHAR2
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
OWNNAME VARCHAR2 IN
TABNAME VARCHAR2 IN
EXTENSION VARCHAR2 IN


SELECT SYS.DBMS_STATS.SHOW_EXTENDED_STATS_NAME( 'sh','customers',EXTENSION) 
from USER_STAT_EXTENSIONS 
WHERE TABLE_NAME='CUSTOMERS';

SYS_STUBPHJSBRKOIK9O2YV3W8HOUE

Assumptions
This tutorial assumes the following:

You created extended statistics for the LOWER(cust_state_province) expression.
You want to determine the column group name, the number of distinct values, 
and whether a histogram has been created for a column group.

To monitor expression statistics:

1. Start SQL*Plus and connect to the database as the sh user.
2. Query the name and definition of the statistics extension.

COL EXTENSION_NAME FORMAT a30
COL EXTENSION FORMAT a35

SELECT EXTENSION_NAME, EXTENSION
FROM USER_STAT_EXTENSIONS
WHERE TABLE_NAME='CUSTOMERS';

EXTENSION_NAME EXTENSION
------------------------------ ------------------------------
SYS_STUBPHJSBRKOIK9O2YV3W8HOUE (LOWER("CUST_STATE_PROVINCE"))

3. Query the number of distinct values and find whether a histogram has 
been created for the expression.

SELECT e.EXTENSION expression, t.NUM_DISTINCT, t.HISTOGRAM
FROM USER_STAT_EXTENSIONS e, USER_TAB_COL_STATISTICS t
WHERE e.EXTENSION_NAME=t.COLUMN_NAME
AND e.TABLE_NAME=t.TABLE_NAME
AND t.TABLE_NAME='CUSTOMERS';

EXPRESSION NUM_DISTINCT HISTOGRAM
-------------------------------------------------------------------
(LOWER("CUST_STATE_PROVINCE")) 145 FREQUENCY


#-> Dropping Expression Statistics

Use the DBMS_STATS.DROP_EXTENDED_STATS function to delete a column group from a table.

Assumptions
This tutorial assumes the following:

You created extended statistics for the LOWER(cust_state_province) expression.
You want to drop the expression statistics.

To drop expression statistics:

1. Start SQL*Plus and connect to the database as the sh user.

2. Drop the column group.

BEGIN
DBMS_STATS.DROP_EXTENDED_STATS(
'sh'
, 'customers'
, '(LOWER(cust_state_province))'
);
END;
/

Note:
You cannot create extended statistics on virtual columns.


[3] Automatic Column Group Detection

Oracle 12c introduced automatic column group detection, but as you will 
see when working through the sections below, the circumstances where it is 
triggered can appear less than obvious.

#Setup

$ sqlplus sys/oracle_4U@pdb2 as sysdba

SQL> ALTER SYSTEM FLUSH SHARED_POOL;

System altered.

SQL> grant dba to test identified by test ;

Grant succeeded.

SQL> alter user test default tablespace test quota unlimited on test;

User altered.

SQL> connect test/test@pdb2
Connected.
SQL>
SQL> show user
USER is "TEST"

drop table tab1 purge;

test table contains columns to indicate gender and the presence of a Y 
chromosome and the presence of a beard. These columns have check constraints 
to tell the optimizer about the allowable values, as well as indexes on the 
columns.

CREATE TABLE tab1 (
id NUMBER,
name VARCHAR2(20),
gender VARCHAR2(1),
y_chromosome VARCHAR2(1),
beard VARCHAR2(1),
CONSTRAINT tab1_pk PRIMARY KEY (id),
CONSTRAINT tab1_gender_chk CHECK (gender IN ('M', 'F')),
CONSTRAINT tab1_has_y_chromosome_chk CHECK (y_chromosome IN ('Y', 'N')),
CONSTRAINT tab1_has_beard_chk CHECK (beard IN ('Y', 'N'))
);

INSERT INTO tab1
SELECT level,
DBMS_RANDOM.string('L',TRUNC(DBMS_RANDOM.value(10,16))) AS name,
CASE
WHEN MOD(rownum, 2) = 0 THEN 'M'
ELSE 'F'
END AS gender,
CASE
WHEN MOD(rownum, 2) = 0 THEN 'Y'
ELSE 'N'
END AS y_chromosome,
CASE
WHEN MOD(rownum, 2) = 0 THEN DECODE(TRUNC(DBMS_RANDOM.value(1,3)), 1, 'Y', 'N')
ELSE 'N'
END AS beard
FROM dual
CONNECT BY level <= 10000;
COMMIT;

SELECT gender, y_chromosome, beard, COUNT(*) AS total
FROM tab1
GROUP BY gender, y_chromosome, beard
ORDER BY 1,2,3; 2 3 4

G Y B TOTAL
- - - ----------
F N N 5000
M Y N 2535
M Y Y 2465


We know that genetically speaking, all males have a Y chromosome and all 
females do not. Our data shows approximately 50% of the men have beards 
and none of the women do. The optimizer is unaware of the relationships 
between these columns, so it has to estimate based on each column having 
two possible allowable values.

Assuming the data is spread evenly amongst the allowable values it would 
expect half of the rows to be marked as male, half to be marked as having 
a Y chromosome and half to me marked as having a beard. If there were no 
relationship between the columns, it would expect 10000/2/2/2=1250 
rows to be marked as female, not having a Y chromosome and not having a 
beard. The following sections use a that query pulls out all the rows 
matching this criteria and displays the execution plan used to return the 
data, including the expected and actual cardinalities.

3.1 Automatic Column Group Detection and Dynamic Statistics

Note: Make sure you have repeated the setup before starting this test.


CONN test/test@pdb1

SELECT count (*)
FROM tab1
WHERE gender = 'F'
AND y_chromosome = 'N'
AND beard = 'N';


SQL> SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
SQL_ID 577pbb7hm1j8x, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM tab1 WHERE gender
= 'F' AND y_chromosome = 'N' AND beard = 'N'

Plan hash value: 2211052296

------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5000 |00:00:00.01 | 401 |
|* 1 | TABLE ACCESS FULL| TAB1 | 1 | 5834 | 5000 |00:00:00.01 | 401 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("GENDER"='F' AND "Y_CHROMOSOME"='N' AND "BEARD"='N'))

Note
-----
- dynamic statistics used: dynamic sampling (level=2)


23 rows selected.

A comparison of the expected and actual rows show the cardinality estimate 
was accurate. That lack of adequate statistics meant dynamic statistics 
were necessary to generate the optimum execution plan.


Notice the query is not reoptimizable.

COLUMN sql_text FORMAT A40
COLUMN is_reoptimizable FORMAT A16

SELECT sql_text, is_reoptimizable
FROM v$sql
WHERE sql_id = '577pbb7hm1j8x';

SQL_TEXT IS_REOPTIMIZABLE
---------------------------------------- ----------------
SELECT /*+ GATHER_PLAN_STATISTICS */ N
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'

Using the DBMS_STATS.REPORT_COL_USAGE function, we can see the basic column 
usage has been detected, but there are no FILTER predicates in the reports, 
which we would expect if a column group had been detected.

SET LONG 100000
SET LINES 120
SET PAGES 50

SELECT DBMS_STATS.report_col_usage('TEST', 'TAB1')
FROM dual;

DBMS_STATS.REPORT_COL_USAGE('TEST','TAB1')
--------------------------------------------------------------------------------
LEGEND:
.......

EQ : Used in single table EQuality
predicate
RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE p
redicate
NULL : Used in single table is (not) NULL pre
dicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate

FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
GROUP_BY : Used in GROUP BY expression
...................................................
............................

##################################################
#############################

COLUMN USAGE REPORT FOR TEST.TAB1
.................................

1. BEARD :
EQ
2. GENDER : EQ
3. Y_CHROMOSOME : EQ
#############################################
##################################


There was no cardinality misestimate, so no SQL plan directives were created.

SQL> EXEC DBMS_SPD.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SET LINESIZE 200

COLUMN dir_id FORMAT A20
COLUMN owner FORMAT A10
COLUMN object_name FORMAT A10
COLUMN col_name FORMAT A12

SELECT TO_CHAR(d.directive_id) dir_id, o.owner, o.object_name, 
o.subobject_name col_name, o.object_type, d.type, d.state, d.reason
FROM dba_sql_plan_directives d, dba_sql_plan_dir_objects o
WHERE d.directive_id=o.directive_id
AND o.owner = 'TEST'
ORDER BY 1,2,3,4,5;

no rows selected


As expected, running the test query a second time results in no change in 
the execution plan.

SELECT /*+ GATHER_PLAN_STATISTICS */
*
FROM tab1
WHERE gender = 'F'
AND y_chromosome = 'N'
AND beard = 'N';


SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));

More importantly, it doesn't alter the column group detection.

SET LONG 100000
SET LINES 120
SET PAGES 50

SELECT DBMS_STATS.report_col_usage('TEST', 'TAB1')
FROM dual;

3.2 Automatic Column Group Detection and Statistics Feedback

CONN test/test@pdb2

EXEC DBMS_STATS.gather_table_stats(USER, 'TAB1');

SELECT /*+ GATHER_PLAN_STATISTICS */
*
FROM tab1
WHERE gender = 'F'
AND y_chromosome = 'N'
AND beard = 'N';

SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID 577pbb7hm1j8x, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM tab1 WHERE gender
= 'F' AND y_chromosome = 'N' AND beard = 'N'

Plan hash value: 2211052296

------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5000 |00:00:00.01 | 404 |
|* 1 | TABLE ACCESS FULL| TAB1 | 1 | 1250 | 5000 |00:00:00.01 | 404 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("GENDER"='F' AND "Y_CHROMOSOME"='N' AND "BEARD"='N'))

The presence of statistics means dynamic statistics weren't used and as such, 
the optimizer was unaware of the relationship between the columns and estimated 
the cardinality incorrectly.


Notice the query is now marked as reoptimizable.

SQL> COLUMN sql_text FORMAT A40
COLUMN is_reoptimizable FORMAT A16

SELECT sql_text, is_reoptimizable
FROM v$sql
WHERE sql_id = '577pbb7hm1j8x';

SQL_TEXT IS_REOPTIMIZABLE
---------------------------------------- ----------------
SELECT /*+ GATHER_PLAN_STATISTICS */ Y
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'

Using the DBMS_STATS.REPORT_COL_USAGE function, we can see the basic column 
usage has been detected, but there are no FILTER predicates in the reports,
which we would expect if a column group had been detected.

SET LONG 100000
SET LINES 120
SET PAGES 50

SELECT DBMS_STATS.report_col_usage('TEST', 'TAB1')
FROM dual;

If we run the test query again, we can see a more accurate cardinality has 
been estimated because statistics feedback has been used to adjust the estimate.

SELECT /*+ GATHER_PLAN_STATISTICS */
*
FROM tab1
WHERE gender = 'F'
AND y_chromosome = 'N'
AND beard = 'N';

SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID 577pbb7hm1j8x, child number 1
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM tab1 WHERE gender
= 'F' AND y_chromosome = 'N' AND beard = 'N'

Plan hash value: 2211052296

------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5000 |00:00:00.01 | 404 |
|* 1 | TABLE ACCESS FULL| TAB1 | 1 | 5000 | 5000 |00:00:00.01 | 404 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("GENDER"='F' AND "Y_CHROMOSOME"='N' AND "BEARD"='N'))

Note
-----
- statistics feedback used for this statement


23 rows selected.


Using the DBMS_STATS.REPORT_COL_USAGE function again, there is still no 
sign of the FILTER predicate in the report.

SET LONG 100000
SET LINES 120
SET PAGES 50

SELECT DBMS_STATS.report_col_usage('TEST', 'TAB1')
FROM dual;


We can see the query is no longer reoptimizable.


COLUMN sql_text FORMAT A40
COLUMN is_reoptimizable FORMAT A16

SELECT sql_text, is_reoptimizable
FROM v$sql
WHERE sql_id = '577pbb7hm1j8x';SQL> SQL> SQL> 2 3

SQL_TEXT IS_REOPTIMIZABLE
---------------------------------------- ----------------
SELECT /*+ GATHER_PLAN_STATISTICS */ Y
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'

SELECT /*+ GATHER_PLAN_STATISTICS */ N
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'


1.3 Automatic Column Group Detection and SQL Plan Directives

We are effectively repeated the previous test here, but we will check the 
SQL plan directives along the way. Gather statistics and run the following 
test query.

CONN test/test@pdb2

EXEC DBMS_STATS.gather_table_stats(USER, 'TAB1');

SELECT /*+ GATHER_PLAN_STATISTICS */
*
FROM tab1
WHERE gender = 'F'
AND y_chromosome = 'N'
AND beard = 'N';


SQL> set linesize 200 pagesize 100
SQL> select * from table(dbms_xplan.display_cursor(format => 'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID 577pbb7hm1j8x, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM tab1 WHERE gender
= 'F' AND y_chromosome = 'N' AND beard = 'N'

Plan hash value: 2211052296

------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5000 |00:00:00.01 | 404 |
|* 1 | TABLE ACCESS FULL| TAB1 | 1 | 1250 | 5000 |00:00:00.01 | 404 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("GENDER"='F' AND "Y_CHROMOSOME"='N' AND "BEARD"='N'))


19 rows selected.

The presence of statistics means dynamic statistics weren't used and as such, 
the optimizer was unaware of the relationship between the columns and estimated 
the cardinality incorrectly.


Using the DBMS_STATS.REPORT_COL_USAGE function, we can see the basic column 
usage has been detected, but there are no FILTER predicates in the reports, 
which we would expect if a column group had been detected.


SET LONG 100000
SET LINES 120
SET PAGES 50

SELECT DBMS_STATS.report_col_usage('TEST', 'TAB1')
FROM dual;


Notice the query is now marked as reoptimizable.


SQL> COLUMN sql_text FORMAT A40
COLUMN is_reoptimizable FORMAT A16

SELECT sql_text, is_reoptimizable
FROM v$sql
WHERE sql_id = '577pbb7hm1j8x';

SQL_TEXT IS_REOPTIMIZABLE
---------------------------------------- ----------------
SELECT /*+ GATHER_PLAN_STATISTICS */ Y
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'

Let's check to see if any SQL plan directives were created as a result 
of the previous cardinality misestimates. Remember to flush them first, 
or they might not be visible.


EXEC DBMS_SPD.flush_sql_plan_directive;


SQL> SET LINESIZE 200

COLUMN dir_id FORMAT A20
COLUMN owner FORMAT A10
COLUMN object_name FORMAT A10
COLUMN col_name FORMAT A12

SELECT TO_CHAR(d.directive_id) dir_id, o.owner, o.object_name,
o.subobject_name col_name, o.object_type, d.type, d.state, d.reason
FROM dba_sql_plan_directives d, dba_sql_plan_dir_objects o
WHERE d.directive_id=o.directive_id
AND o.owner = 'TEST'
ORDER BY 1,2,3,4,5;

DIR_ID OWNER OBJECT_NAM COL_NAME OBJECT TYPE STATE REASON
-------------------- ---------- ---------- ------------ ------ ---------------- ---------- ------------------------------------
10532985814814528861 TEST TAB1 BEARD COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE
10532985814814528861 TEST TAB1 GENDER COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE
10532985814814528861 TEST TAB1 Y_CHROMOSOME COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE
10532985814814528861 TEST TAB1 TABLE DYNAMIC_SAMPLING USABLE SINGLE TABLE CARDINALITY MISESTIMATE


So we now have SQL plan directives. Run the test query again.

SELECT /*+ GATHER_PLAN_STATISTICS */
*
FROM tab1
WHERE gender = 'F'
AND y_chromosome = 'N'
AND beard = 'N';


SQL> SET LINESIZE 200 PAGESIZE 100
SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format => 'allstats last'));SQL>

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID 577pbb7hm1j8x, child number 1
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM tab1 WHERE gender
= 'F' AND y_chromosome = 'N' AND beard = 'N'

Plan hash value: 2211052296

------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5000 |00:00:00.01 | 404 |
|* 1 | TABLE ACCESS FULL| TAB1 | 1 | 5000 | 5000 |00:00:00.01 | 404 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("GENDER"='F' AND "Y_CHROMOSOME"='N' AND "BEARD"='N'))

Note
-----
- statistics feedback used for this statement


23 rows selected.

The statistics feedback was still used in preference to the SQL plan directives 
and once again, the query is no longer reoptimizable.


COLUMN sql_text FORMAT A40
COLUMN is_reoptimizable FORMAT A16

SELECT sql_text, is_reoptimizable
FROM v$sql
WHERE sql_id = '577pbb7hm1j8x';

SQL_TEXT IS_REOPTIMIZABLE
---------------------------------------- ----------------
SELECT /*+ GATHER_PLAN_STATISTICS */ Y
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'

SELECT /*+ GATHER_PLAN_STATISTICS */ N
* FROM tab1 WHERE gender = 'F' AN
D y_chromosome = 'N' AND beard = '
N'


Using the DBMS_STATS.REPORT_COL_USAGE function again, finally we can see 
the FILTER predicate in the report.

SET LONG 100000
SET LINES 120
SET PAGES 50

SELECT DBMS_STATS.report_col_usage('TEST', 'TAB1')
FROM dual;


DBMS_STATS.REPORT_COL_USAGE('TEST','TAB1')
--------------------------------------------------------------------------------
LEGEND:
.......

EQ : Used in single table EQuality
predicate
RANGE : Used in single table RANGE predicate
LIKE : Used in single table LIKE p
redicate
NULL : Used in single table is (not) NULL pre
dicate
EQ_JOIN : Used in EQuality JOIN predicate
NONEQ_JOIN : Used in NON EQuality JOIN predicate

FILTER : Used in single table FILTER predicate
JOIN : Used in JOIN predicate
GROUP_BY : Used in GROUP BY expression
...................................................
............................

##################################################
#############################

COLUMN USAGE REPORT FOR TEST.TAB1
.................................

1. BEARD :
EQ
2. GENDER : EQ
3. Y_CHROMOSOME : EQ
4. (GENDER, Y_CHROMOSOME, BEARD) : FILTER
########################################################
#######################


At this point, there are still no column group statistics available.

COLUMN extension_name FORMAT A30
COLUMN extension FORMAT A45

SELECT extension_name, extension
FROM dba_stat_extensions
WHERE table_name = 'TAB1';


Once we gather default statistics, the column group statistics are created.

EXEC DBMS_STATS.gather_table_stats('TEST', 'TAB1');

COLUMN extension_name FORMAT A30
COLUMN extension FORMAT A45

SELECT extension_name, extension
FROM dba_stat_extensions
WHERE table_name = 'TAB1';

EXTENSION_NAME EXTENSION
------------------------------ ---------------------------------------------
SYS_STSJ_NI$1E_EDU917#V63HO2OA ("GENDER","Y_CHROMOSOME","BEARD")

We can see the resulting statistics in the USER_TAB_COL_STATISTICS view.


COLUMN column_name FORMAT A30

SELECT column_name, num_distinct, num_nulls, histogram
FROM user_tab_col_statistics
WHERE table_name = 'TAB1'
ORDER BY column_name;

COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ---------------
BEARD 2 0 FREQUENCY
GENDER 2 0 FREQUENCY
ID 10000 0 NONE
NAME 10000 0 NONE
SYS_STSJ_NI$1E_EDU917#V63HO2OA 3 0 FREQUENCY
Y_CHROMOSOME 2 0 FREQUENCY

#Conclusion
The column group is only automatically detected during a hard parse or a 
reoptimization of a statement in the presence of persisted SQL plan directives. 
The SQL plan directive doesn't have to be used in the generation 
of the final execution plan, for this to happen.

For a query that we would expect to trigger automatic column group detection, 
we seem to have at least three possible scenarios that result in different 
behaviour.

1. If the lack of good statistics force the use of dynamic statistics directly, 
no automatic column group detection will be performed. At some point, the 
statistics will be refreshed and one of the other scenarios will then be available.

2. If a cardinality misestimate results in both statistics feedback and 
SQL plan directives being produced, the statistics feedback takes priority 
during the reoptimization the next time the query is run. 
If the query is run a second time before the SQL plan directives are persisted, 
no automatic column group detection will be triggered during the reoptimization 
using statistics feedback. The next hard parse after the SQL plan directives 
are persisted will result in automatic column group detection, but depending on how 
long the statement stays in the shared pool, this could be quite a delay.

3. If SQL plan directives are persisted before the query is run a second time, 
automatic column groups detection is triggered during the reoptimization, 
whether the SQL plan directives or statistics feedback are used for 
the reoptimization of the execution plan.

It is only after the column group is detected that default statistics 
gathering will be able to create the column group statistics.

======================
Path to Documentation:
======================
Documentation : Performance
SQL Tuning Guide
13 Managing Optimizer Statistics: Advanced Topics
->Managing Extended Statistics

13 Managing Optimizer Statistics: Advanced Topics
-> Controlling Dynamic Statistics
-> Managing SQL Plan Directives

Thank you for visiting this blog 🙂