Nice article explained about Histograms-
HISTOGRAMS
HISTOGRAMS
1.What Are Histograms?
Histograms are
additional, “special” column statistics.
These statistics have information about the data distribution in the column.
The values of the column that has histograms are sorted into buckets. These buckets are statistic buckets, as the data itself on the disk is not sorted.
These statistics have information about the data distribution in the column.
The values of the column that has histograms are sorted into buckets. These buckets are statistic buckets, as the data itself on the disk is not sorted.
Based on the number of
distinct values in the column, the database will pick one of the four types of
histograms (as of 12c and up):
-Frequency Histograms
-Top Frequency Histograms
-Height-Balanced Histograms (legacy)
-Hybrid Histograms
-Top Frequency Histograms
-Height-Balanced Histograms (legacy)
-Hybrid Histograms
These histogram types,
will be discussed in a future blog post.
2.Why Do We Need Histograms?
Without histograms,
the optimizer assumes the data is distributed uniformly in the column. How
would that look like?
Lets take for example
table A that has 100 rows, with 4 distinct values for the column color:
red, blue, green, yellow.
red, blue, green, yellow.
Without histograms,
the optimizer assumes that there are 25 rows with color red, 25 rows with color
blue, 25 rows with color green, 25 rows with color yellow.
Why is this a bad
assumption?
What if there is only
one row with color red, 1 row with color blue, and 1 row with color green, and
then there are 97 rows of color yellow.
In this case the data
distribution is not uniform. We can say that the data is skewed. If the data is
skewed, then the optimizer might (and most likely will) generate inaccurate
cardinality estimates, which can lead to sub-optimal execution plans.
Histograms could make
the difference in the optimizer picking an execution plan with a full table
scan or an execution plan with an index scan.
3.When Are Histograms Created?
If you gather stats on
a table using dbms_stats, with the METHOD_OPT set to SIZE AUTO (which is the
default), then the database will create histograms automatically if needed.
How does the database know that histograms are needed?
How does the database know that histograms are needed?
After you gathered stats on
the table, and ran some select statements, the following dictionary
table: SYS.COL_USAGE$ is updated, with information about
previous
predicates used in
queries.
Then you gather stats again. Now the database will check SYS.COL_USAGE$, to see which columns will need histograms. If needed, it will gather histograms.
Then you gather stats again. Now the database will check SYS.COL_USAGE$, to see which columns will need histograms. If needed, it will gather histograms.
If you enjoyed this
article, and would like to learn more about databases, please sign up below,
and you will receive