Sunday, June 17, 2018

HISTOGRAMS

Nice article explained about Histograms-

HISTOGRAMS


1.What Are Histograms?
Histograms are additional, “special” column statistics.
These statistics have information about the data distribution in the column.
The values of the column that has histograms are sorted into buckets. These buckets are statistic buckets, as the data itself on the disk is not sorted.
Based on the number of distinct values in the column, the database will pick one of the four types of histograms (as of 12c and up):
-Frequency Histograms
-Top Frequency Histograms
-Height-Balanced Histograms (legacy)
-Hybrid Histograms
These histogram types, will be discussed in a future blog post.
2.Why Do We Need Histograms?

Without histograms, the optimizer assumes the data is distributed uniformly in the column. How would that look like?
Lets take for example table A that has 100 rows, with 4 distinct values for the column color:
red, blue, green, yellow.
Without histograms, the optimizer assumes that there are 25 rows with color red, 25 rows with color blue, 25 rows with color green, 25 rows with color yellow.
Why is this a bad assumption?
What if there is only one row with color red, 1 row with color blue, and 1 row with color green, and then there are 97 rows of color yellow.
In this case the data distribution is not uniform. We can say that the data is skewed. If the data is skewed, then the optimizer might (and most likely will) generate inaccurate cardinality estimates, which can lead to sub-optimal execution plans.
Histograms could make the difference in the optimizer picking an execution plan with a full table scan or an execution plan with an index scan.
3.When Are Histograms Created?
If you gather stats on a table using dbms_stats, with the METHOD_OPT set to SIZE AUTO (which is the default), then the database will create histograms automatically if needed.
How does the database know that histograms are needed?
After you gathered stats on the table, and ran some select statements, the following dictionary table: SYS.COL_USAGE$ is updated, with information about previous 

predicates used in queries.
Then you gather stats again. Now the database will check SYS.COL_USAGE$, to see which columns will need histograms. If needed, it will gather histograms.
If you enjoyed this article, and would like to learn more about databases, please sign up below, and you will receive



No comments: