Architecture The three Exadata-specific features that deliver the biggest performance gains. Smart Scan — how it bypasses the buffer cache and why that matters. Storage Indexes — how Exadata avoids I/O entirely for certain queries. Hybrid Columnar Compression (HCC) — how Exadata achieves 10x to 50x compression and when to use each compression tier. Includes the SQL to check if Smart Scan is happening. Exadata Smart Scan, Storage Indexes, and HCC — The Three Features That Change Everything | punitoracledba

Oracle Exadata Series — Basics to Pro · Article 4 of 10 · punitoracledba.blogspot.com

Exadata Smart Scan, Storage Indexes, and HCC — The Three Features That Change Everything

Exadata — Basics to Pro Series 1. What Is Exadata · 2. Hardware Components · 3. Architecture Deep Dive · 4. Smart Scan, Storage Indexes, HCC · 5. Monitoring · 6. Performance Tuning · 7. Administration · 8. Patching · 9. EBS on Exadata · 10. OCI Exadata

If you ask an experienced Exadata DBA to name the three features that deliver the most real-world performance benefit, the answer is always the same: Smart Scan, Storage Indexes, and Hybrid Columnar Compression. Together, these three features can reduce the I/O a query performs by over 99% compared to traditional storage.

This article explains how each one works internally, what conditions must be met for them to activate, how they interact with each other, and how to verify using SQL that they are actually doing their job. This is the article to bookmark when you need to explain Exadata performance to a developer or architect.

These three features work together as a stack. Storage Indexes eliminate I/O before disk is read. Smart Scan filters what comes off disk before it crosses the network. HCC reduces what is stored on disk in the first place. They compound — the benefit of all three together is greater than the sum of each individually.

Feature 1 — Smart Scan

Smart Scan is Exadata's SQL offload capability. Instead of reading raw database blocks from storage and filtering them at the database server, Smart Scan pushes the filtering work down to the storage cell — so only matching rows and requested columns ever travel across the RoCE network to the database server.

Article 3 covered the mechanics in full. This article focuses on the activation conditions, what disables Smart Scan, and how to verify it is working.

The four conditions required for Smart Scan

Smart Scan does not activate automatically for every query. Four conditions must all be true simultaneously:

Direct Path Read must be used. Smart Scan is invoked from within Oracle's direct path read code path — not from the buffer cache read path. If Oracle reads from the buffer cache, it never calls the Smart Scan function (kcfis_read). Direct path reads bypass the buffer cache and read data into PGA memory instead.
The object must be stored on Exadata storage. The ASM disk group containing the object must have cell.smart_scan_capable = TRUE. Objects on non-Exadata storage cannot use Smart Scan — the storage layer has no cellsrv to process the iDB request.
The operation must be a full scan. Smart Scan activates for full table scans, fast full index scans, and fast full bitmap index scans. Single-block lookups via index range scans do not qualify — they use conventional I/O via the buffer cache and are not candidates for offloading.
CELL_OFFLOAD_PROCESSING must be TRUE. This database initialisation parameter controls whether Smart Scan processing is enabled. It defaults to TRUE on Exadata. It must not have been explicitly disabled at the instance, session, or statement level.

The most common reason Smart Scan does not activate in practice is that the query is serial and the segment is small enough that Oracle decides to use buffer cache reads instead of direct path reads. Parallel queries automatically use direct path reads and therefore automatically qualify for Smart Scan. For serial queries, Oracle makes a dynamic decision at runtime based on segment size relative to buffer cache size.

When Direct Path Read activates for serial queries

For parallel queries — always. For serial queries — Oracle makes a dynamic decision at runtime for each segment separately, based on:

The size of the segment being scanned — larger segments are more likely to trigger direct path reads
The current buffer cache size and how much of the segment is already cached
The _small_table_threshold hidden parameter — segments below approximately 5x this value may use buffer cache reads
Whether the table has been recently accessed and is warm in the cache

This dynamic decision can cause inconsistent Smart Scan behaviour for the same query run at different times. We cover the diagnosis and resolution of this in detail in Article 6.

What Smart Scan does once activated

Smart Scan Operation	What Is Eliminated	Where It Happens
Predicate filtering	Rows that do not match the WHERE clause — never sent to the DB node	Storage cell CPUs via CELLOFLSRVn
Column projection	Columns not in the SELECT list or JOIN conditions — never transmitted	Storage cell CPUs via CELLOFLSRVn
Bloom filter join offload	Rows that cannot survive a join with a dimension table — eliminated at the cell	Storage cell CPUs via CELLOFLSRVn
HCC decompression	HCC data is decompressed at the cell before filtering — only matching rows sent	Storage cell CPUs via CELLOFLSRVn

Smart Scan and the buffer cache

Smart Scan results are not stored in the buffer cache. Because cellsrv returns row sets rather than raw block images, the result does not conform to Oracle's block format and cannot be cached. This is intentional — Smart Scan is designed for large analytic scans where caching the entire dataset would overwhelm the buffer cache. The data flows directly into PGA memory for assembly and is returned to the client without polluting the buffer cache with cold data.

Verify Smart Scan is active for your session

-- Check Smart Scan statistics for the current session
SELECT name, value
FROM   v$mystat ms
       JOIN v$statname sn ON ms.statistic# = sn.statistic#
WHERE  sn.name LIKE 'cell%'
AND    value > 0
ORDER  BY sn.name;

-- Key statistics to look for:
-- cell physical IO interconnect bytes             : total bytes sent between cell and DB node
-- cell physical IO bytes eligible for predicate offload : bytes that went through Smart Scan
-- cell physical IO interconnect bytes returned by smart scan : bytes actually returned after filtering
-- cell scans                                      : number of Smart Scan operations performed

-- The efficiency ratio:
-- bytes returned by smart scan / bytes eligible for predicate offload
-- A ratio of 0.01 (1%) means 99% I/O reduction via Smart Scan

Check Smart Scan activity for a specific SQL_ID

-- Check Smart Scan offload for a specific SQL statement
SELECT sql_id,
       io_cell_offload_eligible_bytes   / 1024 / 1024 / 1024 AS eligible_gb,
       io_interconnect_bytes            / 1024 / 1024 / 1024 AS interconnect_gb,
       io_cell_offload_returned_bytes   / 1024 / 1024 / 1024 AS returned_gb,
       ROUND(
         (1 - io_cell_offload_returned_bytes /
              NULLIF(io_cell_offload_eligible_bytes,0)) * 100, 2
       ) AS pct_saved
FROM   v$sql
WHERE  sql_id = '&your_sql_id'
ORDER  BY last_active_time DESC
FETCH  FIRST 5 ROWS ONLY;

-- pct_saved approaching 99% = excellent Smart Scan efficiency
-- pct_saved = 0 = Smart Scan not activating for this query

Check execution plan for Smart Scan indicator -- After running a query, check its plan from cursor cache SELECT * FROM TABLE( DBMS_XPLAN.DISPLAY_CURSOR('&sql_id', NULL, 'ALLSTATS LAST') ); -- In the plan output, Smart Scan shows as: -- TABLE ACCESS STORAGE FULL -- "STORAGE" indicates Smart Scan offload -- Without Smart Scan it would show: -- TABLE ACCESS FULL -- no "STORAGE" keyword

Disable and enable Smart Scan at session level for testing -- Disable Smart Scan for this session only (testing/diagnosis) ALTER SESSION SET cell_offload_processing = FALSE; -- Re-enable Smart Scan ALTER SESSION SET cell_offload_processing = TRUE; -- Force direct path read for serial queries (testing only) -- Not recommended for production without Oracle guidance ALTER SESSION SET "_serial_direct_read" = ALWAYS;

Feature 2 — Storage Indexes

Storage Indexes are an in-memory optimisation that happens before Smart Scan. While Smart Scan reduces what crosses the network after reading disk, Storage Indexes eliminate disk reads entirely. They represent the fastest possible I/O — no I/O at all.

What a Storage Index is

A Storage Index is an in-memory data structure maintained automatically on each storage cell. It tracks the minimum and maximum values of up to eight columns for each storage region — a 1 MB contiguous area of disk on the storage cell called a storage region.

When cellsrv receives a Smart Scan request with a predicate, it first consults the Storage Index for each storage region before reading any data from disk. If the Storage Index shows that the min/max range of the predicate column in a given region cannot possibly satisfy the WHERE clause, that entire region is skipped — no disk I/O, no flash read, nothing.

A concrete example

Consider a table with a column ORDER_DATE and a query with WHERE ORDER_DATE = DATE '2026-04-21'.

Storage region 1 contains ORDER_DATE values from 01-Jan-2023 to 31-Dec-2023. The Storage Index records min=01-Jan-2023, max=31-Dec-2023. Since 21-Apr-2026 is outside this range, region 1 is skipped entirely — zero disk I/O.
Storage region 2 contains ORDER_DATE values from 01-Jan-2026 to 30-Jun-2026. The Storage Index records min=01-Jan-2026, max=30-Jun-2026. Since 21-Apr-2026 falls within this range, region 2 must be read. Smart Scan then filters it further.
Storage region 3 contains ORDER_DATE values from 01-Jul-2026 to 31-Dec-2026. Min=01-Jul-2026 is already past 21-Apr-2026, so region 3 is skipped — zero disk I/O.

Storage Indexes are most effective when data has natural clustering — when similar values tend to be stored in adjacent storage regions. Data loaded in ORDER_DATE sequence, or partitioned by date, will have excellent Storage Index effectiveness. Randomly distributed data provides less benefit because min/max ranges overlap across regions and fewer regions can be eliminated.

How Storage Indexes are built and maintained

Built automatically by Exadata System Software — no DBA action required, no DDL, no maintenance
Maintained in the storage cell's DRAM — separate from XRMEM, part of cellsrv's own memory
Updated continuously as data is written to each storage region — always current
Tracks up to 8 columns per storage region — selected automatically based on observed query predicates
Rebuilt automatically if the cell restarts — they are regenerated from the data patterns observed after restart
Cannot be created, dropped, or explicitly managed by the DBA — they are entirely automatic

Verifying Storage Index savings

Check Storage Index savings from V$SYSSTAT -- Check storage index bytes saved for current session SELECT name, value / 1024 / 1024 / 1024 AS gb_saved FROM v$mystat ms JOIN v$statname sn ON ms.statistic# = sn.statistic# WHERE sn.name = 'cell physical IO bytes saved by storage index' AND value > 0; -- System-wide storage index savings SELECT name, value / 1024 / 1024 / 1024 AS total_gb_saved FROM v$sysstat WHERE name = 'cell physical IO bytes saved by storage index'; -- From AWR: compare eligible bytes vs storage index savings -- High ratio = Storage Indexes are very effective for this workload

Check Storage Index effectiveness from cellcli on a storage cell # SSH to any storage cell ssh root@cell01 cellcli # Check storage index statistics CellCLI> LIST METRICCURRENT WHERE name LIKE 'SI_%' # SI_NUM_COLUMN_COLUMNS : number of columns tracked by storage indexes # SI_STORAGE_SAVED : bytes saved by storage index eliminations

Storage Indexes vs traditional database indexes

Factor	Traditional Index	Exadata Storage Index
Creation	Manual — DBA runs CREATE INDEX	Automatic — built by Exadata System Software
Maintenance	Manual — rebuild, analyse, monitor fragmentation	Fully automatic — no DBA involvement
Storage cost	Significant — index segments use database storage	Zero — stored in cell DRAM, no disk storage cost
DML overhead	Every INSERT, UPDATE, DELETE updates the index	No DML overhead — maintained asynchronously
Granularity	Row level — points to specific rows	Storage region level (1 MB) — eliminates entire regions
Best for	OLTP lookups — returns specific rows via index range scan	Analytics — eliminates large I/O regions during full scans
Works with Smart Scan	Index range scans bypass Smart Scan entirely	Works before Smart Scan — eliminates regions before Smart Scan reads them

Feature 3 — Hybrid Columnar Compression (HCC)

Hybrid Columnar Compression is an Exadata-exclusive compression technology that enables dramatic reductions in storage for large datasets. Storage savings typically range from 5x to 20x, with an industry-leading average of 10x. HCC is not available on standard Oracle storage — it requires Exadata storage cells.

How HCC works internally — Compression Units

Standard Oracle compression stores data in row format — each row is stored together, one row after another. HCC uses a fundamentally different approach based on Compression Units (CUs).

A Compression Unit is a logical storage structure larger than a database block — CU size ranges from 32 KB to 256 KB depending on the compression tier. Within a CU, HCC groups data by column rather than by row. All values from column 1 for a set of rows are stored together, then all values from column 2, and so on.

Because similar data types and values are grouped together within each column, compression algorithms achieve dramatically higher ratios than row-based storage. A column of account status codes containing mostly 'ACTIVE' compresses to almost nothing when stored together. In row format, those same values are interspersed with dozens of other column values and compress much less efficiently.

The four HCC compression tiers

HCC provides four compression tiers — two Query tiers and two Archive tiers. Each tier trades compression ratio against query performance and DML support.

Query Low

~6x

Optimised for faster load times. Lower compression than Query High but faster data ingestion. Use when load time SLA is more critical than storage savings.

Query High (default)

~10x

Default tier. Up to 10x storage reduction. Best balance of compression and query performance for data warehouse workloads. Recommended starting point.

Archive Low

~15x

Higher compression than Query tiers. Query performance lower than Query tiers. Use for data accessed infrequently — historical partitions, compliance archives.

Archive High

~50x

Maximum compression. Highest decompression overhead. Use only for rarely accessed cold data — regulatory archives, long-term storage. Not for query workloads.

Tier	Syntax	Compression	Query Speed	Best Use
Query Low	`COMPRESS FOR QUERY LOW`	~6x	Fastest	Active data warehouse, frequent reads, fast load requirement
Query High	`COMPRESS FOR QUERY HIGH`	~10x	Fast	Default choice for data warehouse tables — best compression/performance balance
Archive Low	`COMPRESS FOR ARCHIVE LOW`	~15x	Moderate	Historical data — queried occasionally, older partitions
Archive High	`COMPRESS FOR ARCHIVE HIGH`	~50x	Slow	Cold archives — regulatory compliance, accessed very rarely

HCC and DML — the important limitation

HCC is best suited for data with no or very limited DML operations. This is the most important constraint to communicate to application teams.

The reason is the Compression Unit locking model. HCC uses one lock per CU — not one lock per row. When an UPDATE or DELETE targets a single row inside a CU, the entire CU is migrated to row format (decompressed) to allow the DML. This can cause significant space and performance problems for tables with frequent DML.

Bulk loads using direct path INSERT — fully supported, excellent performance
INSERT ... SELECT — supported with direct path from Oracle Database 12.2 onwards
SELECT queries including Smart Scan — fully supported, HCC decompression happens at the cell
Frequent UPDATE or DELETE — not recommended. Use Advanced Row Compression instead.
Infrequent single-row updates — acceptable but each update decompresses the affected CU

HCC and Smart Scan — how they work together

HCC and Smart Scan are designed to work together on Exadata. When a Smart Scan reads HCC-compressed data, the decompression happens on the storage cell's CPUs — not on the database server. The storage cell decompresses the CU, applies predicate filtering, projects the requested columns, and returns only the matching uncompressed rows.

The result is a compounding effect: HCC reduces the number of blocks on disk (fewer blocks to read), Smart Scan filters what comes off those blocks (less data across the network), and Storage Indexes eliminate entire regions before either feature operates (some blocks never read at all).

For data analytics workloads, Exadata Smart Flash Cache implements an additional optimisation called Exadata Columnar Flash Cache (CELLMEMORY). As HCC-compressed data is loaded into Smart Flash Cache, Exadata automatically transforms it into a pure columnar format. Smart Scans on this columnar flash data run faster because they read only the selected columns — further reducing flash I/Os and storage server CPU consumption.

How to create HCC-compressed tables and partitions

Create a table with HCC compression -- Create a new table with HCC Query High (default — recommended) CREATE TABLE sales_history ( sale_id NUMBER, customer_id NUMBER, sale_date DATE, amount NUMBER, status VARCHAR2(20) ) COMPRESS FOR QUERY HIGH; -- Create with Archive High for cold data CREATE TABLE audit_log_archive ( log_id NUMBER, log_date DATE, action VARCHAR2(100) ) COMPRESS FOR ARCHIVE HIGH; -- Compress an existing table (rebuilds the table) ALTER TABLE sales_history MOVE COMPRESS FOR QUERY HIGH; -- Add HCC to a new partition on a partitioned table ALTER TABLE sales ADD PARTITION p_2025 VALUES LESS THAN (DATE '2026-01-01') COMPRESS FOR QUERY HIGH; -- Change compression tier for an existing partition ALTER TABLE sales MOVE PARTITION p_2020 COMPRESS FOR ARCHIVE HIGH;

Check current compression on tables and partitions -- Check table-level compression SELECT table_name, compression, compress_for FROM dba_tables WHERE owner = 'APPS' AND compression = 'ENABLED' ORDER BY table_name; -- Check partition-level compression SELECT table_name, partition_name, compression, compress_for FROM dba_tab_partitions WHERE table_owner = 'APPS' ORDER BY table_name, partition_position; -- Estimate compression savings before applying EXEC DBMS_COMPRESSION.GET_COMPRESSION_RATIO( scratchtbsname => 'TEMP', ownname => 'APPS', tabname => 'SALES_HISTORY', partname => NULL, comptype => DBMS_COMPRESSION.COMP_FOR_QUERY_HIGH, blkcnt_cmp => :blk_cmp, blkcnt_uncmp => :blk_uncmp, row_cmp => :row_cmp, row_uncmp => :row_uncmp, cmp_ratio => :ratio, comptype_str => :type_str );

How All Three Features Work Together

The real power of Exadata is not any single feature — it is the combination. Consider the same 1 TB query from Article 3, now with all three features active.

Stage	Feature	What Happens	I/O Remaining
Before disk	HCC	Data compressed 10x — 1 TB of logical data stored in 100 GB on disk. Fewer blocks to read.	100 GB on disk (was 1 TB)
Before disk read	Storage Index	70% of storage regions eliminated because ORDER_DATE min/max cannot satisfy WHERE clause. Zero disk reads for those regions.	30 GB must be read from disk
After disk read	Smart Scan	Cell decompresses the 30 GB, applies predicate filtering (only 2% match), projects only requested columns. Only matching rows cross the network.	~600 MB crosses network
At DB node	—	DB node assembles result from all cells. Final aggregation if needed. Returns result to user.	Result set returned

Without any Exadata features: 1 TB crosses the storage network. With all three features combined: approximately 600 MB crosses the network — a reduction of over 99.9%. The query that took hours on traditional storage can complete in minutes or seconds.

Complete Smart Scan Verification Script

Use this script before and after a query to measure exactly how much work Exadata saved.

Full Smart Scan measurement — run before and after your query -- Step 1: Capture baseline statistics before your query CREATE TABLE ss_before AS SELECT name, value FROM v$mystat ms JOIN v$statname sn ON ms.statistic# = sn.statistic# WHERE sn.name IN ( 'cell scans', 'cell physical IO interconnect bytes', 'cell physical IO bytes eligible for predicate offload', 'cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index', 'physical read total bytes' ); -- Step 2: Run your query here -- Step 3: Compare after vs before SELECT after.name, after.value - before.value AS delta FROM (SELECT name, value FROM v$mystat ms JOIN v$statname sn ON ms.statistic# = sn.statistic# WHERE sn.name IN ( 'cell scans', 'cell physical IO interconnect bytes', 'cell physical IO bytes eligible for predicate offload', 'cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index', 'physical read total bytes' )) after JOIN ss_before before ON after.name = before.name WHERE after.value - before.value > 0 ORDER BY after.name; -- Key ratios to calculate from the output: -- -- Smart Scan efficiency: -- returned_by_smart_scan / eligible_for_predicate_offload -- Lower = better. 0.01 means 99% reduction. -- -- Storage Index efficiency: -- bytes_saved_by_storage_index / eligible_for_predicate_offload -- Higher = better. Shows what Storage Index eliminated before disk read. -- -- Overall I/O reduction: -- (physical_read_total - interconnect_bytes) / physical_read_total -- Shows total I/O saving across both features.

Summary — The Three Features in One View

Feature	When It Operates	What It Eliminates	Managed By
Storage Index	Before disk read — during Smart Scan I/O planning	Entire 1 MB storage regions where predicates cannot match	Automatic — no DBA action
Smart Scan	During disk read — when Direct Path Read + Full Scan conditions are met	Non-matching rows and unrequested columns before network transmission	Automatic — requires CELL_OFFLOAD_PROCESSING=TRUE
HCC	Always — data stored in compressed format, decompressed at cell during Smart Scan	Disk space and I/O — 6x to 50x fewer blocks to read in the first place	DBA configures at table/partition level — automatic after that

Smart Scan requires Direct Path Read, a full scan operation, Exadata storage, and CELL_OFFLOAD_PROCESSING=TRUE
Storage Indexes are fully automatic — track min/max per storage region for up to 8 columns — no creation or maintenance by the DBA
HCC has four tiers — Query Low (~6x), Query High (~10x, default), Archive Low (~15x), Archive High (~50x)
HCC is for data with minimal DML — frequent updates should use Advanced Row Compression instead
All three features compound — HCC reduces disk I/O, Storage Indexes eliminate regions, Smart Scan filters network I/O
Verify with cell physical IO bytes eligible for predicate offload vs cell physical IO interconnect bytes returned by smart scan in V$SYSSTAT or V$MYSTAT

Database Consultant (Passionate about Database & Cloud Technologies)

Friday, May 22, 2026