Oracle Exadata Smart Scan & Storage Index — What Every DBA Must Know
A practical DBA guide to Smart Scan, Storage Indexes, verification queries, and real-world Exadata tuning.
Oracle Exadata Smart Scan & Storage Index — What Every DBA Must Know
Introduction
You bought Exadata. Management expects magic. Queries are still slow.
Sound familiar?
Exadata is not a "plug it in and queries fly" machine. The hardware is extraordinary, but the performance gains — the ones that justify the price tag — come from two specific features: Smart Scan and Storage Indexes. If these aren't firing, you're running a very expensive conventional Oracle database.
I've worked as an Exadata DMA (Database Machine Administrator) across multiple clients. This article covers what Smart Scan and Storage Index actually are, exactly when they fire (and when they don't), how to verify them in production, and the tuning moves that make the difference.
What is Smart Scan?
Smart Scan is Exadata's most important feature. Instead of shipping raw data blocks from storage cells to the database server (the traditional I/O model), Smart Scan pushes SQL processing down to the storage layer.
Traditional Oracle I/O vs Smart Scan
TRADITIONAL MODEL:
┌─────────────────────┐ ┌──────────────────────┐
│ Database Server │ │ Storage (SAN/NAS) │
│ │ │ │
│ SQL Engine │◀──blocks──│ Raw data blocks │
│ (does all work) │ │ (no intelligence) │
└─────────────────────┘ └──────────────────────┘
Network: moves EVERYTHING — even rows that will be filtered out
EXADATA SMART SCAN:
┌─────────────────────┐ ┌──────────────────────┐
│ Database Server │ │ Exadata Cell │
│ │ │ │
│ SQL Engine │◀──results─│ Cell Offload Engine │
│ (receives results) │ │ - Applies WHERE │
│ │ │ - Projects columns │
│ │ │ - Evaluates joins │
└─────────────────────┘ └──────────────────────┘
Network: moves ONLY qualifying rows and projected columns
The Exadata cell (storage node) has its own CPU running the Cell Offload Engine. It applies your WHERE clause predicates, column projection, and even join filtering before any data leaves storage. The database server receives results, not raw blocks.
What Smart Scan Offloads
- Row filtering —
WHERE salary > 100000applied at storage, only matching rows returned - Column projection —
SELECT name, deptonly returns those two columns, not the full row - JOIN filtering — Bloom filters pushed to storage for hash joins
- Aggregation — Some
GROUP BYandCOUNToperations - Storage Index evaluation — See next section
What is a Storage Index?
A Storage Index is an in-memory structure that lives on each Exadata storage cell. It is not a traditional B-tree or bitmap index — it doesn't live in your tablespace, has no segment, and consumes no disk space.
Each storage cell maintains a Storage Index for every 1 MB of data on disk (called a Storage Region). For each region, the Storage Index records:
Storage Region (1 MB of data)
┌────────────────────────────────────────────────┐
│ Column: SALARY │
│ MIN value in this region: 45,000 │
│ MAX value in this region: 92,000 │
│ Column: DEPARTMENT_ID │
│ MIN value in this region: 10 │
│ MAX value in this region: 80 │
│ Column: HIRE_DATE │
│ MIN value in this region: 01-JAN-2010 │
│ MAX value in this region: 31-DEC-2015 │
└────────────────────────────────────────────────┘
When your query has WHERE SALARY > 150000, the cell checks its Storage Index:
- If the region's MAX SALARY is 92,000, the entire 1 MB region is skipped — no I/O at all
- If the region's MAX is 180,000 (could contain matching rows), the region is read
This is pure I/O elimination — the most powerful type of performance gain.
Storage Index Properties
- Automatically maintained — Oracle builds and updates it as data is written and read
- In-memory only — Lives in cell RAM, not persisted to disk (rebuilt after cell restart)
- Per-column, per-region — Each 1 MB storage region has MIN/MAX for up to 8 columns
- Tracks NULLs — Knows if a region has no NULLs (enables further optimization)
- No DBA action required to create — But DBA actions can prevent it from working
When Does Smart Scan Fire? (The Conditions)
This is where most DBAs get confused. Smart Scan does not fire on every query. It has strict prerequisites.
Condition 1: Full Table Scan or Full Index Scan
-- Smart Scan CAN fire (full scan)
SELECT * FROM sales WHERE region = 'WEST';
-- Smart Scan CANNOT fire (index lookup, single block I/O)
SELECT * FROM sales WHERE sale_id = 12345;
Smart Scan requires multi-block I/O (db file scattered read or direct path read). Single-block I/O from index lookups bypasses Smart Scan entirely.
Condition 2: Direct Path Read
The scan must use direct path reads (bypassing the buffer cache). This happens automatically when:
- The table is larger than
_small_table_threshold(typically ~2% of buffer cache) - Parallel query is used
- Serial full scans on large tables (11g R2+ auto-decides)
-- Force direct path read for testing
ALTER SESSION SET "_serial_direct_read" = TRUE;
Condition 3: Exadata Storage (Cells Must Serve the Data)
Smart Scan only works for data stored on Exadata cells. Data in:
- ASM diskgroups on non-Exadata storage → No Smart Scan
- Locally-attached disks on non-cell nodes → No Smart Scan
- Exadata Smart Flash Cache → Smart Scan still applies
Condition 4: No Incompatible Operations in the Query
Certain operations prevent Smart Scan offload even on Exadata:
| Operation | Smart Scan? | Why |
|---|---|---|
| Full table scan | ✅ YES | Multi-block direct path |
| Parallel full scan | ✅ YES | Direct path |
| Full index scan | ✅ YES (in some cases) | Multi-block |
| Index range scan | ❌ NO | Single-block I/O |
ROWID access |
❌ NO | Single-block I/O |
| Encrypted columns (TDE) | ❌ NO | Cell cannot decrypt |
| Object types / XMLType | ❌ NO | Complex types |
SELECT FOR UPDATE |
❌ NO | Locks prevent offload |
| Dictionary tables | ❌ NO | Small, always cached |
When Does Storage Index Fire?
Storage Index is even more selective. It fires when:
- Smart Scan is already active (prerequisite)
- The
WHEREclause predicate is on a column tracked by the Storage Index - The predicate allows MIN/MAX elimination (range, equality, IN list)
Predicates That Work with Storage Index
-- These predicates enable Storage Index elimination
WHERE salary > 100000 -- range scan, MIN/MAX comparison
WHERE hire_date < DATE '2015-01-01' -- date range
WHERE department_id = 50 -- equality (if not in range)
WHERE region IN ('EAST','WEST') -- IN list (checks each value)
WHERE status IS NULL -- NULL check (if region has no NULLs)
Predicates That DON'T Work with Storage Index
-- Functions defeat Storage Index
WHERE UPPER(last_name) = 'SMITH' -- function wrapping
WHERE TRUNC(hire_date) = SYSDATE-30 -- function on column
WHERE salary * 1.1 > 100000 -- arithmetic on column
-- LIKE with leading wildcard
WHERE product_name LIKE '%WIDGET%' -- can't use MIN/MAX
-- Columns not in the Storage Index (tracked up to 8 per region)
WHERE rarely_queried_col = 'X' -- may not be tracked
Key rule: Never wrap a filterable column in a function if you want Storage Index (or any index) to work.
How to Verify Smart Scan is Working
Method 1: V$SQL Statistics
-- Check Smart Scan stats for a specific SQL
SELECT
s.sql_id,
s.sql_text,
s.executions,
s.io_cell_offload_eligible_bytes / 1024 / 1024 / 1024 AS offload_eligible_gb,
s.io_cell_offload_returned_bytes / 1024 / 1024 / 1024 AS returned_gb,
ROUND(
(1 - s.io_cell_offload_returned_bytes /
NULLIF(s.io_cell_offload_eligible_bytes, 0)) * 100, 2
) AS offload_efficiency_pct,
s.io_interconnect_bytes / 1024 / 1024 AS interconnect_mb
FROM v$sql s
WHERE sql_id = '&your_sql_id'
AND s.io_cell_offload_eligible_bytes > 0;
What to look for:
offload_eligible_gb— how much data was eligible for Smart Scanreturned_gb— how much came back to the DB serveroffload_efficiency_pct— aim for > 90% for analytical queries- If
offload_eligible_bytes = 0→ Smart Scan is NOT firing
Method 2: Session-Level Statistics (For Testing)
-- Enable extended stats collection
ALTER SESSION SET STATISTICS_LEVEL = ALL;
ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';
-- Run your query
SELECT COUNT(*) FROM sales WHERE region = 'WEST' AND amount > 10000;
-- Check session stats
SELECT n.name, s.value
FROM v$mystat s
JOIN v$statname n ON n.statistic# = s.statistic#
WHERE n.name IN (
'cell physical IO bytes eligible for predicate offload',
'cell physical IO bytes saved by storage index',
'cell physical IO bytes sent directly to DB node to balance CPU',
'cell IO uncompressed bytes',
'cell physical IO interconnect bytes'
)
AND s.value > 0
ORDER BY n.name;
The most important stat: cell physical IO bytes saved by storage index
- If this is > 0, Storage Index is eliminating I/O
- Compare to
cell physical IO bytes eligible for predicate offloadto see the ratio
Method 3: Execution Plan — Look for These Keywords
EXPLAIN PLAN FOR
SELECT /*+ FULL(s) */ COUNT(*) FROM sales s WHERE region = 'WEST';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(FORMAT => 'ALL'));
Look for in the Notes section:
NOTE
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of target master table
- cell offloading enabled for this statement ← Smart Scan IS firing
- storage index used for this statement ← Storage Index IS firing
Method 4: AWR — Top Exadata Cell Stats
-- Find top SQL by offload efficiency from AWR
SELECT
sql_id,
ROUND(SUM(io_cell_offload_eligible_bytes)/1073741824, 2) AS eligible_gb,
ROUND(SUM(io_cell_offload_returned_bytes)/1073741824, 2) AS returned_gb,
ROUND((1 - SUM(io_cell_offload_returned_bytes)/
NULLIF(SUM(io_cell_offload_eligible_bytes),0))*100, 1) AS efficiency_pct
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN &begin_snap AND &end_snap
AND io_cell_offload_eligible_bytes > 0
GROUP BY sql_id
ORDER BY eligible_gb DESC
FETCH FIRST 20 ROWS ONLY;
Method 5: CellCLI on Storage Cell (DMA Access Required)
If you have access to the storage cells via cellcli:
# SSH to a cell node
ssh root@cel01
# Check Smart Scan statistics
cellcli -e "list metriccurrent where name like 'CD_IO%' or name like 'CS_IO%'"
# Key metrics to check:
# CD_IO_BY_R_SM — bytes read by Smart Scan
# CD_IO_BY_R_SM_INCR — increment (last interval)
# CS_IO_BY_SS — Smart Scan bytes per second
# Check Storage Index effectiveness
cellcli -e "list metriccurrent where name like 'CS_SINDEX%'"
# CS_SINDEX_SCANS — number of Storage Index scans
# CS_SINDEX_PASSES — regions passed (read)
# CS_SINDEX_SKIPPPED — regions skipped (eliminated!) ← this is the money metric
CS_SINDEX_SKIPPED is your proof of Storage Index working. A high ratio of skipped to passed = Storage Index is highly effective.
Real-World Tuning Scenarios
Scenario 1: Smart Scan Not Firing — Small Table Issue
Symptom: io_cell_offload_eligible_bytes = 0 for a query on a 50 GB table.
Cause: Table was recently queried and blocks are in the buffer cache. Serial queries on tables whose blocks are cached use buffer cache reads (not direct path), which bypasses Smart Scan.
Fix:
-- Force direct path for this session
ALTER SESSION SET "_serial_direct_read" = TRUE;
-- Or use parallel query (always uses direct path)
SELECT /*+ PARALLEL(t, 4) */ COUNT(*) FROM big_table t WHERE ...;
-- Or flush buffer cache (non-prod only!)
ALTER SYSTEM FLUSH BUFFER_CACHE;
Production approach: For ETL/reporting queries that should always use Smart Scan, use parallel hints or set at session level in the reporting connection.
Scenario 2: Low Offload Efficiency — Functions Killing Storage Index
Symptom: offload_efficiency_pct is only 15% on a query that filters heavily.
Problem Query:
SELECT * FROM orders
WHERE TRUNC(order_date) = TRUNC(SYSDATE) - 30
AND UPPER(status) = 'COMPLETED';
Why it fails: TRUNC(order_date) and UPPER(status) wrap columns in functions. Cell Offload Engine cannot use Storage Index for these. It reads all blocks and filters at the cell CPU level (still offloaded but no storage index elimination).
Fix:
-- Rewrite to use range predicate on the column directly
SELECT * FROM orders
WHERE order_date >= TRUNC(SYSDATE) - 30
AND order_date < TRUNC(SYSDATE) - 29
AND status = 'COMPLETED'; -- assuming data is stored uppercase
Now Storage Index can eliminate entire 1 MB regions where order_date range doesn't include the target date.
Scenario 3: Storage Index Not Effective — High Cardinality Data That's Not Clustered
Symptom: CS_SINDEX_SKIPPED is near zero even though you have a range predicate.
Cause: Data is randomly distributed across storage. If salary values are randomly scattered across all blocks, every 1 MB region has a MIN near 30,000 and MAX near 500,000. No region can be skipped.
Fix: Reorganize data to be clustered by the filter column.
-- Option 1: Use DBMS_REDEFINITION to reorganize by the most-filtered column
BEGIN
DBMS_REDEFINITION.START_REDEF_TABLE(
uname => 'HR',
orig_table => 'EMPLOYEES',
int_table => 'EMPLOYEES_INT',
col_mapping => NULL,
options_flag => DBMS_REDEFINITION.CONS_USE_ROWID,
orderby_cols => 'SALARY' -- ← reorganize by filter column
);
END;
/
-- Option 2: Create interim table with ORDER BY and exchange partition
CREATE TABLE employees_sorted AS
SELECT * FROM employees ORDER BY salary;
-- Option 3: Use Exadata Hybrid Columnar Compression (HCC) with QUERY LOW
-- HCC inherently sorts during compression, improving Storage Index effectiveness
ALTER TABLE fact_sales MOVE COLUMN STORE COMPRESS FOR QUERY LOW;
Scenario 4: Smart Scan Disabled by TDE
Symptom: Smart Scan stats show eligible bytes but near-zero offload efficiency. Encryption is enabled.
Cause: Transparent Data Encryption (TDE) at the column or tablespace level prevents the cell from reading the data (cells don't have the encryption key). Blocks must be sent to the DB server for decryption first.
Options:
- Accept the limitation if TDE is a compliance requirement
- Evaluate Oracle Key Vault + Cell-side decryption (available in newer Exadata software)
- Use HCC (which compresses before encryption) to reduce data volume even without Smart Scan
Scenario 5: Parallel Query Degrading Performance
Symptom: Parallel query with 16 DOP is slower than serial query. Smart Scan shows high interconnect bytes.
Cause: Parallel slaves are fighting for interconnect bandwidth. Smart Scan sends results back via the interconnect — if you have 16 slaves all streaming data simultaneously, interconnect becomes the bottleneck.
Fix:
-- Check interconnect utilization
SELECT n.name, s.value / 1024 / 1024 AS value_mb
FROM v$mystat s JOIN v$statname n ON n.statistic# = s.statistic#
WHERE n.name = 'cell physical IO interconnect bytes';
-- Tune DOP — more is not always better
-- For a 4-cell Exadata, DOP = 4 * num_cores_per_cell is a starting point
-- Use Resource Manager to cap DOP per consumer group
-- Or use STATEMENT_QUEUING to serialize heavy queries
ALTER SESSION SET PARALLEL_DEGREE_POLICY = AUTO;
Monitoring Smart Scan in Production — Daily Checks
Quick Health Check Query
-- Run this daily to spot Smart Scan degradation
SELECT
TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24') AS hour,
SUM(st.io_cell_offload_eligible_bytes) / 1073741824 AS eligible_gb,
SUM(st.io_cell_offload_returned_bytes) / 1073741824 AS returned_gb,
ROUND(
(1 - SUM(st.io_cell_offload_returned_bytes) /
NULLIF(SUM(st.io_cell_offload_eligible_bytes), 0)) * 100, 1
) AS offload_pct,
SUM(st.io_interconnect_bytes) / 1073741824 AS interconnect_gb
FROM
dba_hist_sqlstat st,
dba_hist_snapshot s
WHERE st.snap_id = s.snap_id
AND st.dbid = s.dbid
AND s.begin_interval_time >= SYSDATE - 1
AND st.io_cell_offload_eligible_bytes > 0
GROUP BY TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24')
ORDER BY 1 DESC;
A healthy Exadata system should show offload_pct consistently above 85-90% for analytical/reporting workloads.
Alert: When Offload Drops
If offload efficiency drops significantly:
- Check if new queries hit the system without Smart Scan (index lookups on large tables)
- Check if TDE was enabled on a tablespace that previously had Smart Scan
- Check if data distribution changed (new partitions with poorly clustered data)
- Check cell health — a cell in degraded mode won't offload
-- Find queries with LOW offload efficiency (candidates for tuning)
SELECT
sql_id,
SUBSTR(sql_text, 1, 80) AS sql_preview,
ROUND(io_cell_offload_eligible_bytes / 1073741824, 2) AS eligible_gb,
ROUND(
(1 - io_cell_offload_returned_bytes /
NULLIF(io_cell_offload_eligible_bytes, 0)) * 100, 1
) AS offload_pct
FROM v$sql
WHERE io_cell_offload_eligible_bytes > 1073741824 -- > 1 GB eligible
AND (1 - io_cell_offload_returned_bytes /
NULLIF(io_cell_offload_eligible_bytes, 0)) < 0.5 -- < 50% efficient
ORDER BY io_cell_offload_eligible_bytes DESC
FETCH FIRST 10 ROWS ONLY;
Hybrid Columnar Compression (HCC) — Smart Scan's Best Friend
HCC is the Exadata storage compression format that works hand-in-hand with Smart Scan. Compressed data is still readable by the Cell Offload Engine — in fact, HCC actually makes Smart Scan faster because:
- Less data to read from disk (compression ratio of 10x-50x for QUERY LOW/HIGH)
- Storage Index becomes more effective (similar values cluster together in compression units)
- Interconnect traffic drops proportionally
-- Compress a fact table for reporting (read-mostly)
ALTER TABLE fact_sales MOVE COLUMN STORE COMPRESS FOR QUERY LOW;
-- Compress historical/archive data (rarely read)
ALTER TABLE fact_sales_archive MOVE COLUMN STORE COMPRESS FOR ARCHIVE HIGH;
-- Check compression ratio
SELECT
table_name,
compress_for,
blocks,
num_rows,
ROUND(blocks * 8192 / 1024 / 1024, 1) AS actual_mb,
ROUND(num_rows * avg_row_len / 1024 / 1024, 1) AS raw_mb,
ROUND(num_rows * avg_row_len / NULLIF(blocks * 8192, 0), 2) AS compression_ratio
FROM dba_tables
WHERE table_name IN ('FACT_SALES', 'FACT_SALES_ARCHIVE')
ORDER BY table_name;
HCC + Smart Scan is the combination that delivers the 10x-100x performance improvements you see in Exadata benchmarks.
Common Misconceptions
"More indexes = better Exadata performance"
Wrong. More indexes mean more index lookups, which means single-block I/O, which bypasses Smart Scan. On Exadata, carefully consider whether a full scan with Smart Scan is faster than an index scan. Often it is, especially for queries returning > 5% of a table.
"Smart Scan replaces the need for partitioning"
Partially true. Storage Index provides "soft partitioning" by eliminating storage regions. But formal partitioning is still valuable because it eliminates I/O at a higher level (entire partition pruning) before Smart Scan even runs. Use both.
"Storage Index is the same as an index"
No. A storage index only stores MIN/MAX per region. It can't do precise lookups. It's an elimination mechanism, not a navigation mechanism. Think of it as a bloom filter for I/O regions.
"Exadata Smart Scan works for OLTP"
Generally no. OLTP workloads use index lookups (single-block I/O). Smart Scan targets full scans — OLAP, reporting, analytics, data warehouse, batch ETL. Exadata benefits OLTP through other features like Smart Flash Cache and Infiniband, not Smart Scan.
Quick Reference — Smart Scan Troubleshooting Checklist
Is Smart Scan firing at all?
├── Check v$sql.io_cell_offload_eligible_bytes > 0
│ = 0 → Smart Scan NOT firing
│ ↓
├── Is the query doing a full table/index scan?
│ NO → Use FULL hint or remove the index
│ ↓
├── Is direct path read happening?
│ Check v$session_event for 'direct path read'
│ NO → Table may be in buffer cache; try ALTER SESSION SET "_serial_direct_read"=TRUE
│ ↓
└── Is data on Exadata cells?
Check dba_data_files for ASM diskgroup on Exadata cells
Is Smart Scan firing but inefficient?
├── offload_efficiency_pct < 50%?
│ ├── TDE enabled? → Cells can't offload encrypted data
│ ├── Functions on WHERE columns? → Rewrite predicates
│ └── Complex data types? → XMLType, objects prevent offload
│
└── Storage Index not eliminating regions?
├── Check v$mystat for 'cell physical IO bytes saved by storage index'
├── Data randomly distributed? → Reorganize table ordered by filter column
├── High cardinality random data? → Consider HCC (improves clustering)
└── Columns not tracked? → Storage Index tracks up to 8 cols per region
Summary
Smart Scan and Storage Index are the two pillars of Exadata performance for analytical workloads. Here's the condensed DBA checklist:
To maximize Smart Scan:
- Design queries to use full scans on large tables (avoid unnecessary indexes for analytics)
- Use parallel query for consistently large scans
- Avoid TDE on performance-critical reporting tablespaces where possible
- Avoid functions on WHERE clause columns
To maximize Storage Index:
- Write range predicates directly on columns (no function wrapping)
- Cluster data physically by the most frequently filtered columns
- Use HCC compression — it naturally improves Storage Index effectiveness
- Monitor
cell physical IO bytes saved by storage indexregularly
To verify everything is working:
- Query
v$sqlforio_cell_offload_eligible_bytesand efficiency ratios - Use
CellCLIon cells to checkCS_SINDEX_SKIPPEDratio - Run the AWR hourly offload trend query in production monitoring
The DBAs who get the most out of Exadata are the ones who understand why Smart Scan fires — and design their schemas, queries, and data layouts to exploit it. The machine will do the rest.
References
- Oracle Exadata Database Machine Administration Guide (MOS)
- Oracle White Paper: "Exadata Smart Scan and Storage Indexes" (Doc ID 1464468.1)
- MOS Note 1348116.1 — Troubleshooting Smart Scan Issues
- MOS Note 1361766.1 — Storage Index Internals
- CellCLI Reference Guide (Oracle Exadata documentation)
Part of the DBA Playbook Series — Real-world Exadata experience, not vendor marketing.
Have questions from your Exadata environment? Comment below — I answer every one.
