Friday, May 22, 2026

Exadata on Oracle Cloud Infrastructure — what OCI Exadata means for on-premises DBAs

Exadata on Oracle Cloud Infrastructure — what OCI Exadata means for on-premises DBAs Pro The forward-looking post. Covers Exadata Cloud Service (ExaCS) and Exadata Cloud@Customer (ExaCC) — what they are, how they differ from on-premises Exadata, what changes for the DBA (patching, monitoring, administration), what stays the same, and whether your on-premises Exadata skills transfer directly. Gives the Oracle EBS DBA a complete picture of where Exadata is headed and how to position their skills. Exadata on Oracle Cloud Infrastructure — What OCI Exadata Means for On-Premises DBAs | punitoracledba

Exadata on Oracle Cloud Infrastructure — What OCI Exadata Means for On-Premises DBAs

Exadata — Basics to Pro Series 1. What Is Exadata · 2. Hardware Components · 3. Architecture Deep Dive · 4. Smart Scan, Storage Indexes, HCC · 5. Monitoring · 6. Performance Tuning · 7. Administration · 8. Patching · 9. EBS on Exadata · 10. OCI Exadata

If you have been managing an on-premises Exadata system, at some point your organisation will ask about moving to the cloud — or your manager will forward an Oracle sales deck about Exadata Cloud Service. Your first question is a reasonable one: is this the same Exadata I already know, and does my experience transfer?

The answer to both is yes — with important qualifications. Oracle Exadata on OCI is the same engineered system, the same Smart Scan, the same storage cells, the same HCC, the same cellcli. But the operating model is fundamentally different. Some of what you do today on-premises, Oracle does for you in the cloud. Some of what you do on-premises, you still do in the cloud — but through different tools. And some things are genuinely new.

This article explains the three Exadata deployment options, the practical difference between ExaCS and ExaCC for a working DBA, what changes in your day-to-day role, and where your on-premises skills apply directly.

This is the final article in the Exadata Basics to Pro series. If you have read Articles 1 through 9, you already understand the technology that underpins all three Exadata deployment options. This article focuses on the operational differences — not the technology, which is the same across all three.

The Three Exadata Deployment Options

Oracle now offers Exadata in three commercially distinct delivery models. The underlying technology — the same X10M or X11M hardware, the same Exadata System Software, the same Smart Scan — is identical across all three. What differs is who owns the hardware, where it runs, who manages what, and how you pay.

Deployment Where It Runs Who Owns Hardware Payment Model
On-Premises Exadata Your data centre You — capital purchase Capex — buy hardware upfront, annual support
ExaCC — Exadata Cloud@Customer Your data centre Oracle — subscription lease Opex — monthly subscription per OCPU
ExaCS — Exadata Cloud Service Oracle's public cloud (OCI regions) Oracle — fully managed Opex — hourly or monthly per OCPU

The most important distinction for a DBA is not where the hardware runs — it is who manages what. That operating model split determines your day-to-day responsibilities entirely.

ExaCC — Exadata Cloud@Customer

ExaCC is Oracle's Exadata delivered as a cloud service inside your own data centre. Oracle installs the Exadata rack in your facility, but you consume it as a subscription service. Oracle retains ownership of the hardware and manages it remotely — just as they would in their public cloud — while you run your Oracle databases behind your own firewall.

Why organisations choose ExaCC

  • Data sovereignty — the data never leaves your data centre or country, which satisfies regulatory and compliance requirements that prohibit public cloud
  • Low latency connectivity — application servers and the database are in the same data centre, no internet round-trip for every database call
  • Same OCI cloud services — consumed through the same OCI control plane, same APIs, same tooling as public OCI
  • Predictable performance — dedicated hardware, no shared-tenancy performance variability
  • Oracle manages the infrastructure — hardware maintenance, cell patching, firmware, switch management all handled by Oracle

What Oracle manages for you on ExaCC

  • Physical hardware — servers, storage cells, switches, power, cooling — all Oracle responsibility
  • Exadata System Software — cellsrv updates, cell OS patching, EXABP applied on Oracle's schedule
  • Database server infrastructure — DBBP applied by Oracle including OS, firmware, ILOM
  • Network switch firmware — RoCE switch patching handled by Oracle
  • Hardware replacement — predictive failure responses and disk replacement done by Oracle
  • OCI control plane — the cloud management layer connecting your ExaCC to OCI services

What you manage as DBA on ExaCC

  • Oracle Grid Infrastructure and Oracle Database patching — GI and DB CPU/RU patches are still your responsibility, applied via the OCI console or OPatch
  • Database creation, configuration, and lifecycle — create, drop, resize databases via OCI console or API
  • User management, tablespace management, backup orchestration
  • Application access, performance tuning, query optimisation
  • Data Guard configuration and management
  • Recovery operations in response to database failures

The key DBA implication of ExaCC: You no longer run patchmgr for cell patches, respond to disk alerts, SSH into storage cells for cellcli commands, or manage hardware events. All of that moves to Oracle. Your DBA scope narrows to the database layer — which is the work that directly impacts your applications.

ExaCS — Exadata Cloud Service on Public OCI

ExaCS is fully managed Exadata deployed in Oracle's public cloud regions. You provision Exadata VM clusters through the OCI console, and Oracle manages all the infrastructure beneath your databases — you never see the physical hardware, the storage cells, or the network switches.

Why organisations choose ExaCS

  • Zero hardware responsibility — Oracle manages everything below the database VM
  • Elastic scaling — add or remove OCPUs and storage without hardware procurement
  • Global availability — deploy in any of Oracle's OCI regions worldwide
  • Pay as you use — scale down during off-peak periods to reduce costs
  • Access to full OCI service ecosystem — Object Storage, Autonomous Database, Analytics Cloud, AI services
  • Multicloud — Exadata X11M is available via Oracle Database@Azure, Oracle Database@Google Cloud, and Oracle Database@AWS

What Oracle manages for you on ExaCS

  • Everything Oracle manages for ExaCC, plus:
  • The physical data centre — power, cooling, physical security
  • Database server OS patching — Oracle patches the underlying VM host OS
  • Network infrastructure between your VM cluster and the internet
  • Automatic backups to OCI Object Storage if you enable the managed backup service

What you manage as DBA on ExaCS

  • GI and Oracle Database patching — GI and DB CPU patches are still your responsibility, applied via OCI console patch scheduling or manually via OPatch
  • Database provisioning and configuration via OCI console or API
  • VM cluster configuration — number of nodes, OCPU allocation, storage scaling
  • User management, schemas, tablespaces, application access
  • Performance tuning — the same SQL and AWR skills apply
  • Backup strategy — choose between Oracle-managed backups or self-managed RMAN

ExaCS vs ExaCC vs On-Premises — DBA Responsibility Comparison

Responsibility On-Premises ExaCC ExaCS
Physical hardware You Oracle Oracle
Storage cell patching (EXABP) You — patchmgr Oracle Oracle
DB node infrastructure (DBBP) You — patchmgr Oracle Oracle
Network switch patching You — patchmgr Oracle Oracle
Disk replacement / hardware alerts You — cellcli, SR raise Oracle Oracle
Grid Infrastructure patching You — OPatch You — OCI console or OPatch You — OCI console or OPatch
Oracle Database CPU/RU patching You — OPatch + datapatch You — OCI console or OPatch You — OCI console or OPatch
Database creation and lifecycle You You — OCI console or API You — OCI console or API
Performance tuning and SQL You You You
Backup and recovery You — RMAN You — RMAN or OCI managed You — RMAN or OCI managed
Data sovereignty Your data centre Your data centre OCI region — you choose
cellcli access for DBA Yes — full SSH access to cells Limited — Oracle manages cells No — not exposed to DBA

The operating model split in plain language: On-premises Exadata is the most operationally demanding model — you patch the OS, the Exadata storage software, Grid Infrastructure, and the database. ExaCC removes the infrastructure operations burden but preserves your database control. ExaCS removes even more, leaving only database-layer work.

What Changes for the DBA on Cloud Exadata

1. Patching is split — infrastructure vs database

On-premises, you patch everything with patchmgr and OPatch in a specific sequence. On ExaCC and ExaCS, Oracle patches the infrastructure (EXABP, DBBP, switches) on their own schedule — you cannot defer Oracle's infrastructure patch windows beyond a fixed grace period. You remain responsible for GI and Oracle Database patches, which you apply via the OCI console's database patching workflow or manually via OPatch.

This split removes a significant operational burden — but it also removes your control over the infrastructure patch schedule. You cannot delay an Oracle-managed infrastructure patch if it conflicts with your change management calendar.

2. Cell administration moves to Oracle — cellcli is not your tool anymore

On-premises, cellcli and dcli are essential daily tools. On ExaCC, Oracle manages the storage cells — you do not SSH into cells, run cellcli commands, respond to disk alerts, or manage griddisks. On ExaCS, the cells are not even visible to you — they are abstracted behind the VM cluster layer.

The cell metrics and health information that you previously got from cellcli are surfaced through the OCI console and OCI Monitoring service instead. You read the same information, but through a web interface and cloud APIs rather than a command line on the cell.

3. Database provisioning via OCI console or API

Creating a new database on-premises involves srvctl, DBCA, and ASM configuration. On OCI, you provision databases through the OCI console, OCI CLI, or Terraform — Oracle's infrastructure-as-code tool. The underlying database is identical, but the provisioning workflow is cloud-native.

OCI CLI — common database operations on ExaCS or ExaCC
# List VM clusters in your compartment
oci db vm-cluster list \
  --compartment-id <compartment_ocid>

# List databases in a VM cluster
oci db database list \
  --compartment-id <compartment_ocid> \
  --vm-cluster-id <vm_cluster_ocid>

# Create a new database in an existing VM cluster
oci db database create \
  --vm-cluster-id <vm_cluster_ocid> \
  --db-name MYDB \
  --db-unique-name MYDB_UNIQUE \
  --db-version 19.0.0.0 \
  --admin-password <password> \
  --db-workload OLTP

# Get patching status for a VM cluster
oci db vm-cluster get \
  --vm-cluster-id <vm_cluster_ocid> \
  --query 'data.{patchingStatus:"lifecycle-state"}'

4. Backup to OCI Object Storage — not just local RMAN

On OCI, the natural backup destination is OCI Object Storage — Oracle's durable, scalable cloud object store. You can configure Oracle-managed automated backups (Oracle schedules and manages RMAN backups to Object Storage) or self-managed RMAN backups using the RMAN SBT (System Backup to Tape) channel pointing to Object Storage. Your RMAN knowledge applies — the syntax and concepts are the same, only the backup destination changes.

RMAN backup to OCI Object Storage
-- RMAN backup to OCI Object Storage using SBT channel
-- The OCI RMAN library handles the Object Storage connection
RMAN> RUN {
  ALLOCATE CHANNEL ch1 DEVICE TYPE SBT
    PARMS='SBT_LIBRARY=/opt/oracle/oak/lib/libopc.so,
           ENV=(OPC_PFILE=/home/oracle/opc.ora)';
  BACKUP DATABASE PLUS ARCHIVELOG;
  RELEASE CHANNEL ch1;
}

-- Alternatively use the OCI-integrated backup service
-- which handles scheduling automatically

5. Scaling is elastic — not a hardware procurement

On-premises, scaling means hardware procurement, rack space, power planning, and weeks of lead time. On ExaCS, you can add OCPUs to a VM cluster in minutes through the OCI console or API. Storage expansion is similarly online. On ExaCC, scaling within your provisioned rack is straightforward — adding capacity beyond the rack requires a hardware change request to Oracle.

What Stays the Same on Cloud Exadata

This is the section that matters most for positioning your skills. The database technology is identical across all three deployment options. Everything you know about Oracle Database on Exadata transfers directly.

Skill or Task Transfers Directly? Notes
Smart Scan, Storage Indexes, HCC Yes — identical Same technology, same V$SYSSTAT statistics, same execution plan keywords
SQL tuning and AWR analysis Yes — identical AWR reports include the same Exadata-specific sections
RMAN backup and recovery Yes — same commands Destination changes to Object Storage but syntax is standard RMAN
Oracle RAC administration Yes — same srvctl and crsctl RAC runs on the VM cluster the same way as on-premises
Data Guard configuration Yes — same concepts OCI provides a Data Guard console wizard but manual configuration also works
Performance monitoring with V$ views Yes — identical Same V$CELL_STATE, V$SYSSTAT cell statistics, same Smart Scan metrics
OPatch and datapatch for DB software Yes — same commands GI and DB patching uses the same OPatch process as on-premises
ASM disk group management Yes — same commands ASM manages disk groups the same way — V$ASM_DISKGROUP, V$ASM_DISK
ADOP for EBS R12.2 (if applicable) Yes — identical EBS on ExaCS or ExaCC uses the same ADOP patching cycle
cellcli commands on ExaCC and ExaCS Partial / No ExaCC limits DBA cellcli access. ExaCS does not expose cells to DBA at all.
patchmgr for infrastructure No on ExaCC/ExaCS Infrastructure patching is Oracle-managed. Your patchmgr skill applies on-premises only.

New Skills the Cloud Adds to Your Toolkit

Moving to OCI Exadata does not replace your existing skills — it adds new ones on top. The DBA who understands both the on-premises Exadata internals and the OCI operational model is genuinely hard to find and well compensated.

OCI Console and CLI

The OCI console is where you provision databases, manage VM clusters, schedule patches, configure backups, and monitor cloud-level infrastructure. Becoming fluent in the OCI console and the OCI CLI is the most immediately practical new skill for an on-premises DBA moving to OCI.

Terraform for Oracle databases

Oracle provides a Terraform provider for OCI. Many organisations manage their OCI Exadata infrastructure as code — VM clusters, databases, Data Guard associations, and network configuration are all defined in Terraform files and applied through CI/CD pipelines. Basic Terraform literacy is increasingly expected for cloud DBA roles.

OCI Monitoring and Notifications

The cell alerts and cellcli health checks you ran on-premises are replaced by OCI Monitoring alarms and Notifications. You define metric-based alarms in the OCI console — for example, alert when database CPU exceeds 90% for more than 10 minutes — and route notifications to email, PagerDuty, or Slack. The same information is available, accessed through a different paradigm.

OCI Object Storage for backups and data movement

Object Storage is the natural landing zone for backups, data exports, and data imports on OCI. Understanding how RMAN integrates with Object Storage, how Data Pump uses pre-authenticated requests (PARs), and how to manage Object Storage buckets are practical skills that come up in every OCI database engagement.

OCI CLI — common monitoring tasks for cloud DBAs
# Check Exadata infrastructure maintenance history
oci db maintenance-run list \
  --compartment-id <compartment_ocid> \
  --target-resource-id <vm_cluster_ocid>

# List available database patches for a specific DB version
oci db patch list --all \
  --compartment-id <compartment_ocid> \
  --db-version 19.0.0.0 \
  --db-home-id <db_home_ocid>

# Check backup status for a database
oci db backup list \
  --compartment-id <compartment_ocid> \
  --database-id <database_ocid>

# Create an alarm for cell Smart Scan efficiency
# (Done via OCI console Monitoring service or API)
# Metric: oracle_oci_database_exadata_smart_scan_efficiency_percent
# Alarm condition: value < 70
# Notification: email to DBA team

How to Think About Which Option Is Right for Your Organisation

If Your Requirement Is Consider
Data must stay in your own data centre — regulatory requirement ExaCC or On-Premises Exadata
Want cloud economics but data sovereignty ExaCC — subscription in your data centre
Maximum operational simplicity — let Oracle manage infrastructure ExaCS on public OCI
Need to scale quickly without hardware lead time ExaCS — elastic OCPU and storage scaling
Application servers must be co-located with database ExaCC or On-Premises Exadata
Already have Oracle Database licences Any option — evaluate BYOL pricing for ExaCC and ExaCS carefully
Want access to Autonomous Database on the same platform ExaCS — Autonomous Database runs on ExaCS infrastructure
Multicloud requirement — run Oracle databases closer to AWS or Azure workloads Oracle Database@Azure, Oracle Database@Google Cloud, or Oracle Database@AWS — all use Exadata X11M

What This Means for Your Career as an Exadata DBA

The shift to cloud Exadata does not make on-premises Exadata skills obsolete — it makes them more valuable when combined with cloud skills. Here is the reality of the market in 2026.

  • On-premises Exadata is not going away. Most large enterprises with Exadata investments will run on-premises systems for years. Organisations with regulatory constraints will run ExaCC in their own data centres indefinitely. The on-premises DBA who understands cellcli, patchmgr, and the storage object model is still in demand.
  • Cloud Exadata expertise is additive, not replacement. The DBA who understands both patchmgr rolling upgrades and OCI Terraform provisioning is rare. That combination commands a premium over a DBA who only knows one side.
  • Database skills transfer completely. Everything you know about SQL tuning, AWR, Smart Scan, HCC, RAC, Data Guard, RMAN, and ADOP applies directly to cloud Exadata. None of it becomes obsolete. The cloud adds an operational layer on top — it does not replace the database layer underneath.
  • The next step is OCI fundamentals. If you want to position yourself for cloud Exadata roles, learn the OCI console, OCI CLI, and basic Terraform. These are the gaps between your current skill set and a cloud DBA role. The Oracle Cloud Database Services certification (1Z0-1093) is a practical starting point.

Practical advice: Set up a free Oracle Cloud account at cloud.oracle.com — Oracle provides Always Free tier resources. Provision an Autonomous Database or a small VM cluster in the Always Free tier and explore the OCI console, OCI CLI, and monitoring. You will immediately recognise the concepts from your on-premises experience — Smart Scan metrics, AWR reports, RMAN backups — through a new interface.

Summary — Three Deployment Options, One Technology

  • On-premises Exadata — you own and operate everything. Maximum control, maximum operational responsibility. patchmgr, cellcli, and dcli are daily tools.
  • ExaCC (Cloud@Customer) — Oracle manages the infrastructure in your data centre. You manage databases. Data stays on-premises. Oracle handles EXABP, DBBP, hardware alerts, and disk replacement.
  • ExaCS (Cloud Service) — fully managed in Oracle's public cloud. You manage databases via OCI console and CLI. Oracle manages everything below the database VM. Elastic scaling, multicloud availability.
  • All three use the same Exadata hardware and software — Smart Scan, Storage Indexes, HCC, cellsrv, ASM, RAC. The technology does not change between deployment options.
  • On cloud Exadata, your database skills transfer completely. SQL tuning, AWR, RMAN, RAC, Data Guard, OPatch, ADOP — all identical. Infrastructure skills (patchmgr, cellcli) are replaced by OCI operational tools.
  • The new skills to add are OCI console, OCI CLI, Terraform, OCI Monitoring, and Object Storage for backups. None of these replace what you know — they extend it.

Exadata patching — how to patch database nodes, storage cells, and InfiniBand switches

Exadata Patching — How to Patch Database Nodes, Storage Cells, and Switches | punitoracledba

Exadata Patching — How to Patch Database Nodes, Storage Cells, and Switches

Exadata — Basics to Pro Series 1. What Is Exadata · 2. Hardware Components · 3. Architecture Deep Dive · 4. Smart Scan, Storage Indexes, HCC · 5. Monitoring · 6. Performance Tuning · 7. Administration · 8. Patching · 9. EBS on Exadata · 10. OCI Exadata

Patching an Exadata system is significantly more complex than patching a standard Oracle database server. An Exadata environment has multiple distinct components — database servers, storage cells, and network switches — each running their own software stack, each patched with different tools, and each requiring a specific sequence relative to the others. Patching one component out of order can leave the system in an inconsistent state.

This article covers the complete Exadata patching process from end to end — what the bundle patches contain, the correct patching order, how rolling and non-rolling modes work, the patchmgr commands for each component, how OPatch fits into the process for DB software, and how to verify patch levels across all components when you are done.

Always read the MOS patch readme before starting. Exadata bundle patches frequently include specific pre-patch steps, known issues, and order-of-operations requirements that are specific to that release. The MOS readme supersedes general guidance including the guidance in this article. Reference MOS Doc ID 888828.1 — Exadata Database Machine and Exadata Storage Server Supported Versions — for the current patch list.

What Gets Patched in an Exadata System

An Exadata system has four distinct software layers, each patched separately with different tools. Understanding this is the foundation of everything else in this article.

Component Software Patched Tool Used Downtime Impact
Database Servers — Infrastructure Oracle Linux OS, firmware, ILOM, drivers, Exadata System Software on DB nodes patchmgr (DBBP) Rolling — one node at a time, RAC stays up
Database Servers — DB Software Oracle Grid Infrastructure, Oracle Database (CPU, PSU, RUs) OPatch + datapatch Rolling — one node at a time via GI rolling patches
Storage Cells Oracle Linux OS, Exadata System Software, firmware, cellsrv, flash drivers patchmgr (EXABP) Rolling — one cell at a time, ASM handles I/O
RoCE / Network Switches Switch firmware and software patchmgr Rolling — one switch at a time

The Two Exadata Bundle Patches — DBBP and EXABP

Oracle delivers Exadata infrastructure patches as two bundle patches. Understanding the difference between them is essential before you download anything.

DBBP — Database Server Bundle Patch

The Database Server Bundle Patch (DBBP) is the infrastructure patch for the database server nodes. It updates everything on the DB nodes except the Oracle Database and Grid Infrastructure software itself.

DBBP includes:

  • Oracle Linux OS updates — kernel, system libraries, security patches
  • System firmware — BIOS, BMC/ILOM, HBA, NIC firmware
  • Storage and network driver updates
  • Exadata-specific utilities — imageinfo, patchmgr itself, support tools
  • Hardware compatibility updates for new component generations

DBBP does not include Oracle Grid Infrastructure or Oracle Database software patches. Those come from the standard Oracle Database CPU/RU patches applied via OPatch — the same process used on any Oracle database server.

EXABP — Exadata Storage Server Bundle Patch

The Exadata Storage Server Bundle Patch (EXABP) is the patch for storage cells. It updates everything that runs on the storage servers.

EXABP includes:

  • Oracle Linux OS updates on the storage cell
  • Exadata System Software — cellsrv, MS, RS daemon updates
  • Storage cell firmware — disk controller, flash device firmware, ILOM
  • Flash driver and NVMe driver updates
  • Cell offload server (CELLOFLSRVn) updates for newer Oracle DB versions
  • Security patches for the cell OS and software stack

EXABP and DBBP are released on the same quarterly schedule as Oracle Database CPUs — January, April, July, and October. They are aligned with the quarterly CPU cycle so that all Exadata components can be brought to a consistent patch level in a single quarterly patching activity.

The Correct Patching Order

This is the most critical section of this article. Applying Exadata patches in the wrong order can break component compatibility, cause cellsrv to fail, or leave the system partially updated. Oracle mandates a specific sequence. Never deviate from it without explicit Oracle Support instruction.

Step Component Tool Why This Order
1 RoCE / Network Switches patchmgr Network infrastructure must be updated first to support newer protocols used by DB nodes and cells
2 Storage Cells (EXABP) patchmgr -cells Storage software must be updated before DB node infrastructure. DB node connects to cells — newer cell software is backward compatible with older DB software but not always forward compatible
3 Database Server Infrastructure (DBBP) patchmgr -dbnodes DB node OS and firmware updated after cells are at new version. Node reboots happen here.
4 Oracle Grid Infrastructure (GI) OPatch (GI rolling) GI patch applied after OS is at new level. Rolling — one node at a time via GI rolling patch mechanism
5 Oracle Database CPU/RU OPatch + datapatch DB patch applied last. datapatch applies SQL changes after all nodes are patched.

Always apply the network switch patch first, then cells, then DB nodes, then GI, then DB software. If the MOS readme for your specific patch bundle states a different order, follow the readme. Some patches modify this sequence for specific dependency reasons.

Rolling vs Non-Rolling Patching Mode

patchmgr supports two patching modes — rolling and non-rolling. The choice between them determines whether databases stay online during patching and how long the total patching window takes.

Rolling mode — zero downtime for RAC databases

In rolling mode, patchmgr patches one component at a time while the others continue serving workloads. For storage cells, it patches one cell at a time — the other cells continue serving I/O to all databases. ASM handles the temporary unavailability of the cell being patched through its normal redundancy mechanisms without data movement. For database nodes, rolling means patching one RAC node at a time while the other nodes continue running the database.

  • Databases remain online throughout patching
  • Takes longer overall — each cell or node is patched sequentially
  • Requires ASM redundancy — data must be mirrored so I/O can continue while one cell is down
  • Recommended for production environments where availability is critical

Non-rolling mode — faster but requires downtime

In non-rolling mode, patchmgr patches all components in parallel simultaneously. For storage cells, all cells are patched at the same time — databases cannot access any storage during this period. For database nodes, all nodes are patched and rebooted simultaneously.

  • Databases must be shut down before starting
  • Total patching time is significantly shorter — all cells patch in parallel
  • Appropriate for development, test environments, or planned maintenance windows
  • Non-rolling is the default mode for storage cells when no -rolling flag is specified
Factor Rolling Mode Non-Rolling Mode
Database availability Online throughout — no outage Must be shut down — full outage
Total patch time Longer — sequential component by component Shorter — all cells patch simultaneously
ASM requirement Requires normal or high redundancy disk groups No redundancy requirement during patching
Default for cells No — specify -rolling flag explicitly Yes — default when no flag specified
Recommended for Production — maximum availability Development, test, short maintenance windows
Per-cell patch time Approximately 1 to 2 hours per cell Same per cell but all cells run in parallel

Manual rolling update — avoid when possible. There is a third option called manual rolling update where each cell is taken completely offline (griddisks dropped from ASM), patched, then added back — forcing two full ASM rebalances per cell. This method requires enough free space in the disk group to hold all data from the most full cell. It is extremely time-consuming and should be used only when patchmgr's built-in rolling mode is not available for a specific patch.

Pre-Patch Checks — Exadata Specific

Exadata has its own set of pre-patch checks that go beyond the standard Oracle database pre-patch checklist. Run all of these before starting any patchmgr operation.

1. Run ExaCheck (formerly exachk)

ExaCheck is Oracle's health check tool specifically for Exadata. It validates hundreds of configuration items across DB nodes, storage cells, and network infrastructure. Run it and resolve all critical and warning findings before patching.

Run ExaCheck before patching
# ExaCheck is usually located here on the DB node
/opt/oracle.SupportTools/exachk/exachk

# Or via the support tools bundle
/opt/oracle.SupportTools/onecommand/exachk

# Run against all components
./exachk -a

# Review the HTML report generated in the output directory
# Fix all FAIL and WARNING items before starting patching

2. Verify current version on all components

Check current versions across all components # Check DB node Exadata software version (run on each DB node) imageinfo # Check cell software version on all cells via dcli dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "imageinfo -ver" # Or using cellcli dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST CELL ATTRIBUTES name, releaseVersion" # Check OPatch version (for GI and DB patching) $ORACLE_HOME/OPatch/opatch version $GI_HOME/OPatch/opatch version # Check current GI version $GI_HOME/bin/crsctl query crs activeversion # Check current DB home version $ORACLE_HOME/bin/sqlplus -version

3. Verify ASM disk group health and redundancy

Check ASM disk groups before rolling cell patching -- Connect to ASM instance sqlplus / as sysasm -- Check disk group status and redundancy SELECT name, type, state, total_mb, free_mb, ROUND(free_mb / total_mb * 100, 1) AS pct_free FROM v$asm_diskgroup ORDER BY name; -- Check no disks are already in error state SELECT group_number, disk_number, name, path, state, mode_status FROM v$asm_disk WHERE state != 'NORMAL' OR mode_status != 'ONLINE' ORDER BY group_number, disk_number; -- Check no rebalance in progress SELECT * FROM v$asm_operation WHERE state = 'RUN';

4. Check for active database sessions and pending operations

Verify database readiness before patching -- Check cluster nodes and their status $GI_HOME/bin/crsctl status resource -t -- Check RAC database instances are all running $GI_HOME/bin/srvctl status database -db <db_unique_name> -- Check for any long-running transactions SELECT s.sid, s.username, s.status, ROUND(t.used_ublk * 8192 / 1048576, 1) AS undo_mb, s.last_call_et AS seconds_active FROM v$transaction t JOIN v$session s ON t.ses_addr = s.saddr ORDER BY t.used_ublk DESC FETCH FIRST 10 ROWS ONLY;

5. Verify SSH key-based authentication from DB node to all cells

patchmgr requires password-less SSH access from the primary database node to all storage cells. If SSH authentication fails for any cell, patchmgr cannot patch that cell.

Verify SSH connectivity to all cells # Test SSH to each cell without password prompt dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "hostname" # Expected output: each cell hostname printed # If SSH fails or prompts for password: fix key-based auth before patching # Also verify from the primary node as root dcli -l root -g /opt/oracle.SupportTools/onecommand/cell_group \ "hostname"

Step 1 — Patching Network Switches

Network switch patching is the first step. On X10M and X11M systems using RoCE switches, patchmgr handles the switch patching. The process patches one switch at a time — during the patching of one switch, all traffic routes through the other, so database I/O continues without interruption.

Network switch patching with patchmgr # Unzip the bundle patch — use the patch directory cd /u01/patches unzip p<patch_number>_<version>_Linux-x86-64.zip cd patch_<version> # Pre-check for network switches ./patchmgr -switches switch_group \ -switch_precheck # Apply the switch patch (rolling — patches one switch at a time) ./patchmgr -switches switch_group \ -upgrade # Verify switch patch completion ./patchmgr -switches switch_group \ -verify

Step 2 — Patching Storage Cells with patchmgr (EXABP)

Storage cell patching uses patchmgr with the -cells flag. patchmgr is run from the primary database node — it connects to all cells via SSH and orchestrates the entire update process including pre-checks, OS updates, firmware updates, cellsrv updates, and post-patch verification.

Prepare patchmgr for cell patching

Stage and prepare patchmgr for cell patching # Unzip the EXABP patch bundle on the primary DB node cd /u01/patches unzip p<exabp_patch_number>_<version>_Linux-x86-64.zip cd patch_<version> # Confirm patchmgr is present and executable ls -la patchmgr ./patchmgr --version # Check the cell group file cat /opt/oracle.SupportTools/onecommand/cell_group # Clean up any previous patchmgr state before starting ./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group \ -cleanup

Pre-check — validate cells are ready for patching

Run patchmgr pre-check before patching cells # Run pre-check to validate all cells meet prerequisites ./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group \ -precheck # Expected SUCCESS output per cell: # cell01: SUCCESS: DONE: Check prerequisites on all cells. # cell02: SUCCESS: DONE: Check prerequisites on all cells. # If any FAIL messages appear, resolve before proceeding # Common pre-check failures: # - SSH authentication not set up # - Insufficient disk space on cell OS partition # - cellsrv not running on a cell # - ASM disk group in error state

Apply the cell patch — rolling mode

Apply EXABP to storage cells in rolling mode # Rolling mode: patches one cell at a time # Databases remain online -- recommended for production nohup ./patchmgr \ -cells /opt/oracle.SupportTools/onecommand/cell_group \ -rolling \ -upgrade > /u01/patches/patchmgr_cells_$(date +%Y%m%d).log 2>&1 & # Monitor progress tail -f /u01/patches/patchmgr_cells_$(date +%Y%m%d).log # Check patchmgr status (from another terminal) ./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group \ -query
Apply EXABP to storage cells in non-rolling mode # Non-rolling mode: patches all cells simultaneously # Databases MUST be shut down before running this # Default mode when -rolling is not specified # Step 1: Shut down all databases and GI on all DB nodes $GI_HOME/bin/srvctl stop database -db <db_unique_name> # Step 2: Apply to all cells simultaneously nohup ./patchmgr \ -cells /opt/oracle.SupportTools/onecommand/cell_group \ -upgrade > /u01/patches/patchmgr_cells_nonrolling.log 2>&1 & # Step 3: Monitor until complete tail -f /u01/patches/patchmgr_cells_nonrolling.log # Step 4: Start databases after all cells are patched $GI_HOME/bin/srvctl start database -db <db_unique_name>

What patchmgr does to each cell during patching

  1. Copies the patch bundle to the cell via SSH/SCP
  2. Runs the cell pre-check to confirm readiness
  3. For rolling mode: waits until ASM confirms redundancy is maintained
  4. Stops cellsrv on the cell being patched
  5. Applies Oracle Linux OS updates, firmware updates, and Exadata System Software
  6. Reboots the storage cell — typically 15 to 30 minutes per cell
  7. Validates the cell post-reboot and confirms cellsrv is running at the new version
  8. For rolling mode: confirms ASM I/O is restored before moving to the next cell

Step 3 — Patching Database Server Infrastructure (DBBP)

After storage cells are patched, the database server infrastructure is updated using patchmgr with the -dbnodes flag. This updates Oracle Linux, firmware, and Exadata tools on the DB nodes. It requires a node reboot — RAC ensures the database stays available during rolling node reboots.

Apply DBBP to database nodes in rolling mode # Unzip the DBBP patch bundle cd /u01/patches unzip p<dbbp_patch_number>_<version>_Linux-x86-64.zip cd patch_<version> # Run pre-check for database nodes ./patchmgr \ -dbnodes /opt/oracle.SupportTools/onecommand/dbs_group \ -precheck \ -iso_repo /u01/patches/p<patch_number>_Linux-x86-64.zip \ -target_version <target_version_string> # Apply DBBP in rolling mode to DB nodes # nohup is important -- SSH session can disconnect during node reboot nohup ./patchmgr \ -dbnodes /opt/oracle.SupportTools/onecommand/dbs_group \ -upgrade \ -iso_repo /u01/patches/p<patch_number>_Linux-x86-64.zip \ -target_version <target_version_string> \ -rolling \ > /u01/patches/patchmgr_dbnodes.log 2>&1 & # Monitor from a separate session tail -f /u01/patches/patchmgr_dbnodes.log # After each node reboots, verify it rejoined the cluster $GI_HOME/bin/crsctl status resource -t

Always run patchmgr with nohup and redirect output to a log file. If your SSH session disconnects during node patching — which commonly happens when patchmgr reboots the node you are connected to — the patchmgr process continues running in the background. You can reconnect to another node and check the log file to confirm progress.

Step 4 — Patching Oracle Grid Infrastructure and Database (OPatch)

After the infrastructure is at the new level, Oracle Grid Infrastructure and Oracle Database software is patched using the standard OPatch process — the same process used on any Oracle database platform. The key difference on Exadata is that GI rolling patches are applied one node at a time using the GI rolling patch mechanism.

Apply GI and DB CPU patch using OPatch # Step 1: Apply GI patch using opatchauto (preferred for Exadata) # Run from the GI home as root export ORACLE_HOME=/u01/app/19.0.0/grid # Run opatchauto for GI patch -- handles rolling on its own $ORACLE_HOME/OPatch/opatchauto apply \ /u01/patches/<gi_patch_directory> \ -oh $ORACLE_HOME # Step 2: Apply DB CPU patch to each DB home export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1 # Apply DB patch -- databases can remain up for rolling GI patches $ORACLE_HOME/OPatch/opatch apply \ /u01/patches/<db_cpu_patch_number> # Step 3: Run datapatch after OPatch completes on all nodes $ORACLE_HOME/OPatch/datapatch -verbose # Step 4: Verify GI and DB patch levels $ORACLE_HOME/OPatch/opatch lsinventory $GI_HOME/OPatch/opatch lsinventory

Step 5 — Verifying Patch Levels Across All Components

After patching is complete, verify every component is at the expected version. Do not close the change ticket until all components are confirmed.

Verify all component patch levels # 1. Check DB node Exadata infrastructure version (run on each DB node) imageinfo # 2. Check all storage cell versions via dcli dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "imageinfo -ver" # Or via cellcli dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST CELL ATTRIBUTES name, releaseVersion" # 3. Verify GI patch level $GI_HOME/OPatch/opatch lsinventory | grep -i "patch" # 4. Verify DB home patch level $ORACLE_HOME/OPatch/opatch lsinventory | grep -i "patch" # 5. Verify datapatch completed successfully sqlplus / as sysdba <<EOF SELECT action_time, action, status, description FROM dba_registry_sqlpatch ORDER BY action_time DESC FETCH FIRST 10 ROWS ONLY; EOF # 6. Check all cluster resources are running $GI_HOME/bin/crsctl status resource -t # 7. Verify all cells are running new software dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST CELL ATTRIBUTES name, status, releaseVersion" # 8. Verify all griddisks are active after cell patching dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST GRIDDISK WHERE status != 'active'"
Verify ASM is healthy after patching -- Connect to ASM and verify disk group health sqlplus / as sysasm <<EOF SELECT name, state, type, ROUND(total_mb / 1024, 1) AS total_gb, ROUND(free_mb / 1024, 1) AS free_gb FROM v$asm_diskgroup; -- Confirm no disks are in error state SELECT path, state, mode_status, total_mb FROM v$asm_disk WHERE state != 'NORMAL' OR mode_status != 'ONLINE'; -- Confirm no rebalance in progress SELECT * FROM v$asm_operation; EOF

Rolling Back a Cell Patch

If a storage cell patch causes issues, patchmgr supports rollback to the previous image. Rollback is only available when the cell was patched successfully — a cell that failed mid-patch cannot be rolled back and requires Oracle Support assistance.

Roll back storage cell patch # Step 1: Run rollback pre-check ./patchmgr \ -cells /opt/oracle.SupportTools/onecommand/cell_group \ -rollback_check_prereq # Step 2: Roll back in rolling mode (one cell at a time) ./patchmgr \ -cells /opt/oracle.SupportTools/onecommand/cell_group \ -rollback \ -rolling # Step 3: Clean up patchmgr state after rollback ./patchmgr \ -cells /opt/oracle.SupportTools/onecommand/cell_group \ -cleanup # Step 4: Verify cells are back at previous version dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "imageinfo -ver"

Firmware updates applied as part of the cell patch are not rolled back when rolling back to a previous software image. After rollback, if you need to revert firmware as well, Oracle Support will guide the firmware downgrade procedure. This is another reason to always test patches in a non-production environment before applying to production.

Exadata Patching Checklist

# Task Tool Done
1Read the MOS readme for DBBP and EXABP — note any special stepsMOS[ ]
2Download DBBP, EXABP, GI patch, DB CPU from MOSMOS[ ]
3Run ExaCheck and resolve all FAIL and WARNING findingsexachk[ ]
4Verify current version on all DB nodes — imageinfoimageinfo[ ]
5Verify current version on all cells — imageinfo -ver via dclidcli[ ]
6Verify ASM disk groups healthy — no errors, no rebalance in progressSQL[ ]
7Verify SSH key-based auth from primary DB node to all cellsdcli[ ]
8Take RMAN full backup of all databasesRMAN[ ]
9Patch network switches — patchmgr -switchespatchmgr[ ]
10Run patchmgr pre-check for cells — resolve any FAIL findingspatchmgr -precheck[ ]
11Patch storage cells — patchmgr -cells -rolling (or non-rolling)patchmgr[ ]
12Verify all cells at new version and all griddisks activedcli / cellcli[ ]
13Patch DB node infrastructure — patchmgr -dbnodes -rollingpatchmgr[ ]
14Verify all DB nodes rejoined cluster and databases runningcrsctl[ ]
15Apply GI patch — opatchautoOPatch[ ]
16Apply DB CPU patch — opatch apply on each DB homeOPatch[ ]
17Run datapatch -verbose and verify completiondatapatch[ ]
18Verify all component versions — imageinfo, opatch lsinventory, cellcliMultiple[ ]
19Run Smart Scan health check — confirm offload statistics healthyV$SYSSTAT[ ]
20Run ExaCheck again post-patch and confirm no new failuresexachk[ ]

Summary

  • DBBP patches the database server infrastructure — OS, firmware, Exadata tools. Applied with patchmgr -dbnodes.
  • EXABP patches storage cells — Oracle Linux, Exadata System Software, cellsrv, firmware. Applied with patchmgr -cells.
  • Correct patching order is always: switches → storage cells → DB node infrastructure → Grid Infrastructure → Oracle Database.
  • Rolling mode patches one component at a time while RAC databases stay online. Non-rolling patches all in parallel but requires database downtime. Non-rolling is the default for cells — specify -rolling explicitly for production.
  • patchmgr is run from the primary database node — it orchestrates SSH connections to all cells and handles the entire update process including reboot and post-patch verification.
  • OPatch and datapatch handle GI and Oracle Database software — the same process as any Oracle database, applied after the infrastructure layer is at the new version.
  • Always run ExaCheck before and after patching. Always take a full RMAN backup before starting. Always read the MOS readme for the specific patch bundle — it supersedes general guidance.

Exadata administration essentials — cellcli, dcli, and managing storage cells day to day

Exadata administration essentials — cellcli, dcli, and managing storage cells day to day Admin The day-to-day administration guide. Covers the cellcli command line interface — how to connect, list objects, check disk status, manage griddisks and celldisks. dcli for running commands across all storage cells simultaneously. How to check cell alerts, manage the Exadata alert log, handle predictive failure warnings, and perform routine cell health checks. 

  • Storage object model table — PHYSICALDISK → CELLDISK → GRIDDISK — the three-level chain and what breaks when a disk fails
  • cellcli connection guide — celladmin vs cellmonitor, interactive vs single command, syntax rules
  • Full cellcli command vocabulary — all major object types and actions available in the tool K21 Academy
  • PHYSICALDISK commands — list all, list by type, filter by status, full detail
  • Physical disk status values explained — normal, warning-predictive failure, failed, not present — with real-world output example showing slot numbers and flash device names 4pillarsinfosys
  • CELLDISK commands — list, detail, status filter, free space check, I/O metrics
  • GRIDDISK commands — list, detail, ASM mapping, status filter, disk group filter
  • Taking griddisks INACTIVE/ACTIVE for planned maintenance — with the correct ASM rebalance wait step and redundancy rules
  • Alert severity levels table — critical, warning, informational, clear — action required for each
  • ALERTHISTORY commands — list, filter by severity, filter unacknowledged, acknowledge, drop Blogger
  • Full predictive failure response workflow — identify disk, record serial number, take griddisks inactive, wait for ASM rebalance, create celldisk and griddisks after replacement Techgoeasy
  • Exadata alert log location and grep commands for deep troubleshooting
  • dcli reference — syntax, username flag, subset group files, 10 essential daily dcli commands
  • Cell service management — service celld start/stop/status, ALTER CELL RESTART, shutdown sequence
  • Flash cache administration — list, flush, drop and recreate
  • Ready-to-run bash health check script — 8 checks via dcli, runs in under 5 minutes
  • Complete cellcli command reference table — 22 commands with access level required
  • Exadata Administration Essentials — cellcli, dcli, and Managing Storage Cells Day to Day | punitoracledba

    Exadata Administration Essentials — cellcli, dcli, and Managing Storage Cells Day to Day

    Exadata — Basics to Pro Series 1. What Is Exadata · 2. Hardware Components · 3. Architecture Deep Dive · 4. Smart Scan, Storage Indexes, HCC · 5. Monitoring · 6. Performance Tuning · 7. Administration · 8. Patching · 9. EBS on Exadata · 10. OCI Exadata

    Managing an Exadata system as a DBA involves two distinct workspaces — the database tier you already know, and the storage cell tier that is unique to Exadata. The storage cell tier has its own command-line interface, its own objects, its own alert system, and its own administration tasks. A DBA who only manages the database layer is managing half of Exadata.

    This article covers everything you need for day-to-day storage cell administration. It explains the storage object model — the relationship between physical disks, celldisks, and griddisks — walks through the essential cellcli and dcli commands, shows you how to work with cell alerts and predictive failure warnings, and provides a complete routine health check reference you can run every day.

    Two OS users for cellcli: celladmin can run all cellcli commands including those that modify configuration. cellmonitor can run read-only LIST commands only. For routine monitoring, always use cellmonitor. Reserve celladmin for configuration changes. Commands in this article that modify the cell are clearly marked.

    The Exadata Storage Object Model

    Before running any cellcli commands, you need to understand the three-layer object hierarchy that Exadata uses to represent storage. Every disk in an Exadata cell exists at three levels simultaneously, and each level has a different name, a different purpose, and different administration commands.

    Object What It Represents Who Manages It Typical Naming
    PHYSICALDISK The actual physical hard disk or flash drive inside the storage cell. Identified by its slot position (e.g. 35:3) or flash device name (e.g. FLASH_4_0). Exadata System Software — auto-detected 35:3 (slot:disk) or FLASH_4_0
    CELLDISK The logical representation of a physical disk within cellsrv. One celldisk per physical disk. This is the layer where Exadata System Software manages the disk. Exadata System Software — created automatically CD_00_cell01
    GRIDDISK Logical partitions carved from a celldisk and presented to Oracle ASM. One celldisk can have multiple griddisks (e.g. one for DATA, one for RECO). ASM sees griddisks as disk devices. DBA — can be created, dropped, resized DATA_CD_00_cell01, RECO_CD_00_cell01

    The relationship is: one PHYSICALDISK → one CELLDISK → one or more GRIDDISKs. When a physical disk fails, the celldisk becomes unavailable, which takes its griddisks offline, which causes ASM to drop those disks from the disk group and begin rebalancing. Understanding this chain is essential for diagnosing hardware events.

    Connecting to cellcli

    cellcli runs only on the storage cell — you cannot run it from a database node. You must SSH to each cell individually. The management network (not the RoCE storage network) is used for cellcli SSH access.

    Connecting to a storage cell and entering cellcli
    # Connect to a storage cell via SSH using the management hostname
    ssh celladmin@cell01-adm       # Full admin access — can modify configuration
    ssh cellmonitor@cell01-adm     # Read-only access — monitoring only
    
    # Open the interactive cellcli prompt
    cellcli
    
    # You will see the prompt:
    # CellCLI: Release 23.x.x.x — Production on [date]
    # CellCLI>
    
    # Run a single command without entering interactive mode
    cellcli -e "LIST CELL DETAIL"
    
    # Exit the interactive session
    CellCLI> EXIT

    cellcli syntax rules: Commands are case-insensitive. Object type keywords (CELL, CELLDISK, GRIDDISK, etc.) and attribute names are case-insensitive. String values in WHERE filters are case-sensitive and must match exactly. Use a backslash (\) as a continuation character for long commands that span multiple lines.

    Cell-Level Commands

    The CELL object represents the entire storage server. Start here for any health check — it gives you the overall status of the cell before drilling into disks or metrics.

    Essential CELL commands # Quick cell status — name, status, and software version CellCLI> LIST CELL # Full cell detail — hardware model, software version, all network interfaces CellCLI> LIST CELL DETAIL # Check specific attributes only CellCLI> LIST CELL ATTRIBUTES name, status, releaseVersion # Check network interconnect interfaces CellCLI> LIST CELL ATTRIBUTES name, interconnect0, interconnect1 # Check cell uptime and last restart time CellCLI> LIST CELL ATTRIBUTES name, upTime, restartCount # Check cell memory total and available CellCLI> LIST METRICCURRENT CL_MEMUT # Check CPU utilisation CellCLI> LIST METRICCURRENT CL_CPUT # Check cell temperature (thermal status) CellCLI> LIST METRICCURRENT CL_TEMP

    Managing Physical Disks — PHYSICALDISK

    Physical disks are the raw hardware inside the storage cell. The most important thing to monitor at this level is the disk status. A status of normal is expected. Any deviation — particularly warning - predictive failure — requires immediate attention.

    PHYSICALDISK monitoring commands # List all physical disks with status — quick overview CellCLI> LIST PHYSICALDISK # Full detail for all physical disks CellCLI> LIST PHYSICALDISK DETAIL # List only specific attributes for all disks CellCLI> LIST PHYSICALDISK ATTRIBUTES name, diskType, status, serialNumber # List only hard disks CellCLI> LIST PHYSICALDISK WHERE diskType = 'HardDisk' # List only flash drives CellCLI> LIST PHYSICALDISK WHERE diskType = 'FlashDisk' # CRITICAL: Check for any disk not in normal status CellCLI> LIST PHYSICALDISK WHERE status != 'normal' # Check for predictive failure specifically (pre-failure warning) CellCLI> LIST PHYSICALDISK \ WHERE diskType = 'HardDisk' \ AND status = 'warning - predictive failure' \ DETAIL

    Normal status values for PHYSICALDISK: normal — disk is healthy. warning - predictive failure — S.M.A.R.T. diagnostics predict this disk will fail. Replace proactively before it fails completely. failed — disk has failed. ASM will drop the associated griddisks and rebalance. not present — no disk in this slot — expected for empty slots.

    Managing Celldisks — CELLDISK

    Celldisks are the logical representation of physical disks within Exadata System Software. They are created automatically when the cell is initialised and correspond one-to-one with physical disks. The DBA rarely needs to create or drop celldisks manually — but checking their status and free space is part of routine administration.

    CELLDISK monitoring and management commands # List all celldisks with status CellCLI> LIST CELLDISK # Full detail for all celldisks CellCLI> LIST CELLDISK DETAIL # Check status, size, and free space for all celldisks CellCLI> LIST CELLDISK ATTRIBUTES name, status, size, freeSpace # Check for any celldisk not in normal status CellCLI> LIST CELLDISK WHERE status != 'normal' # Check a specific celldisk in detail CellCLI> LIST CELLDISK CD_00_cell01 DETAIL # Read I/O throughput per celldisk CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK' # Celldisk I/O requests — large (Smart Scan) and small (OLTP) CellCLI> LIST METRICCURRENT CD_IO_RQ_R_LG CellCLI> LIST METRICCURRENT CD_IO_RQ_R_SM

    Celldisk status values mirror physicaldisk status. If a physicaldisk enters a failed state, the corresponding celldisk will also show as failed or in an error state. When a celldisk fails, all griddisks on that celldisk become unavailable and ASM rebalance begins automatically.

    Managing Griddisks — GRIDDISK

    Griddisks are what Oracle ASM sees as individual disk devices. Each celldisk is partitioned into one or more griddisks — typically one for the DATA disk group and one for the RECO disk group. The griddisk is the unit that ASM adds to or drops from a disk group.

    Griddisks are the most frequently administered storage object. You will work with griddisks when taking a cell offline for maintenance, restoring a replaced disk, and verifying disk group membership after a hardware event.

    GRIDDISK monitoring and management commands # List all griddisks with status CellCLI> LIST GRIDDISK # Full detail for all griddisks CellCLI> LIST GRIDDISK DETAIL # List griddisks with key attributes CellCLI> LIST GRIDDISK ATTRIBUTES \ name, asmDiskName, asmDiskGroupName, status, size # Check for any griddisk not in active status CellCLI> LIST GRIDDISK WHERE status != 'active' # List griddisks for a specific ASM disk group CellCLI> LIST GRIDDISK WHERE asmDiskGroupName = 'DATA' # Check a specific griddisk in detail CellCLI> LIST GRIDDISK DATA_CD_00_cell01 DETAIL

    Taking a griddisk offline and online — for planned maintenance

    When you need to take a storage cell offline for maintenance (patching, hardware work), you must first quiesce the griddisks so ASM can handle the absence gracefully. This is a celladmin-only operation.

    Take griddisks inactive for planned cell maintenance # Step 1: Take all griddisks on this cell INACTIVE before maintenance # This signals ASM to begin rebalancing before the cell goes down # Replace cell01 with your cell name CellCLI> ALTER GRIDDISK ALL INACTIVE # Step 2: Verify all griddisks show inactive status CellCLI> LIST GRIDDISK ATTRIBUTES name, status # Step 3: From the DB node, verify ASM rebalance is complete # Wait until no rebalance operations are in progress -- sqlplus / as sysasm -- SELECT * FROM v$asm_operation WHERE state = 'RUN'; # Step 4: Perform maintenance on the cell # Step 5: After maintenance, bring griddisks ACTIVE again CellCLI> ALTER GRIDDISK ALL ACTIVE # Step 6: Verify all griddisks are active CellCLI> LIST GRIDDISK ATTRIBUTES name, status

    Do not take all cells offline simultaneously. ASM needs enough cells to satisfy the disk group redundancy requirements. For a normal redundancy disk group (2-way mirroring), you can take one cell offline at a time. For high redundancy (3-way mirroring), you can take up to two cells offline simultaneously — but always confirm ASM rebalance completes between each cell.

    The Exadata Alert System

    Exadata System Software monitors hundreds of metrics across every cell component — disks, CPUs, temperature sensors, fans, network interfaces, flash devices. When a metric crosses a threshold, an alert is generated and recorded in the alert history. Understanding how to read and manage alerts is one of the most important Exadata administration skills.

    Alert severity levels

    Severity Meaning Action Required
    critical Component has failed or is at immediate risk. Data availability may be impacted. Immediate — escalate to Oracle Support and hardware team
    warning Component is degraded or approaching a failure threshold. Not yet causing data loss. Investigate within the same business day — plan remediation
    informational A notable event occurred but no failure. System state changes, rebalance completions, etc. Review — no immediate action usually required
    clear A previously raised alert has been resolved automatically. Verify the root cause is genuinely resolved
    Working with ALERTHISTORY # List all alerts — most recent first CellCLI> LIST ALERTHISTORY # Show full detail for all alerts CellCLI> LIST ALERTHISTORY DETAIL # Show only critical and warning alerts CellCLI> LIST ALERTHISTORY WHERE severity LIKE '[warning|critical]' # Show only alerts not yet acknowledged (examinedBy is null) CellCLI> LIST ALERTHISTORY \ WHERE severity LIKE '[warning|critical]' \ AND examinedBy IS NULL # Show alerts from a specific time range CellCLI> LIST ALERTHISTORY \ WHERE beginTime > '2026-05-22T00:00:00' # Mark an alert as examined (acknowledge it) # Replace <alert_id> with the numeric ID from LIST ALERTHISTORY CellCLI> ALTER ALERTHISTORY <alert_id> examinedBy = 'your_name' # Mark all alerts as examined CellCLI> ALTER ALERTHISTORY ALL examinedBy = 'your_name' # Drop old alert history entries CellCLI> DROP ALERTHISTORY <alert_id>

    Handling Predictive Failure Warnings

    A predictive failure warning is one of the most important alerts an Exadata DBA receives. It means S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) diagnostics inside the disk have detected early signs of failure — the disk has not yet failed, but its internal health indicators predict it will fail soon. The disk must be replaced before it fails completely to avoid data loss and an emergency rebalance operation.

    Step-by-step response to a predictive failure warning

    Step 1 — Identify the failing disk # Connect to the affected cell ssh celladmin@cell01-adm cellcli # Find the predictive failure disk — get its slot and serial number CellCLI> LIST PHYSICALDISK \ WHERE diskType = 'HardDisk' \ AND status = 'warning - predictive failure' \ DETAIL # Note the output — key fields to record: # name: 28:3 <-- slot number (rack:slot) # deviceId: 19 <-- device ID # serialNumber: ABC123DEF456 <-- serial for Oracle Support SR # status: warning - predictive failure # slotNumber: 3 <-- physical slot in the cell # Also check the associated celldisk and griddisks CellCLI> LIST CELLDISK ATTRIBUTES name, status, diskId CellCLI> LIST GRIDDISK WHERE celldisk = 'CD_03_cell01' DETAIL
    Step 2 — Check the alert history for this disk # Review all alerts associated with this cell CellCLI> LIST ALERTHISTORY WHERE severity LIKE '[warning|critical]' DETAIL # Check from the database side as well -- sqlplus / as sysasm -- SELECT path, mode_status, state, total_mb, free_mb -- FROM v$asm_disk -- ORDER BY path;
    Step 3 — Raise Oracle Support SR and prepare for disk replacement # Take all griddisks on the affected celldisk INACTIVE # This gives ASM time to rebalance before disk replacement CellCLI> ALTER GRIDDISK DATA_CD_03_cell01, RECO_CD_03_cell01 INACTIVE # Verify griddisks are inactive CellCLI> LIST GRIDDISK ATTRIBUTES name, status # Wait for ASM rebalance to complete from the database node -- SELECT group_number, operation, state, est_minutes -- FROM v$asm_operation; # After disk is physically replaced by Oracle hardware support, # and Exadata System Software detects the new disk, # create the celldisk and griddisks on the replacement disk CellCLI> CREATE CELLDISK CD_03_cell01 physicalDisk = '28:3' # Create griddisks on the new celldisk (matching the original sizes) CellCLI> CREATE GRIDDISK DATA_CD_03_cell01 \ celldisk = 'CD_03_cell01', \ size = 18T, offset = 0 CellCLI> CREATE GRIDDISK RECO_CD_03_cell01 \ celldisk = 'CD_03_cell01', \ size = 4T, offset = 18T # ASM will detect the new griddisks and begin rebalancing automatically # Monitor from the database: -- SELECT group_number, operation, state, est_minutes FROM v$asm_operation;

    Always raise an Oracle Support SR before disk replacement, even for predictive failures. Oracle Support has tools to remotely diagnose the disk and will guide the replacement procedure. For systems under Oracle-managed Platinum support, Oracle may initiate the replacement proactively before you even see the alert.

    The Exadata Alert Log

    In addition to the cellcli alert history, Exadata System Software maintains an alert log on each storage cell — similar in concept to the Oracle Database alert log. This log records all significant events including daemon starts and stops, disk events, configuration changes, and hardware alerts.

    Finding and reading the Exadata alert log on a storage cell # The alert log is on the storage cell OS -- SSH to the cell first ssh celladmin@cell01-adm # Main cellsrv alert log location cat /opt/oracle/cell/log/diag/asm/cell/cell01/alert/log.xml # For more readable plain text version ls -la /opt/oracle/cell/log/diag/asm/cell/cell01/trace/ # Tail the live alert log to watch for new events tail -f /opt/oracle/cell/log/diag/asm/cell/cell01/alert/log.xml # Search for specific error types grep -i "error\|warning\|critical\|ORA-" \ /opt/oracle/cell/log/diag/asm/cell/cell01/alert/log.xml | tail -50 # Check cellsrv trace files for detailed diagnostics ls -lt /opt/oracle/cell/log/diag/asm/cell/cell01/trace/ | head -20
    Check cell system logs for hardware events # Check the OS system log for hardware events (disk I/O errors, etc.) tail -100 /var/log/messages | grep -i "error\|fail\|disk" # Check IPMI/BMC hardware event log via ipmitool ipmitool sel list | tail -20 # Check for disk I/O errors in the kernel ring buffer dmesg | grep -i "error\|i/o error\|disk" | tail -20

    dcli — Running Commands Across All Cells

    dcli (Distributed CLI) executes the same command on multiple cells simultaneously and collects the combined output. It is the most time-efficient way to check the health of all cells in one command rather than SSH-ing into each one individually.

    dcli setup and syntax # The cell group file lists one cell hostname per line cat /opt/oracle.SupportTools/onecommand/cell_group # Basic syntax # dcli -g <group_file> [-l <username>] "<command>" # Default runs as celladmin dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST CELL" # Run as cellmonitor (read-only) -- specify username with -l dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ -l cellmonitor \ "cellcli -e LIST CELL ATTRIBUTES name, status" # Run on specific cells only -- create a subset group file echo -e "cell01\ncell02\ncell03" > /tmp/three_cells.txt dcli -g /tmp/three_cells.txt "cellcli -e LIST CELL"
    Essential dcli daily health checks # 1. Check all cell daemon status dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "service celld status" # 2. Check Exadata System Software version on all cells dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST CELL ATTRIBUTES name, releaseVersion" # 3. Check for any alerts across all cells dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST ALERTHISTORY WHERE severity LIKE '[warning|critical]'" # 4. Check for any disk NOT in normal status across all cells dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST PHYSICALDISK WHERE status != 'normal'" # 5. Check predictive failure disks specifically dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e \"LIST PHYSICALDISK WHERE status = \ 'warning - predictive failure'\"" # 6. Check all griddisk status dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST GRIDDISK WHERE status != 'active'" # 7. Check cell CPU across all cells dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST METRICCURRENT CL_CPUT" # 8. Check flash cache status on all cells dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST FLASHCACHE DETAIL" # 9. Check Storage Index savings across all cells dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST METRICCURRENT WHERE name LIKE 'SI_%'" # 10. Check for any celldisk not in normal state dcli -g /opt/oracle.SupportTools/onecommand/cell_group \ "cellcli -e LIST CELLDISK WHERE status != 'normal'"

    Managing Cell Services — Starting and Stopping

    The three cell daemons — cellsrv, MS (Management Server), and RS (Restart Server) — run as a single service called celld. RS is the watchdog process that monitors and automatically restarts cellsrv and MS if they fail. In normal operations, you should rarely need to manually start or stop cell services.

    Cell service management commands — run as root on the cell OS # Check status of all cell daemons service celld status # Start all cell services (RS starts first, then MS, then cellsrv) service celld start # Stop all cell services gracefully service celld stop # Restart a specific daemon without stopping others # Use cellcli ALTER commands instead of OS-level restart when possible CellCLI> ALTER CELL RESTART SERVICES cellsrv CellCLI> ALTER CELL RESTART SERVICES ms CellCLI> ALTER CELL RESTART SERVICES rs # Restart all services (equivalent to stop then start) CellCLI> ALTER CELL RESTART SERVICES ALL # Shutdown the cell completely (use before hardware maintenance) CellCLI> ALTER CELL SHUTDOWN

    Never restart cellsrv during active database I/O without first taking the griddisks INACTIVE and allowing ASM to rebalance. An abrupt cellsrv restart while databases are actively writing will cause ASM to drop the affected griddisks from the disk group, triggering an unplanned rebalance. Always take griddisks INACTIVE first for planned maintenance.

    Flash Cache Administration

    Smart Flash Cache is managed by Exadata System Software automatically — it populates, evicts, and manages cache contents without DBA intervention. However, there are specific administration tasks a DBA needs to perform, particularly during troubleshooting or after hardware replacement.

    Flash cache administration commands # Check flash cache status and configuration CellCLI> LIST FLASHCACHE DETAIL # Check flash cache utilisation metrics CellCLI> LIST METRICCURRENT FC_BY_USED CellCLI> LIST METRICCURRENT FC_IO_BY_R_SEC # List what is cached in the flash cache (be careful -- large output) CellCLI> LIST FLASHCACHECONTENT # Flush (clear) the flash cache -- use only when directed by Oracle Support # This causes temporary performance degradation as cache repopulates CellCLI> ALTER FLASHCACHE ALL FLUSH # Drop and recreate the flash cache (after flash hardware replacement) CellCLI> DROP FLASHCACHE CellCLI> CREATE FLASHCACHE ALL

    Routine Health Check — Full Command Reference

    This is the complete daily administration runbook for Exadata storage cell health. Run all sections at the start of each working day. Total time: under 5 minutes when run via dcli.

    Daily Exadata health check — run from the primary DB node #!/bin/bash # DAILY EXADATA STORAGE CELL HEALTH CHECK # Run from primary DB node as oracle or applmgr # Requires SSH key-based authentication to all cells CELL_GROUP=/opt/oracle.SupportTools/onecommand/cell_group echo "==============================================" echo " EXADATA DAILY HEALTH CHECK - $(date)" echo "==============================================" echo "" echo "--- 1. CELL DAEMON STATUS ---" dcli -g $CELL_GROUP "service celld status | grep -E 'CellSRV|MS |RS '" echo "" echo "--- 2. CELL SOFTWARE VERSION ---" dcli -g $CELL_GROUP \ "cellcli -e LIST CELL ATTRIBUTES name, releaseVersion" echo "" echo "--- 3. CRITICAL AND WARNING ALERTS ---" dcli -g $CELL_GROUP \ "cellcli -e LIST ALERTHISTORY WHERE severity LIKE '[warning|critical]'" echo "" echo "--- 4. PHYSICAL DISKS NOT IN NORMAL STATUS ---" dcli -g $CELL_GROUP \ "cellcli -e LIST PHYSICALDISK WHERE status != 'normal'" echo "" echo "--- 5. GRIDDISKS NOT IN ACTIVE STATUS ---" dcli -g $CELL_GROUP \ "cellcli -e LIST GRIDDISK WHERE status != 'active'" echo "" echo "--- 6. CELLDISKS NOT IN NORMAL STATUS ---" dcli -g $CELL_GROUP \ "cellcli -e LIST CELLDISK WHERE status != 'normal'" echo "" echo "--- 7. CELL CPU UTILISATION ---" dcli -g $CELL_GROUP \ "cellcli -e LIST METRICCURRENT CL_CPUT" echo "" echo "--- 8. FLASH CACHE UTILISATION ---" dcli -g $CELL_GROUP \ "cellcli -e LIST METRICCURRENT FC_BY_USED" echo "" echo "==============================================" echo " HEALTH CHECK COMPLETE" echo "=============================================="

    Complete cellcli Command Reference

    Command What It Does Access Required
    LIST CELL Cell status overview cellmonitor
    LIST CELL DETAIL Full cell info including hardware model, software version, network cellmonitor
    LIST PHYSICALDISK All physical disk status cellmonitor
    LIST PHYSICALDISK WHERE status != 'normal' Disks with problems only cellmonitor
    LIST CELLDISK All celldisk status and size cellmonitor
    LIST CELLDISK WHERE status != 'normal' Celldisks with problems cellmonitor
    LIST GRIDDISK All griddisk status and ASM mapping cellmonitor
    LIST GRIDDISK WHERE status != 'active' Griddisks not in active state cellmonitor
    LIST ALERTHISTORY All cell alerts cellmonitor
    LIST ALERTHISTORY WHERE severity LIKE '[warning|critical]' Warning and critical alerts only cellmonitor
    LIST FLASHCACHE DETAIL Flash cache status and configuration cellmonitor
    LIST METRICCURRENT CL_CPUT Cell CPU utilisation cellmonitor
    LIST METRICCURRENT FC_BY_USED Flash cache utilisation % cellmonitor
    LIST METRICCURRENT CD_IO_RQ_R_LG Large I/O (Smart Scan) request rate cellmonitor
    LIST IORMPLAN I/O Resource Management plan cellmonitor
    ALTER GRIDDISK ALL INACTIVE Take all griddisks offline for maintenance celladmin
    ALTER GRIDDISK ALL ACTIVE Bring all griddisks back online celladmin
    ALTER ALERTHISTORY <id> examinedBy = 'name' Acknowledge an alert celladmin
    ALTER CELL RESTART SERVICES cellsrv Restart the cellsrv daemon celladmin
    ALTER FLASHCACHE ALL FLUSH Clear the flash cache celladmin
    CREATE GRIDDISK Create a new griddisk on a celldisk celladmin
    DROP GRIDDISK Remove a griddisk celladmin

    Summary

    • Exadata storage has three object levels: PHYSICALDISK (hardware) → CELLDISK (cell software layer) → GRIDDISK (presented to ASM). Understanding this chain is essential for hardware event diagnosis.
    • cellcli runs on storage cells only — SSH to each cell individually. Use cellmonitor for read-only monitoring, celladmin for configuration changes.
    • The most critical daily check is LIST PHYSICALDISK WHERE status != 'normal' — a warning - predictive failure status means the disk must be replaced proactively before it fails completely.
    • dcli runs any cellcli command across all cells simultaneously using a group file — it is the only practical way to check all cells every day.
    • The cell alert system records all significant events in ALERTHISTORY. Check for unacknowledged warning and critical alerts daily using the dcli command in the health check script.
    • The Exadata alert log on each cell (/opt/oracle/cell/log/diag/asm/cell/) provides detailed event history for deep troubleshooting.
    • Before any planned cell maintenance, always take griddisks INACTIVE first and wait for ASM rebalance to complete. Never restart cellsrv with griddisks in active state during database I/O.