Sunday, December 29, 2019

shutdown the Exadata database nodes and storage cells in a rolling fashion


How to shutdown the Exadata database nodes and storage cells in a rolling fashion so certain hardware tasks can be performed


The goal of this document is to provide a way to shutdown the Exadata database nodes and storage cells in a rolling fashion so certain hardware tasks may be performed on each server. For example - replacing the raid controller battery requires downtime for each server. This document will provide a way to perform these tasks without taking down the database.

FIX
This note separates the tasks into two different sections.
  1. Instructions on shutting down and restarting each database node in a rolling fashion.
  2. Instructions on shutting down and restarting each storage cell in a rolling fashion.

 SECTION1: Shutting down and restarting each Exadata database node in a rolling fashion

As root perform the following on each database server - one at a time:

1)Check to see if AUTOSTART is enabled:

# $GRID_HOME/bin/crsctl config crs
 2) Disable the Grid Infrastructure for autostart on the database servers if the previous step indicated it is currently enabled for autostart.
# $GRID_HOME/bin/crsctl disable crs
  •  Note: This is step is [Optional] and it can required during maintenance operation like "firmware patches" which requires to reboot the Compute Node several times.

3) Stop the Grid Infrastructure stack on the first database server locally:
# $GRID_HOME/bin/crsctl stop crs
 4) Verify that the Grid Infrastructure stack has shutdown successfully on the database server.

The following command should show no output if the GI stack has shutdown:
# ps -ef | grep diskmon
 5) Confirm that the clusterware resources are still up and running on the other nodes:
cd /opt/oracle.SupportTools/onecommand
# dcli -g dbs_group -l root $GRID_HOME/bin/crsctl check crs
7) Before shutting down the DB node, check /etc/fstab for any nfs mounts that should be unmounted.
8) Shut the db node down so you can peform maintenance.
# poweroff
 9) Perform the scheduled maintenance on the DB node.
You can proceed now with hardware replacement or maintenance.

10) After hardware has been replaced or maintenance performed and your ready to bring the server back up, power on the DB node by using the power button on the front panel of the Exadata Storage Servers.

11) Start the Grid Infrastructure stack on the database server once it comes up:
# $GRID_HOME/bin/crsctl start crs
 12) Wait until the Grid Infrastructure stack has successfully started. To check the status of the Grid Infrastructure stack, run the following command and verify that the "ora.asm" instance is started. Note that the command below will continue to report that it is unable to communicate with the Grid Infrastructure software for several minutes after issuing the "crsctl start crs" command above:
# /u01/app/11.2.0/grid/bin/crsctl status resource -t
 13) Reenable the Grid Infrastructure for autostart again since we disabled it earlier in Section 1 # 3:
# $GRID_HOME/bin/crsctl enable crs
 14) Confirm that the clusterware resources are up and running on all nodes:

cd /opt/oracle.SupportTools/onecommand
# dcli -g dbs_group -l root $GRID_HOME/bin/crsctl check crs
 After completing the steps above, repeat same steps again in Section 1 on the remaining nodes (one at a time) until all hardware has been replaced on each DB node.
SECTION 2: Shutting down and restarting the storage cells in a rolling fashion
1) Refer to the instructions listed in the note below:
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)

Reference Doc ID - . (Doc ID 1539451.1)

No comments: