Cluster integrity and cluster membership will be governed by occsd (oracle cluster synchronization daemon) monitors the nodes using 2 communication channels: - Private Interconnect aka Network Heartbeat - Voting Disk based communication aka Disk Heartbeat
These are the dependent processes/agents (init ==> ohasd ==> cssdagent ==> ora.ocssd.bin)
If cssd found that ocssd is down, it will reboot the node to protect the data integrity.
but why ocssd down !!!!! any idea, please go through below chart where I tried to explained dependency
of ocssd process with two communication channel
first one is network heart beat which is communicate over private interconnect
second one is disk heart beat which is communicate over Voting disk
Why nodes should be evicted?
Evicting (fencing) nodes is a preventive measure (it’s a good thing)!
Nodes are evicted to prevent consequences of a split brain:
– Shared data must not be written by independently operating nodes
– The easiest way to prevent this is to forcibly remove a node from the cluster
Each node in the cluster is “pinged” every second
Nodes must respond in css_misscount time (defaults to 30 secs.)
bash-3.2$ ./crsctl get css misscount CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
– Reducing the css_misscount time is generally not supported
Network heartbeat failures will lead to node evictions
[date / time] [CSSD]
clssnmPollingThread: node mynodename (5) at 75% heartbeat fatal, removal in 6.7 sec