Hi there,
I'm having an issue with one of our cluster wherein, when tried to fail-over the service group it failed with below errors -
2013/10/30 22:22:19 VCS ERROR V-16-2-13006 (XXXXX) Resource(vdaappdg_dg): clean procedure did not complete within the expected time. 2013/10/30 22:22:30 VCS ERROR V-16-2-13006 (XXXXX) Resource(vdrapp_vol): clean procedure did not complete within the expected time. 2013/10/30 22:22:32 VCS ERROR V-16-2-13027 (XXXXX) Resource(mobius_dg) - monitor procedure did not complete within the expected time. 2013/10/30 22:23:30 VCS ERROR V-16-2-13077 (XXXXX) Agent is unable to offline resource(vdrapp_vol). Administrative intervention may be required. 2013/10/30 22:26:48 VCS ERROR V-16-2-13077 (XXXXX) Agent is unable to offline resource(vdaappdg_dg). Administrative intervention may be required. 2013/10/30 22:32:35 VCS ERROR V-16-2-13063 (XXXXX) Agent is calling clean for resource(mobius_dg) because offline did not complete within the expected time. 2013/10/30 22:33:22 VCS INFO V-16-2-13068 (XXXXX) Resource(mobius_dg) - clean completed successfully. 2013/10/30 22:37:24 VCS ERROR V-16-2-13027 (XXXXX) Resource(mobius_dg) - monitor procedure did not complete within the expected time. 2013/10/30 22:37:24 VCS ERROR V-16-2-13077 (XXXXX) Agent is unable to offline resource(mobius_dg). Administrative intervention may be required.
Background -
On this cluster we have large number of disks and worth of 54 TB data.
DG-NAME #VD (GB) TOTAL used free mobius 176 14031.25 14000.00 31.25 vdaappdg 534 42596.69 40628.76 1967.93
While VCS fail-over happens it simply hangs! and says "Agent is unable to offline resource(vdrapp_vol). Administrative intervention may be required."
As far as I understand, this message displays when an offline procedure does not complete on time. An offline procedure timeout can occur when the system is overloaded, is busy processing a system call, or is handling a large number of resources.
I wanted to know if someone has ever gone through such a situation & if someone can advice if this is happening due to large number of storage??
Waiting for experts advice and possibly solution in order to make sure fail-over functionality works seamlessly.
Thank you/Nilesh