VCS ERROR V-16-1-13027

September 12, 2013, 6:54 am

≫ Next: vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit

≪ Previous: Need to know the root cause for the service outage in vcs cluster

I need a solution

Hello Forum,

Recently this error message showed up on my system log,

Sep 11 08:17:37 dbsp1 Had[2008]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13027 (dbsp1) Resource(MNIC) - monitor procedure did not complete

I am guessing that this is because the monitor procedure failed to completed within the time window.
Probably because the CPU was busy. Someone mentioned it was because of swap space.
I checked the swap but there is only about 5% capacity utlization of swap space.

Can anyone guide me as to what I need to look for?
The cluster shows no faults or errors.

Thank you very much in advance.

↧

vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit

August 29, 2013, 10:31 pm

≫ Next: half-duplex LLT

≪ Previous: VCS ERROR V-16-1-13027

I need a solution

Hi,

Everyday for the same time, we getting on the syslog below error,

Kindly please anyone suggets how to resolve ?

Aug 21 21:50:09 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 21 21:50:09 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 22 02:10:05 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 22 02:10:05 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 22 02:35:09 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 22 02:35:09 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 23 14:14:11 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 23 14:14:11 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 23 16:28:11 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 23 16:28:11 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 23 17:16:11 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 23 17:16:12 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 24 09:03:12 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 24 09:03:12 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 24 19:21:13 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 24 20:28:13 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 24 22:31:22 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 25 00:25:13 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 25 00:25:13 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 25 00:58:13 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 25 00:58:13 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 25 17:21:14 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 25 17:21:15 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 26 03:06:15 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 26 03:06:15 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 26 22:31:16 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 01:40:18 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 01:40:18 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 07:30:15 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 07:30:15 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 08:35:19 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 08:35:19 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 15:08:19 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 15:08:19 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 22:40:19 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 28 22:40:19 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 29 15:52:20 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 29 15:52:20 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 29 17:45:20 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 30 01:30:21 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit
Aug 30 01:30:21 l165ux12 vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit

↧

half-duplex LLT

September 12, 2013, 10:13 pm

≫ Next: Mount resource was not able to unmount

≪ Previous: vmunix: LLT INFO V-14-1-10063 llt_send_port: no memory to xmit

I need a solution

Hey,

Should private connect LLT interface set to run half-duplex be any sort of a concern?

Thanks,

Wojtek

↧

Mount resource was not able to unmount

September 13, 2013, 12:47 am

≫ Next: Are VCS commands executed on all nodes

≪ Previous: half-duplex LLT

I need a solution

Environment

SFHA/DR

iSCSI SAN

Primary Site = two nodes

DR Site = one node

SFHA version = 5.0 MP4 RP1

RHEL OS = 5

DiskGroup Agent logs

2013/09/10 11:56:48 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 11:56:49 VCS ERROR V-16-2-13027 Thread(4152359824) Resource(DG) - monitor procedure did not complete within the expected time.
2013/09/10 11:58:48 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:00:48 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 12:02:48 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:02:49 VCS ERROR V-16-2-13210 Thread(4155296656) Agent is calling clean for resource(DG) because 4 successive invocations of the monitor procedure did not complete within the expected time.
2013/09/10 12:02:50 VCS ERROR V-16-2-13068 Thread(4155296656) Resource(DG) - clean completed successfully.
2013/09/10 12:03:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 12:03:51 VCS ERROR V-16-2-13077 Thread(4152359824) Agent is unable to offline resource(DG). Administrative intervention may be required.
2013/09/10 12:05:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:07:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 12:09:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:11:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 12:13:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:15:50 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 12:17:51 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:19:51 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)
2013/09/10 12:21:51 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4152359824)
2013/09/10 12:23:51 VCS WARNING V-16-2-13139 Thread(4156349328) Canceling thread (4155296656)

ERROR V-16-2-13210

https://sort.symantec.com/ecls/umi/V-16-2-13210

===========================================================================

Mount Agent logs

2013/09/10 12:12:53 VCS ERROR V-16-2-13064 Thread(4154968976) Agent is calling clean for resource(MOUNT1) because the resource is up even after offline completed.
2013/09/10 12:12:54 VCS ERROR V-16-2-13069 Thread(4154968976) Resource(MOUNT1) - clean failed.
2013/09/10 12:13:54 VCS ERROR V-16-2-13077 Thread(4154968976) Agent is unable to offline resource(MOUNT1). Administrative intervention may be required.
2013/09/10 12:14:00 VCS WARNING V-16-2-13102 Thread(4153916304) Resource (MOUNT1) received unexpected event info in state GoingOfflineWaiting
2013/09/10 12:15:28 VCS WARNING V-16-2-13102 Thread(4153916304) Resource (MOUNT1) received unexpected event info in state GoingOfflineWaiting

WARNING V-16-2-13102

https://sort.symantec.com/ecls/umi/V-16-2-13102

===========================================================================

My investigation

Extract from the above TN => Links 0 and 1 are both connected to the same switch, resulting in a cross-link scenario (googled last extract and found next TN) =>http://www.symantec.com/business/support/index?page=content&id=HOWTO79920 (Description of this TN next ) => It seems that you're using only one network switch between the cluster nodes. Symantec recommends configuring two independent networks between the cluster nodes with one network switch for each network for failure protection.

Seems the problem is this => Links 0 and 1 are both connected to the same switch, resulting in a cross-link scenario

===========================================================================

The above is the one case. I also see the problem is with disks as well. Which may lead the disks unresponsive and mount resource caused problem. See the attached RHEL messages log:

messages.rar

↧

Are VCS commands executed on all nodes

September 14, 2013, 2:43 am

≫ Next: Errors "V-2-96 vx_setfsflags file system full fsck flag set" and "V-2-17 vx_iread_1 file system inode marked bad incore"

≪ Previous: Mount resource was not able to unmount

I need a solution

Moved from post "vcs upgrade" as a separate issue:

When a resource is added from one vcs node with: haconf -makerw hagrp -add groupw hares -add diskgroup haconf -dump -makero Then all the above commands are executed on all vcs nodes.right?had from every vcs nodes using gab knows that now should to execute the above commands. when a new node join to an vcs then all the information regarding nodes,resource group so on are loaded into memory from the remote vcs nodes.right? tnx so much, marius

9225691

↧

Errors "V-2-96 vx_setfsflags file system full fsck flag set" and "V-2-17 vx_iread_1 file system inode marked bad incore"

September 16, 2013, 1:33 am

≫ Next: SCL for VCS 5 on AIX

≪ Previous: Are VCS commands executed on all nodes

I need a solution

Hi All ,

I am getting the above messages from one of the node in VCS. We are running with RHEL 5.8 version.

The messages refering to a particular filesystem is mounted in rw mode, and there was no issues reported except these messages.

I found an article http://www.symantec.com/docs/TECH135425 but states a solution for Solaris 10 x86 .

Is there any tech doc refering to RHEL also please suggest. And also is this error critical ????

Regards ,

Sourav

↧

SCL for VCS 5 on AIX

September 16, 2013, 2:14 am

≫ Next: VCS 6.0.3 on Solaris 11 with Sybase ASE 15.X

≪ Previous: Errors "V-2-96 vx_setfsflags file system full fsck flag set" and "V-2-17 vx_iread_1 file system inode marked bad incore"

I need a solution

Hello

I am preparing an OS upgrade from Aix 5.3 TL7 to 5.3 TL11 or TL12 while runing VCS 5.0.30.3.

I am try to find the official symantec SCL (software compatibility list) without success ! can anybody help me to find this document to ensure which minimum version is supported on aix 5.3TL12 ?

Thanks in advance

Best regards

↧

VCS 6.0.3 on Solaris 11 with Sybase ASE 15.X

October 26, 2013, 7:02 am

≫ Next: Nic resource going down frequently

≪ Previous: SCL for VCS 5 on AIX

I need a solution

I am planning to set-up to a 2-node cluster (Oracle T5-2 running Solaris 11) using VCS 6.0.3 with the database being Sybase ASE 15.7.

I am trying to get hold of the VCS Sybase Agent but could not find it.

Can you help please?

Thank you

Danilo

↧

Nic resource going down frequently

October 28, 2013, 4:52 am

≫ Next: test

≪ Previous: VCS 6.0.3 on Solaris 11 with Sybase ASE 15.X

I need a solution

We are experiencing below mentioned faults every now and then that affects our backup.

setup is like we have BkupLan service group in which we have NIC resource (bkup_nic) and Phantom resource .

Two service groups OSSfs and sybase1 have proxy resource monitoring this NIC resource.
and both are configured with Ip resources Ossbak_ip1 and sybbak_ip1.

The issue is Nic resource failed with below message and in turn proxy and ip resources failed.
Then NIC comes back and proxy and ip come back too.
Below is the configuration too
Please advice why it goen offline/online frequently

2013/10/24 22:32:03 VCS WARNING V-16-10001-7506 (ossadm2) NIC:bkup_nic:monitor:Resource is offline. No Network Host could be reached

2013/10/24 22:32:03 VCS INFO V-16-2-13716 (ossadm2) Resource(bkup_nic): Output of the completed operation (monitor)
==============================================
Broken Pipe
==============================================

2013/10/24 22:32:03 VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:03 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=bkup_nic, arg2=ONLINE
2013/10/24 22:32:03 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=bkup_nic2013/10/24 22:32:03 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 bkup_nic ONLINE successfully
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:20 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ossbak_ip (Owner: Unspecified, Group: Ossfs) on System ossadm2
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:20 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=ossbak_p1, arg2=ONLINE
2013/10/24 22:32:20 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=ossbak_p1
2013/10/24 22:32:20 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=syb1bak_p1, arg2=ONLINE
2013/10/24 22:32:20 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=syb1bak_p1
2013/10/24 22:32:20 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 ossbak_p1 ONLINE successfully
2013/10/24 22:32:20 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 syb1bak_p1 ONLINE successfully
2013/10/24 22:32:22 VCS INFO V-16-1-10305 Resource ossbak_ip (Owner: Unspecified, Group: Ossfs) is offline on ossadm2 (VCS initiated)
2013/10/24 22:33:04 VCS INFO V-16-1-10299 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is online on ossadm2 (Not initiated by VCS)
2013/10/24 22:33:04 VCS NOTICE V-16-1-10447 Group BkupLan is online on system ossadm2
2013/10/24 22:33:20 VCS INFO V-16-1-10299 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is online on ossadm2 (Not initiated by VCS)
2013/10/24 22:33:20 VCS INFO V-16-1-10299 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is online on ossadm2 (Not initiated by VCS)

====================================================================================

ossadm2{root} # hastatus -summ

-- SYSTEM STATE
-- System State Frozen

A ossadm1 RUNNING 0
A ossadm2 RUNNING 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B BkupLan         ossadm1              Y          N               ONLINE
B BkupLan         ossadm2              Y          N               ONLINE
B DDCMon          ossadm1              Y          N               ONLINE
B DDCMon          ossadm2              Y          N               ONLINE
B Oss             ossadm1              Y          N               OFFLINE
B Oss             ossadm2              Y          N               ONLINE
B Ossfs           ossadm1              Y          N               OFFLINE
B Ossfs           ossadm2              Y          N               ONLINE
B PrivLan         ossadm1              Y          N               ONLINE
B PrivLan         ossadm2              Y          N               ONLINE
B PubLan          ossadm1              Y          N               ONLINE
B PubLan          ossadm2              Y          N               ONLINE
B StorLan         ossadm1              Y          N               ONLINE
B StorLan         ossadm2              Y          N               ONLINE
B Sybase1         ossadm1              Y          N               ONLINE
B Sybase1         ossadm2              Y          N               OFFLINE
you have mail
ossadm2{root} # hagrp -resources BkupLan
bkup_nic
bkup_p
ossadm2{root} # hagrp -resources bkup_nic
VCS WARNING V-16-1-12130 Group bkup_nic does not exist
ossadm2{root} # hagrp -resources BkupLan
bkup_nic
bkup_p
ossadm2{root} # hares -display bkup_nic
#Resource    Attribute             System     Value
bkup_nic     Group                 global     BkupLan
bkup_nic     Type                  global     NIC
bkup_nic     AutoStart             global     1
bkup_nic     Critical              global     0
bkup_nic     Enabled               global     1
bkup_nic     LastOnline            global     ossadm2
bkup_nic     MonitorOnly           global     0
bkup_nic     ResourceOwner         global
bkup_nic     TriggerEvent          global     0
bkup_nic     ArgListValues         ossadm1    Device    1       oce12   PingOptimize    1       1       NetworkHosts    1       127.0.0.1       Protocol   1       IPv4    NetworkType     1       ether   ExclusiveIPZone 1       0
bkup_nic     ArgListValues         ossadm2    Device    1       oce12   PingOptimize    1       1       NetworkHosts    1       127.0.0.1       Protocol   1       IPv4    NetworkType     1       ether   ExclusiveIPZone 1       0
bkup_nic     ConfidenceLevel       ossadm1    100
bkup_nic     ConfidenceLevel       ossadm2    100
bkup_nic     ConfidenceMsg         ossadm1
bkup_nic     ConfidenceMsg         ossadm2
bkup_nic     Flags                 ossadm1
bkup_nic     Flags                 ossadm2
bkup_nic     IState                ossadm1    not waiting
bkup_nic     IState                ossadm2    not waiting
bkup_nic     MonitorMethod         ossadm1    Traditional
bkup_nic     MonitorMethod         ossadm2    Traditional
bkup_nic     Probed                ossadm1    1
bkup_nic     Probed                ossadm2    1
bkup_nic     Start                 ossadm1    0
bkup_nic     Start                 ossadm2    0
bkup_nic     State                 ossadm1    ONLINE
bkup_nic     State                 ossadm2    ONLINE
bkup_nic     ComputeStats          global     0
bkup_nic     ExclusiveIPZone       global     0
bkup_nic     NetworkHosts          global     127.0.0.1
bkup_nic     NetworkType           global     ether
bkup_nic     PingOptimize          global     1
bkup_nic     Protocol              global     IPv4
bkup_nic     TriggerResStateChange global     0
bkup_nic     ContainerInfo         ossadm1    Type              Name            Enabled
bkup_nic     ContainerInfo         ossadm2    Type              Name            Enabled
bkup_nic     Device                ossadm1    oce12
bkup_nic     Device                ossadm2    oce12
bkup_nic     MonitorTimeStats      ossadm1    Avg       0       TS
bkup_nic     MonitorTimeStats      ossadm2    Avg       0       TS
bkup_nic     ResourceInfo          ossadm1    State     Stale   Msg             TS
bkup_nic     ResourceInfo          ossadm2    State     Stale   Msg             TS
ossadm2{root} # hares -display bkup_p
#Resource    Attribute             System     Value
bkup_p       Group                 global     BkupLan
bkup_p       Type                  global     Phantom
bkup_p       AutoStart             global     1
bkup_p       Critical              global     1
bkup_p       Enabled               global     1
bkup_p       LastOnline            global     ossadm1
bkup_p       MonitorOnly           global     0
bkup_p       ResourceOwner         global
bkup_p       TriggerEvent          global     0
bkup_p       ArgListValues         ossadm1    ""
bkup_p       ArgListValues         ossadm2    ""
bkup_p       ConfidenceLevel       ossadm1    100
bkup_p       ConfidenceLevel       ossadm2    100
bkup_p       ConfidenceMsg         ossadm1
bkup_p       ConfidenceMsg         ossadm2
bkup_p       Flags                 ossadm1
bkup_p       Flags                 ossadm2
bkup_p       IState                ossadm1    not waiting
bkup_p       IState                ossadm2    not waiting
bkup_p       MonitorMethod         ossadm1    Traditional
bkup_p       MonitorMethod         ossadm2    Traditional
bkup_p       Probed                ossadm1    1
bkup_p       Probed                ossadm2    1
bkup_p       Start                 ossadm1    1
bkup_p       Start                 ossadm2    1
bkup_p       State                 ossadm1    ONLINE
bkup_p       State                 ossadm2    ONLINE
bkup_p       ComputeStats          global     0
bkup_p       TriggerResStateChange global     0
bkup_p       ContainerInfo         ossadm1    Type              Name            Enabled
bkup_p       ContainerInfo         ossadm2    Type              Name            Enabled
bkup_p       MonitorTimeStats      ossadm1    Avg       0       TS
bkup_p       MonitorTimeStats      ossadm2    Avg       0       TS
bkup_p       ResourceInfo          ossadm1    State     Stale   Msg             TS
bkup_p       ResourceInfo          ossadm2    State     Valid   Msg             TS
ossadm2{root} # hagrp -resources Sybase1
sybasedg
sybmaster_mount
syblog_mount
sybdata_mount
pmsyblog_mount
pmsybdata_mount
fmsyblog_mount
fmsybdata_mount
dbdumps_mount
syb1_ip
syb1_p1
syb1bak_ip
syb1bak_p1
masterdataservice
masterdataservice_BACKUP
stop_sybase
ossadm2{root} # hares -display syb1bak_p1
#Resource    Attribute             System     Value
syb1bak_p1   Group                 global     Sybase1
syb1bak_p1   Type                  global     Proxy
syb1bak_p1   AutoStart             global     1
syb1bak_p1   Critical              global     0
syb1bak_p1   Enabled               global     1
syb1bak_p1   LastOnline            global     ossadm2
syb1bak_p1   MonitorOnly           global     0
syb1bak_p1   ResourceOwner         global
syb1bak_p1   TriggerEvent          global     0
syb1bak_p1   ArgListValues         ossadm1    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
syb1bak_p1   ArgListValues         ossadm2    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
syb1bak_p1   ConfidenceLevel       ossadm1    0
syb1bak_p1   ConfidenceLevel       ossadm2    0
syb1bak_p1   ConfidenceMsg         ossadm1
syb1bak_p1   ConfidenceMsg         ossadm2
syb1bak_p1   Flags                 ossadm1
syb1bak_p1   Flags                 ossadm2
syb1bak_p1   IState                ossadm1    not waiting
syb1bak_p1   IState                ossadm2    not waiting
syb1bak_p1   MonitorMethod         ossadm1    Traditional
syb1bak_p1   MonitorMethod         ossadm2    Traditional
syb1bak_p1   Probed                ossadm1    1
syb1bak_p1   Probed                ossadm2    1
syb1bak_p1   Start                 ossadm1    0
syb1bak_p1   Start                 ossadm2    0
syb1bak_p1   State                 ossadm1    ONLINE
syb1bak_p1   State                 ossadm2    ONLINE
syb1bak_p1   ComputeStats          global     0
syb1bak_p1   ResourceInfo          global     State     Stale   Msg             TS
syb1bak_p1   TargetResName         global     bkup_nic
syb1bak_p1   TargetSysName         global
syb1bak_p1   TriggerResStateChange global     0
syb1bak_p1   ContainerInfo         ossadm1    Type              Name            Enabled
syb1bak_p1   ContainerInfo         ossadm2    Type              Name            Enabled
syb1bak_p1   MonitorTimeStats      ossadm1    Avg       0       TS
syb1bak_p1   MonitorTimeStats      ossadm2    Avg       0       TS
ossadm2{root} # hares -display syb1bak_ip
#Resource    Attribute             System     Value
syb1bak_ip   Group                 global     Sybase1
syb1bak_ip   Type                  global     IP
syb1bak_ip   AutoStart             global     1
syb1bak_ip   Critical              global     0
syb1bak_ip   Enabled               global     1
syb1bak_ip   LastOnline            global     ossadm1
syb1bak_ip   MonitorOnly           global     0
syb1bak_ip   ResourceOwner         global
syb1bak_ip   TriggerEvent          global     0
syb1bak_ip   ArgListValues         ossadm1    Device    1       oce12   Address 1       10.41.78.138    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
syb1bak_ip   ArgListValues         ossadm2    Device    1       oce12   Address 1       10.41.78.138    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
syb1bak_ip   ConfidenceLevel       ossadm1    100
syb1bak_ip   ConfidenceLevel       ossadm2    0
syb1bak_ip   ConfidenceMsg         ossadm1
syb1bak_ip   ConfidenceMsg         ossadm2
syb1bak_ip   Flags                 ossadm1
syb1bak_ip   Flags                 ossadm2
syb1bak_ip   IState                ossadm1    not waiting
syb1bak_ip   IState                ossadm2    not waiting
syb1bak_ip   MonitorMethod         ossadm1    Traditional
syb1bak_ip   MonitorMethod         ossadm2    Traditional
syb1bak_ip   Probed                ossadm1    1
syb1bak_ip   Probed                ossadm2    1
syb1bak_ip   Start                 ossadm1    1
syb1bak_ip   Start                 ossadm2    0
syb1bak_ip   State                 ossadm1    ONLINE
syb1bak_ip   State                 ossadm2    OFFLINE
syb1bak_ip   Address               global     10.41.78.138
syb1bak_ip   ArpDelay              global     1
syb1bak_ip   ComputeStats          global     0
syb1bak_ip   ExclusiveIPZone       global     0
syb1bak_ip   IfconfigTwice         global     1
syb1bak_ip   NetMask               global     255.255.255.224
syb1bak_ip   Options               global
syb1bak_ip   PrefixLen             global     0
syb1bak_ip   ResourceInfo          global     State     Stale   Msg             TS
syb1bak_ip   RouteOptions          global
syb1bak_ip   TriggerResStateChange global     0
syb1bak_ip   ContainerInfo         ossadm1    Type              Name            Enabled
syb1bak_ip   ContainerInfo         ossadm2    Type              Name            Enabled
syb1bak_ip   Device                ossadm1    oce12
syb1bak_ip   Device                ossadm2    oce12
syb1bak_ip   MonitorTimeStats      ossadm1    Avg       0       TS
syb1bak_ip   MonitorTimeStats      ossadm2    Avg       0       TS

ossadm2{root} # hares -display ossbak_p1
#Resource    Attribute             System     Value
ossbak_p1    Group                 global     Ossfs
ossbak_p1    Type                  global     Proxy
ossbak_p1    AutoStart             global     1
ossbak_p1    Critical              global     0
ossbak_p1    Enabled               global     1
ossbak_p1    LastOnline            global     ossadm2
ossbak_p1    MonitorOnly           global     0
ossbak_p1    ResourceOwner         global
ossbak_p1    TriggerEvent          global     0
ossbak_p1    ArgListValues         ossadm1    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
ossbak_p1    ArgListValues         ossadm2    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
ossbak_p1    ConfidenceLevel       ossadm1    0
ossbak_p1    ConfidenceLevel       ossadm2    0
ossbak_p1    ConfidenceMsg         ossadm1
ossbak_p1    ConfidenceMsg         ossadm2
ossbak_p1    Flags                 ossadm1
ossbak_p1    Flags                 ossadm2
ossbak_p1    IState                ossadm1    not waiting
ossbak_p1    IState                ossadm2    not waiting
ossbak_p1    MonitorMethod         ossadm1    Traditional
ossbak_p1    MonitorMethod         ossadm2    Traditional
ossbak_p1    Probed                ossadm1    1
ossbak_p1    Probed                ossadm2    1
ossbak_p1    Start                 ossadm1    0
ossbak_p1    Start                 ossadm2    0
ossbak_p1    State                 ossadm1    ONLINE
ossbak_p1    State                 ossadm2    ONLINE
ossbak_p1    ComputeStats          global     0
ossbak_p1    ResourceInfo          global     State     Stale   Msg             TS
ossbak_p1    TargetResName         global     bkup_nic
ossbak_p1    TargetSysName         global
ossbak_p1    TriggerResStateChange global     0
ossbak_p1    ContainerInfo         ossadm1    Type              Name            Enabled
ossbak_p1    ContainerInfo         ossadm2    Type              Name            Enabled
ossbak_p1    MonitorTimeStats      ossadm1    Avg       0       TS
ossbak_p1    MonitorTimeStats      ossadm2    Avg       0       TS
ossadm2{root} # hares -display ossbak_ip
#Resource    Attribute             System     Value
ossbak_ip    Group                 global     Ossfs
ossbak_ip    Type                  global     IP
ossbak_ip    AutoStart             global     1
ossbak_ip    Critical              global     0
ossbak_ip    Enabled               global     1
ossbak_ip    LastOnline            global     ossadm2
ossbak_ip    MonitorOnly           global     0
ossbak_ip    ResourceOwner         global
ossbak_ip    TriggerEvent          global     0
ossbak_ip    ArgListValues         ossadm1    Device    1       oce12   Address 1       10.41.78.137    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
ossbak_ip    ArgListValues         ossadm2    Device    1       oce12   Address 1       10.41.78.137    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
ossbak_ip    ConfidenceLevel       ossadm1    0
ossbak_ip    ConfidenceLevel       ossadm2    100
ossbak_ip    ConfidenceMsg         ossadm1
ossbak_ip    ConfidenceMsg         ossadm2
ossbak_ip    Flags                 ossadm1
ossbak_ip    Flags                 ossadm2
ossbak_ip    IState                ossadm1    not waiting
ossbak_ip    IState                ossadm2    not waiting
ossbak_ip    MonitorMethod         ossadm1    Traditional
ossbak_ip    MonitorMethod         ossadm2    Traditional
ossbak_ip    Probed                ossadm1    1
ossbak_ip    Probed                ossadm2    1
ossbak_ip    Start                 ossadm1    0
ossbak_ip    Start                 ossadm2    1
ossbak_ip    State                 ossadm1    OFFLINE
ossbak_ip    State                 ossadm2    ONLINE
ossbak_ip    Address               global     10.41.78.137
ossbak_ip    ArpDelay              global     1
ossbak_ip    ComputeStats          global     0
ossbak_ip    ExclusiveIPZone       global     0
ossbak_ip    IfconfigTwice         global     1
ossbak_ip    NetMask               global     255.255.255.224
ossbak_ip    Options               global
ossbak_ip    PrefixLen             global     0
ossbak_ip    ResourceInfo          global     State     Stale   Msg             TS
ossbak_ip    RouteOptions          global
ossbak_ip    TriggerResStateChange global     0
ossbak_ip    ContainerInfo         ossadm1    Type              Name            Enabled
ossbak_ip    ContainerInfo         ossadm2    Type              Name            Enabled
ossbak_ip    Device                ossadm1    oce12
ossbak_ip    Device                ossadm2    oce12
ossbak_ip    MonitorTimeStats      ossadm1    Avg       0       TS
ossbak_ip    MonitorTimeStats      ossadm2    Avg       0       TS

ossadm2{root} # hatype -display NIC
#Type        Attribute              Value
NIC          AEPTimeout             0
NIC          ActionTimeout          30
NIC          AgentClass             TS
NIC          AgentDirectory
NIC          AgentFailedOn
NIC          AgentFile
NIC          AgentPriority          0
NIC          AgentReplyTimeout      130
NIC          AgentStartTimeout      60
NIC          AlertOnMonitorTimeouts 0
NIC          ArgList                Device      PingOptimize    NetworkHosts    Protocol        NetworkType     ExclusiveIPZone
NIC          AttrChangedTimeout     60
NIC          CleanRetryLimit        0
NIC          CleanTimeout           60
NIC          CloseTimeout           60
NIC          ConfInterval           600
NIC          ContainerOpts          RunInContainer      0       PassCInfo       1
NIC          EPClass                -1
NIC          EPPriority             -1
NIC          ExternalStateChange
NIC          FaultOnMonitorTimeouts 4
NIC          FaultPropagation       1
NIC          FireDrill              0
NIC          IMF                    Mode        0       MonitorFreq     1       RegisterRetryLimit      3
NIC          IMFRegList
NIC          InfoInterval           0
NIC          InfoTimeout            30
NIC          LevelTwoMonitorFreq    0
NIC          LogDbg
NIC          LogFileSize            33554432
NIC          MonitorInterval        60
NIC          MonitorStatsParam      Frequency   0       ExpectedValue   100     ValueThreshold 100     AvgThreshold    40
NIC          MonitorTimeout         120
NIC          NumThreads             10
NIC          OfflineMonitorInterval 60
NIC          OfflineTimeout         300
NIC          OfflineWaitLimit       0
NIC          OnlineClass            -1
NIC          OnlinePriority         -1
NIC          OnlineRetryLimit       0
NIC          OnlineTimeout          300
NIC          OnlineWaitLimit        2
NIC          OpenTimeout            60
NIC          Operations             None
NIC          RestartLimit           0
NIC          ScriptClass            TS
NIC          ScriptPriority         0
NIC          SourceFile             /etc/VRTSvcs/conf/config/types.cf
NIC          SupportedActions       device.vfd clearNICFaultInZone
NIC          ToleranceLimit         0
NIC          TypeOwner

1383171819

↧

test

October 28, 2013, 2:39 pm

≫ Next: Microsoft HyperV & SCSI3 PR Support ( Veritas Cluster Server 6.0.3 )

≪ Previous: Nic resource going down frequently

I do not need a solution (just sharing information)

test

↧

Microsoft HyperV & SCSI3 PR Support ( Veritas Cluster Server 6.0.3 )

October 30, 2013, 2:31 pm

≫ Next: Confirmation that IMF is used for Offline Monitor

≪ Previous: test

I need a solution

Dear All

I am trying to configure 2 x node cluster with io fencing ( SCSI3 PR ) , /opt/VRTS/bin/vxfentsthdw giving me error it is not supported.

Below are the environment details.

Microsoft Hyper-V
2 x RHEL 6.4 ( x86_64 )

Is SCSI3 PR supported on Microsoft Hyper-V ? I would like to disk based i/o fencing.if possable and what are best practices in this regard.

Thanks & Best Regards

M Aziz

↧

Confirmation that IMF is used for Offline Monitor

October 31, 2013, 3:19 am

≫ Next: Clustered NFS

≪ Previous: Microsoft HyperV & SCSI3 PR Support ( Veritas Cluster Server 6.0.3 )

I need a solution

The VCS Admin guide says:

The Intelligent Monitoring Framework (IMF) notification module hooks into
system calls and other kernel interfaces of the operating system to get notifications
on various events such as when a process starts or dies, or when a block device
gets mounted or unmounted from a mount point.

So I assume if IMF know when resources online as well as fault, then IMF is used for the offline monitor - can someone confirm this.

Thanks

Mike

1383222680

↧

Clustered NFS

November 4, 2013, 9:02 am

≫ Next: Moving CFS Primary when using SFCFS for fast failover

≪ Previous: Confirmation that IMF is used for Offline Monitor

I need a solution

Can someone confirm my understanding of CNFS in that clients can acces the same share from different nodes in the CNFS cluster.

So for example suppose you have 3 nodes, A, B and C and filesystem "/myshare" which is CFS mounted on all nodes and NFS shared on all nodes, then you could have 3 failover servers each containing a VIP - VIP1, VIP2, VIP3.

Then:

Client 1 can access /myshare via VIP1 running on Node A

Client 2 can access /myshare via VIP2 running on Node B

Client 3 can access /myshare via VIP3 running on Node C

If node C fails then VIP3 may fail to Node A and then clients 1 and 2 would both be accessing share on Node A, but by different VIPs

This is my understanding, but the example in the 5.1 CFS admin guide only shows 1 VIP service group which means in the example, there is no concurrent access of the share which I thought was the point of CNFS, or is the point of CNFS that you don't have to unshare and share when you fail over as shares are always there but there can only be one VIP for the share?

Mike

↧

Moving CFS Primary when using SFCFS for fast failover

November 6, 2013, 2:41 am

≫ Next: Need help in Nic resource fault

≪ Previous: Clustered NFS

I need a solution

When using SFCFS for fast failover, so that only one node reads and writes to a filesystem at anyone time and SFCFS is just used to speed up failover by avoiding having to import diskgroups and mount filesystems, there is a requirement to move the CFS primary with the failover application service group so that the application doing the writes is on the same system as the CFS primary.

For me this is standard functionality, as SFCFS is particularly sold as a fast failover solution, like as an alternative to Oracle RAC (in addition of course to being able to write to filesystems simultaneously), but I can't find any referece to standard tools for moving the CFS primary in VCS - does anyone know if any exist?

For me the most logical approach for this would be to have a trigger script or resource which looks at the CFS service group it is dependent on and promotes the CFSMounts in that child service group which I can write myself, but I would have thought that a standard script/agent exists.

Thanks

Mike

1384546075

↧

Need help in Nic resource fault

November 8, 2013, 2:22 am

≫ Next: Service Group Failover fails !!!

≪ Previous: Moving CFS Primary when using SFCFS for fast failover

I need a solution

We are experiencing below mentioned faults every now and then that affects our backup.

setup is like we have BkupLan service group in which we have NIC resource (bkup_nic) and Phantom resource .

Two service groups OSSfs and sybase1 have proxy resource monitoring this NIC resource.
and both are configured with Ip resources Ossbak_ip1 and sybbak_ip1.

2013/10/24 22:32:03 VCS WARNING V-16-10001-7506 (ossadm2) NIC:bkup_nic:monitor:Resource is offline. No Network Host could be reached

I chnage the Tolerance limit of NIC na dproxy agent to 2

root@ossadm1> hatype -display NIC

NIC ToleranceLimit 2

oot@ossadm1> hatype -display Proxy

Proxy ToleranceLimit 2

But still it failed like below

:57:28 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
Nov 6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys ossadm2
Nov 6 00:58:01 ossadm1 Had[18697]: [ID 702911 daemon.notice] VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2

It didnot say That syb1bak_p1 fauled and tolerance limit 2 not reached something like this to stay up for another monitor interval

Do we need to set anything else so that proxy resoources dnt fault immediately and wait for sometime.

↧

Service Group Failover fails !!!

November 8, 2013, 6:02 am

≫ Next: VCS 6.0.1 on Solaris x64 apache resource not coming up

≪ Previous: Need help in Nic resource fault

I need a solution

Hi there,

I'm having an issue with one of our cluster wherein, when tried to fail-over the service group it failed with below errors -

2013/10/30 22:22:19 VCS ERROR V-16-2-13006 (XXXXX) Resource(vdaappdg_dg): clean procedure did not complete within the expected time.
2013/10/30 22:22:30 VCS ERROR V-16-2-13006 (XXXXX) Resource(vdrapp_vol): clean procedure did not complete within the expected time.
2013/10/30 22:22:32 VCS ERROR V-16-2-13027 (XXXXX) Resource(mobius_dg) - monitor procedure did not complete within the expected time.
2013/10/30 22:23:30 VCS ERROR V-16-2-13077 (XXXXX) Agent is unable to offline resource(vdrapp_vol). Administrative intervention may be required.
2013/10/30 22:26:48 VCS ERROR V-16-2-13077 (XXXXX) Agent is unable to offline resource(vdaappdg_dg). Administrative intervention may be required.
2013/10/30 22:32:35 VCS ERROR V-16-2-13063 (XXXXX) Agent is calling clean for resource(mobius_dg) because offline did not complete within the expected time.
2013/10/30 22:33:22 VCS INFO V-16-2-13068 (XXXXX) Resource(mobius_dg) - clean completed successfully.
2013/10/30 22:37:24 VCS ERROR V-16-2-13027 (XXXXX) Resource(mobius_dg) - monitor procedure did not complete within the expected time.
2013/10/30 22:37:24 VCS ERROR V-16-2-13077 (XXXXX) Agent is unable to offline resource(mobius_dg). Administrative intervention may be required.

Background -

On this cluster we have large number of disks and worth of 54 TB data.

DG-NAME               #VD  (GB) TOTAL        used        free
mobius                176    14031.25    14000.00       31.25
vdaappdg              534    42596.69    40628.76     1967.93

While VCS fail-over happens it simply hangs! and says "Agent is unable to offline resource(vdrapp_vol). Administrative intervention may be required."

As far as I understand, this message displays when an offline procedure does not complete on time. An offline procedure timeout can occur when the system is overloaded, is busy processing a system call, or is handling a large number of resources.

I wanted to know if someone has ever gone through such a situation & if someone can advice if this is happening due to large number of storage??

Waiting for experts advice and possibly solution in order to make sure fail-over functionality works seamlessly.

Thank you/Nilesh

↧

VCS 6.0.1 on Solaris x64 apache resource not coming up

November 12, 2013, 11:45 am

≫ Next: Bug in VCS's Oracle agent Health check Monitoring scripts?

≪ Previous: Service Group Failover fails !!!

I need a solution

2013/11/12 15:31:44 VCS INFO V-16-1-10298 Resource res_mysql (Owner: Unspecified, Group: testgrp) is online on solaris10-1 (VCS initiated)

2013/11/12 15:31:44 VCS NOTICE V-16-1-10301 Initiating Online of Resource res_apache (Owner: Unspecified, Group: testgrp) on System solaris10-1

2013/11/12 15:32:36 VCS NOTICE V-16-10061-20284 (solaris10-1) Apache:res_apache:online:KillPIDandAllChildren:Proc:[SIGTERM] delivered to:

PID CMD

----------------

[12992] [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout]

----------------

Total: [1]

2013/11/12 15:32:59 VCS NOTICE V-16-10061-20284 (solaris10-1) Apache:res_apache:online:KillPIDandAllChildren:Proc:[SIGKILL] delivered to:

PID CMD

----------------

[12992] [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout]

----------------

Total: [1]

2013/11/12 15:33:20 VCS ERROR V-16-10061-20460 (solaris10-1) Apache:res_apache:online:Sys:RunWithEnvCmdWithOutputWithTimeOut:The command [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout] did not complete within [10] seconds

2013/11/12 15:33:41 VCS ERROR V-16-10061-20376 (solaris10-1) Apache:res_apache:online:<Apache::GetMonitorTimeout> The command line [/opt/VRTSvcs/bin/hatype-value Apache MonitorTimeout] did not complete within the allotted

amount of time ( [10] seconds )

2013/11/12 15:34:02 VCS ERROR V-16-10061-20312 (solaris10-1) Apache:res_apache:online:<Apache::ArgsValid> SecondLevelMonitorTimeOut must be less than MonitorTimeOut.

2013/11/12 15:34:09 VCS ERROR V-16-2-13027 (solaris10-1) Resource(res_apache) - monitor procedure did not complete within the expected time.

2013/11/12 15:34:35 VCS NOTICE V-16-10061-20284 (solaris10-1) Apache:res_apache:monitor:KillPIDandAllChildren:Proc:[SIGTERM] delivered to:

PID CMD

----------------

[13080] [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout]

----------------

Total: [1]

2013/11/12 15:34:58 VCS NOTICE V-16-10061-20284 (solaris10-1) Apache:res_apache:monitor:KillPIDandAllChildren:Proc:[SIGKILL] delivered to:

PID CMD

----------------

[13080] [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout]

----------------

Total: [1]

2013/11/12 15:35:29 VCS ERROR V-16-10061-20460 (solaris10-1) Apache:res_apache:monitor:Sys:RunWithEnvCmdWithOutputWithTimeOut:The command [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout] did not complete within [10] seconds

2013/11/12 15:36:01 VCS NOTICE V-16-10061-20284 (solaris10-1) Apache:res_apache:monitor:KillPIDandAllChildren:Proc:[SIGTERM] delivered to:

PID CMD

----------------

[13140] [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout]

----------------

Total: [1]

2013/11/12 15:36:07 VCS ERROR V-16-2-13066 (solaris10-1) Agent is calling clean for resource(res_apache) because the resource is not up even after online completed.

2013/11/12 15:36:11 VCS ERROR V-16-10061-20376 (solaris10-1) Apache:res_apache:monitor:<Apache::GetMonitorTimeout> The command line [/opt/VRTSvcs/bin/hatype-value Apache MonitorTimeout] did not complete within the allotted

amount of time ( [10] seconds )

2013/11/12 15:36:53 VCS NOTICE V-16-10061-20284 (solaris10-1) Apache:res_apache:monitor:KillPIDandAllChildren:Proc:[SIGKILL] delivered to:

PID CMD

----------------

[13140] [/opt/VRTSvcs/bin/hatype -value Apache MonitorTimeout]

----------------

Total: [1]

bash-3.2# cat /etc/VRTSvcs/conf/config/main.cf

include "OracleASMTypes.cf"

include "types.cf"

include "Db2udbTypes.cf"

include "MySQLTypes51.cf"

include "OracleTypes.cf"

include "SybaseTypes.cf"

cluster ourcluster (

UserNames = { admin = aPQiPKpMQlQQoYQkPN }

ClusterAddress = "10.0.0.43"

Administrators = { admin }

)

system solaris10-1 (

)

system solaris10-2 (

)

group ClusterService (

SystemList = { solaris10-1 = 0, solaris10-2 = 1 }

AutoStartList = { solaris10-1, solaris10-2 }

OnlineRetryLimit = 3

OnlineRetryInterval = 120

)

IP webip (

Device = e1000g0

Address = "10.0.0.43"

NetMask = "255.255.255.0"

)

NIC csgnic (

Device = e1000g0

)

webip requires csgnic

// resource dependency tree

// group ClusterService

// {

// IP webip

// {

// NIC csgnic

// }

group testgrp (

SystemList = { solaris10-1 = 0, solaris10-2 = 1 }

ContainerInfo = { Name = testzone, Type = Zone, Enabled = 1 }

AutoStartList = { solaris10-1, solaris10-2 }

Administrators = { admin }

)

Apache res_apache (

ResLogLevel = TRACE

httpdDir = "/apps2/apache2/bin"

PidFile = "/apps2/apache2/logs/httpd.pid"

HostName = testzone

User = root

ConfigFile = "/apps2/apache2/conf/httpd.conf"

)

MySQL res_mysql (

Critical = 0

ResLogLevel = TRACE

MySQLAdminPasswd = mypwd

BaseDir = "/opt/mysql/mysql"

DataDir = "/opt/mysql/mysql/data"

MyCnf = "/opt/mysql/mysql/my.cnf"

HostName = testzone

)

Zone vcszone (

)

Zpool vcspool (

PoolName = spool

AltRootPath = "/"

ZoneResName = vcszone

)

res_apache requires res_mysql

res_mysql requires vcszone

vcszone requires vcspool

// resource dependency tree

// group testgrp

// {

// Apache res_apache

// {

// MySQL res_mysql

// {

// Zone vcszone

// {

// Zpool vcspool

// }

bash-3.2#

↧

Bug in VCS's Oracle agent Health check Monitoring scripts?

November 19, 2013, 5:13 am

≫ Next: Veritas Cluster Filesystem was down

≪ Previous: VCS 6.0.1 on Solaris x64 apache resource not coming up

I need a solution

Has anyone hit a bug when using VCS's Oracle agent in "Health Check Honitoring" mode (MonitorOption=1)? I've recently tested this with VCS 6.0.1 on Solaris10 x64, and discovered that the agent attempts to run scripts called "oraapi_32", "oraapi_3211g", "oraapi_64" or "oraapi_6411g" in the agent's directory "/opt/VRTSagents/ha/bin/Oracle". However, this fails because the standard installation doesn't give these scripts execute permission for non-root users, yet it's the (non-root) Oracle user that needs to run these scripts.

I added global execute permission to these scripts in my test environment, and this allowed Health Check Honitoring to work OK. Has anyone else hit this problem? Is there an official workaround or solution?

Best Regards, Alistair.

↧

Veritas Cluster Filesystem was down

November 19, 2013, 5:44 am

≫ Next: Could not set or unset FAILED flag on interface

≪ Previous: Bug in VCS's Oracle agent Health check Monitoring scripts?

I need a solution

Hi Everyone,

I had an issue last Sunday , two cluster nodes with the status (state: out of cluster) and the node 1 was rebooted, below are the most important logs.

2013/11/16 05:36:16 VCS ERROR V-16-1-10205 Group cvm is faulted on system rsgisap1
2013/11/16 05:37:47 VCS ERROR V-16-1-10205 Group cvm is faulted on system rsgisap2

2013/11/16 05:34:00 VCS NOTICE V-16-1-10080 System (rsgisap2) - Membership: 0x2, Jeopardy: 0x1
2013/11/16 05:34:00 VCS ERROR V-16-1-10091 System rsgisap1 (Node '0') is in Jeopardy Membership - Membership: 0x2, Visible: 0x0
2013/11/16 05:36:16 VCS NOTICE V-16-1-10080 System (rsgisap2) - Membership: 0x2, Jeopardy: 0x0
2013/11/16 05:50:55 VCS NOTICE V-16-1-10080 System (rsgisap2) - Membership: 0x2, Jeopardy: 0x1
2013/11/16 05:50:55 VCS ERROR V-16-1-10091 System rsgisap1 (Node '0') is in Jeopardy Membership - Membership: 0x2, Visible: 0x0
2013/11/16 05:51:12 VCS NOTICE V-16-1-10080 System (rsgisap2) - Membership: 0x3, Jeopardy: 0x0

2013/02/23 03:14:44 VCS ERROR V-16-1-10303 Resource cvm_vxconfigd (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys rsgisap2
2013/11/16 05:34:00 VCS ERROR V-16-1-10322 System rsgisap1 (Node '0') changed state from RUNNING to FAULTED
2013/11/16 05:36:16 VCS ERROR V-16-1-10205 Group vrts_vea_cfs_int_cfsmount1 is faulted on system rsgisap1
2013/11/16 05:36:16 VCS ERROR V-16-1-10205 Group vrts_vea_cfs_int_cfsmount2 is faulted on system rsgisap1
2013/11/16 05:36:16 VCS ERROR V-16-1-10205 Group vrts_vea_cfs_int_cfsmount3 is faulted on system rsgisap1
2013/11/16 05:36:16 VCS ERROR V-16-1-10205 Group vrts_vea_cfs_int_cfsmount4 is faulted on system rsgisap1
2013/11/16 05:36:16 VCS ERROR V-16-1-10205 Group cvm is faulted on system rsgisap1
2013/11/16 05:37:47 VCS ERROR V-16-1-10205 Group cvm is faulted on system rsgisap2
2013/11/16 05:51:12 VCS NOTICE V-16-1-10322 System rsgisap1 (Node '0') changed state from FAULTED to INITING

2013/11/16 05:34:01 VCS NOTICE V-16-1-10446 Group vrts_vea_cfs_int_cfsmount1 is offline on system rsgisap1
2013/11/16 05:34:01 VCS NOTICE V-16-1-10446 Group vrts_vea_cfs_int_cfsmount2 is offline on system rsgisap1
2013/11/16 05:34:01 VCS NOTICE V-16-1-10446 Group vrts_vea_cfs_int_cfsmount3 is offline on system rsgisap1
2013/11/16 05:34:01 VCS NOTICE V-16-1-10446 Group vrts_vea_cfs_int_cfsmount4 is offline on system rsgisap1

2013/11/16 05:37:47 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvmvoldg4 (Owner: unknown, Group: vrts_vea_cfs_int_cfsmount4) on System rsgisap2
2013/11/16 05:37:47 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvmvoldg3 (Owner: unknown, Group: vrts_vea_cfs_int_cfsmount3) on System rsgisap2
2013/11/16 05:37:47 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvmvoldg2 (Owner: unknown, Group: vrts_vea_cfs_int_cfsmount2) on System rsgisap2
2013/11/16 05:37:47 VCS NOTICE V-16-1-10300 Initiating Offline of Resource cvmvoldg1 (Owner: unknown, Group: vrts_vea_cfs_int_cfsmount1) on System rsgisap2

2013/11/16 05:37:49 VCS INFO V-16-6-15004 (rsgisap2) hatrigger:Failed to send trigger for postoffline; script doesn't exist
2013/11/16 05:51:43 VCS INFO V-16-6-15004 (rsgisap1) hatrigger:Failed to send trigger for postonline; script doesn't exist

2013/11/16 05:51:45 VCS INFO V-16-10031-1046 (rsgisap1) CVMVolDg:cvmvoldg1:online:resource cvmvoldg1 is online
2013/11/16 05:51:45 VCS INFO V-16-10031-1046 (rsgisap1) CVMVolDg:cvmvoldg3:online:resource cvmvoldg3 is online
2013/11/16 05:51:45 VCS INFO V-16-10031-1046 (rsgisap1) CVMVolDg:cvmvoldg2:online:resource cvmvoldg2 is online
2013/11/16 05:51:45 VCS INFO V-16-10031-1046 (rsgisap1) CVMVolDg:cvmvoldg4:online:resource cvmvoldg4 is online

2013/11/16 05:51:12 VCS NOTICE V-16-1-10453 Node: 0 changed name from: 'rsgisap1' to: 'rsgisap1'
2013/11/16 05:51:12 VCS NOTICE V-16-1-10322 System rsgisap1 (Node '0') changed state from FAULTED to INITING
2013/11/16 05:51:12 VCS NOTICE V-16-1-10322 System rsgisap1 (Node '0') changed state from INITING to CURRENT_DISCOVER_WAIT
2013/11/16 05:51:12 VCS NOTICE V-16-1-10322 System rsgisap1 (Node '0') changed state from CURRENT_DISCOVER_WAIT to REMOTE_BUILD
2013/11/16 05:51:12 VCS INFO V-16-1-10463 Sending snapshot to node: 0
2013/11/16 05:51:13 VCS NOTICE V-16-1-10322 System rsgisap1 (Node '0') changed state from REMOTE_BUILD to RUNNING

2013/11/16 05:51:14 VCS ERROR V-16-10031-1005 (rsgisap1) CVMCluster:???:monitor:node - state: out of cluster
2013/11/16 05:52:47 VCS ERROR V-16-10031-1005 (rsgisap2) CVMCluster:???:monitor:node - state: out of cluster

The OS logs there is no nothing if there was a network problem or storage problem. Cluster nodes are redhat linux 4 and the veritas version is 4.1.

Probably is needed upgrade all , but those servers are in production. I need to know what is root cause.

I really appreciate your help.

Claudio

↧

Could not set or unset FAILED flag on interface

November 19, 2013, 12:33 pm

≫ Next: Can we use same node for two service groups of different applications

≪ Previous: Veritas Cluster Filesystem was down

I need a solution

Hello All,
Recently when I brought up the backup server, which was down for a long time due to drive failure, now the disks are swapped and working as they should.

The os version is Solaris 9 Sun 240 Server
VERITAS cluster server 4.1

In the two node cluster I am receiving the following error messages and it keeps repeating every few seconds..

2013/11/19 15:25:27 VCS NOTICE V-16-1-10438 Group pa_dev has been probed on system padevdb2
2013/11/19 15:25:31 VCS ERROR V-16-10001-6524 (db2) MultiNICB:MNIC:monitor:Could not set or unset FAILED flag on interface bge3
2013/11/19 15:25:41 VCS ERROR V-16-10001-6524 (db2) MultiNICB:MNIC:monitor:Could not set or unset FAILED flag on interface bge3
2013/11/19 15:25:51 VCS ERROR V-16-10001-6524 (db2) MultiNICB:MNIC:monitor:Could not set or unset FAILED flag on interface bge3
2013/11/19 15:26:00 VCS ERROR V-16-10001-6524 (db2) MultiNICB:MNIC:monitor:Could not set or unset FAILED flag on interface bge3

bge0: flags=9040843 mtu 1500 index 2
inet 10.100.0.11 netmask ffffc000 broadcast 10.100.63.255
groupname MNIC

I can't find any documentation on this.. and I am not sure how to proceed.
Any advice is appreciated!

1384967796

30201

↧