运维老司机分享的八个AIX日常运维经验及案例(2)
记得印象很深刻的一次,当时并没有陪过HACMP,但维护的环境中有一台出报的系统找到了我.当时是打电话联系当时做这套系统的技术支持.逐步检查HACMP的配置,检查后.本来应该推出,但是当时手抖.习惯性的敲了回车.什么配置都没有改动.但是却报了错说让我重启生效,那时候才知道HACMP无论配置有没有改动.敲了回车就认为是修改了配置.要重启.
【案例分享】某企业HACMP软件,在网络交换机变更是引起down机 某企业HA cluster log,IP switch down时引起双节点halt,系统版本7100-03-03,HA版本6.1sp13 Error description In HACMP 6 with rsct.core.utils 3.1.4.9 or higher,if all IP networks are lost and at least one non-IP network is functioning,the Group Services subsystem will core dump when trying to send packets to be routed through Topology Services (across the non-IP connection). This will cause a node halt. Customers with PowerHA 7,or HACMP 6 customers with no non-IP networks (such as rs232 or disk) are not in danger. Also this will not happen if only one node is still running,since there will be no other cluster members to send messages to. 日志如下 Nov 21 01:35:46 masterserv1 daemon:notice topsvcs[8192030]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.mbpHK/ONs/o.Ama/……………….:::Reference ID: :::Template ID: 173c787f:::Details File: :::Location: rsct,nim_control.C,1.39.1.41,6717 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter i nterface name en2 Adapter offset 1 Adapter IP address 192.200.192.52 Nov 21 01:35:49 masterserv1 user:notice HACMP for AIX: EVENT START: fail_standby masterserv1 192.200.192.52 Nov 21 01:35:49 masterserv1 user:notice HACMP for AIX: EVENT COMPLETED: fail_standby masterserv1 192.200.192.52 0 Nov 21 01:35:51 masterserv1 user:notice HACMP for AIX: EVENT START: fail_standby masterserv2 192.200.192.53 Nov 21 01:35:51 masterserv1 user:notice HACMP for AIX: EVENT COMPLETED: fail_standby masterserv2 192.200.192.53 0 Nov 21 01:40:34 masterserv1 daemon:notice topsvcs[8192030]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.GgpHK/DLG.o.Ama/……………….:::Reference ID: :::Template ID: 173c787f:::Details File: :::Location: rsct,6717 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter i nterface name en0 Adapter offset 0 Adapter IP address 102.200.192.52 Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: Called,state=ST_UNSTABLE,provider token 1 Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: GsToken 2,AdapterToken 3,rm_GsToken 1 Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: GRPSVCS announcment code=512; exiting Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs) Nov 21 01:40:36 masterserv1 daemon:err|error haemd[15204586]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395,haemd: 2521-032 Cannot d ispatch group services (1). Nov 21 01:40:36 masterserv1 user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES. Nov 21 01:40:36 masterserv1 user:notice HACMP for AIX: clexit.rc : Halting system immediately!!! 原因是补丁IV55293: HAGSD CORE DUMP WHEN IP NETWORKS LOST,需要升级rsct文件集. 官网解释: http://www-01.ibm.com/support/docview.wss?uid=isg1IV55293
(编辑:ASP站长网) |