报警知识库


Windows故障转移集群错误日志报警

<h1>报警描述</h1> <p>业务[XX业务],虚拟机[XXIP],用途[XX用途]错误日志|Microsoft-Windows-FailoverClustering近一小时错误数报警,近一小时错误数为XX</p> <p>可查看报警日志明细信息,界面示例如下: <img src="https://www.showdoc.com.cn/server/api/attachment/visitfile/sign/bd8df0711a3a6525b500e2537fb7cc8b" alt="" /></p> <h1>说明</h1> <p>如果Windows故障转移集群产生错误日志,则产生报警。</p> <h1>监控对象</h1> <p>Windows操作系统</p> <h1>监控方式</h1> <p>监控Windows日志,对于其中日志源为“Microsoft-Windows-FailoverClustering”的错误日志进行报警。 通过监控代理执行 winlog.bat 批处理来读取Windows日志内容。 winlog.bat内容如下</p> <pre><code>@echo off wevtutil qe System /c:100 /rd:true /f:text /q:&amp;quot;Event [System[(Level&amp;lt;4)]]&amp;quot;</code></pre> <p>wevtutil是微软Windows操作系统内置的日志采集工具。对本命令的各项参数说明如下:</p> <pre><code>/c:100 读取最近100条日志 /rd:true 日志按照从新到旧进行排序 /f:text 日志的格式为文本 /q:&amp;quot;Event [System[(Level&amp;lt;4)]]&amp;quot; 查询条件,这里查询级别小于4的日志</code></pre> <p>输出示例</p> <pre><code>Event[69]: Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 2021-11-26T03:42:46.218 Event ID: 1254 Task: Resource Control Manager Level: 错误 Opcode: 信息 Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: XXX Description: Clustered role '群集组' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period. Event[70]: Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 2021-11-26T03:42:46.218 Event ID: 1205 Task: Resource Control Manager Level: 错误 Opcode: 信息 Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: XXX Description: The Cluster service failed to bring clustered role '群集组' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role. Event[71]: Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 2021-11-26T03:42:46.217 Event ID: 1069 Task: Resource Control Manager Level: 错误 Opcode: 信息 Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: XXX Description: Cluster resource '群集 IP 地址' of type 'IP Address' in clustered role '群集组' failed. The error code was '0x13c1' ('该群集 IP 地址已在使用中。'). Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. Event[72]: Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 2021-11-26T03:42:46.217 Event ID: 1049 Task: IP Address Resource Level: 错误 Opcode: 信息 Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: XXX Description: Cluster IP address resource '群集 IP 地址' cannot be brought online because a duplicate IP address 'X.X.X.X' was detected on the network. Please ensure all IP addresses are unique. Event[73]: Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 2021-11-26T03:40:45.518 Event ID: 1069 Task: Resource Control Manager Level: 错误 Opcode: 信息 Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: XXX Description: Cluster resource '群集 IP 地址' of type 'IP Address' in clustered role '群集组' failed. The error code was '0x13c1' ('该群集 IP 地址已在使用中。'). Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. Event[74]: Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 2021-11-26T03:40:45.518 Event ID: 1049 Task: IP Address Resource Level: 错误 Opcode: 信息 Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: XXX Description: Cluster IP address resource '群集 IP 地址' cannot be brought online because a duplicate IP address 'X.X.X.X' was detected on the network. Please ensure all IP addresses are unique.</code></pre> <h1>规则</h1> <p>判断日志日期,最近一小时,日志源为“Microsoft-Windows-FailoverClustering”的日志判定为要报警的日志。</p>

页面列表

ITEM_HTML