Dubbo2.7.8升级导致ZKServer宕机

微信扫一扫,分享到朋友圈

Dubbo2.7.8升级导致ZKServer宕机

升级过程及现象

1、Dubbo版本从2.7.4 升级到 2.7.8 2、在开发、测试、预发环境运行的时间超过3周 3、发布线上时,应用启动失败,并伴有大量的ZK客户端重连日志, 如下:

2020-09-24 15:03:40.664, [TID:N/A], INFO, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 1299, onConnected Session establishment complete on server xx/x:2181, sessionid = 0xe72dc060e3f1d06, negotiated timeout = 40000
2020-09-24 15:03:40.666, [TID:N/A], INFO, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 66, info  [DUBBO] Curator zookeeper client instance initiated successfully, session id is e72dc060e3f1d06, dubbo version: 2.7.8, current host: 10.3.107.40
2020-09-24 15:03:53.734, [TID:N/A], WARN, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 1164, run Session 0xe72dc060e3f1d06 for server xx/xx:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len18594568 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-09-24 15:03:55.540, [TID:N/A], WARN, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 81, warn  [DUBBO] Curator zookeeper connection of session e72dc060e3f1d06 timed out. connection timeout value is 3000, session expire timeout value is 60000, dubbo version: 2.7.8, current host: 10.3.107.40
2020-09-24 15:03:56.165, [TID:N/A], INFO, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 879, primeConnection Socket connection established to xx/xx:2181, initiating session
2020-09-24 15:03:56.346, [TID:N/A], INFO, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 1299, onConnected Session establishment complete on server xx/xx:2181, sessionid = 0xe72dc060e3f1d06, negotiated timeout = 40000
2020-09-24 15:04:08.135, [TID:N/A], WARN, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 81, warn  [DUBBO] Curator zookeeper connection recovered from connection lose, reuse the old session e72dc060e3f1d06, dubbo version: 2.7.8, current host: 10.3.107.40
2020-09-24 15:04:53.046, [TID:N/A], WARN, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 1164, run Session 0xe72dc060e3f1d06 for server xx/xx:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len18594616 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-09-24 15:04:54.626, [TID:N/A], WARN, org.apache.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient, Log4jLogger.java, 81, warn  [DUBBO] Curator zookeeper connection of session e72dc060e3f1d06 timed out. connection timeout value is 3000, session expire timeout value is 60000, dubbo version: 2.7.8, current host: 10.3.107.40
复制代码

4、然后还有上百万条的如下日志

2020-09-24 15:37:50.309, [TID:N/A], ERROR, org.apache.curator.framework.imps.CuratorFrameworkImpl, CuratorFrameworkImpl.java, 694, logError Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:647)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:604)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
复制代码

5、运维收到告警,ZK挂了





6、回滚应用

问题原因

主要有以下疑惑:

org.apache.dubbo.metadata.report.MetadataReport

排查过程:

  1. Packet lenxx is out of range!
2020-09-24 15:03:53.734, [TID:N/A], WARN, org.apache.zookeeper.ClientCnxn, ClientCnxn.java, 1164, run Session 0xe72dc060e3f1d06 for server xx/xx:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len18594568 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
复制代码

之前在操作ZK的时候遇到过这个问题,当时是一个节点里面写着很长很长一串乱七八糟的数字,在获取这个节点的时候,就报了个这样的错。在网上查了一些资料,

public static final int packetLen = Integer.getInteger("jute.maxbuffer",4096 * 1024);
protected void readLength() throws IOException {
int len = incomingBuffer.getInt();
if (len < 0 || len >= ClientCnxn.packetLen) {
throw new IOException("Packet len" + len + " is out of range!");
}
incomingBuffer = ByteBuffer.allocate(len);
}
复制代码

大概就是超过了某个限制,可以粗暴的通过设置 jute.maxbuffer 大小解决这个问题。但是为什么会报这个错呢?因为某个节点太大吗?可是在 2.7.4 版本中并没有这个错,会不会是在新版中对zk做了什么监听导致的?

2、在日志中心看到一条关键日志,发现里面涉及到 MetadataReport

Injection of @DubboReference dependencies is failed; nested exception is java.lang.IllegalStateException: Failed to receive INITIALIZED event from zookeeper, pls. check if url zookeeper://xx:2181/org.apache.dubbo.metadata.report.MetadataReport?client=&dubbo.config-center.root-path=/ is correct
复制代码

在目前的dubbo版本中,元数据中心只是一个用于存放元数据的地方,并没有涉及到其它复杂的逻辑。并且我们已经通过配置中心,为所有应用开启了元数据中心,并将元数据中心改成了Redis,这里为什么会生成这么个东西呢?

3、在 MetadataReportInstance 的init方法中,涉及到获取 MetadataReport 的逻辑,最终会导致 ZookeeperDynamicConfiguration 初始化


4、为什么在其它环境没出现,而仅在日常出现? 因为其它环境都通过dubbo全局配置中心为所有的客户端配置了元数据中心为redis,而线上没有配置。所以导致这里使用了默认元数据中心,及ZK。客户端在没有配置元数据中心的时候,Dubbo会默认使用注册中心作为元数据中心

微信扫一扫,分享到朋友圈

Dubbo2.7.8升级导致ZKServer宕机

使用卷积神经网络和openCV预测年龄和性别

上一篇

使用Python调整图像大小

下一篇

你也可能喜欢

Dubbo2.7.8升级导致ZKServer宕机

长按储存图像,分享给朋友