技术控

    今日:51| 主题:49312
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] The Mysterious Fiber Bomb Problem: A Debugging Story

[复制链接]
难拥友 发表于 2016-10-1 16:18:26
151 6

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
By Kenton Varda - 30 Sep 2016
  A month or two ago, we started seeing a mysterious problem in production: every now and then, one of our Node.js web server processes supporting Sandstorm Oasis would suddenly jump to 100% CPU usage (of one core) and stay there until it was killed. The problem wasn’t an infinite loop, though: the process continued to respond to requests, just slowly. Since the process continued to respond to requests, it continued to pass health checks and was never restarted automatically. But for users assigned to that shard, the service was essentially unusable, as every action would take seconds to complete. The problem left nothing at all suspicious in the logs – other than a gap in which far fewer requests that normal were being handled. At first, the problem only struck about once a week, seemingly at random.
  This kind of bug is a web developer’s worst nightmare. How do you debug something which you can only reproduce once a week, at random, with real users on the line? What could even cause a process to slow down but not stop in this way?
  What’s eating our CPU?

  Obviously, we needed to take a CPU profile while the bug was in progress. Of course, the bug only reproduced in production, therefore we’d have to take our profile in production. This ruled out any profiling technology that would harm performance at other times – so, no instrumented binaries. We’d need a sampling profiler that could run on an existing process on-demand. And it would have to understand both C++ and V8 Javascript. (This last requirement ruled out my personal favorite profiler, pprof from google-perftools.)
   Luckily, it turns out there is a correct modern answer: Linux’s “perf” tool. This is a sampling profiler that relies on Linux kernel APIs, thus not requiring loading any code into the target binary at all, at least for C/C++. And for Javascript, it turns out V8 has built-in support for generating a “perf map”, which tells the tool how to map JITed code locations back to Javascript source: just pass the --perf_basic_prof_only_functions flag on the Node command-line. This flag is safe in production – it writes some data to disk over time, but we rebuild all our VMs weekly, so the files never get large enough to be a problem.
   Armed with this new knowledge, we waited. Finally, after a few days, my pager went off. I shelled into the broken server, recorded a ten-second profile, restarted Node, and then downloaded the data for analysis. Upon running perf , I was presented with this:
   
The Mysterious Fiber Bomb Problem: A Debugging Story-1 (suspicious,supporting,continued,developer,something)

  Well, this looks promising! Almost all the time is being spent in two C++ functions! The perf viewer makes it easy to jump directly into the disassembly:

The Mysterious Fiber Bomb Problem: A Debugging Story-2 (suspicious,supporting,continued,developer,something)

12下一页
友荐云推荐




上一篇:[React Native Android 安利系列]ReactNative中的reactjs基础
下一篇:EasyHeaderFooterAdapter
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

问薇 发表于 2016-10-1 20:49:37
duang
回复 支持 反对

使用道具 举报

fkquf 发表于 2016-10-2 06:16:27
不错,谢谢分享
回复 支持 反对

使用道具 举报

贾品俊 发表于 2016-10-2 08:16:35
为配合今年中国计划生育工作的胜利完成,本人决定暂时不和异性朋友接触,谢谢合作.
回复 支持 反对

使用道具 举报

赖九钰 发表于 2016-10-11 09:37:46
人又不聪明,还学人家秃顶!!
回复 支持 反对

使用道具 举报

兰兴 发表于 2016-10-16 10:55:13
锄禾日当午,发帖真辛苦。谁知坛中餐,帖帖皆辛苦!
回复 支持 反对

使用道具 举报

游民星空 发表于 2016-10-18 04:53:37
楼主,约么?
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表