How to Release a Buggy App (And Live to Tell the Tale)

综合技术 2018-03-14

Bugs! No matter how many times I decree that my coworkers and I must stop writing bugs, we keep on doing it anyways. Even worse, sometimes those bugs make it into production, where users run into them!

The fact of the matter is, you are going to someday release a buggy app
. Even with layers of defenses (like QA, automated tests, and CI) you’ll eventually put out code that will bring shame to your name.

Therefore, you should have contingency plans in place for when you do release bugs. Here are some hard-earned lessons we’ve learned over the years about safely releasing apps.

Remote Monitoring

First things first: you need to know when
you’ve released a buggy app.

As comforting as it would be to push releases out to the wild and never think about them again, you absolutely need some way to monitor how your app is doing in the field.

Back when I was a wee baby app developer, I included zero
remote logging. Imagine my surprise when we got a support email from a customer telling me the app they’d paid for was crashing - constantly! Not only was that news to me, I also had nothing to go on for how to reproduce the issue.

There are plenty of great services these days for remote monitoring that not only tell you when the app crashes but gives you stack traces and logs so you can debug the issue. For Trello Android, we use Crashlytics
. At an absolute minimum, you’ll want remote crash reporting. Remote logging is also useful for issues that users hit that don’t crash the app.

Alphas, Betas, and Staged Rollouts

One easy way to limit the damage of your buggy app is to release it to fewer people.

Staged rollouts
are a great tool for this. If you only push your crashy app to 1% of users, then you’re screwing over fewer people. Not ideal, but it could have been all of them!

In addition, you should have an alpha and/or beta tester program
. Your testers willingly opt into less-than-stable releases and make for a great early warning system. I wouldn’t unleash knowingly crashy apps on them (lest you convert them from “testers” to “former testers”), but I would toss them new features that haven’t been fully QA’d yet.

At Trello, we have a beta program (which you can sign up for on the Play Store) where we regularly release beta versions of the app. We’re only a handful of developers; users run into all sorts of situations we hadn’t anticipated. We are very thankful we have such great beta users who run into all sorts of interesting bugs and crashes!

Remote Feature Flags

I am a huge fan of feature flags that allow you to enable or disable features in your application.

They’re great for developing new features. If you’ve got a project that’s going to take a few months, it’s much better to keep merging code, but keep it flagged off from users. It allows your devs and testers to dig into the new feature while keeping it away from production.

You can boost their utility even further by making them remotely configurable. That way, when you first release that big new feature and something goes wrong, now you have a way to disable that feature without having to scramble to put out another release.

For Trello Android, we use Firebase remote config
to control our feature flags. Your solution need not be anything too complex; before Firebase, we just used a simple JSON file on Trello’s servers.

Release Timing

Avoid releasing your app right before any sort of work break. Weekends, vacations, conferences, jury duty… don’t release before any of them!

Why? If something goes horribly wrong and you need to rollout a fix, that means your break is now over. Worst-case scenario, you can’t get a hold of a key player and now you’ve got an awful bug floating around for a few days.

I have personally made this mistake countless times. “What’s the worst that could happen,” I ask myself. “It’s so much more convenient to release this Friday than next week,” I say.

The end result: I have spent one Google IO keynote banging out a hotfix for a release, hoping the network wouldn’t go down (again) as I uploaded a new APK. I spent another Google IO releasing multiple hotfixes for a buggy app - unfortunately, my sleep-deprived state also deprived me of my brains. Learn from my mistakes: do not release right before Google IO!

Traffic Identifiers

Let he who has never released an app that’s accidentally DDoS’d his own servers cast the first stone…

Okay, so, I did that once. Whoops.

Luckily, we had a user agent that identifies the Trello Android app and the current version. The server team put up an emergency measure to block requests from that particular user agent (to avoid taking down all of Trello) while we worked on a hotfix.

I recommend putting something into your requests that identifies the source app + the version, just in case the server needs to insert some logic for that particular release to cover up for your mistakes.

It’s not a measure the server team should take lightly - writing code paths for one specific UA is a last-ditch effort. But it can really save your hiney in select circumstances.

Killswitch

In extreme cases, you may want to remotely disable the entire app and force users to upgrade. At Trello, we have a remote flag which can be used to render an old version of the app inoperable.

Do NOT take killswitches lightly - you’re also killing a lot of goodwill with your users when you activate them. However, they’re good to implant into your code just in case, for peace of mind. It’s better to have it and not use it, than not have it and need it.

For the record, we have never had to use our killswitch, so I have no fun stories about it. I’m sure that whenever we do use it, the story will make for a good post-mortem.

Ignore the Little Things

Finally: chill out.

Chances are, your app is not controlling Hawaii’s missile alert system. If the app crashes once, or something goes minorly wrong, it is not the end of the world.

You would be surprised how forgiving users are for one-off crashes or issues with easy workarounds. As long as they like your app as a whole, they’ll keep using it.

Your first instinct may be to rush another release out the door to fix the bug. Resist the urge! Rushed releases are much more error prone. Unless the issue is critical, take your time to fix it correctly.

We all release buggy apps, but if you go about it in a reasonable manner, you can live to tell the tale. Follow the steps above and you’ll sleep better at night.

您可能感兴趣的

Engineering Scalable, Isolated Mobile Features wit... As Uber grows, we continue to refine our approach to architecting mobile applications in order to support the scale of our businesses. When apps grow...
Nontraditional Uses of BlackBerry CPaaS: Creative ... Earlier this year, we released theBBM Enterprise SDK to encourage developers and entrepreneurs to embed our ready-made communications platform into ...
Launcher3桌面开发(3)-Launcher3 桌面加载流程分析(下)... 主目录见: Android高级进阶知识(这是总目录索引) Launcher3源码地址: Launcher3-master 上文 Launcher3 桌面加载流程分析(上) ,我们看到 L...
Android通过socket长连接实现推送 工具:Android studio 软件方法及协议:socket、protobuf 实现原理: 通过本地建立一个socket,绑定服务器IP和port,然后connect,再开启另外线程定时心跳(注意这里的心跳不是自定义发送数据,而是采用socket本身的心跳功能sendUrgen...
How do I manage the height of Android spinner arti... I have an android spinner that's populated by a list of strings using an ArrayAdapter and it operates fine, however because of the way the spinner is ...