Two Days, or How Long Until The Data Is In

科技动态 2017-09-12

Two days.

It doesn’t seem like long, but that is how long you need to wait before looking at a day’s Firefox data and being sure than 95% of it has been received.

There are some caveats, of course. This only applies to current versions of Firefox (55 and later). This will very occasionally be wrong (like, say, immediately after Labour Day when people finally get around to waking up their computers that have been sleeping for quite some time). And if you have a special case (like trying to count nearly everything instead of just 95% of it) you might want to wait a bit longer.

But for most cases: Two Days.

As part of my 2017 Q3 Deliverables I looked into how long it takes clients to send their anonymous usage statistics to us using Telemetry. This was a culmination ofearlier ponderings onclient delay , previous work in establishing Telemetry client health, and an eighteen-month (or more!) push to actually look at our data from a data perspective (meta-data).

This led to a meeting in San Francisco where :mreid, :kparlante, :frank, :gfritzsche, and I settled upon a list of metrics that we ought to measure to determine how healthy our Telemetry system is.

Number one on that list: latency.

It turns out there’s a delay between a user doing something ( opening a tab , for instance) and them sending that information to us. This is client delay and is broken into two smaller pieces: recording delay (how long from when the user does something until when we’ve put it in a ping for transport), and submission delay (how long it takes that ready-for-transport ping to get to Mozilla ).

If you want to know how many tabs were opened on Tuesday, September the 5th, 2017, you couldn’t tell on the day itself. All the tabs people open late at night won’t even be in pings, and anyone who puts their computer to sleep won’t send their pings until they wake their computer in the morning of the 6th.

This is where “Two Days” comes in: On Thursday the 7th you can be reasonably sure that we have received 95% of all pings containing data from the 5th. In fact, by the 7th, you should even have that data in some scheduled datasets like main_summary .

How do we know this? We measured it:

(Remember what I said about Labour Day? That’s the exceptional case on beta 56)

Most data, most days, comes in within a single day. Add a day to get it into your favourite dataset , and there you have it: Two Days.

Why is this such a big deal? Currently the only information circulating in Mozilla about how long you need to wait for data is received wisdom from a pre-Firefox-55 (pre- pingsender ) world. Some teams wait up to ten full days (!!) before trusting that the data they see is complete enough to make decisions about.

This slows Mozilla down. If we are making decisions on data, our data needs to be fast and reliably so.

It just so happens that, since Firefox 55, it has been.

Now comes the hard part: communicating that it has changed and changing those long-held rules of thumb and idées fixes to adhere to our new, speedy reality.

Which brings us to this blog post. Consider this your notice that we have looked into the latency of Telemetry Data and is looks pretty darn quick these days. If you want to know about what happened on a particular day, you don’t need to wait for ten days any more.

Just Two Days. Then you can have your answers.

:chutten

(Much thanks to :gsvelto and :Dexter’s work on pingsender and using it for shutdown pings , :Dexter’s analyses on ping delay that first showed these amazing improvements, and everyone in the data teams for keeping the data flowing while I poked at SQL and rearranged words in documents.)

您可能感兴趣的

14 Years Covering The Search Industry Today is the 14th year anniversary of this site, the Search Engine Roundtable. This site, started as a way to keep my notes on the changes happen...
How To Create Content That Hooks Your Prospects an... Free Webinar | August 16th Find out how to optimize your website to give your customers experiences that will have the biggest ROI for your ...
山东:科技创新助力新旧动能转换 作为发展高新技术产业的一个主要战场,济南高新技术产业开发区(下称高新区)出现了企业注册难的问题,实属怪事。 过去每天来注册的企业有十几家,现在增长到了六七十家,面对日益增长的注册企业,原有的办公场地略显不足,为满足企业注册需求,高新区新建2万多平方米的办公场所,全面提升服务水平。 据济南高新...
10 most common keyword planning mistakes to avoid Don't fall short of these common search marketing mistakes It seems almost too easy at first. It really does. You find a keyword tool and you type i...
A&E Networks Uses Tubular to Stay Ahead in the... Think about what you watched last weekend? If you reside in the United States, there’s an 80% chance you watched something produced by A&E Net...