Professional network LinkedIn used the revamp of its mobile app as the impetus to standardize its data foundation.
9 Tips For Hiring Data Science Talent
(Click image for larger view and slideshow.)
When careers website and professional network LinkedIn prepared to redesign its mobile app last year, several questions regarding how to handle massive amounts of data cropped up before the actual process could even launch.
The mobile app not only displays data to users, it also emits data based on user behavior that is then ingested to become part of LinkedIn's data sets. Then that data is served to users and viewed by internal users such as analysts.
Any changes to the mobile app would impact more than the user experience. It would impact the data collected and all the reports and analysis based on that data. It would affect profile views, jobs searched, and more. So, making even a small change in the mobile app had the potential to break other upstream and downstream data and applications.
"Any changes to the data production will have massive implications to the whole data stream," said Yael Garten, director of data science at LinkedIn, during a breakout session at September's Strata + Hadoop event in New York. "… Changes upstream by the producers are clearly going to break things downstream by the consumers, and through no fault of anybody. Producers are not really aware of the consumers. The consumers are not really aware of the changes coming down to them through the data pipeline."
LinkedIn's data code base faced challenges of large organizations that rely on extensive use of data. For example, different application development organizations within the company referred to the same data by different names, and built their applications based on using those different names.
It may be called "Profile View" by the group creating the "People You May Know" app. But it may be called "Person View" by the group creating the "Jobs You May Be Interested In" app. This siloed process led to unneeded complexity.
"We wanted to move to better data models," said Garten . "We wanted to structure the data in a way that was actually maintainable. We wanted to enable via good schemas, a maintainable data pipeline."
But going from one place to another is a bit of a journey.
"The problem was that doing this for all the data was a pretty massive change," Garten said. "If we are going to change the data, should we standardize everything?" Garten said that LinkedIn evaluated the two paths -- keeping the old data models versus standardizing the data.
Holding onto the old approach had some benefits. For instance, it would save consumers from having to migrate to the new data approach, but it would add costs for overall development going forward as developers would need to write to replicate the old bad code every time from scratch. The evaluation revealed that this approach would take 5,000 worker days for the project.
The other choice was to evolve by standardizing the data. This choice called for a bigger up front effort, but one that would let LinkedIn reduce that data modeling effort each time it had to update the technology. The downsides of this approach were the higher upfront development investments and the requirement for consumers to migrate to the new model. The evaluation revealed that this approach would take 3,000 worker days for the project.
The difference in worker days to complete the project made the choice clear, Garten said. LinkedIn would evolve its code by standardizing the data. But that didn't make the project itself any less daunting.
As it planned this project, LinkedIn wanted to make sure it handled the evolution in a principled way so that the cost was not so high the next time around.
LinkedIn set out to create "a data ecosystem that can handle change," by standardizing core data entities, creating maintainable contracts between data producers and consumers, and ensuring a dialog between data producers and consumers.
"If you are small, maybe you don't need this, but as you get larger, this will let you scale," Garten said.
"This might sound like, oh, big company, a lot of process, do we really need this? Maybe you don't in a small company. But as you start to evolve these things don't slow you down. These things let you innovate and accelerate and make sure you have a really good foundation to enable good data products," she said.
Among the work performed to enable LinkedIn's evolution to a good foundation, the company created a set of tools to help users monitor and maintain the code base and contracts between data producers and consumers.
[Looking to get an edge in hiring talent? Read Why I Banned Non-Compete Clauses From Our Hiring Practices .]
"Let's make it really hard to screw up. Anything that can be standardized should be standardized. We created a library that was basically framework-level tracking that all the different product teams that make up this one app all use very simply. You specify a name, you specify the name of the button that the user clicks and for free, you get a lot of specs like page flow analysis," Garten said.
"For anything else, for anything custom, we give guidance to teams on how to create a new event, what to call it, when to break out a new event and when not to."
To ensure compliance, LinkedIn created an internal visual tool that specifies what employees need to know when they are working with tracking applications. In addition, there's a monitoring app on top of that tool that reports back when data that is emitted does not match the guidance for tracking so that problem can be fixed.
LinkedIn also realized how important it was for teams in such a large company to know who to talk to in order to ensure success of this project. Garten said the goal was to create a standardized process so that teams would know who are the consumers of the data, who are the product managers, and who are the engineers. They needed to know who to talk to so they could learn about data dependencies and not end up breaking something that someone was using. So, they created another new process.
"We have a step-by-step process that feels like it's more process, but in practice it's been amazing," she said. "Teams are iterating faster, and things aren't breaking. Hashtag Analytics Happiness ."
Following the successful retooling of the back-end data foundation and the roll out of the new mobile app, LinkedIn is embarking on a redesign of its web-based desktop experience, Garten said. But this process will likely be much easier because the work performed on the mobile app helped ensure a solid data foundation.
"It's been amazingly smooth and easier than last year because we laid the groundwork," she said. "It's nothing like the investment we had to undertake last year."
Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG's Infoworld, Ziff Davis Enterprise's eWeek and Channel Insider, and Penton Technology's MSPmentor. She's passionate about the practical use of business intelligence, ...View Full Bio