Coping with Over Four Hundred Devices: How Netflix Uses HTML5 to Deliver Amazing User Interfaces
Note: Every so often, we’ll be publishing a “feature”, an in-depth posting on a topic we care about that involves much more effort than we’re able to invest on a regular basis. Time will tell how often we produce these; probably will be monthly or bi-monthly. Please let us know what you think of this one and what other areas you’d like to see us explore in the future. Thanks!
Today’s fragmented browser and device landscape make yesterday’s cross-browser incompatibilities look like a walk in the park. It’s one thing to adapt a site that was originally tailored for Firefox to look decent on IE6, but creating a high-quality software experience for today’s array of mobile browsers, desktop browsers, and native platforms is an even bigger challenge. Solving for the current platform fragmentation is clearly one of the great problems–and opportunities–of our era.
But for a company like Netflix, dealing with this issue is the least of their worries. That’s because in addition to the same soup of platforms most of the rest of us are supporting, Netflix is providing an ever-evolving video streaming and content discovery experiences for devices across classes ranging from set-top boxes to televisions to video game consoles: over 400 different such machines at last count. While nearly all of these share Linux as the operating system, the differences across the hardware and software stacks is otherwise staggering.
We assumed that in order to support this kind of breadth, Netflix must have setup an API and rely on third parties to create these hundreds of software experiences. But while there are a handful of Netflix experiences implemented this way (e.g., the Apple TV), it turns out Netflix itself is responsible for creating the vast majority of these.
Surely Netflix must have partnered with some kind of software consulting shop or platform vendor to accomplish this, right? They’re an internet subscription TV and movie service for crying out loud. But would you believe they did it all in-house? It turns out Netflix is more a software-company-that-happens-to-be-in-the-internet-video-business than the other way around.
We recently had the opportunity to sit down with Netflix and discuss their approach to “extreme multi-platform development” (our words, not theirs). Because we’re covering it here, you might guess that the Web plays a big role in their strategy; you’d be right. We hope you find their story as captivating as we did.
We’ve seen some swanky offices in the Valley, but Netflix is truly in a league of its own. Located off the beaten path in sleepy Los Gatos, the Netflix campus has more in common with a classic California Adobe mansion than with, say, Adobe’s own traditional corporate offices in San Jose or any of the other Silicon Valley campuses you’re likely to run across. It’s a beautiful, inspiring setting.
It’s early in the new year and the two of us are in the lobby waiting for our meeting. And of course, there’s an authentic movie theatre popcorn machine here filled for visitors to snack on–complete with a sign telling employees “hands off” the popcorn. While the rest of the inside of the campus looks like other Class A office space in the valley, it still manages to be distinctive thanks to movie-themed conference rooms complete with translucent movie imagery imposed on their interior glass windows, seemingly authentic movie props strewn throughout, and a full-size movie screening room.
We’re here to meet with our good friend Bill Scott, Netflix Director of UI Engineering. We’ve known Bill since the early days of the Ajax movement when we met at the Adaptive Path / O’Reilly Ajax Summit in San Francisco. Back then, Bill was pioneering with the Rico framework (remember the iconic accordion component?) and we became fast friends; we’ve tracked each other’s careers ever since. Joining Bill for today’s meeting are Matt Marenghi (Director, Engineering), Matt McCarthy (Web UI Engineer), and Andy Atkins (Director, Engineering).
After a short tour, the four lead us to one of the many conference rooms throughout the Netflix offices where for the next couple of hours we’ll geek out with them on the Netflix technology stack.
The PlayStation 3
Right after we sit down, Matt Marenghi pulls out a PlayStation 3 from under a desk and projects its display onto one of the room’s walls. He’s showing us the PS3 Netflix interface. At least, one of them. It turns out they’ve made quite a few different interfaces for the PS3; more on that later.
As Marenghi demos the UI, it’s clear that Netflix has invested a good deal of energy on polish. Animations are smooth and additive to the interaction design, providing transitions and context that truly aid in the user experience. The frame-rates are high and the render quality of the images and fonts are great. This is a native-quality app, no doubt.
And it’s all done with HTML.
“Really?” we ask. “How did you get the PlayStation 3 browser to do all of this? Doesn’t it suck?”
And here’s where it gets really interesting. The experience we’re looking at is a modern HTML5 app running on a custom Netflix port of WebKit. Didn’t see that one coming. At this point, we’ve got a lot of questions on our minds.
The Netflix Approach to Software Development
Before we dive deep into the technical details, it may be helpful to understand a bit more about the company’s approach to software development in general. Marenghi explains, “We believe strongly in a consumer science approach to software development.” That is, Netflix doesn’t just observe small focus groups of consumers to get ideas about how to design their software, they observe huge swaths of their user base every single day and constantly improve their software as a result.
These experiments take the form of “A/B testing”, the practice of developing multiple versions of a feature or an entire user interface and assigning these varied versions to users at random. In the case of Netflix, they measure which of these groups (called “cells”) watch more movies and stay subscribed the longest, and based on these results, roll the winning variations out to all users–or incorporate the winning features into their current experiences.
Hearing about this focus on data reminded us of Douglas Bowman’s famous blog post in which he explains that he had to leave Google because of their “design philosophy that lives or dies strictly by the sword of data.” More from Douglas:
When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. Remove all subjectivity and just look at the data. Data in your favor? Ok, launch it. Data shows negative effects? Back to the drawing board. And that data eventually becomes a crutch for every decision, paralyzing the company and preventing it from making any daring design decisions.
Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such minuscule design decisions. There are more exciting design problems in this world to tackle.
Does the Netflix obsession with “consumer science” lead to similar frustrations?
The team offered a few reasons why for them this focus on data hasn’t made their jobs any less enjoyable:
- Design, Engineering, and Product Management are equally balanced. It didn’t used to be the case, but they spent a good deal of effort recruiting in top talent across these three disciplines. Seeing this partnership in action without a super-dominant player is rare, in our experience.
- Focus around a clear, unambiguous goal: “Maximize TV show and movie viewing by Netflix members.” Having this as their charter–rather than a more amorphous directive to “make great products” or “do cool stuff”–creates a strong drive in all the team members to measure their progress towards the goal.
- Democratized participation in decision-making. Says Marenghi, “The traditional approach to software design is to hire one or two key people who drive the entire process and get to make the key decisions. [At Netflix] everybody really gets to contribute hypotheses and contribute into the user testing process.” McCarthy added that they are often surprised at which hypothesis wins and that “it’s hugely gratifying to be surprised when an experience you didn’t predict winds up as the winner because it validates our approach.”
To get a sense for just how seriously Netflix takes their methodology, check out a fairly recent Quora posting from their Chief Product Officer, Neil Hunt:
Regardless of the size of each test cell, you can compare any cell to any other on some metric M, measured as Mi, Mj for the different cells (recipes), where the difference Mi-Mj has variance Vij equal to the sum of the variances Vi + Vj of the separate metrics Mi, Mj. Thus the standard deviation SDij of the different Mi-Mj is the square root of Vij. A difference Mi-Mj greater than 1.64 x SDij is meaningful with less than 1 chance in 20 of being by chance alone, and a difference greater than 1.96 x SDij has less than 1 chance in 40 of being by chance alone…
Another handy trick, if recipe T, U, V, all share many aspects in common, and are all quite different from control C, it may make sense to create a virtual result X that is the combination of T, U, and V, for determining some metric that will not differ across T, U, V. In such cases, it may maximize experimental sensitivity to allocate 50% of trials to C and 16.7% of trials to each of T, U, and V. As an example, you want to test response on a website to disclosing an additional product benefit. Control C is without the additional detail. Test T, U, V are with the additional detail, but in different positions on the page. 30k visitors get C, and 10k each get T, U, or V. You get maximum sensitivity to whether exposing the detail is worthwhile at all, and if it’s big, you may be able to determine which placement is best. If small, run a followup test without C, with 20k each in T, U, ad V to figure out which was best.
While that sounds quite complicated, it’s important to note that they do take steps to keep the process as simple as possible. For example, Netflix doesn’t create custom experiences for targeted demographics; they view their user pool as homogenous and they therefore extrapolate tests across random swaths of their user base across all users. Incidentally, while we appreciate why they take that approach, we were hoping they had at least two different user segments (e.g., normal people and geeks) to accommodate those of us who like a bit of complexity and choice. And in fact they confirmed that in their tests, the simpler user interfaces were generally always the winners. Well, for the geeks among us, there’s always the possibility of creating a custom interface on top of their API–the same API that all of their experiences use, too.
Consumer Science UI Design and HTML: A Perfect Fit
Put yourself in the shoes of the Netflix engineering team. Can you imagine what it must be like maintaining all of these seemingly endless feature variations? Most of us struggle to get one experience relatively stable and bug-free! “We have a lot of coping strategies for supporting our A/B testing approach,” McCarthy says with a grin while the others laugh.
Coping is the right word. If they have a silver bullet for easing the maintenance of all of these, they didn’t share it with us. Rather, they characterized the maintenance costs and associated complexity as an on-going strategic investment. It directly contributes to their success in the marketplace, so they bear the cost without complaint.
Having said that, dealing with the complexity across a single technology platform–the Web–is one thing; doing it across the Web and a sea of native platforms is quote entirely another. McCarthy admitted that they knew from the beginning that they simply couldn’t bring their consumer science approach to a half-dozen or more different development platforms. HTML5 had to be their vehicle for applying the Netflix Way to the apps world.
But choosing HTML5 as their platform was about much more than just bounding the engineering effort. It turns out that the Web stack is ideal for another reason: its dynamic, interpreted nature. Because of it, Netflix can change their user interface easily anytime without redistributing a new client binary image or dealing with a review process. This is a critical factor in the success of their consumer science approach.
A Cute, Custom Version of WebKit
While HTML is a great platform fit for Netflix, it’s not the platform they originally started out with for their device UIs. The initial platform was a mixture of Flash Lite and C++, which Andy characterized as a real integration challenge. They wanted to try a different approach. He explains, “About a year and a half ago, the team sat down and we realized that devices were getting faster quickly. At the same time, WebKit was starting to pick up steam as a great embeddable Web runtime. YouYou combine that with the huge breadth of HTML talent at Netflix, and we saw an opportunity to leverage that talent across all our devices. We decided the time was right to take a dive into the HTML pool.”
The Netflix team realized almost immediately that if they chose HTML for their mobile applications, they’d have to provide their own WebKit runtime to power it. This became apparent as they evaluated using the browser engines third-party device manufacturers provided. “We found that verifying a third-party browser engine against our current and future needs was nearly impossible,” Andy said. And when gaps in such are identified, there’s the issue of cadence to consider. “We’re Web guys. We release new versions nearly every two weeks. Our entire business is based on this flexibility. But device manufacturers work on much longer timelines as they are dealing with hardware. We didn’t want to lose our ability to work rapidly.”
Hearing that Netflix chose QT/WebKit wasn’t a big surprise; it has a great reputation in the industry. We dig in a bit to understand how well it worked out of the box for them. To start with, we ask Andy how closely QT/WebKit was tracking the mainline WebKit project. Andy explains that while they did have to pull in a few patches into the QT, by and large it met their needs as-is. However, there was one notable exception: accelerated compositing. Let’s explain that a bit.
The Netflix user interfaces make frequent use of animation. Of course, with HTML5 interfaces, such animation usually involves making changes to the properties of the elements on the screen, such as their position, size, etc. Netflix found that every change to these UI element attributes caused QT/WebKit to recalculate the layout for the entire page, an operation that is simply too expensive on embedded and mobile devices to achieve acceptable frame rates. Accelerated compositing solves this problem by enabling a subset of these DOM state changes to be performed by the computer’s dedicated graphics hardware.
The WebKit project provides this support through CSS animations, transitions, and transforms, but QT/WebKit didn’t expose this functionality. “So we added a new layer in QT that supported WebKit’s hooks for hardware accelerating these features,” Andy tells us. “That led to a 30x improvement in page rendering performance and in turn allowed us to support certain embedded hardware platforms that we otherwise wouldn’t have been able to.”
Netflix makes their tweaked version of QT/WebKit available to hardware partners via a special SDK. This makes integration a breeze from their perspective; they give their partners the SDK and once they bring it on-line and pass Netflix certification tests, both parties rest assured that the user experience will be acceptable. Andy tells us that most partners have found integrating QT/WebKit with their custom operating systems has been quite straight-forward, even easy.
iOS and the Future
Rather than attempt to distribute their custom runtime on iOS devices, Netflix has opted to make use of iOS’ own embeddable WebKit instance, exposed to developers through Apple’s Cocoa Touch framework. This approach was taken for a variety of reasons. Of course, there’s the obvious show-stopper: Apple’s developer agreements at the time clearly blocked such an approach (though arguably they no longer do). But there are other issues at play. For one, Netflix’s QT/WebKit runtime weighs in at a hefty 33 MB. While not hugely significant on devices with many gigabytes of storage, it is firmly over the iOS Edge/3G download limitation size for the App Store. For another, Apple certainly isn’t going to spend any time integrating the Netflix SDK with iOS, so Netflix would have to bear that integration cost–a cost most of their other hardware partners pay for them.
iOS foreshadows a future Netflix is already anticipating, one where Netflix will increasingly have to integrate with third-party Web runtimes. “We are anticipating a world where the device people will want us to use their WebKit,” Andy says, but he also pointed out that “it’s a nightmare for us if a bunch of different WebKits proliferate with different quirks and such.” They’ve already taken architectural steps to facilitate this, such as splitting the application UI from the video player and other lower-level components. The two tiers communicate via an HTTP server embedded in the component layer, which allows the HTML5 UI to use Ajax-style remoting paradigms to communicate with the lower-level components.
To mitigate the pain of Web runtime fragmentation, Netflix anticipates easing into this territory by first supporting one third-party WebKit runtime (not counting iOS) and then gradually increasing support for others as Netflix learns how to certify these runtimes for their application.
Developing for Devices
A specific issue here had to do with the same accelerated compositing we mentioned earlier. McCarthy continues: “I used to hide text by moving it to -8000 or something, but that’s a really bad idea [with respect to video memory] because it creates a huge texture. Since every animated surface is converted into a bitmap and passed to the GPU, you have to be aware of memory usage.”
The team shared some additional lessons they learned along the way:
- Write very short very specific CSS selectors
- Pool and reuse DOM elements
- Avoid closures. The Netflix website makes liberal use of closures to hide private variables (i.e., the module pattern). However, every time you use a closure, it adds one more environment record on the scope chain. McCarthy explains, “To avoid this cost, we moved to making members private by convention.” Sorry, Doug!
In short, develop for a very slow processor and small amounts of memory. To this end, McCarthy shares, “If I could take away my developer’s Macs, I would.” It’s too easy to write code that only works well on the developer’s high-end usually-Apple hardware. But to compensate, Netflix gives their developers lots of embedded devices to measure performance and check their assumptions with regards to performance.
So Of Course, We’ll Need a Framework
With so many different UIs across different types of devices and for different user cells in their A/B testing, it’s a safe guess that Netflix has some kind of app framework. And in fact, they do. McCarthy claims that up to 70% of their code across all their cross-platform experiences is composed of shared code at every level of the stack in the form of infrastructure libraries and a UI framework.
These shared libraries provide the foundation that development teams use to build a particular user interface. The UIs themselves are often custom-tailored for an individual platform. “We’re testing a lot of bleeding edge concepts that can only be pulled off on a single device,” McCarthy says.
They have a custom build system that lets developers conditionally include the shared components they want for the experience that they are building. The system leverages tools like Ant, YUI Compressor, and Juicer and incorporates popular HTML5 frameworks like LESS, in addition to Netflix-specific code.
One of the more interesting Netflix-specific bits is their UI framework. It’s based on the core abstractions of components and a state management primitive called a “card.” These cards essentially manage the state of one or more views, which in turn contain components. The framework allows for cards to be stacked, and for the stack to be traversed. In some of their UIs, users visually see these stacks in one way or another.
McCarthy tells us that dealing with TVs presented some interesting challenges given how different that modality is compared to touch or mouse-driven interfaces. For example, in a TV interface, the user typically interacts by moving focus from component to component with a remote control, whereas in other modalities the notion of focus is far less important unless you are actually typing. As a result, their UI framework builds on top of the DOM API’s notion of focus tracking to create a stronger model that provides explicit–and easy–component state tracking. They’ve also added additional event types, such as for when video buffering has finished.
The Application Team
So how many people does Netflix have building and supporting all of this? The core team working on the SDK (which contains their version of QT/WebKit) is comprised of only a handful of engineers, but they in turn are aided by lots equally small supporting teams, such as a partner engagement team, a partner support team, a device testing team, etc.
The user interfaces are created by small, agile teams. Each major device is supported by only a few people, but the environment is quite fluid. Rather than pigeonholing their developers, they often float from team to team, and their senior development experts move around to support the teams as needed. McCarthy explains, “We mobilize people to where the needs are and try to move fast to meet that need. We don’t have a large corporate structure that has all kinds of counter-productive barriers to prevent that.”
If this sounds like the Netflix engineers are generalists, that’s because they often are. “Because we’re a small company and moving very fast, the need for different skill sets is huge. We look for people who can learn new things and go to where the needs are,” McCarthy says. Maregnhi adds, “We operate lean and efficient because we actually accomplish more that way and find we can be more efficient by having a team of very experienced, top-notch engineers who are experts in their areas. We’re very focused on talent density. People are surprised at how few engineers Netflix employs for the level of work they do.”
The question on so many developer’s minds right now is, “Do we develop native apps or cross-platform apps?” The Netflix story provides a clear example of a company achieving success with cross-platform HTML5. But that doesn’t mean it is the answer for everyone. Says Marenghi, “If you’re doing an app for one device and don’t have a need to frequently update it or to do A/B testing, of course you’d do native. We’re interested in bringing our service to as many devices as possible, and want those experiences to bring delight to our customers, but we also want the flexibility to rapidly innovate on them. We’re willing to sacrifice some polish that comes with a native implementation in order to innovate with minimal constraints.”
But there’s no debating that for many other platforms, such as the PS3, the cross-platform HTML5 approach delivers a fantastic experience. Further, Netflix has come up with a compelling way to side-step the cross-browser fragmentation that plagues so many of us. It has caused us to wonder if a project like PhoneGap ought to optionally bundle a WebKit instance and give developers the option to use a unified web runtime platform.
One thing’s for sure: we’re going to keep a close eye on Netflix moving forward. They’re a fantastic crew doing amazing things.