Last Friday (February 8th), I spoke at the Intelligent Content Conference 2013. When Scott Abel (aka The Content Wrangler) first contacted me to speak at the event, he asked me to speak about my content management and distribution experiences from both NPR and Netflix. The two experiences seemed to him to be an interesting blend for the conference. These are the slides from that presentation.
I have applied comments to every slide in this presentation to include the context that I otherwise provided verbally during the talk.
41. There has to be a better tool for the
job than the One-Size-Fits-All REST API
42.
43. Network Border Network Border
One-Size-
Fits-All
REST API
START- A/B MEMBER RECOMME MOVIE SIMILAR
AUTH NDATIONS
RATINGS
UP TESTS DATA DATA MOVIES
44. Network Border Network Border
Optimized
API
START- A/B MEMBER RECOMME MOVIE SIMILAR
AUTH NDATIONS
RATINGS
UP TESTS DATA DATA MOVIES
45. Recipe for Optimized APIs
API providers that have a:
• small number of targeted API consumers
• very close relationships between with API consumers
• increasing divergence of needs across these API consumers
• strong desire for optimization by the API consumers
• optimized APIs offer high value proposition
46. Recipe for Optimized APIs
API providers that have a:
• small number of targeted API consumers
• very close relationships between with API consumers
• increasing divergence of needs across these API consumers
• strong desire for optimization by the API consumers
• optimized APIs offer high value proposition
• a generous helping of chocolate (to keep engineers happy)
50. Static, Non-Reusable Data
Disparate Technologies
Disparate Skill Sets
Redundant
Redundancy in
Redundant Data
Data
SQL
Applications In
Server Across the
Databases
Disparate Two Sites
Technologies Disparate
Technologies
Disparate Skill
Sets Disparate
Informix
Skill Sets
Editorial
Staff
80. Daniel Jacobson
Director of Engineering, Netflix API
djacobson@netflix.com
@daniel_jacobson
http://www.linkedin.com/in/danieljacobson
http://www.slideshare.net/danieljacobson
Notas do Editor
Content Management tends to be a bucket in which everything related to content gets thrown. It ranges from blob-based applications like blogging tools to comprehensive systems that maintain strong structure and separation of content. I prefer to think about it in terms of three discrete operations that these systems should perform: Capture, Storage and Distribution. All three operations require equal attention and are equally important for the success of content strategies. This presentation, however, will mainly focus on Distribution and how to set content free.
There are many forms of caged content, including content embedded in static HTML files, in flash or other binary objects, in legacy databases or otherwise trapped in systems without mechanisms for sharing it. They are like a bird in a cage, desperate for its release. For an effective content strategy, all of that content needs to be liberated.
When talking about liberating content, most people gravitate towards “open” approaches, such as Open/Public APIs, RSS, etc. They think the bird has wings so it should be able to fly anywhere in the world. Those approaches are viable in some scenarios and should not be discounted. But that isn’t always practical or necessary. And in fact, in many cases, the value of content being opened to the world is rarely transformational.
Rather, a more focused liberation strategy can be effective and often transformational. The transformation is in letting the bird out of its cage, but keeping it in the house. This enables the content to be extremely portable so it can satisfy any business/use case while allowing you to leverage it for all of your internal needs. This doesn’t preclude opportunities to let the bird into the world, but I want to be clear that the biggest value proposition is often just getting it outside of the cage.
The following slides will go through two use cases where content liberation has transformed the business: Netflix (my current employer) and NPR (my previous employer). They have different paths towards liberation, but the conclusions are similar.
Netflix is focused on being the best, global Internet streaming video provider.
We now have more than 33 million global subscribers in more than 50 countries and territories.
Those subscribers consume more than a billion hours of streaming video a month which, according to Sandvine, accounts for 33% of the peak Internet traffic in the US.
And we are now producing a fleet of original series, getting released throughout 2013, starting with House of Cards (released on February 1st).
All 33 million of Netflix’s subscribers are watching shows (like House of Cards) and movies on virtually any device that has a streaming video screen. We are now on more than 800 different device types.
All of this started, however, with launching streaming in 2007. At the time, we were only streaming on computer-based players (ie. No devices, mobile phones, etc).
At that stage, the bird was still in the cage. We have lots of systems that knew how to share with each other, but the content wasn’t particularly portable.
Shortly after streaming launched, in 2008, we launched our REST API. I describe it as a One-Size-Fits-All (OSFA) type of implementation because the API itself sets the rules and requires anyone who interfaces with it to adhere to those rules. Everyone is treated the same.
The OSFA REST API launched to support the 1,000 flowers model. That is, we would plant the seeds in the ground (by providing access to our content) and see what flowers sprout up in the myriad fields throughout the US. The 1,000 flowers are public API developers.
At this point, the bird has been set free to the world!
And at launch, the API was exclusively targeted towards and consumed by the 1,000 flowers (ie. External developers).
Some examples of the flowers…
But as streaming gained more steam…
The API evolved to support more of the devices that were getting built. The 1,000 flowers were still supported as well, but as the devices ramped up, they became a bigger focus.
And organizational and system changes were taking place as well. The many devices were now calling into the API and the API was drawing its content from a ever growing distributed system of dependencies.
Meanwhile, the balance of requests by audience had completely flipped. Overwhelmingly, the majority of traffic was coming from Netflix-ready devices and a shrinking percentage was from the 1,000 flowers. Today, the 1,000 flowers accounts for less than 0.1% of the API traffic.
With this transformation, the bird is still free, but it is mostly flying in the house (with occasional adventures into the world).
As the audience of the API has changed, so did its use cases. We started to realize that the original design for the API was not as effective as it could be in satisfying the newer, more complicated and more business-critical users (the device UI teams). We began inspecting the various ways in which the system was creating problems for us so we can create a more effective design. The following are some of the areas in which emerging problems were surfacing.
The first thing we looked at was the scale of the system and how chatty the API was.
With the adoption of the devices, API traffic took off! We went from about 600 million requests per month to about 42 BILLION requests in just two years.
That kind of growth and those kinds of numbers seem great. And it was great as it indicated that our subscriber counts were increasing, the number of devices supported by the API was growing, our subscribers were getting richer experiences on those devices and they were also spending more time interfacing with them. Who wouldn’t want those numbers, right?
Especially if you are an organization like NPR serving web pages that have ads on them. Each one of those requests create impressions for the ad which translates into revenue (and potentially increased CPM at those levels).
But the API traffic is not serving ads. Rather, we are delivering documents like this, in the form of XML…
Or like this, in the form of JSON.
Growth in traffic, especially if it were to continue at this rate, do not directly translate into revenue. Instead, it is more likely to translate into costs. Supporting massive traffic requires major infrastructure to support the load, expenses in delivering the bits, engineering costs to build and support more complex systems, etc.
So our first realization was that we could potentially significantly reduce the chattiness between the devices and the API while maintaining the same or better user experience. Rather than handling 2 billion requests per day, could we have the same UI at 300 million instead? Or less? Could having more optimized delivery of the metadata improve the performance and experience for our customers as well?
With more than 800 different device types that were supporting, we learned that the variability across them can also play a role in some of that chattiness. Different devices have different characteristics and capabilities that could influence the interaction model with the API.
For example, screen size could significantly affect what the API should deliver to the UI. TVs with bigger screens that can potentially fit more titles and more metadata per title than a mobile phone. Do we need to send all of the extra bits for fields or items that are not needed, requiring the device itself to drop items on the floor? Or can we optimize the deliver of those bits on a per-device basis?
Different devices have different controlling functions as well. For devices with swipe technologies, such as the iPad, do we need to pre-load a lot of extra titles in case a user swipes the row quickly to see the last of 500 titles in their queue? Or for up-down-left-right controllers, would devices be more optimized by fetching a few items at a time when they are needed? Other devices support voice or hand gestures or pointer technologies. How might those impact the user experience and therefore the metadata needed to support them?
The technical specs on these devices differ greatly. Some have significant memory space while others do not, impacting how much data can be handled at a given time. Processing power and hard-drive space could also play a role in how the UI performs, in turn potentially influencing the optimal way for fetching content from the API.
Finally, the OSFA model also seemed to slow the innovation rate of our various UI teams (as well as the API team itself). This became one of the most important considerations in our research.
Many UI teams needing metadata means many requests to the API team. In the OSFA world, we essentially needed to funnel these requests and then prioritize them. That means that some teams would need to wait for API work to be done. It also meant that, because they all shared the same endpoints, we were often adding variations to the endpoints resulting in a more complex system as well as a lot of spaghetti code. Make teams wait due to prioritization was exacerbated by the fact that tasks took longer because the technical debt was increasing, causing time to build and test to increase. Moreover, many of the incoming requests were asking us to do more of the same kinds of customizations. This created a spiral that would be very difficult to break out of…
So, after months of investigation, discussions, diagraming, etc., we set a path for our solution.
Ultimately, we determined that they definitely is a better way than adhering to the industry standards around a OSFA REST API. We don’t care about adhering to the industry standard practices, such as building a REST-based API…
What we care about is our audience! That is, we care about how to best serve the UI teams who in turn care about how to best serve Netflix subscribers. We will build systems that best serve them regardless of what that means for various standards or industry best practices.
So, if our OSFA REST API interaction model looked roughly like this (where are device would have a set of rules it needed to adhere to and would make a series of calls to the API based on those rules to create the UI)…
We would alter the interaction model substantially by allowing the device to make a single call back to an optimized API for that device (we are calling it a client adapter). That adapter will then fetch the data from the generic API and then format and deliver the document back to the device itself. This decouples the act of data gathering from data formating/delivery. The gathering would be done by the API system that we support while the formating/delivery would be done in the adapter by the UI teams, all before delivering the payload back to the device itself for rendering. For more details on the new system, you can learn more at http://techblog.netflix.com/search/label/api.
This design is not for everyone. Here is a recipe for those to which something like this could apply…
And don’t forget heaps of chocolate for your engineers!
The NPR story has a few differences in how it evolved, but the importance of the bird in the house is about the same.
In the first three years of my employment at NPR (1999-2002), there were four significant redesigns. Some of them were siginficant visual designs, but some of the changes were also in the underpinnings. For example, early on the content was basically just titles and links to audio for a story. Throughout those three years, we would slowly introduce new data elements to enrich those stories. The final one here shows more images and links out to pages where the earlier ones were mainly Listen links.
Meanwhile, the “systems” that supported these sites were very immature. There were static HTML files plus two distinct applications/databases that supported different areas of the system. A non-trivial amount of the content in the two database applications was the same…
That this stage, the bird was very much in the cage! We reviewed these problems and determined that a site and system redesign was necessary. First, we needed to collapse the databases into one clean and modular database while eliminating as many of the HTML files as possible (in favor of a database-driven system). Meanwhile, the evolution of the site in earlier years told us that presentation layers will constantly be changing so we needed to make sure the capturing and managing of content was separated from any presentation layer that would emerge.
That is when we came up with COPE: Create Once, Publish Everywhere.
The key philosophies that support COPE drove our redesign. Web publishing tools focus on capturing content for a display. That is why many of them have big blog fields where image links and other elements are inline with the text. Our goal was to not focus on publishing. Rather, we needed to manage content independent of where it will live, which takes us to ensuring that the managing of content is separated completely from its display. Next was content modularity which ensures that we break up all blogs into discreet fields for each element. Each field should have a very specific and focused purpose. Portability, which emerged over time, ensures that the content can live effectively wherever it needs to go. Portability started with multiple title lengths to allow different copy to appear in different sized locations (the first example was having the short title appear in a narrow column of PBS.org). But it also included getting rid of dirty markup such as <i> tags that could not be rendered by iPods.
This is an architecture diagram that we created years later after the API was built in to create a further level of abstraction between content and display. This architecture, no doubt, has changed since then.
But the goal remains the same… Capture content from many different points (human editors, feeds from internal and external sources, etc), store that content in a clean, modular and portable way. Then publish to many destinations.
In terms of the NPR API, the first use for it was an internal one to support areas of NPR.org. In that sense, the bird was clearly out of the cage, but that particular use case had the bird exclusively in the house.
As the API took flight, however, its reach became quite impressive. The following are just a few examples of where a single story had been published.
This is what the story, “A Space Voyage To Genesis” looks like in our content capture application. It is important to note that in terms of our system, this capture application is just another presentation layer. I have highlighted the title in this an other slides where it may be tougher to find it.
Once in the system, it becomes available through the NPR API.
OnNPR.org’s story page.
On the NPR.org tablet site.
In the NPR.org audio player. Notice, the title here is different because it draws from a different field in this location. In the audio player, the title gets pulled from the audio file name rather than the story title (a demonstration of the modularity of the data and our ability to have it be context-driven).
On the NPR News iPhone App.
On the mobile site.
On the NPR Android App, which was originally built (without our involvement) by a Google employee with his 20% time, getting the data from the NPR public API. We eventually caught up with him and adopted his work, which became the official NPR Android app.
NPR Addict is another app built entirely by a public developer using the NPR public API.
The same story on KQED.org, one of the San Francisco NPR member stations.
On WBUR.org, one of the Boston NPR member stations.
And of course, this and other stories can make their way to virtually any other location, including IP radios, news aggregators, connected cars, other devices, etc.
At this point, the bird is completely free, although it does spend most of its time in the house.
And as noted early by the frequent redesigns, this major site redesign happened in 2009. The changes were visually very different, but they also required substantially changes to the capture, storage and distribution channels in our content management efforts.
Prior to then, in 2008, similar efforts were made to support the NPR Music site.
And then much more recently, a new story page was created. All of these design efforts, while substantial, would be much more difficult without COPE.
So, looking at the API request distribution for NPR, it is now obviously very skewed towards internal NPR consumption.
Which looks very similar, although not as extreme, as the Netflix distribution.
To further emphasize the point that birds in the house are more valuable than in the open world, other companies like Evernote see a very similar distributions.
As does The Guardian in UK.
My final point will be another analogy. APIs are like icebergs. The tip, which is highly visible and gets a lot of attention equates to public APIs. They are the APIs that you hear about in the news, check out on developer portals, experiment with at hackathons, etc. But they very much are the smallest part of the iceberg in virtually every way. The big mass under the water that you don’t see represent internal or private APIs. They are substantially larger in terms of impact for most companies, amount of traffic, and in pure number of such APIs that companies rely on. As a result, it is important to understand your audience and to target them towards your most impactful segments. In most cases, that will likely be internal developers…