2. The Open Source…
Behind the Tweets
October 22, 2014 #twitterflight
3. Open source is everywhere!
On your phone, in your car… and within Twitter!
!
http://www4.mercedes-benz.com/manual-cars/ba/foss/content/en/assets/FOSS_licences.pdf
iOS: General->About->Legal->Legal Notices
!
Vine: General->About->Legal
!
8. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
https://dev.twitter.com/rest/reference/post/statuses/update
Your first stop as a tweet: Twitter Front End (TFE)
A fancy reverse proxy for HTTP traffic built on the JVM
Handles authentication, rate limits and more!
Powered by the open source project Netty: http://netty.io
9. Netty at Twitter
Netty is open source Java NIO framework
Used heavily at Twitter
Healthy adopter community:
http://netty.io/wiki/adopters.html
!
Cloudhopper sends billions of SMS messages
per month using Netty
https://github.com/twitter/cloudhopper-smpp
!
We contributed SPDY support to Netty:
http://netty.io/news/2012/02/04/3-3-1-spdy.html
*https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead
10. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch
fin
Twitter backend architecture is *service-oriented (on the JVM)
Core services are built on top of Finagle (using an API framework)
Finagle is written in Scala and built on top of Netty
https://github.com/twitter/finagle
*http://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture
11. Finagle at Twitter
Why Scala?
Scala enables succinct expression (vs Java)
Less typing is less reading; brevity enhances clarity
Two open source Scala/Finagle guides from Twitter:
https://twitter.github.io/effectivescala/
https://twitter.github.io/scala_school/
!
Finagle is our fault tolerant protocol-agnostic
RCP framework built on Netty
Emphasizes services modularity via async futures
Handles failover semantics, metrics, logging etc…
*https://blog.twitter.com/2014/netty-at-twitter-with-finagle
12. Finagle Service Example
// #1 Create a client for each service!
val timelineSvc = Thrift.newIface[TimelineService](...)!
val tweetSvc = Thrift.newIface[TweetService](...)!
val authSvc = Thrift.newIface[AuthService](...)!
!
// #2 Create new Filter to authenticate incoming requests!
val authFilter = Filter.mk[Req, AuthReq, Res, Res] { (req, svc) =>!
authSvc.authenticate(req) flatMap svc(_)!
}!
!
// #3 Create a service to convert an authenticated timeline request to a json response!
val apiService = Service.mk[AuthReq, Res] { req =>!
timelineSvc(req.userId) flatMap {tl =>!
val tweets = tl map tweetSvc.getById(_)!
Future.collect(tweets) map tweetsToJson(_) }!
}!
}!
!
// #4 Start a new HTTP server on port 80 using the authenticating filter and our service!
Http.serve(":80", authFilter andThen apiService)!
13. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
14. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
Tweets need to be stored somewhere (via a Finagle-based core service)
TBird: persistent storage for tweets
Built originally on Gizzard: https://github.com/twitter/gizzard
Tweets stored in sharded and replicated MySQL
TFlock: track relations between users and tweets
Built originally on FlockDB: https://github.com/twitter/flockdb
15. MySQL at Twitter
Maintain a public fork of v5.5/v5.6
Goal is to“work” with upstream
https://github.com/twitter/mysql
Co-founded the WebScaleSQL.org effort
16. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
17. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
When a tweet is generated it needs to be written to all relevant timelines
Timelines are essentially a list of tweet ids (heavily cached)
Fanout is the process where tweets are delivered to timelines
For caching we rely on the open source project Redis
https://github.com/antirez/redis
18. Redis at Twitter
Redis is used for caching timelines and more!
Added custom logging, data structures
We are working to upstream some changes…
@thinkingfish gave a fantastic talk on this:
https://www.youtube.com/watch?v=rP9EKvWt0zo
!
Open Source Proxy for Redis: Twemproxy
https://github.com/twitter/twemproxy
Used by Vine, Pinterest, Wikimedia, Snapchat etc…
19. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
20. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
Everyone searches for tweets: https://dev.twitter.com/rest/public/search
In fact, one of the most heavily trafficked search engines in the world
Back in the day, Twitter search was built on MySQL
Today, Twitter search is an optimized real-time search/indexing technology
Powered by Apache Lucene: http://lucene.apache.org
!
!
tweet write fanout search batch fin
21. Lucene (earlybird) at Twitter
Earlybird* is Twitter’s real-time search engine
built on top of Apache Lucene
!
We optimized Lucene (cut corners) to handle
tweets only since that’s all we do
e.g., less space: 140 characters only need 8 bits
!
Read about Blender, our search front-end
https://blog.twitter.com/2011/twitter-search-now-3x-faster
*http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf
22. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
23. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch
Hadoop is used for many things at Twitter, like counting words :)
scribe logs, batch processing, recommendations, trends, user modeling and more!
10,000+ hadoop servers, 100,000+ daily hadoop jobs,10M+ daily hadoop tasks
Parquet is a columnar storage format for Hadoop
https://parquet.incubator.apache.org
Scalding is our Scala DSL for writing Hadoop jobs
https://github.com/twitter/scalding
!
!
fin
24. Parquet/Scalding at Twitter
Parquet* is a columnar storage format
Initially a collaboration between Twitter/Cloudera
Inspired by Google Dremel paper**
Now at Apache: http://parquet.incubator.apache.org/
!
Scalding built on top of Scala and Cascading
https://github.com/Cascading/cascading
Makes it easier* to write Hadoop jobs (using Scala)
*https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
25. Scalding Example
import com.twitter.scalding._!
!
// can’t have a Hadoop example without word count!!
class WordCountJob(args : Args) extends Job(args) {!
TextLine( args("input") )!
.flatMap('line -> 'word) { line : String => line.split("""s+""") }!
.groupBy('word) { _.size }!
.write( Tsv( args("output") ) )!
}
https://github.com/twitter/scalding/wiki/Rosetta-Code
26. Life of a Tweet
What open source technology do we use behind the scenes when we tweet?
tweet write fanout search batch fin
27. Sharing is caring, contribute!
Lets all make Twitter better!
!
!
!
opensource.twitter.com https://github.com/twitter
28. New Open Source API Samples
Hack on the samples and improve them!
https://github.com/twitterdev (t.co/code)
!
Also, later today check out
the lightning talk by Andrew
Noonan later about the
“Twitter’s developer toolbox”
!
30. Q&A
The Open Source Behind the Tweets
http://opensource.twitter.com
!
Hope you learned something new!
Come see us at the @TwitterOSS Booth!
Chris Aniszczyk (@cra)