SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
TWITTER
SEARCH
By: Ramez Al-Fayez
TWITTER
User-generated content
­  140 characters called Tweet
­  Informal language, free-form
­  Diverse topics
­  Images, videos and links
­  SPAM L
Very high volume
Ø Information overload
2
“When you've got 5 minutes to fill,
Twitter is a great way to fill 35
minutes”
@mattcutts 
TWITTER STATS
3
2 BILLION
QUERIES PER DAY
230 MILLION
TWEETS PER DAY
< 10 S
INDEXING LATENCY
50 MS
AVG. QUERY
RESPONSE TIME
1 BILLION
REGISTERED USER
143,199
TWEETS PER SECONDS
4
WHAT TO SEARCH IN TWITTER?
­  Tweets
­  Images (Tweets that have images)
­  Users
­  News(Tweets that have links)
5
SEARCHING FOR “IPAD” ON TWITTER
6
More than 50 tweets
mentioning “iPad”
posted within
1-minute
CUSTOMIZED IR FOR TWITTER
Feature of Twitter’s IR
§ Modularity
§ Scalability
§ Cost effectiveness
§ Simple interface
§ Incremental development
7
CUSTOMIZED IR FOR TWITTER
The system consists four main parts
§ Batched data aggregation and preprocess pipeline
§ An inverted index builder;
§ Earlybird shards
§ Earlybird roots
8
CRAWLING TWITTER
 HoseBird API Client
	
  	
  Client	
  hosebirdClient	
  =	
  builder.build();	
  
StatusesFilterEndpoint	
  endpoint	
  =	
  new	
  StatusesFilterEndpoint();	
  
//	
  Optional:	
  set	
  up	
  some	
  followings	
  and	
  track	
  terms	
  
List<Long>	
  followings	
  =	
  Lists.newArrayList(1234L,	
  566788L);	
  
List<String>	
  terms	
  =	
  Lists.newArrayList("twitter",	
  "api");	
  
endpoint.followings(followings);	
  
endpoint.trackTerms(terms);	
  
INDEXING TWITTER
	
  	
  
In November 18, 2014 Twitter inc. announce that Twitter now
indexes every public Tweet since 2006
§ Temporal sharding: The Tweet corpus was first divided into multiple time tiers.
§ Hash partitioning: Within each time tier, data was divided into partitions based on a
hash function.
§ Earlybird: Within each hash partition, data was further divided into chunks called
Segments. Segments were grouped together based on how many could fit on each
Earlybird machine.
§ Replicas: Each Earlybird machine is replicated to increase serving capacity and
resilience
DATA AGGREGATION
11
§ Engagement aggregator: Counts the number of engagements for each Tweet in a
given day. These engagement counts are used later as an input in scoring each Tweet.
§ Aggregation: Joins multiple data sources together based on Tweet ID.
§ Ingestion: Performs different types of preprocessing — language identification,
tokenization, text feature extraction, URL resolution and more.
§ Scorer: Computes a score based on features extracted during Ingestion. For the
smaller historical indices, this score determined which Tweets were selected into the
index.
§ Partitioner: Divides the data into smaller chunks through our hashing algorithm. The
final output is stored into HDFS.
DATA AGGREGATION
12
INVERT INDEX
13
§ Segment partitioner: Groups multiple batches of preprocessed daily Tweet data from
the same partition into bundles. We call these bundles “segments.”
§ Segment indexer: Inverts each Tweet in a segment, builds an inverted index and
stores the inverted index into HDFS.
INVERT INDEX
14
SEARCH PROCESS
15
 Earlybirds shards:
­  The inverted index builders produced hundreds of inverted index segments. These segments
were then distributed to machines called Earlybirds. Since each Earlybird machine could
only serve a small portion of the full Tweet corpus, we had to introduce sharding
­  two-dimensional sharding scheme to distribute index segments onto serving Earlybirds
­  Multiple time tiers
­  Hash partitioning
­  Each Earlybird machine is replicated to increase serving capacity and resilience
 Earlybird roots:
­  The roots perform a two level scatter-gather as shown in the below diagram, merging
search results and term statistics histograms
SEARCH PROCESS
16
SEARCH PROCESS
17
RANKING
18
§ Different types of content are searched separately
§ Uniscores: used as a means to blend different content types into the search result
§ Score unification: Individual content is assigned a “raw” score, then converted into
uniscores
§ Burst: is used to filter out content types with low or no bursts. It’s also used to boost the
score of corresponding content types, as a feature for a multi-class classifier that
predicts the most likely content type for a query, and in additional components of the
ranking system.
RANKING
19
Search ranker chose News1 followed by Tweet1 so far and is presented with three candidatesTweet2,
User Group, and News2 to pick the content after Tweet1.
News2 has the highest uniscore but search ranker picks Tweet2, instead of News2 as we penalize change
in type between consecutive content by decreasing the score of News2 from 0.65 to 0.55, for instance
RANKING
20
Normalized image and news counts are matched to one of n=5 states : 1
average, 2 above, and 2 below. Matched states curves show a more
stable quantization of original sequence which has the effect of removal
of small noisy peaks
Query of “Photo” shows three sequences of number of
Tweets over eight 15 minute buckets from bucket 1 (2
hours ago) to 8 (most recent).
REFERENCES
§ Anirudh Todi, TSAR, a TimeSeries AggregatoR ,
https://blog.twitter.com/2014/tsar-a-timeseries-aggregator
§ Youngin Shin, New Twitter search results,
https://blog.twitter.com/2013/new-twitter-search-results
§ Yi Zhuang, Building a complete Tweet index,
https://blog.twitter.com/2014/building-a-complete-tweet-index
§ J. Kleinberg, Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD
Intl. Conf. on Knowledge Discovery and Data Mining, 2002
§ Brendan O'Connor, Michel Krieger, and David Ahn. 2010b. TweetMotif:
Exploratory search and topic summarization for Twitter. In Proc. of ICWSM
21
THANK YOU!

Mais conteúdo relacionado

Mais procurados

Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshareAllen (Xiaozhong) Wang
 
Scaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road AheadScaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road AheadC4Media
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed SystemChau Thanh
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyKishore Gopalakrishna
 
From Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance GainsFrom Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance GainsScyllaDB
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Data Warehousing and Analytics on Redshift and EMR
Data Warehousing and Analytics on Redshift and EMRData Warehousing and Analytics on Redshift and EMR
Data Warehousing and Analytics on Redshift and EMRAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 

Mais procurados (20)

Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
Scaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road AheadScaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road Ahead
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed System
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Real time data quality on Flink
Real time data quality on FlinkReal time data quality on Flink
Real time data quality on Flink
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
BERT
BERTBERT
BERT
 
Edge architecture ieee international conference on cloud engineering
Edge architecture   ieee international conference on cloud engineeringEdge architecture   ieee international conference on cloud engineering
Edge architecture ieee international conference on cloud engineering
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
From Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance GainsFrom Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance Gains
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Data Warehousing and Analytics on Redshift and EMR
Data Warehousing and Analytics on Redshift and EMRData Warehousing and Analytics on Redshift and EMR
Data Warehousing and Analytics on Redshift and EMR
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 

Destaque

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseLucidworks (Archived)
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...lucenerevolution
 
PageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglioPageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglioMarco Dal Pozzo
 
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...Marco Dal Pozzo
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Lucidworks
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Hadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryHadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryMatthew Hunt
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Evolution of The Twitter Stack
Evolution of The Twitter StackEvolution of The Twitter Stack
Evolution of The Twitter StackChris Aniszczyk
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Lessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking SitesLessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking SitesPatrick Senti
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Lucidworks
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbLucidworks
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling TwitterBlaine
 
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」Kazuhiko Nakamura
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsAditya Rao
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 
DirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like itDirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like itTelefónica
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 

Destaque (20)

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
PageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglioPageRank - per capirlo un po' meglio
PageRank - per capirlo un po' meglio
 
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
Smart society in Smart cities: lezioni per la PA e per i professionisti dell'...
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Hadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industryHadoop at Bloomberg:Medium data for the financial industry
Hadoop at Bloomberg:Medium data for the financial industry
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Evolution of The Twitter Stack
Evolution of The Twitter StackEvolution of The Twitter Stack
Evolution of The Twitter Stack
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Lessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking SitesLessons from Highly Scalable Architectures at Social Networking Sites
Lessons from Highly Scalable Architectures at Social Networking Sites
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」 Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
Office 365 勉強会「いまさらきけない? SharePoint の基礎のキソ」
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessons
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
DirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like itDirtyTooth: It´s only Rock'n Roll but I like it
DirtyTooth: It´s only Rock'n Roll but I like it
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 

Semelhante a Twitter Search Architecture

[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like systemAree Oh
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxMd. Rakib Trofder
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Twitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationTwitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationPallav Shah
 
Rob Procter
Rob ProcterRob Procter
Rob ProcterNSMNSS
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionieeepondy
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Ke Tao
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET Journal
 
Leverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesLeverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesMethod360
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 

Semelhante a Twitter Search Architecture (20)

Twitter System Design
Twitter System DesignTwitter System Design
Twitter System Design
 
Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptx
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Jinchao demo v7
Jinchao demo v7Jinchao demo v7
Jinchao demo v7
 
Twitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project PresentationTwitter Sub-event Detection Project Presentation
Twitter Sub-event Detection Project Presentation
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
README
READMEREADME
README
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
EasyHashtag
EasyHashtagEasyHashtag
EasyHashtag
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognition
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams
 
tweet segmentation
tweet segmentation tweet segmentation
tweet segmentation
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
Leverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesLeverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data Services
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
 

Mais de Ramez Al-Fayez

Process mining in business process management
Process mining in business process managementProcess mining in business process management
Process mining in business process managementRamez Al-Fayez
 
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORKSECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORKRamez Al-Fayez
 
Social networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterpriseSocial networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterpriseRamez Al-Fayez
 
IT strategic planning session
IT strategic planning sessionIT strategic planning session
IT strategic planning sessionRamez Al-Fayez
 

Mais de Ramez Al-Fayez (7)

Process mining in business process management
Process mining in business process managementProcess mining in business process management
Process mining in business process management
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Wcc elise features
Wcc elise featuresWcc elise features
Wcc elise features
 
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORKSECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
SECURITY REQUIREMENTS ENGINEERING: APPLYING SQUARE FRAMEWORK
 
Maria DBMS
Maria DBMSMaria DBMS
Maria DBMS
 
Social networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterpriseSocial networks and social media analysis in the context of the enterprise
Social networks and social media analysis in the context of the enterprise
 
IT strategic planning session
IT strategic planning sessionIT strategic planning session
IT strategic planning session
 

Último

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查ydyuyu
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Roommeghakumariji156
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolinonuriaiuzzolino1
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 

Último (20)

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 

Twitter Search Architecture

  • 2. TWITTER User-generated content ­  140 characters called Tweet ­  Informal language, free-form ­  Diverse topics ­  Images, videos and links ­  SPAM L Very high volume Ø Information overload 2 “When you've got 5 minutes to fill, Twitter is a great way to fill 35 minutes” @mattcutts 
  • 3. TWITTER STATS 3 2 BILLION QUERIES PER DAY 230 MILLION TWEETS PER DAY < 10 S INDEXING LATENCY 50 MS AVG. QUERY RESPONSE TIME 1 BILLION REGISTERED USER 143,199 TWEETS PER SECONDS
  • 4. 4
  • 5. WHAT TO SEARCH IN TWITTER? ­  Tweets ­  Images (Tweets that have images) ­  Users ­  News(Tweets that have links) 5
  • 6. SEARCHING FOR “IPAD” ON TWITTER 6 More than 50 tweets mentioning “iPad” posted within 1-minute
  • 7. CUSTOMIZED IR FOR TWITTER Feature of Twitter’s IR § Modularity § Scalability § Cost effectiveness § Simple interface § Incremental development 7
  • 8. CUSTOMIZED IR FOR TWITTER The system consists four main parts § Batched data aggregation and preprocess pipeline § An inverted index builder; § Earlybird shards § Earlybird roots 8
  • 9. CRAWLING TWITTER  HoseBird API Client    Client  hosebirdClient  =  builder.build();   StatusesFilterEndpoint  endpoint  =  new  StatusesFilterEndpoint();   //  Optional:  set  up  some  followings  and  track  terms   List<Long>  followings  =  Lists.newArrayList(1234L,  566788L);   List<String>  terms  =  Lists.newArrayList("twitter",  "api");   endpoint.followings(followings);   endpoint.trackTerms(terms);  
  • 10. INDEXING TWITTER     In November 18, 2014 Twitter inc. announce that Twitter now indexes every public Tweet since 2006 § Temporal sharding: The Tweet corpus was first divided into multiple time tiers. § Hash partitioning: Within each time tier, data was divided into partitions based on a hash function. § Earlybird: Within each hash partition, data was further divided into chunks called Segments. Segments were grouped together based on how many could fit on each Earlybird machine. § Replicas: Each Earlybird machine is replicated to increase serving capacity and resilience
  • 11. DATA AGGREGATION 11 § Engagement aggregator: Counts the number of engagements for each Tweet in a given day. These engagement counts are used later as an input in scoring each Tweet. § Aggregation: Joins multiple data sources together based on Tweet ID. § Ingestion: Performs different types of preprocessing — language identification, tokenization, text feature extraction, URL resolution and more. § Scorer: Computes a score based on features extracted during Ingestion. For the smaller historical indices, this score determined which Tweets were selected into the index. § Partitioner: Divides the data into smaller chunks through our hashing algorithm. The final output is stored into HDFS.
  • 13. INVERT INDEX 13 § Segment partitioner: Groups multiple batches of preprocessed daily Tweet data from the same partition into bundles. We call these bundles “segments.” § Segment indexer: Inverts each Tweet in a segment, builds an inverted index and stores the inverted index into HDFS.
  • 15. SEARCH PROCESS 15  Earlybirds shards: ­  The inverted index builders produced hundreds of inverted index segments. These segments were then distributed to machines called Earlybirds. Since each Earlybird machine could only serve a small portion of the full Tweet corpus, we had to introduce sharding ­  two-dimensional sharding scheme to distribute index segments onto serving Earlybirds ­  Multiple time tiers ­  Hash partitioning ­  Each Earlybird machine is replicated to increase serving capacity and resilience  Earlybird roots: ­  The roots perform a two level scatter-gather as shown in the below diagram, merging search results and term statistics histograms
  • 18. RANKING 18 § Different types of content are searched separately § Uniscores: used as a means to blend different content types into the search result § Score unification: Individual content is assigned a “raw” score, then converted into uniscores § Burst: is used to filter out content types with low or no bursts. It’s also used to boost the score of corresponding content types, as a feature for a multi-class classifier that predicts the most likely content type for a query, and in additional components of the ranking system.
  • 19. RANKING 19 Search ranker chose News1 followed by Tweet1 so far and is presented with three candidatesTweet2, User Group, and News2 to pick the content after Tweet1. News2 has the highest uniscore but search ranker picks Tweet2, instead of News2 as we penalize change in type between consecutive content by decreasing the score of News2 from 0.65 to 0.55, for instance
  • 20. RANKING 20 Normalized image and news counts are matched to one of n=5 states : 1 average, 2 above, and 2 below. Matched states curves show a more stable quantization of original sequence which has the effect of removal of small noisy peaks Query of “Photo” shows three sequences of number of Tweets over eight 15 minute buckets from bucket 1 (2 hours ago) to 8 (most recent).
  • 21. REFERENCES § Anirudh Todi, TSAR, a TimeSeries AggregatoR , https://blog.twitter.com/2014/tsar-a-timeseries-aggregator § Youngin Shin, New Twitter search results, https://blog.twitter.com/2013/new-twitter-search-results § Yi Zhuang, Building a complete Tweet index, https://blog.twitter.com/2014/building-a-complete-tweet-index § J. Kleinberg, Bursty and Hierarchical Structure in Streams, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002 § Brendan O'Connor, Michel Krieger, and David Ahn. 2010b. TweetMotif: Exploratory search and topic summarization for Twitter. In Proc. of ICWSM 21