5. Pseudo-Code for Crawler Manager
• Begin infinite loop
– For each messageBoard in List
• crawlAll
– End For loop
• End infinite loop
6. High-Level Crawler Strategy
• Failed messages are
persisted
• Message markers
(right-hand side
labels) are persisted
• Algorithm prevents
crawling duplicate
messages
Old Message
Threshold
Oldest Message
Crawled
Last Successful
crawl
Last Successful
message extracted
Newest Message
Newly Crawled
Messages
Old successful
Crawled
Messages
Old Messages
Yet to be
Crawled
Messages from
Crash
Highest Message
Id
Lowest Message
Id
7. Crawler Strategy Algorithm
• Crawl all previous failed messages
• Crawl ‘crashed messages’
• Crawl new messages
• Crawl new failed messages
• Crawl old messages