O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
Each of the log collection machine are running Nginx. The nginx access logs are then processed by flume, and bucketed by beacon types, partitioned by hour, and stored on hdfs.
majority of our MapReduce jobs: Select a set of dimensions that we are concerned about Clean up any incomplete/malformed beacons Perform some lookups against metadata tables (for example mapping a video id to a show name) Group by the selected dimensions and aggregate on some attribute (for example, the number of minutes watched)
We have about 100 different MR jobs that run every hour – if we handwrote each MR job that would be painful
The BeaconSpec tool parses a beacon specification file and provides an object model of beacons and base fact. The tool also supports useful tasks, like generating base fact scrubber code, harpy data definitions, and validation tests. The MetStat dashboard uses BeaconSpec to automate the creation of processing jobs.
Three basic components of any modern compiler: - Lexer - Parser - Code generator
Jflex and CUP are modeled on Flex and Bison, which are in turn modeled on lex and yacc
“There will always be problems…make it easy to troubleshoot”
Next steps: data quality
Lessons Learned - Monitoring the Data Pipeline at Hulu
MONITORING THE DATA PIPELINE
• Who am I?
• What’s a Hulu?
• Beacons & the Data Pipeline
• Monitoring – Take One
• Monitoring – Take Two
METRICS & REPORTING TOOLS TEAM LEAD
Help people find and enjoy
the world’s premium content
when, where and how they want
PREMIUM CONTENT QUALITY AD EXPERIENCE
• Premium Content
• 485+ Content Partners
• 6 of 6 Broadcast Networks
• Ads can’t be skipped
• Less ad load than TV
• 100% video completion
• On Demand
• Across Devices
• Choice Based Ad Formats
WHY IS HULU EFFECTIVE?
• Service Oriented
• Small teams, specialized scopes
• Build tools for other developers
• Right tool for the job
Fire & Forget
External View of Beacons
80 2013-04-01 00:00:00
Which show is the user watching?
Which pages did they visit?
How long did they stay?
Where did they come from?
Did they become Plus members?
Log Collector / Flume
Continuous Aggregation /
Data never stops
and we can’t lose
Files bucketed by beacon
type and partitioned by hour
Hulu MapReduce Metrics Jobs
Scala / Akka
JFlex & CUP Java (Generated)
Aggregation & Publishing
Date range checks
BIG DATA PIPELINE?
I’LL BET THAT’S GOING
GREAT FOR YOU
Lots of Monitoring Tools Available
ClusterOpenTSDB & Graphite
WHAT’s GOING ON??!??
HOW IS OUR CLUSTER? WILL WE MEET OUR SLAs?
HOW FAST DID A JOB RUN?
HOW DID RUNTIME COMPARE TO
HOW IS THIS COMPONENT? HOW IS OUR SYSTEM?
Access all your tools in one
…but avoid multitasking
Comprehensive Web UI
Does this solve our problems?
• Single Point of Access?
• Maintain services separately?
Our Users’ Perspective?
• We detect platform issues
• We quickly troubleshoot errors
• We track relative performance
• We know where we are re: SLAs
…but is detection of a problem
We need to think of things from
the report users’ perspectives
The User Perspective
Contextual Troubleshooting Model
• Connect issues to business units
• Better impact assessment
• Tune performance per user needs
We need a graph data structure,
populated with the stuff we care
Something like this
Why a Graph?
…instead of RDBMS
Indeterminate # of Joins
Query for graph connectedness is trivial and short
Query for connectedness w/ SQL relies on knowing the
…instead of a tree?
Data is sometimes recombinant (e.g. a metric in
multiple reports to same user)
Let’s investigate… These failed before getting to a data store
Most of the hive failures were the same
table, but it’s a common table
As we filter, the matched reports show up
on the bottom of the page. The log link
shows us the details
Each service implements a log-fetching interface,
specific to the resources used for a particular report
Find the Important Questions => Measure the Right Data
Make troubleshooting easy
Small distinct services are easy to create, maintain, and
• Muthu…the Platform GrandMaster
• All of Metrics Platform, Tools, Reporting for making this stuff
• Mohamed, Chris, Charlie, Robert, Phong, AJ, Ratheesh, Adi, Matt, Shashank, Joanne,
Siddhartha, Tamir, Jun, James, Dr. Kevin, Hang
• All of the Hulu DEV team for general awesomeness
• Prasan…thanks for the impetus to do this. I’ll look u up
• Kevin…thanks for Hulu. I’ll send u a snap