2011 06-30-hadoop-summit v5

•Download as PPT, PDF•

32 likes•28,044 views

Samuel Rash

Slides from presentation at Hadoop Summit 2011 on Facebook's Data Freeway system

Technology

Data Freeway : Scaling Out to Realtime ,[object Object],[object Object]

Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Big Data, Big Applications / Data at Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Realtime Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Scribe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Calligraphus ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

HDFS : a different use case ,[object Object],[object Object],[object Object]

HDFS : add Sync ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

HDFS : Concurrent Reads Overview ,[object Object],[object Object]

HDFS : Concurrent Reads Implementation ,[object Object],[object Object],[object Object]

HDFS : Checksum Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Calligraphus: Log Writer Calligraphus Servers HDFS Scribe categories Server Server Server Category 1 Category 2 Category 3 ,[object Object]

Calligraphus (Simple) Calligraphus Servers HDFS Scribe categories Number of categories Number of servers Total number of directories x = Server Server Server Category 1 Category 2 Category 3

Calligraphus (Stream Consolidation) Calligraphus Servers HDFS Scribe categories Number of categories Total number of directories = Category 1 Category 2 Category 3 Router Router Router Writer Writer Writer ZooKeeper

ZooKeeper: Distributed Map ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A 1 5 2 3 4 B 1 5 2 3 4 C 1 5 2 3 4 D 1 5 2 3 4 Root

ZooKeeper: Distributed Map ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Distributed Map: Performance Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Canonical Realtime Application ,[object Object],[object Object],[object Object],[object Object]

Parallel Tailer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Puma Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Summary - Data Freeway ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What's hot

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter

Efficient in situ processing of various storage types on apache tajoHyunsik Choi

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks

Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh

Powering a Virtual Power Station with Big DataDataWorks Summit/Hadoop Summit

Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit

HBaseCon 2015: HBase Operations in a FlurryHBaseCon

HUG August 2010: Best practicesHadoop User Group

Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks

Tales from the Cloudera FieldHBaseCon

Large-scale Web Apps @ PinterestHBaseCon

HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack

Gfs vs hdfsYuval Carmel

Hudi architecture, fundamentals and capabilitiesNishith Agarwal

What's New Tajo 0.10 and Its BeyondGruter

HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.

Apache HBase in the Enterprise Data Hub at CernerHBaseCon

Apache Kudu Fast Analytics on Fast Data （Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan

Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore

What's hot (20)

Spark Summit EU talk by Debasish Das and Pramod Narasimha

Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)

Efficient in situ processing of various storage types on apache tajo

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...

Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...

Powering a Virtual Power Station with Big Data

Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop

HBaseCon 2015: HBase Operations in a Flurry

HUG August 2010: Best practices

Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera

Tales from the Cloudera Field

Large-scale Web Apps @ Pinterest

HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...

Gfs vs hdfs

Hudi architecture, fundamentals and capabilities

What's New Tajo 0.10 and Its Beyond

HBaseCon 2012 | HBase, the Use Case in eBay Cassini

Apache HBase in the Enterprise Data Hub at Cerner

Apache Kudu Fast Analytics on Fast Data （Hadoop / Spark Conference Japan 2016...

Teradata Partners Conference Oct 2014 Big Data Anti-Patterns

Viewers also liked

Cloudera's FlumeCloudera, Inc.

2012 빅데이터 big data 발표자료Wooseung Kim

빅데이터 플랫폼 새로운 미래Wooseung Kim

줌인터넷 빅데이터 활용사례 김우승Wooseung Kim

Best practice instagramWooseung Kim

The Future of EverythingMichael Ducy

Bitcoin 2.0(blockchain technology 2)Wooseung Kim

AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith

Viewers also liked (8)

Cloudera's Flume

2012 빅데이터 big data 발표자료

빅데이터 플랫폼 새로운 미래

줌인터넷 빅데이터 활용사례 김우승

Best practice instagram

The Future of Everything

Bitcoin 2.0(blockchain technology 2)

AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017

Similar to 2011 06-30-hadoop-summit v5

Hive @ Hadoop day seattle_2010nzhang

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.

Server Monitoring (Scaling while bootstrapped)Ajibola Aiyedogbon

Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit

Hadoop training in bangalore-kellytechnologiesappaji intelhunt

Scaling Streaming - Concepts, Research, Goalskamaelian

Building Big Data Applications using Spark, Hive, HBase and KafkaAshish Thapliyal

Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks

Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.

Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG

Data Infrastructure for a World of MusicLars Albertsson

Hadoop basicsAntonio Silveira

Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier

Hive Training -- Motivations and Real World Use Casesnzhang

HDF Data in the CloudThe HDF-EOS Tools and Information Center

UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin

Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan

Fluentd Overview, Now and ThenSATOSHI TAGOMORI

HDInsight for ArchitectsAshish Thapliyal

Similar to 2011 06-30-hadoop-summit v5 (20)

Hive @ Hadoop day seattle_2010

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...

Server Monitoring (Scaling while bootstrapped)

Bringing OLTP woth OLAP: Lumos on Hadoop

Hadoop training in bangalore-kellytechnologies

Scaling Streaming - Concepts, Research, Goals

Building Big Data Applications using Spark, Hive, HBase and Kafka

Eric Baldeschwieler Keynote from Storage Developers Conference

Hw09 Rethinking The Data Warehouse With Hadoop And Hive

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...

Hadoop ecosystem framework n hadoop in live environment

Data Infrastructure for a World of Music

Hadoop basics

Extending the Yahoo Streaming Benchmark + MapR Benchmarks

Hive Training -- Motivations and Real World Use Cases

HDF Data in the Cloud

UnConference for Georgia Southern Computer Science March 31, 2015

Building Scalable Data Pipelines - 2016 DataPalooza Seattle

Fluentd Overview, Now and Then

HDInsight for Architects

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

A Call to Action for Generative AI in 2024Results

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

A Year of the Servo Reboot: Where Are We Now?Igalia

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Finology Group – Insurtech Innovation Award 2024

How to Troubleshoot Apps for the Modern Connected Worker

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Data Cloud, More than a CDP by Matt Robison

08448380779 Call Girls In Friends Colony Women Seeking Men

A Call to Action for Generative AI in 2024

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

A Year of the Servo Reboot: Where Are We Now?

Handwritten Text Recognition for manuscripts and early printed texts

Boost Fertility New Invention Ups Success Rates.pdf

Breaking the Kubernetes Kill Chain: Host Path Mount

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

GenCyber Cyber Security Day Presentation