SlideShare uma empresa Scribd logo
1 de 20
Flume Office Hours Community planning Jonathan Hsieh Cloudera HQ, 2/28/2011
Outline State of the world What’s new? Stories (Chime in!) What needs work? Prioritizing what is next. Q+A 3 Flume Office Hours, 2/28/2011
State of the world Flume Office Hours, 2/28/2011 4
Growing user and developer community  Github stats: Currently 295 watchers, 51 forks New Committers:  9/10: Eric Sammer (Cloudera) 1/11: Bruce Mitchener (Independent) User characteristics Most potential users seem to use adhoc scripts Most users are early adopters / startup devops Flume Office Hours, 2/28/2011 5
A short feature history 6/10: v0.9.0  Initial open source release 8/10: v0.9.1  Fixes for hangs  Initial compression features 10/10: v0.9.1+29 (CDH3b3, packages) Added kerberized HDFS support Flume cookbook Elastic Search / Cassandra Plugins Initial VoldemortPlugins 11/10: v0.9.2 Support for other compression codecs Avro RPC Improvements to tail and exec Robustness improvements Initial Hbase /MongoDBPlugin 2/11: v0.9.3 (CDH3b4, packages) Flume Node Windows support Initial JSON metrics support Multi-master functional Robustness improvements JRuby / AMQP Plugins S3/EC2 Blog Stories 4/11: v0.9.3+xxx (CDH3 Stable, packages) Excessive Duplication fixes Compression fixes ?/11: v0.9.4 Flume Office Hours, 2/28/2011 6
Whats new? Flume Office Hours, 2/28/2011 7
New features Flume node JSON metrics http://node:35862/node/reports Terser syntax { deco1 => { deco2 => sink } }  deco1 deco2 sink  Multiple collector sink support collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),   ] } Limited Multi-master support Windows support Flume Office Hours, 2/28/2011 8
Stories 9 Flume Office Hours, 2/28/2011
                : The Standard Use Case HDFS Flume Master Agent server Agent Collector server Agent server Agent server 10 Agent server Agent Collector server Agent server Agent server Agent server Agent Collector server Agent server Agent server Collector tier Agent tier Flume Office Hours, 2/28/2011
                       : Multi Datacenter 11 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
                       : Multi Datacenter 12 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Relay Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
             : Near Realtime Aggregator 13 HDFS DB Flume Agent Ad svr Collector Tracker  Agent Ad svr Agent Ad svr Agent Ad svr quick reports Hive job verify reports Flume Office Hours, 2/28/2011
An enterprise story 14 Kerberos HDFS Flume Collector tier Agent api Agent Collector api Agent api Win api API server Agent api Agent Collector api Agent api Linux api D D D D D D Agent api Agent Collector api Agent api Linux api Flume Office Hours, 2/28/2011 Active Directory  / LDAP
An emerging community story 15 HDFS HBase Incremental Search Idx Flume Agent Hive query Agent Agent Collector Fanout index hbase hdfs Agent svr Pig query Key lookup Range query Search query Faceted query Flume Office Hours, 2/28/2011
What needs work?What comes next? Flume Office Hours, 2/28/2011 16
Known issues Excessive event duplication (due to tail or e2e agent) Configuration translation problem in some cases Multi-master limited: doesn’t work with translations Flume Office Hours, 2/28/2011 17
What’s next? (proposals) Fix Excessive duplication issues. Apache Incubator (?) Log4j/Log4net/logback/etc… Fix Multi-master limitations. Security upgrades for node to node comms (TLS/SSL) Improved metrics / GUI / usability Integration with open source alerting/monitoring tools Integration with proprietary systems Version proofing RPCs / State storage Packaging friendly plug-in install Multi Datacenter Story Performance Increases Inline near-realtime analytics Puppet/Chef style config for nodes Lightweight Agent Masterless Agent Better S3 / AWS support Flume Office Hours, 2/28/2011 18
Q+A 19 Flume Office Hours, 2/28/2011
Flume office-hours-110228

Mais conteúdo relacionado

Mais procurados

Introduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project HostingIntroduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project Hosting
Philip Johnson
 

Mais procurados (20)

Docker at Spotify
Docker at SpotifyDocker at Spotify
Docker at Spotify
 
Bareos - Open Source Data Protection, by Philipp Storz
Bareos - Open Source Data Protection, by Philipp StorzBareos - Open Source Data Protection, by Philipp Storz
Bareos - Open Source Data Protection, by Philipp Storz
 
Introduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project HostingIntroduction to Subversion and Google Project Hosting
Introduction to Subversion and Google Project Hosting
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with docker
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016
 
Democratizing Development - Scott Gress
Democratizing Development - Scott GressDemocratizing Development - Scott Gress
Democratizing Development - Scott Gress
 
Docker serverless v1.0
Docker serverless v1.0Docker serverless v1.0
Docker serverless v1.0
 
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
Nextflow Camp 2019: nf-core tutorial (Updated Feb 2020)
 
Composer Power User Tips
Composer Power User TipsComposer Power User Tips
Composer Power User Tips
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
Dockercon 16 Wrap-up (Docker for Mac and Win, Docker 1.12, Swarm Mode, etc.)
 
Docker Security Deep Dive by Ying Li and David Lawrence
Docker Security Deep Dive by Ying Li and David LawrenceDocker Security Deep Dive by Ying Li and David Lawrence
Docker Security Deep Dive by Ying Li and David Lawrence
 
Docker for Java developers at JavaLand
Docker for Java developers at JavaLandDocker for Java developers at JavaLand
Docker for Java developers at JavaLand
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
 
Cloud infrastructure as code
Cloud infrastructure as codeCloud infrastructure as code
Cloud infrastructure as code
 
Browser Testing with Docker - Craig Huber
Browser Testing with Docker - Craig HuberBrowser Testing with Docker - Craig Huber
Browser Testing with Docker - Craig Huber
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry
 
fabric8 ... and Docker, Kubernetes & OpenShift
fabric8 ... and Docker, Kubernetes & OpenShiftfabric8 ... and Docker, Kubernetes & OpenShift
fabric8 ... and Docker, Kubernetes & OpenShift
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
 

Destaque (6)

Contributing to Impala
Contributing to ImpalaContributing to Impala
Contributing to Impala
 
Linux Kernel Boot Process , SOSCON 2015, By Mario Cho
Linux Kernel Boot Process , SOSCON 2015, By Mario ChoLinux Kernel Boot Process , SOSCON 2015, By Mario Cho
Linux Kernel Boot Process , SOSCON 2015, By Mario Cho
 
Mozilla - Anurag Phadke - Hadoop World 2010
Mozilla - Anurag Phadke - Hadoop World 2010Mozilla - Anurag Phadke - Hadoop World 2010
Mozilla - Anurag Phadke - Hadoop World 2010
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
 

Semelhante a Flume office-hours-110228

Semelhante a Flume office-hours-110228 (20)

File Transfers - Web Hosting Curriculum [5/10]
File Transfers - Web Hosting Curriculum [5/10] File Transfers - Web Hosting Curriculum [5/10]
File Transfers - Web Hosting Curriculum [5/10]
 
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5
 
Bp106 Worst Practices Final
Bp106   Worst Practices FinalBp106   Worst Practices Final
Bp106 Worst Practices Final
 
PHP Quality Assurance Workshop PHPBenelux
PHP Quality Assurance Workshop PHPBeneluxPHP Quality Assurance Workshop PHPBenelux
PHP Quality Assurance Workshop PHPBenelux
 
Managing Plone Projects with Perl and Subversion
Managing Plone Projects with Perl and SubversionManaging Plone Projects with Perl and Subversion
Managing Plone Projects with Perl and Subversion
 
Puppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE LvivPuppet / DevOps - EDGE Lviv
Puppet / DevOps - EDGE Lviv
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11
 
Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsBecoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
 
Chicago Data Summit: Flume: An Introduction
Chicago Data Summit: Flume: An IntroductionChicago Data Summit: Flume: An Introduction
Chicago Data Summit: Flume: An Introduction
 
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
Uyuni: the solution to manage your Linux infrastructure (OpenFest 2020)
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Evolution of HTTP - Miran Al Mehrab
Evolution of HTTP - Miran Al MehrabEvolution of HTTP - Miran Al Mehrab
Evolution of HTTP - Miran Al Mehrab
 
XPDS16: Xen Project Weather Report 2016
XPDS16: Xen Project Weather Report 2016XPDS16: Xen Project Weather Report 2016
XPDS16: Xen Project Weather Report 2016
 
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
 
Linux textbook notes - Graham Helton
Linux textbook notes - Graham HeltonLinux textbook notes - Graham Helton
Linux textbook notes - Graham Helton
 
Linux training
Linux trainingLinux training
Linux training
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
 
Introduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics WorkshopIntroduction to linux at Introductory Bioinformatics Workshop
Introduction to linux at Introductory Bioinformatics Workshop
 
BitTorrent on iOS
BitTorrent on iOSBitTorrent on iOS
BitTorrent on iOS
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
 

Mais de Cloudera, Inc.

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Flume office-hours-110228

  • 1.
  • 2. Flume Office Hours Community planning Jonathan Hsieh Cloudera HQ, 2/28/2011
  • 3. Outline State of the world What’s new? Stories (Chime in!) What needs work? Prioritizing what is next. Q+A 3 Flume Office Hours, 2/28/2011
  • 4. State of the world Flume Office Hours, 2/28/2011 4
  • 5. Growing user and developer community Github stats: Currently 295 watchers, 51 forks New Committers: 9/10: Eric Sammer (Cloudera) 1/11: Bruce Mitchener (Independent) User characteristics Most potential users seem to use adhoc scripts Most users are early adopters / startup devops Flume Office Hours, 2/28/2011 5
  • 6. A short feature history 6/10: v0.9.0 Initial open source release 8/10: v0.9.1 Fixes for hangs Initial compression features 10/10: v0.9.1+29 (CDH3b3, packages) Added kerberized HDFS support Flume cookbook Elastic Search / Cassandra Plugins Initial VoldemortPlugins 11/10: v0.9.2 Support for other compression codecs Avro RPC Improvements to tail and exec Robustness improvements Initial Hbase /MongoDBPlugin 2/11: v0.9.3 (CDH3b4, packages) Flume Node Windows support Initial JSON metrics support Multi-master functional Robustness improvements JRuby / AMQP Plugins S3/EC2 Blog Stories 4/11: v0.9.3+xxx (CDH3 Stable, packages) Excessive Duplication fixes Compression fixes ?/11: v0.9.4 Flume Office Hours, 2/28/2011 6
  • 7. Whats new? Flume Office Hours, 2/28/2011 7
  • 8. New features Flume node JSON metrics http://node:35862/node/reports Terser syntax { deco1 => { deco2 => sink } } deco1 deco2 sink Multiple collector sink support collector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”), ] } Limited Multi-master support Windows support Flume Office Hours, 2/28/2011 8
  • 9. Stories 9 Flume Office Hours, 2/28/2011
  • 10. : The Standard Use Case HDFS Flume Master Agent server Agent Collector server Agent server Agent server 10 Agent server Agent Collector server Agent server Agent server Agent server Agent Collector server Agent server Agent server Collector tier Agent tier Flume Office Hours, 2/28/2011
  • 11. : Multi Datacenter 11 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
  • 12. : Multi Datacenter 12 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Relay Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
  • 13. : Near Realtime Aggregator 13 HDFS DB Flume Agent Ad svr Collector Tracker Agent Ad svr Agent Ad svr Agent Ad svr quick reports Hive job verify reports Flume Office Hours, 2/28/2011
  • 14. An enterprise story 14 Kerberos HDFS Flume Collector tier Agent api Agent Collector api Agent api Win api API server Agent api Agent Collector api Agent api Linux api D D D D D D Agent api Agent Collector api Agent api Linux api Flume Office Hours, 2/28/2011 Active Directory / LDAP
  • 15. An emerging community story 15 HDFS HBase Incremental Search Idx Flume Agent Hive query Agent Agent Collector Fanout index hbase hdfs Agent svr Pig query Key lookup Range query Search query Faceted query Flume Office Hours, 2/28/2011
  • 16. What needs work?What comes next? Flume Office Hours, 2/28/2011 16
  • 17. Known issues Excessive event duplication (due to tail or e2e agent) Configuration translation problem in some cases Multi-master limited: doesn’t work with translations Flume Office Hours, 2/28/2011 17
  • 18. What’s next? (proposals) Fix Excessive duplication issues. Apache Incubator (?) Log4j/Log4net/logback/etc… Fix Multi-master limitations. Security upgrades for node to node comms (TLS/SSL) Improved metrics / GUI / usability Integration with open source alerting/monitoring tools Integration with proprietary systems Version proofing RPCs / State storage Packaging friendly plug-in install Multi Datacenter Story Performance Increases Inline near-realtime analytics Puppet/Chef style config for nodes Lightweight Agent Masterless Agent Better S3 / AWS support Flume Office Hours, 2/28/2011 18
  • 19. Q+A 19 Flume Office Hours, 2/28/2011