[2024]Digital Global Overview Report 2024 Meltwater.pdf
Flume office-hours-110228
1.
2. Flume Office Hours Community planning Jonathan Hsieh Cloudera HQ, 2/28/2011
3. Outline State of the world What’s new? Stories (Chime in!) What needs work? Prioritizing what is next. Q+A 3 Flume Office Hours, 2/28/2011
4. State of the world Flume Office Hours, 2/28/2011 4
5. Growing user and developer community Github stats: Currently 295 watchers, 51 forks New Committers: 9/10: Eric Sammer (Cloudera) 1/11: Bruce Mitchener (Independent) User characteristics Most potential users seem to use adhoc scripts Most users are early adopters / startup devops Flume Office Hours, 2/28/2011 5
6. A short feature history 6/10: v0.9.0 Initial open source release 8/10: v0.9.1 Fixes for hangs Initial compression features 10/10: v0.9.1+29 (CDH3b3, packages) Added kerberized HDFS support Flume cookbook Elastic Search / Cassandra Plugins Initial VoldemortPlugins 11/10: v0.9.2 Support for other compression codecs Avro RPC Improvements to tail and exec Robustness improvements Initial Hbase /MongoDBPlugin 2/11: v0.9.3 (CDH3b4, packages) Flume Node Windows support Initial JSON metrics support Multi-master functional Robustness improvements JRuby / AMQP Plugins S3/EC2 Blog Stories 4/11: v0.9.3+xxx (CDH3 Stable, packages) Excessive Duplication fixes Compression fixes ?/11: v0.9.4 Flume Office Hours, 2/28/2011 6
10. : The Standard Use Case HDFS Flume Master Agent server Agent Collector server Agent server Agent server 10 Agent server Agent Collector server Agent server Agent server Agent server Agent Collector server Agent server Agent server Collector tier Agent tier Flume Office Hours, 2/28/2011
11. : Multi Datacenter 11 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
12. : Multi Datacenter 12 HDFS Collector tier Agent api Agent api Agent Collector api Agent api API server Agent api Agent Collector api Agent api Agent api Agent api Agent Collector api Agent api Agent api Relay Agent api Agent api Agent Collector api Agent proc Agent api Processor server Agent Collector api Agent api Agent proc Agent api Agent Collector api Agent api Agent proc Flume Office Hours, 2/28/2011
13. : Near Realtime Aggregator 13 HDFS DB Flume Agent Ad svr Collector Tracker Agent Ad svr Agent Ad svr Agent Ad svr quick reports Hive job verify reports Flume Office Hours, 2/28/2011
14. An enterprise story 14 Kerberos HDFS Flume Collector tier Agent api Agent Collector api Agent api Win api API server Agent api Agent Collector api Agent api Linux api D D D D D D Agent api Agent Collector api Agent api Linux api Flume Office Hours, 2/28/2011 Active Directory / LDAP
15. An emerging community story 15 HDFS HBase Incremental Search Idx Flume Agent Hive query Agent Agent Collector Fanout index hbase hdfs Agent svr Pig query Key lookup Range query Search query Faceted query Flume Office Hours, 2/28/2011
17. Known issues Excessive event duplication (due to tail or e2e agent) Configuration translation problem in some cases Multi-master limited: doesn’t work with translations Flume Office Hours, 2/28/2011 17
18. What’s next? (proposals) Fix Excessive duplication issues. Apache Incubator (?) Log4j/Log4net/logback/etc… Fix Multi-master limitations. Security upgrades for node to node comms (TLS/SSL) Improved metrics / GUI / usability Integration with open source alerting/monitoring tools Integration with proprietary systems Version proofing RPCs / State storage Packaging friendly plug-in install Multi Datacenter Story Performance Increases Inline near-realtime analytics Puppet/Chef style config for nodes Lightweight Agent Masterless Agent Better S3 / AWS support Flume Office Hours, 2/28/2011 18