SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Apache Flume
2013.5.29 ⻩黄浩松
Flume的架构
Flume结点的⾓角⾊色
•Node节点,按⾓角⼜又分Agent和Collector
•Master节点
Node结点
• 指逻辑节点。⼀一个物理
节点(JVM实例)上可以
包含多个逻辑节点,并
且可以绑定或解绑逻辑
节点到某个特定的物理
节点。
• 每个逻辑节点会向master
报告执⾏行状态,配置,
发送⼼心跳。
JVM
Agent Node
Collector Node
Agent
•agent通过指定的
source来接受数据
流,并通过指定
的sink转发给
collector。
Agent
source
sink
Collector
source
sink
Collector
•collector聚合多个
agent的数据流,
并传输到指定的
输出
Agent
source
sink
Collector
source
sink
Collector与Agent的对应
•可⼿手动指定,也
可⾃自动匹配
•⾃自动匹配的情况
下,master会平衡
collector之间的负
载
Event
•event是Flume中数
据的最⼩小表⽰示单
元。
Agent
metadata
body
Unix timestamp
Nanosecond timestamp
Priority
Source host
Event
流之间的隔离措施
•为将所有的source加标签,使⽤用⽤用相同的管
道同时进⾏行传输,最终经过分离后传送到不
同的⺫⽬目的地。
•另⼀一种为阻塞型,任意时刻只能有⼀一个数据
流在传输。
Source种类
•Tiered Event Source
• 由master或配置⽂文件提供参数,⽤用于node之间传输
•Logical Source
• 信息不完全⽽而且抽象的source,由master⾃自动管理,当
master获取到⾜足够的信息时会将其转换为rpcSource
•Basic Source
Source机制
•Push数据到Flume
•令Flume不断Polling数据
•Embedding,即在应⽤用程序中嵌⼊入Flume组件
Sink的种类
•Tiered Event Sink
• 由master或配置⽂文件提供参数,⽤用于node之间传输
•Logical Sink
• 信息不完全⽽而且抽象的sink,由master⾃自动管理,当master
获取到⾜足够的信息时会将其转换为rpcSink
•Basic Sink
Sink的种类
•Special Sink
• Fan out
• [sinkA, sinkB] 表⽰示同时往sinkA,sinkB写数据
• Fail over
• [sinkA ? sinkB] 表⽰示写sinkA失败时尝试写sinkB
• Roll
• roll(A) text("file-%{rolltag}")
• 表⽰示每隔A秒关闭当前sink实例,再开另外⼀一个sink实例继续往另⼀一个⽂文件写内容
Sink的输出格式
•有avro,json,log4j等
•可先设置压缩,再写⼊入HDFS
Sink的装饰器
•可通过sink decorators对sink中的event进⾏行控
制。
•如增⼤大,过滤或压缩其中的event。
•可⾃自定义装饰器。
Master结点
•管理所有结点的配置
•跟踪数据流的最后确认信息,并通知agent
多个Master
•在多个master的情况下,当有⼀一个master失败
后,其他master会接管它的⼯工作并保证所有
正在传输的数据流正常运⾏行。
•借助zookeeper管理
Master⽂文件存储
•临时存于内存
• 只能单Master
• 重启Master及Master错误退出均会配置⽂文件丢失
•基于zookeeper持久化地存储
Flume的可靠性
Agent的可靠性
•两种可靠的Silk
•agentE2ESink
• 使⽤用预先⽇日志和确认信息来保证数据正确传输
•agentDFOSink
• 检测到collector失败的时候,先将数据写到磁盘
• 当collector恢复或有其他可⽤用collector时,继续传输
Agent的可靠性
•故障转移,由
master控制
•autoE2EChain
•autoDFOChain
•autoBEChain
HDFS冗余
•collector在⼀一定时间间隔后(默认30s)会关
闭HDFS⽂文件,并创建⼀一个新的⽂文件继续写数
据
•当HDFS⽂文件为关闭的情况下,HDFS的冗余
策略才能正常实施
Flume的可管理性
⺴⽹网⻚页
控制台
Flume的可拓展性
可拓展性
•source,sink,decorators均可⾃自定义
Flume的安全控制
安全控制
•可借助kerberized HDFS服务,实现HDFS的读
写安全控制

Mais conteúdo relacionado

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Apache flume介绍