Mais conteúdo relacionado Semelhante a Flume in 10minutes (20) Flume in 10minutes1. Flume NG Basics
1 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
2. Oracle’s Big Data Approach
4 Steps to Greater Value
• Acquire and organize all data
• Enable greater access to wide data
• Analyze and refine important data
• Decide and publish insights
2 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
3. How do I get data to my Hadoop Cluster?
Using Flume NG to collect distributed data
3 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
4. My log data is not near my Hadoop cluster
Oracle
Application Big Data Appliance
Servers
Customer Logs
?
4 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
5. Moving Data with Flume NG
Application
Servers Oracle
Big Data Appliance
Flume NG Flume NG
Logs Agent HDFS Write
Avro Agent
Flume NG Flume NG
Logs Avro HDFS Write
Agent Agent
Flume NG Flume NG
Logs Avro HDFS Write
Agent Agent
5 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
6. Building a Basic Flume Agent
One configuration file
• Flume is flexible
– Durable Transactions
– In-Flight Data Modification
– Compresses Data
• Flume simpler than it used to be
– No Zookeeper requirement
– No Master-Slave architecture
• 3 basic pieces
– Source, Channel, Sink
6 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
7. Flume Configuration
flume-ng agent –f this_file –n hdfs-agent
ollect
e
hannel
llect.type = netcat
llect.bind = 127.0.0.1
llect.port = 11111
type = hdfs
hdfs.path = hdfs://localhost:8020/user/oracle/sabre_example
rollInterval = 30
hdfs.writeFormat=Text
hdfs.fileType=DataStream
annel.type = memory
annel.capacity=10000
llect.channels=memoryChannel
channel=memoryChannel
7 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
8. Sending Data to the Agent
• Connect netcat to the host
• Pipe input to it
• Records are transmitted on newline
• head example.xml | nc localhost 11111
8 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
9. Alternatives to Flume
And Their Trade-Offs
• Scribe
– Thrift-based
– Lightweight, but no support
– Not designed around Hadoop
• Kafka
– Designed to resemble a publish-subscribe system
– Explicitly distributed
– Apache Incubator Project
9 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
10. 10 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.
11. 11 Copyright © 2012, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.