SlideShare uma empresa Scribd logo
1 de 22
Extracting Twitter Data
using Apache Flume
By Bharat Khanna
Talend ETL Developer
What you need ??
• Horton works Hadoop Cluster :- HDP 1.3
• Oracle Virtual Box
• Putty
• Winscp
• Maven (for creating flume-snapshot.jar)
What is Flume ?
• Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data. It has a simple and
flexible architecture based on streaming data flows.
Network Settings at Oracle Virtual Box
Network Settings at Oracle Virtual Box Contd..
Getting Started
• Run your Hadoop Cluster in Virtual Box. Once it is started, make sure you
are able to connect to HDFS from your host windows machine by giving
address as something like http://192.168.56.101:8000.
• This IP address you will get when you run ifconfig command in your
Hadoop cluster once it is started.
File Browser using HUE
• Your HDFS interface from host machine may look like below: -
Setting your bash_profile in Putty
• It is important to set environment variables by editing bash_profile that can
edited using command “vi .bash_profile”(You need dot before bash_profile
as by default it is hidden) at your home directory. Exclude Maven_Home
below for now.
Creating Flume Snapshot.jar
• This jar contains necessary libraries for proper functioning of Flume. This
can be either downloaded by googling or we can create it ourselves. Best is to
create it ourselves.
• You need Maven software for this. If your java version is 1.6, which is in
Hortonworks HDP 1.3 , then download archived version of Maven i.e. 3.0.5
from http://archive.apache.org/dist/maven/maven-3/ else use any latest
version.
Creating Flume Snapshot.jar Contd..
• Once download, unzip the folder in windows, and transfer it to your
Hortonworks cluster using Winscp.
• Create a link to the folder by command “ln -s apache-maven-3.0.5 maven” in
your home directory folder.
• Set the path of this link in your bash_profile as shown in slide 8.
• Logoff and login again to Unix session after saving your bash_profile to
implement changes. Run command “mvn -version” to check its working.
Creating Flume Snapshot.jar Contd..
• Download Cloudera’s Twitter Code zip file from
https://github.com/cloudera/cdh-twitter-example.
• Unzip it and transfer it to your home directory in Hortonworks cluster using
Winscp.
• Go to flume-sources folder under folder cdh-twitter-example-master and run
command “mvn package” to build the flume snapshot.jar file. This file can
be found under target folder in same directory.
Configuring Flume
• Transfer the flume-sources-1.0-SNAPSHOT.jar to lib directory of flume under
location /etc/flume/apache-flume-1.6.0-bin/lib for Hortonworks 1.3 VM.
• Flume’s configuration directory can be found at /etc/flume/apache-flume-1.6.0-
bin/conf.
• Open flume-env.sh.template file in vi editor , set Java_Home Path as defined in the
bash_profile and Flume Classpath as the path of flume-snapshot.jar in double
quotes.
• Rename flume-env.sh.template to flume-env.sh using mv command.
Configuring Flume contd..
• You also need to transfer following jar files to flume lib folder.
Jar From Directory
hadoop-core.jar HADOOP_HOME i.e. /usr/lib/hadoop
hadoop-client-1.2.0.1.3.0.0-107.jar HADOOP_HOME i.e. /usr/lib/hadoop
jets3t-0.6.1.jar /usr/lib/hadoop/lib
commons-httpclient-3.0.1.jar /usr/lib/hadoop/lib
commons-configuration-1.6.jar /usr/lib/hadoop/lib
commons-codec-1.4.jar /usr/lib/hadoop/lib
Creating Twitter App
• Go to dev.twitter.com and click on create a new app.
• Give your name , description and website may be like
http://yourdomain.com.
• After creating app, go to Keys and Access tokens and create your consumer
key , consumer secret , access token and access token secret.
• Make a note of it as you need that in subsequent steps.
Creating conf file
• Go to folder , /etc/flume/apache-flume-1.6.0-bin/conf and open a new file
named Twitter.conf.
• A Sample Image of it is shown in next slide. You need to insert your
consumer key , consumer secret , access token and access token secret that
you got in previous step.
• Then you need to enter keywords for which you want to analyze the data.
• At last, you need to give your hdfs path that you can get from fs.default.name
in core-site.xml file under Hadoop_Home/conf i.e. /usr/lib/hadoop/conf
Checks before running flume-Setting Timezone
• Make sure that the time being shown in your VM matches with what you can see in
your local machine. If they are not, you need to reset the time as shown below. You
can time in your VM by “date” command.
• If your Timezone is matching , you can skip next 2 steps.
• Time zone is controlled by /etc/localtime file. You can check the list of timezones
available under /usr/share/zoneinfo/ directory.
• cd /etc
• ln -s /usr/share/zoneinfo/US/Eastern localtime
Checks before running flume-Setting Oracle
Virtual Box Properties
• You need to make sure that you can always reset your time in VM as you
have done in previous step. For that you need to set following properties at
VirtualBox.
• In Windows, start a command line interpreter, go to C:Program
FilesOracle folder and click VirtualBox to select, then holding left shift key,
do a mouse right-button click and select "Open command window here"
menu, the interpreter has to be running now.
Checks before running flume-Setting Oracle
Virtual Box Properties Contd..
• Run following commands in command prompt.
VBoxManage setextradata ${VMNAME}
"VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled" 1
$ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/--
timesync-interval" 10000
$ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/--
timesync-min-adjust" 100
$ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/--
timesync-set-on-restore" 1
$ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/--
timesync-set-threshold" 1000
Running Flume
• Go to flume bin directory and run the flume agent using following
command:-
• flume-ng agent -n TwitterAgent -c conf -f /etc/flume/apache-flume-1.6.0-
bin/conf/twitter.conf
• After sometime, you may start getting files like below under directory
specified in conf file.
Error Catalog
• You may face following frequently occurring errors while running flume.
Apache flume Error - java.lang.NoSuchMethodError:
twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j FilterQuery
Fix :- This happens because of FilterQuery.class occurring in two different jars( one
of which will be flume-snapshot.jar) .
You can search for those clashing jars using command :- “find . -name "*.jar" | xargs
grep FilterQuery.class” under lib directory of flume.
Rename the other jar by suffixing jar name with .org.
Error Catalog Contd..
• Apache flume Error :- java.io.IOException: Callable timed out after 10000
ms on file:
Fix :- This happens because of too many connections to twitter from your
account. Just wait for some time and try again.

Mais conteúdo relacionado

Mais procurados

Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Load Balancing with Apache
Load Balancing with ApacheLoad Balancing with Apache
Load Balancing with ApacheBradley Holt
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
Apache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpApache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpSander Temme
 
OSSV [Open System SnapVault]
OSSV [Open System SnapVault]OSSV [Open System SnapVault]
OSSV [Open System SnapVault]Ashwin Pawar
 
Tips for Administering Complex Distributed Perforce Environments
Tips for Administering Complex Distributed Perforce EnvironmentsTips for Administering Complex Distributed Perforce Environments
Tips for Administering Complex Distributed Perforce EnvironmentsPerforce
 
Denial of Service Mitigation Tactics in FreeBSD
Denial of Service Mitigation Tactics in FreeBSDDenial of Service Mitigation Tactics in FreeBSD
Denial of Service Mitigation Tactics in FreeBSDSteven Kreuzer
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData
Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData
Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData InfluxData
 

Mais procurados (20)

Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Load Balancing with Apache
Load Balancing with ApacheLoad Balancing with Apache
Load Balancing with Apache
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Mike Resseler - Deduplication in windows server 2012 r2
Mike Resseler - Deduplication in windows server 2012 r2Mike Resseler - Deduplication in windows server 2012 r2
Mike Resseler - Deduplication in windows server 2012 r2
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Apache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpApache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling Up
 
OSSV [Open System SnapVault]
OSSV [Open System SnapVault]OSSV [Open System SnapVault]
OSSV [Open System SnapVault]
 
Tips for Administering Complex Distributed Perforce Environments
Tips for Administering Complex Distributed Perforce EnvironmentsTips for Administering Complex Distributed Perforce Environments
Tips for Administering Complex Distributed Perforce Environments
 
Denial of Service Mitigation Tactics in FreeBSD
Denial of Service Mitigation Tactics in FreeBSDDenial of Service Mitigation Tactics in FreeBSD
Denial of Service Mitigation Tactics in FreeBSD
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData
Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData
Where Flux and InfluxDB Are Headed | Paul Dix | InfluxData
 

Semelhante a Extracting twitter data using apache flume

Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Installing Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for WindowsInstalling Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for WindowsJonathan Bloom
 
Wamp & LAMP - Installation and Configuration
Wamp & LAMP - Installation and ConfigurationWamp & LAMP - Installation and Configuration
Wamp & LAMP - Installation and ConfigurationChetan Soni
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
WordPress Development Environments
WordPress Development Environments WordPress Development Environments
WordPress Development Environments Ohad Raz
 
Making Developers Productive with Vagrant, VirtualBox, and Docker
Making Developers Productive with Vagrant, VirtualBox, and DockerMaking Developers Productive with Vagrant, VirtualBox, and Docker
Making Developers Productive with Vagrant, VirtualBox, and DockerJohn Rofrano
 
Professional deployment
Professional deploymentProfessional deployment
Professional deploymentIvelina Dimova
 
BLCN532 Lab 1Set up your development environmentV2.0.docx
BLCN532 Lab 1Set up your development environmentV2.0.docxBLCN532 Lab 1Set up your development environmentV2.0.docx
BLCN532 Lab 1Set up your development environmentV2.0.docxmoirarandell
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java binOlve Hansen
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantArtefactual Systems - AtoM
 
PHP Dependency Management with Composer
PHP Dependency Management with ComposerPHP Dependency Management with Composer
PHP Dependency Management with ComposerAdam Englander
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...
Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...
Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...Yun Lung Li
 
Varying wordpressdevelopmentenvironment wp-campus2016
Varying wordpressdevelopmentenvironment wp-campus2016Varying wordpressdevelopmentenvironment wp-campus2016
Varying wordpressdevelopmentenvironment wp-campus2016David Brattoli
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
Varnish Configuration Step by Step
Varnish Configuration Step by StepVarnish Configuration Step by Step
Varnish Configuration Step by StepKim Stefan Lindholm
 
How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...
How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...
How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...VEXXHOST Private Cloud
 
Making environment for_infrastructure_as_code
Making environment for_infrastructure_as_codeMaking environment for_infrastructure_as_code
Making environment for_infrastructure_as_codeSoshi Nemoto
 

Semelhante a Extracting twitter data using apache flume (20)

Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Installing Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for WindowsInstalling Hortonworks Hadoop for Windows
Installing Hortonworks Hadoop for Windows
 
Wamp & LAMP - Installation and Configuration
Wamp & LAMP - Installation and ConfigurationWamp & LAMP - Installation and Configuration
Wamp & LAMP - Installation and Configuration
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
WordPress Development Environments
WordPress Development Environments WordPress Development Environments
WordPress Development Environments
 
Making Developers Productive with Vagrant, VirtualBox, and Docker
Making Developers Productive with Vagrant, VirtualBox, and DockerMaking Developers Productive with Vagrant, VirtualBox, and Docker
Making Developers Productive with Vagrant, VirtualBox, and Docker
 
Professional deployment
Professional deploymentProfessional deployment
Professional deployment
 
BLCN532 Lab 1Set up your development environmentV2.0.docx
BLCN532 Lab 1Set up your development environmentV2.0.docxBLCN532 Lab 1Set up your development environmentV2.0.docx
BLCN532 Lab 1Set up your development environmentV2.0.docx
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java bin
 
Its3 Drupal
Its3 DrupalIts3 Drupal
Its3 Drupal
 
Its3 Drupal
Its3 DrupalIts3 Drupal
Its3 Drupal
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with Vagrant
 
PHP Dependency Management with Composer
PHP Dependency Management with ComposerPHP Dependency Management with Composer
PHP Dependency Management with Composer
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...
Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...
Apache flink-crash-course-by-slim-baltagi-and-srini-palthepu-150817191850-lva...
 
Varying wordpressdevelopmentenvironment wp-campus2016
Varying wordpressdevelopmentenvironment wp-campus2016Varying wordpressdevelopmentenvironment wp-campus2016
Varying wordpressdevelopmentenvironment wp-campus2016
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
Varnish Configuration Step by Step
Varnish Configuration Step by StepVarnish Configuration Step by Step
Varnish Configuration Step by Step
 
How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...
How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...
How To Setup Highly Available Web Servers with Keepalived & Floating IPs on U...
 
Making environment for_infrastructure_as_code
Making environment for_infrastructure_as_codeMaking environment for_infrastructure_as_code
Making environment for_infrastructure_as_code
 

Último

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 

Último (20)

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 

Extracting twitter data using apache flume

  • 1. Extracting Twitter Data using Apache Flume By Bharat Khanna Talend ETL Developer
  • 2. What you need ?? • Horton works Hadoop Cluster :- HDP 1.3 • Oracle Virtual Box • Putty • Winscp • Maven (for creating flume-snapshot.jar)
  • 3. What is Flume ? • Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
  • 4. Network Settings at Oracle Virtual Box
  • 5. Network Settings at Oracle Virtual Box Contd..
  • 6. Getting Started • Run your Hadoop Cluster in Virtual Box. Once it is started, make sure you are able to connect to HDFS from your host windows machine by giving address as something like http://192.168.56.101:8000. • This IP address you will get when you run ifconfig command in your Hadoop cluster once it is started.
  • 7. File Browser using HUE • Your HDFS interface from host machine may look like below: -
  • 8. Setting your bash_profile in Putty • It is important to set environment variables by editing bash_profile that can edited using command “vi .bash_profile”(You need dot before bash_profile as by default it is hidden) at your home directory. Exclude Maven_Home below for now.
  • 9. Creating Flume Snapshot.jar • This jar contains necessary libraries for proper functioning of Flume. This can be either downloaded by googling or we can create it ourselves. Best is to create it ourselves. • You need Maven software for this. If your java version is 1.6, which is in Hortonworks HDP 1.3 , then download archived version of Maven i.e. 3.0.5 from http://archive.apache.org/dist/maven/maven-3/ else use any latest version.
  • 10. Creating Flume Snapshot.jar Contd.. • Once download, unzip the folder in windows, and transfer it to your Hortonworks cluster using Winscp. • Create a link to the folder by command “ln -s apache-maven-3.0.5 maven” in your home directory folder. • Set the path of this link in your bash_profile as shown in slide 8. • Logoff and login again to Unix session after saving your bash_profile to implement changes. Run command “mvn -version” to check its working.
  • 11. Creating Flume Snapshot.jar Contd.. • Download Cloudera’s Twitter Code zip file from https://github.com/cloudera/cdh-twitter-example. • Unzip it and transfer it to your home directory in Hortonworks cluster using Winscp. • Go to flume-sources folder under folder cdh-twitter-example-master and run command “mvn package” to build the flume snapshot.jar file. This file can be found under target folder in same directory.
  • 12. Configuring Flume • Transfer the flume-sources-1.0-SNAPSHOT.jar to lib directory of flume under location /etc/flume/apache-flume-1.6.0-bin/lib for Hortonworks 1.3 VM. • Flume’s configuration directory can be found at /etc/flume/apache-flume-1.6.0- bin/conf. • Open flume-env.sh.template file in vi editor , set Java_Home Path as defined in the bash_profile and Flume Classpath as the path of flume-snapshot.jar in double quotes. • Rename flume-env.sh.template to flume-env.sh using mv command.
  • 13. Configuring Flume contd.. • You also need to transfer following jar files to flume lib folder. Jar From Directory hadoop-core.jar HADOOP_HOME i.e. /usr/lib/hadoop hadoop-client-1.2.0.1.3.0.0-107.jar HADOOP_HOME i.e. /usr/lib/hadoop jets3t-0.6.1.jar /usr/lib/hadoop/lib commons-httpclient-3.0.1.jar /usr/lib/hadoop/lib commons-configuration-1.6.jar /usr/lib/hadoop/lib commons-codec-1.4.jar /usr/lib/hadoop/lib
  • 14. Creating Twitter App • Go to dev.twitter.com and click on create a new app. • Give your name , description and website may be like http://yourdomain.com. • After creating app, go to Keys and Access tokens and create your consumer key , consumer secret , access token and access token secret. • Make a note of it as you need that in subsequent steps.
  • 15. Creating conf file • Go to folder , /etc/flume/apache-flume-1.6.0-bin/conf and open a new file named Twitter.conf. • A Sample Image of it is shown in next slide. You need to insert your consumer key , consumer secret , access token and access token secret that you got in previous step. • Then you need to enter keywords for which you want to analyze the data. • At last, you need to give your hdfs path that you can get from fs.default.name in core-site.xml file under Hadoop_Home/conf i.e. /usr/lib/hadoop/conf
  • 16.
  • 17. Checks before running flume-Setting Timezone • Make sure that the time being shown in your VM matches with what you can see in your local machine. If they are not, you need to reset the time as shown below. You can time in your VM by “date” command. • If your Timezone is matching , you can skip next 2 steps. • Time zone is controlled by /etc/localtime file. You can check the list of timezones available under /usr/share/zoneinfo/ directory. • cd /etc • ln -s /usr/share/zoneinfo/US/Eastern localtime
  • 18. Checks before running flume-Setting Oracle Virtual Box Properties • You need to make sure that you can always reset your time in VM as you have done in previous step. For that you need to set following properties at VirtualBox. • In Windows, start a command line interpreter, go to C:Program FilesOracle folder and click VirtualBox to select, then holding left shift key, do a mouse right-button click and select "Open command window here" menu, the interpreter has to be running now.
  • 19. Checks before running flume-Setting Oracle Virtual Box Properties Contd.. • Run following commands in command prompt. VBoxManage setextradata ${VMNAME} "VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled" 1 $ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/-- timesync-interval" 10000 $ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/-- timesync-min-adjust" 100 $ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/-- timesync-set-on-restore" 1 $ VBoxManage guestproperty set ${VMNAME} "/VirtualBox/GuestAdd/VBoxService/-- timesync-set-threshold" 1000
  • 20. Running Flume • Go to flume bin directory and run the flume agent using following command:- • flume-ng agent -n TwitterAgent -c conf -f /etc/flume/apache-flume-1.6.0- bin/conf/twitter.conf • After sometime, you may start getting files like below under directory specified in conf file.
  • 21. Error Catalog • You may face following frequently occurring errors while running flume. Apache flume Error - java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j FilterQuery Fix :- This happens because of FilterQuery.class occurring in two different jars( one of which will be flume-snapshot.jar) . You can search for those clashing jars using command :- “find . -name "*.jar" | xargs grep FilterQuery.class” under lib directory of flume. Rename the other jar by suffixing jar name with .org.
  • 22. Error Catalog Contd.. • Apache flume Error :- java.io.IOException: Callable timed out after 10000 ms on file: Fix :- This happens because of too many connections to twitter from your account. Just wait for some time and try again.