SlideShare uma empresa Scribd logo
1 de 29
A part of the Nordic IT group EVRY
Infopulse
Oleksiy Krotov (Expert Oracle DBA)
19.01.2016
BIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop 2
Apache Hadoop
HADOOP ARCHITECTURE
HADOOP INTERFACE
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
HADOOP MAPREDUCE
ORACLE BIG DATA
RESOURCES
Hadoop Architecture
Apache Hadoop is an open-source framework for distributed storage and
distributed processing of very large data sets
storage part known as Hadoop Distributed File System (HDFS)
processing part called MapReduce.
Hadoop splits files into large blocks and distributes them across nodes in
a cluster. To process data, Hadoop transfers packaged code for nodes to
process in parallel based on the data that needs to be processed.
Hadoop Architecture
Biggest Hadoop cluster: Yahoo! has more than 100,000 CPUs in over
40,000 servers running Hadoop, with its biggest Hadoop cluster
running 4,500 nodes with 455 PetaBytes of data in Hadoop (2014)
More than half of the Fortune 50 companies run open source Apache
Hadoop based on Cloudera. (2012)
The HDFS file system is not restricted to MapReduce jobs. It can be
used for other applications, many of which are under development at
Apache. The list includes the HBase database, the Apache Mahout
machine learning system, and the Apache Hive Data Warehouse
system. Hadoop can in theory be used for any sort of work that is
batch-oriented rather than real-time, is very data-intensive, and
benefits from parallel processing of data.
Hadoop Architecture
NameNode hosts metadata (file system index of files and blocks)
DataNode hosts the data (blocks)
JobTracker is a master which creates and runs the job
Hadoop Interface
[training@localhost ~]$ hdfs dfsadmin -report
Configured Capacity: 15118729216 (14.08 GB)
Present Capacity: 10163642368 (9.47 GB)
DFS Remaining: 9228095488 (8.59 GB)
DFS Used: 935546880 (892.21 MB)
DFS Used%: 9.2%
Under replicated blocks: 3
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: 127.0.0.1:50010 (localhost.localdomain)
Hostname: localhost.localdomain
Decommission Status : Normal
Configured Capacity: 15118729216 (14.08 GB)
DFS Used: 935546880 (892.21 MB)
Non DFS Used: 4955086848 (4.61 GB)
DFS Remaining: 9228095488 (8.59 GB)
DFS Used%: 6.19%
DFS Remaining%: 61.04%
Last contact: Mon Jan 18 14:05:48 EST 2016
Hadoop Interface
[training@localhost ~]$ hadoop fs -help get
-get [-ignoreCrc] [-crc] <src> ... <localdst>: Copy files that match the file pattern <src>
to the local name. <src> is kept. When copying multiple,
files, the destination must be a directory.
hadoop fs –ls
hadoop fs -put purchases.txt
hadoop fs -put access_log
hadoop fs -ls
hadoop fs -tail purchases.txt
hadoop fs get filename
hs {mapper script} {reducer script} {input_file} {output directory}
hs mapper.py reducer.py myinput joboutput
Hadoop Interface
Hadoop Interface
Hadoop Distributed File System (HDFS)
HDFS is a Java-based file system that provides scalable and
reliable data storage, and it was designed to span large
clusters of commodity servers.
HDFS is a scalable, fault-tolerant, distributed storage system
that works closely with a wide variety of concurrent data
access applications
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)
Default replication value 3, data is stored on three nodes:
two on the same rack, and one on a different rack.
Data nodes can talk to each other to rebalance data, to
move copies around, and to keep the replication of data
high
Apache Hadoop can work with additional file systems:
FTP, Amazon S3, Windows Azure Storage Blobs (WASB)
Hadoop MapReduce
Hadoop MapReduce is a software framework for easily
writing applications which process vast amounts of
data (multi-terabyte data-sets) in-parallel on large
clusters (thousands of nodes) of commodity hardware
in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into
independent chunks which are processed by the map
tasks in a completely parallel manner. The framework
sorts the outputs of the maps, which are then input to
the reduce tasks.
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Hadoop MapReduce
Usage: $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd|JavaClassName> The streaming command to run
-combiner <cmd|JavaClassName> The streaming command to run
-reducer <cmd|JavaClassName> The streaming command to run
-file <file> File/dir to be shipped in the Job jar file
-inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
-outputformat TextOutputFormat(default)|JavaClassName Optional.
-partitioner JavaClassName Optional.
-numReduceTasks <num> Optional.
-inputreader <spec> Optional.
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands
-mapdebug <path> Optional. To run this script when a map task fails
-reducedebug <path> Optional. To run this script when a reduce task fails
-io <identifier> Optional.
-verbose
hs {mapper script} {reducer script} {input_file} {output directory}
hs mapper.py reducer.py myinput joboutput
Oracle Big Data Connectors
Load Data into the Database
Oracle Loader for Hadoop
– Map Reduce job transforms data on Hadoop
into Oracle-ready data types
– Use more Hadoop compute resources
Oracle SQL Connector for HDFS
– Oracle SQL access to data on Hadoop via
external tables
– Use more database compute resources
– Includes option to query in-place
Oracle Big Data Connectors
Load Data into the Database
Oracle Loader for Hadoop
– Map Reduce job transforms data on Hadoop
into Oracle-ready data types
– Use more Hadoop compute resources
Oracle SQL Connector for HDFS
– Oracle SQL access to data on Hadoop via
external tables
– Use more database compute resources
– Includes option to query in-place
Oracle Big Data Connectors
Oracle Big Data Appliance X5-2
Enterprise-class security for Hadoop through Oracle Big Data SQL,
which also provides the ability to use a simple SQL query to quickly
explore data across Hadoop, SQL, and relational databases.
Resources
https://hadoop.apache.org/docs/stable/
https://en.wikipedia.org/wiki/Apache_Hadoop
https://developer.yahoo.com/hadoop/tutorial/
http://go.cloudera.com/udacity-lesson-1
http://content.udacity-data.com/courses/ud617/access_log.gz
http://content.udacity-data.com/courses/ud617/purchases.txt.gz
https://www.youtube.com/watch?v=acWtid-OOWM
http://www.oracle.com/technetwork/database/bigdata-
appliance/overview/index.html
https://www.udacity.com/courses/ud617
Thank you for attention!
BIG DATA: Apache Hadoop 27
BIG DATA: Apache Hadoop 28
Contact us!
Address:
03056,
24, Polyova Str.,
Kyiv, Ukraine
Phone:
+38 044 457-88-56
Email:
info@infopulse.com.ua
Contact us!
Address:
03056,
24, Polyova Str.,
Kyiv, Ukraine
Phone:
+38 044 457-88-56
Email:
info@infopulse.com.ua
BIG DATA: Apache Hadoop 29

Mais conteúdo relacionado

Mais procurados

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 

Mais procurados (20)

Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Hadoop
HadoopHadoop
Hadoop
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
6.hive
6.hive6.hive
6.hive
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Pptx present
Pptx presentPptx present
Pptx present
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Hive and data analysis using pandas
 Hive  and  data analysis  using pandas Hive  and  data analysis  using pandas
Hive and data analysis using pandas
 
Hadoop - Introduction to Hadoop
Hadoop - Introduction to HadoopHadoop - Introduction to Hadoop
Hadoop - Introduction to Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - Hadoop
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 

Destaque

Destaque (11)

Oracle's BigData solutions
Oracle's BigData solutionsOracle's BigData solutions
Oracle's BigData solutions
 
Big data-analytics-ebook
Big data-analytics-ebookBig data-analytics-ebook
Big data-analytics-ebook
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Semelhante a BIG DATA: Apache Hadoop

big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
preetik9044
 

Semelhante a BIG DATA: Apache Hadoop (20)

Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Introduction to Hadoop
Introduction to Hadoop Introduction to Hadoop
Introduction to Hadoop
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 

Último

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 

Último (15)

lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 

BIG DATA: Apache Hadoop

  • 1. A part of the Nordic IT group EVRY Infopulse Oleksiy Krotov (Expert Oracle DBA) 19.01.2016 BIG DATA: Apache Hadoop
  • 2. BIG DATA: Apache Hadoop 2 Apache Hadoop HADOOP ARCHITECTURE HADOOP INTERFACE HADOOP DISTRIBUTED FILE SYSTEM (HDFS) HADOOP MAPREDUCE ORACLE BIG DATA RESOURCES
  • 3. Hadoop Architecture Apache Hadoop is an open-source framework for distributed storage and distributed processing of very large data sets storage part known as Hadoop Distributed File System (HDFS) processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.
  • 4. Hadoop Architecture Biggest Hadoop cluster: Yahoo! has more than 100,000 CPUs in over 40,000 servers running Hadoop, with its biggest Hadoop cluster running 4,500 nodes with 455 PetaBytes of data in Hadoop (2014) More than half of the Fortune 50 companies run open source Apache Hadoop based on Cloudera. (2012) The HDFS file system is not restricted to MapReduce jobs. It can be used for other applications, many of which are under development at Apache. The list includes the HBase database, the Apache Mahout machine learning system, and the Apache Hive Data Warehouse system. Hadoop can in theory be used for any sort of work that is batch-oriented rather than real-time, is very data-intensive, and benefits from parallel processing of data.
  • 5. Hadoop Architecture NameNode hosts metadata (file system index of files and blocks) DataNode hosts the data (blocks) JobTracker is a master which creates and runs the job
  • 6. Hadoop Interface [training@localhost ~]$ hdfs dfsadmin -report Configured Capacity: 15118729216 (14.08 GB) Present Capacity: 10163642368 (9.47 GB) DFS Remaining: 9228095488 (8.59 GB) DFS Used: 935546880 (892.21 MB) DFS Used%: 9.2% Under replicated blocks: 3 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 1 (1 total, 0 dead) Live datanodes: Name: 127.0.0.1:50010 (localhost.localdomain) Hostname: localhost.localdomain Decommission Status : Normal Configured Capacity: 15118729216 (14.08 GB) DFS Used: 935546880 (892.21 MB) Non DFS Used: 4955086848 (4.61 GB) DFS Remaining: 9228095488 (8.59 GB) DFS Used%: 6.19% DFS Remaining%: 61.04% Last contact: Mon Jan 18 14:05:48 EST 2016
  • 7. Hadoop Interface [training@localhost ~]$ hadoop fs -help get -get [-ignoreCrc] [-crc] <src> ... <localdst>: Copy files that match the file pattern <src> to the local name. <src> is kept. When copying multiple, files, the destination must be a directory. hadoop fs –ls hadoop fs -put purchases.txt hadoop fs -put access_log hadoop fs -ls hadoop fs -tail purchases.txt hadoop fs get filename hs {mapper script} {reducer script} {input_file} {output directory} hs mapper.py reducer.py myinput joboutput
  • 10. Hadoop Distributed File System (HDFS) HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS is a scalable, fault-tolerant, distributed storage system that works closely with a wide variety of concurrent data access applications
  • 11. Hadoop Distributed File System (HDFS)
  • 12. Hadoop Distributed File System (HDFS)
  • 13. Hadoop Distributed File System (HDFS) Default replication value 3, data is stored on three nodes: two on the same rack, and one on a different rack. Data nodes can talk to each other to rebalance data, to move copies around, and to keep the replication of data high Apache Hadoop can work with additional file systems: FTP, Amazon S3, Windows Azure Storage Blobs (WASB)
  • 14. Hadoop MapReduce Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks.
  • 21. Hadoop MapReduce Usage: $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar [options] Options: -input <path> DFS input file(s) for the Map step -output <path> DFS output directory for the Reduce step -mapper <cmd|JavaClassName> The streaming command to run -combiner <cmd|JavaClassName> The streaming command to run -reducer <cmd|JavaClassName> The streaming command to run -file <file> File/dir to be shipped in the Job jar file -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional. -outputformat TextOutputFormat(default)|JavaClassName Optional. -partitioner JavaClassName Optional. -numReduceTasks <num> Optional. -inputreader <spec> Optional. -cmdenv <n>=<v> Optional. Pass env.var to streaming commands -mapdebug <path> Optional. To run this script when a map task fails -reducedebug <path> Optional. To run this script when a reduce task fails -io <identifier> Optional. -verbose hs {mapper script} {reducer script} {input_file} {output directory} hs mapper.py reducer.py myinput joboutput
  • 22. Oracle Big Data Connectors Load Data into the Database Oracle Loader for Hadoop – Map Reduce job transforms data on Hadoop into Oracle-ready data types – Use more Hadoop compute resources Oracle SQL Connector for HDFS – Oracle SQL access to data on Hadoop via external tables – Use more database compute resources – Includes option to query in-place
  • 23. Oracle Big Data Connectors Load Data into the Database Oracle Loader for Hadoop – Map Reduce job transforms data on Hadoop into Oracle-ready data types – Use more Hadoop compute resources Oracle SQL Connector for HDFS – Oracle SQL access to data on Hadoop via external tables – Use more database compute resources – Includes option to query in-place
  • 24. Oracle Big Data Connectors
  • 25. Oracle Big Data Appliance X5-2 Enterprise-class security for Hadoop through Oracle Big Data SQL, which also provides the ability to use a simple SQL query to quickly explore data across Hadoop, SQL, and relational databases.
  • 27. Thank you for attention! BIG DATA: Apache Hadoop 27
  • 28. BIG DATA: Apache Hadoop 28
  • 29. Contact us! Address: 03056, 24, Polyova Str., Kyiv, Ukraine Phone: +38 044 457-88-56 Email: info@infopulse.com.ua Contact us! Address: 03056, 24, Polyova Str., Kyiv, Ukraine Phone: +38 044 457-88-56 Email: info@infopulse.com.ua BIG DATA: Apache Hadoop 29