SlideShare uma empresa Scribd logo
1 de 82
Baixar para ler offline
TCloud Computing, Inc.
Hadoop Product Family
and Ecosystem
Agenda
• What is Big Data?
• Big Data Opportunities
• Hadoop
– Introduction to Hadoop
– Hadoop 2.0
– What’s next for Hadoop?
• Hadoop ecosystem
• Conclusion
What is Big Data?
A set of files A database A single file
4 V’s of Big Data
http://www.datasciencecentral.com/profiles/blogs/data-veracity
Big data Expands on 4 fronts
Velocity
Volume
Variety
Veracity
MB GB TB PB
batch
periodic
near Real-Time
Real-Time
http://whatis.techtarget.com/definition/3Vs
Big Data Opportunities
http://www.sap.com/corporate-en/news.epx?PressID=21316
Big Data Revenue by Market Segment 2012
• 1
http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017
Big Data Market Forecast 2012-2017
• 1
http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017
Hadoop Solutions
The most common problems Hadoop can solve
Threat Analysis/Trade Surveillance
• Challenge:
– Detecting threats in the form of fraudulent activity or attacks
• Large data volumes involved
• Like looking for a needle in a haystack
• Solution with Hadoop:
– Parallel processing over huge datasets
– Pattern recognition to identify anomalies
• – i.e., threats
• Typical Industry:
– Security, Financial Services
Recommendation Engine
• Challenge:
– Using user data to predict which products to recommend
• Solution with Hadoop:
– Batch processing framework
• Allow execution in in parallel over large datasets
– Collaborative filtering
• Collecting ‘taste’ information from many users
• Utilizing information to predict what similar users like
• Typical Industry
– ISP, Advertising
Walmart Case
Revenue ?
Friday
Beer
Diapers
• 1
http://tech.naver.jp/blog/?p=2412
Hadoop!
• Apache Hadoop project
– inspired by Google's MapReduce and Google File System
papers.
• Open sourced, flexible and available architecture for
large scale computation and data processing on a
network of commodity hardware
• Open Source Software + Hardware Commodity
– IT Costs Reduction
– inspired by
Hadoop Concepts
• Distribute the data as it is initially stored in the system
• Moving Computation is Cheaper than Moving Data
• Individual nodes can work on data local to those nodes
• Users can focus on developing applications.
Hadoop 2.0
• Hadoop 2.2.0 is expected to GA in Fall 2013
• HDFS Federation
• HDFS High Availability (HA)
• Hadoop YARN (MapReduce 2.0)
HDFS Federation - Limitation of Hadoop 1.0
• Scalability
– Storage scales horizontally - namespace doesn’t
• Performance
– File system operations throughput limited by a single node
• Poor isolation
– All the tenants share a single namespace
HDFS Federation
• Multiple independent NameNodes and Namespace
Volumes in a cluster
– Namespace Volume = Namespace + Block Pool
• Block Storage as generic storage service
– Set of blocks for a Namespace Volume is called a Block Pool
– DNs store blocks for all the Namespace Volumes – no
partitioning
HDFS Federation
Hadoop Hadoop 2.0
http://hortonworks.com/blog/an-introduction-to-hdfs-federation/
/home//app/Hive /app/HBase
HDFS High Availability (HA)
• Secondary Name Node is not Name Node
• http://www.youtube.com/watch?v=hEqQMLSXQlY
HDFS High Availability (HA)
https://issues.apache.org/jira/browse/HDFS-1623
Why do we need YARN
• Scalability
– Maximum Cluster size – 4,000 nodes
– Maximum concurrent tasks – 40,000
• Single point of failure
– Failure kills all queued and running jobs
• Lacks support for alternate paradigms
– Iterative applications implemented using MapReduce are 10x
slower
– Example: K-Means, PageRank
Hadoop YARN
http://hortonworks.com/hadoop/yarn/
Role of YARN
• Resource Manager
– Per-cluster
– Global resource scheduler
– Hierarchical queues
• Node Manager
– Per-machine agent
– Manages the life-cycle of container
– Container resource monitoring
• Application Master
– Per-application
– Manages application scheduling and task execution
– E.g. MapReduce Application Master
Job Tracker
Resource Manager
Application Master
Hadoop YARN architectural
http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/
• Container
– Basic unit of allocation
– Ex. Container A =
2GB, 1CPU
– Fine-grained resource
allocation
– Replace the fixed map/reduce slots
What’s next for Hadoop?
• Real-time
– Apache Tez
• Part of Stinger
– Spark
• SQL in Hadoop
– Stinger
• An immediate aim of 100x performance increase for Hive is more
ambitious than any other effort.
• Based on industry standard SQL, the Stinger Initiative improves
HiveQL to deliver SQL compatibility.
– Shark
What’s next for Hadoop?
• Security: Data encryption
– hadoop-9331: Hadoop crypto codec framework and crypto
codec implementations
• hadoop-9332: Crypto codec implementations for AES
• hadoop-9333: Hadoop crypto codec framework based on
compression codec
• mapreduce-5025: Key Distribution and Management for supporting
crypto codec in Map Reduce
• 2013/09/28 Hadoop in Taiwan 2013
– Hadoop Security: Now and future
– Session B, 16:00~16:40
The Hadoop Ecosystems
Growing Hadoop Ecosystem
• The term ‘Hadoop’ is taken to be the combination of
HDFS and MapReduce
• There are numerous other projects surrounding Hadoop
– Typically referred to as the ‘Hadoop Ecosystem’
• Zookeeper
• Hive and Pig
• HBase
• Flume
• Other Ecosystem Projects
– Sqoop
– Oozie
– Mahout
The Ecosystem is the System
• Hadoop has become the kernel of the distributed
operating system for Big Data
• No one uses the kernel alone
• A collection of projects at Apache
Relation Map
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
ZooKeeper – Coordination Framework
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
What is ZooKeeper
• A centralized service for maintaining
– Configuration information
– Providing distributed synchronization
• A set of tools to build distributed applications that can
safely handle partial failures
• ZooKeeper was designed to store coordination data
– Status information
– Configuration
– Location information
Why use ZooKeeper?
• Manage configuration across nodes
• Implement reliable messaging
• Implement redundant services
• Synchronize process execution
ZooKeeper Architecture
– All servers store a copy of the data (in memory)
– A leader is elected at startup
– 2 roles – leader and follower
• Followers service clients, all updates go through leader
• Update responses are sent when a majority of servers have persisted the
change
– HA support
HBase – Column NoSQL DB
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
Structured-data V.S. Raw-data
I – Inspired by
• Apache open source project
• Inspired from Google Big Table
• Non-relational, distributed database written in Java
• Coordinated by Zookeeper
Row & Column Oriented
HBase – Data Model
• Cells are “versioned”
• Table rows are sorted by row key
• Region – a row range [start-key:end-key]
When to use HBase
• Need random, low latency access to the data
• Application has a flexible schema where each row is
slightly different
– Add columns on the fly
• Most of columns are NULL in each row
Flume / Sqoop – Data Integration Framework
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
What’s the problem for data collection
• Data collection is currently a priori and ad hoc
• A priori – decide what you want to collect ahead of time
• Ad hoc – each kind of data source goes through its own
collection path
(and how can it help?)
• A distributed data collection service
• It efficiently collecting, aggregating, and moving large
amounts of data
• Fault tolerant, many failover and recovery mechanism
• One-stop solution for data collection of all formats
An example flow
Sqoop
• Easy, parallel database import/export
• What you want do?
– Insert data from RDBMS to HDFS
– Export data from HDFS back into RDBMS
What is Sqoop
• A suite of tools that connect Hadoop and database
systems
• Import tables from databases into HDFS for deep
analysis
• Export MapReduce results back to a database for
presentation to end-users
• Provides the ability to import from SQL databases
straight into your Hive data warehouse
How Sqoop helps
• The Problem
– Structured data in traditional databases cannot be easily
combined with complex data stored in HDFS
• Sqoop (SQL-to-Hadoop)
– Easy import of data from many databases to HDFS
– Generate code for use in MapReduce applications
Why Sqoop
• JDBC-based implementation
– Works with many popular database vendors
• Auto-generation of tedious user-side code
– Write MapReduce applications to work with your data, faster
• Integration with Hive
– Allows you to stay in a SQL-based environment
Pig / Hive – Analytical Language
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
Why Hive and Pig?
• Although MapReduce is very powerful, it can also be
complex to master
• Many organizations have business or data analysts who
are skilled at writing SQL queries, but not at writing Java
code
• Many organizations have programmers who are skilled
at writing code in scripting languages
• Hive and Pig are two projects which evolved separately
to help such people analyze huge amounts of data via
MapReduce
– Hive was initially developed at Facebook, Pig at Yahoo!
Hive – Developed by
• What is Hive?
– An SQL-like interface to Hadoop
• Data Warehouse infrastructure that provides data
summarization and ad hoc querying on top of Hadoop
– MapRuduce for execution
– HDFS for storage
• Hive Query Language
– Basic-SQL : Select, From, Join, Group-By
– Equi-Join, Muti-Table Insert, Multi-Group-By
– Batch query
SELECT * FROM purchases WHERE price > 100 GROUP BY storeid
Hive/MR V.S. Hive/Tez
http://www.slideshare.net/adammuise/2013-jul-23thughivetuningdeepdive
Pig
• A high-level scripting language (Pig Latin)
• Process data one step at a time
• Simple to write MapReduce program
• Easy understand
• Easy debug A = load ‘a.txt’ as (id, name, age, ...)
B = load ‘b.txt’ as (id, address, ...)
C = JOIN A BY id, B BY id;STORE C into ‘c.txt’
– Initiated by
Hive vs. Pig
Hive Pig
Language HiveQL (SQL-like) Pig Latin, a scripting language
Schema Table definitions
that are stored in a
metastore
A schema is optionally defined
at runtime
Programmait Access JDBC, ODBC PigServer
• Input
• For the given sample input the map emits
• the reduce just sums up the values
Hello World Bye World
Hello Hadoop Goodbye Hadoop
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>
< Bye, 1>
< Goodbye, 1>
< Hadoop, 2>
< Hello, 2>
< World, 2>
WordCount Example
WordCount Example In MapReduce
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
WordCount Example By Pig
A = LOAD 'wordcount/input' USING PigStorage as (token:chararray);
B = GROUP A BY token;
C = FOREACH B GENERATE group, COUNT(A) as count;
DUMP C;
WordCount Example By Hive
CREATE TABLE wordcount (token STRING);
LOAD DATA LOCAL INPATH ’wordcount/input'
OVERWRITE INTO TABLE wordcount;
SELECT count(*) FROM wordcount GROUP BY token;
Spark / Shark - Analytical Language
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
Why
• MapReduce is too slow
• Aims to make data analytics fast — both fast to run and
fast to write.
• When you have the request: iterative algorithms
What is
• In-memory distributed computing framework
• Create by UC Berkeley AMP Lab in 2010
• Target Problem that Hadoop MR is bad at
– Iterative algorithm (Machine Learning )
– Interactive data mining
• More general purpose than Hadoop MR
• Active contributions from ~15 companies
BDAS, the Berkeley Data Analytics Stack
https://amplab.cs.berkeley.edu/software/
What Different between Hadoop and Spark
Data Source
Map()
Data Source 2
Join()
Cache()Transform
http://spark.incubator.apache.org
HDFS
Map
Reduce
Map
Reduce
What is Shark
• A data analytic (warehouse) system that
– Port of Apache Hive to run on Spark
– Compatible with existing Hive data, metastores, and query(Hive,
UDFs,etc)
– Similar speedup of up to 40x than hive
– Scale out and is fault-tolerant
– Support low-latency, interactive query through in-memory
computing
Shark Architecture
Hive
Meta Store
HDFS/HBase
Spark
SQL
Parser
Query
Optimizer Physical Plan
Execution
Cache Mgr.
CLI Thrift/JDBC
Driver
Oozie – Job Workflow & Scheduling
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
What is ?
• A Java Web Application
• Oozie is a workflow scheduler for Hadoop
• Crond for Hadoop
Job 1
Job 3
Job 2
Job 4 Job 5
Why
• Why use Oozie instead of just cascading a jobs one
after another
• Major flexibility
– Start, Stop, Suspend, and re-run jobs
• Oozie allows you to restart from a failure
– You can tell Oozie to restart a job from a specific node in the
graph or to skip specific failed nodes
How it triggered
• Time
– Execute your workflow every 15 minutes
• Time and Data
– Materialize your workflow every hour, but only run them when
the input data is ready.
00:15 00:30 00:45 01:00
01:00 02:00 03:00 04:00
Hadoop
Input Data Exists?
Oozie use criteria
• Need Launch, control, and monitor jobs from your Java
Apps
– Java Client API/Command Line Interface
• Need control jobs from anywhere
– Web Service API
• Have jobs that you need to run every hour, day, week
• Need receive notification when a job done
– Email when a job is complete
Mahout – Data Mining
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
What is
• Machine-learning tool
• Distributed and scalable machine learning algorithms on
the Hadoop platform
• Building intelligent applications easier and faster
Why
• Current state of ML libraries
– Lack Community
– Lack Documentation and Examples
– Lack Scalability
– Are Research oriented
Mahout – scale
• Scale to large datasets
– Hadoop MapReduce implementations that scales linearly with
data
• Scalable to support your business case
– Mahout is distributed under a commercially friendly Apache
Software license
• Scalable community
– Vibrant, responsive and diverse
Mahout – four use cases
• Mahout machine learning algorithms
– Recommendation mining : takes users’ behavior and find items
said specified user might like
– Clustering : takes e.g. text documents and groups them based
on related document topics
– Classification : learns from existing categorized documents what
specific category documents look like and is able to assign
unlabeled documents to appropriate category
– Frequent item set mining : takes a set of item groups (e.g. terms
in query session, shopping cart content) and identifies, which
individual items typically appear together
Use case Example
• Predict what the user likes based on
– His/Her historical behavior
– Aggregate behavior of people similar to him
Conclusion
• Big Data Opportunities
– The market still growing
• Hadoop 2.0
– Federation
– HA
– YARN
• What’s next for Hadoop
– Real-time query
– Data encryption
• What other projects are included in the Hadoop
ecosystem
– Different project for different purpose
– Choose right tools for your needs
Recap – Hadoop Ecosystem
MapReduce Runtime
(Dist. Programming
Framework)
Hadoop Distributed File System (HDFS)
HBase
(Column
NoSQL DB)
Sqoop/Flume
(Data integration)
Oozie
(Job Workflow & Scheduling)
Pig/Hive
(Analytical Language)
Mahout
(Data Mining)
YARN
ZooKeeper
(Coordination)
Tez
(near real-time
processing)
Spark
(in-
memory)
Shark
Questions?
Thank you!

Mais conteúdo relacionado

Mais procurados

Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 

Mais procurados (20)

Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Pptx present
Pptx presentPptx present
Pptx present
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 

Destaque

Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsDatabricks
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Konstantinos Kloudas
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use CasesWhat the Spark!? Intro and Use Cases
What the Spark!? Intro and Use CasesAerospike, Inc.
 

Destaque (8)

Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Apache Spark Briefing
Apache Spark BriefingApache Spark Briefing
Apache Spark Briefing
 
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use CasesWhat the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
 

Semelhante a Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdfManoel Ribeiro
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 

Semelhante a Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3 (20)

List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Hadoop
HadoopHadoop
Hadoop
 

Mais de tcloudcomputing-tw

Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Futuretcloudcomputing-tw
 
Session 4 - News from ACS Community
Session 4 - News from ACS CommunitySession 4 - News from ACS Community
Session 4 - News from ACS Communitytcloudcomputing-tw
 
Session 3 - CloudStack Test Automation and CI
Session 3 - CloudStack Test Automation and CISession 3 - CloudStack Test Automation and CI
Session 3 - CloudStack Test Automation and CItcloudcomputing-tw
 
Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)tcloudcomputing-tw
 
Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)
Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)
Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)tcloudcomputing-tw
 
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-2
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-22012 CloudStack Design Camp in Taiwan--- CloudStack Overview-2
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-2tcloudcomputing-tw
 
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-12012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1tcloudcomputing-tw
 

Mais de tcloudcomputing-tw (7)

Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Session 4 - News from ACS Community
Session 4 - News from ACS CommunitySession 4 - News from ACS Community
Session 4 - News from ACS Community
 
Session 3 - CloudStack Test Automation and CI
Session 3 - CloudStack Test Automation and CISession 3 - CloudStack Test Automation and CI
Session 3 - CloudStack Test Automation and CI
 
Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)
 
Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)
Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)
Session 1 - CloudStack Plugin Structure and Implementation (2013.Q3)
 
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-2
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-22012 CloudStack Design Camp in Taiwan--- CloudStack Overview-2
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-2
 
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-12012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
2012 CloudStack Design Camp in Taiwan--- CloudStack Overview-1
 

Último

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3

  • 1. TCloud Computing, Inc. Hadoop Product Family and Ecosystem
  • 2. Agenda • What is Big Data? • Big Data Opportunities • Hadoop – Introduction to Hadoop – Hadoop 2.0 – What’s next for Hadoop? • Hadoop ecosystem • Conclusion
  • 3. What is Big Data? A set of files A database A single file
  • 4. 4 V’s of Big Data http://www.datasciencecentral.com/profiles/blogs/data-veracity
  • 5. Big data Expands on 4 fronts Velocity Volume Variety Veracity MB GB TB PB batch periodic near Real-Time Real-Time http://whatis.techtarget.com/definition/3Vs
  • 7. Big Data Revenue by Market Segment 2012 • 1 http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017
  • 8. Big Data Market Forecast 2012-2017 • 1 http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017
  • 9. Hadoop Solutions The most common problems Hadoop can solve
  • 10. Threat Analysis/Trade Surveillance • Challenge: – Detecting threats in the form of fraudulent activity or attacks • Large data volumes involved • Like looking for a needle in a haystack • Solution with Hadoop: – Parallel processing over huge datasets – Pattern recognition to identify anomalies • – i.e., threats • Typical Industry: – Security, Financial Services
  • 11. Recommendation Engine • Challenge: – Using user data to predict which products to recommend • Solution with Hadoop: – Batch processing framework • Allow execution in in parallel over large datasets – Collaborative filtering • Collecting ‘taste’ information from many users • Utilizing information to predict what similar users like • Typical Industry – ISP, Advertising
  • 15. • Apache Hadoop project – inspired by Google's MapReduce and Google File System papers. • Open sourced, flexible and available architecture for large scale computation and data processing on a network of commodity hardware • Open Source Software + Hardware Commodity – IT Costs Reduction – inspired by
  • 16. Hadoop Concepts • Distribute the data as it is initially stored in the system • Moving Computation is Cheaper than Moving Data • Individual nodes can work on data local to those nodes • Users can focus on developing applications.
  • 17. Hadoop 2.0 • Hadoop 2.2.0 is expected to GA in Fall 2013 • HDFS Federation • HDFS High Availability (HA) • Hadoop YARN (MapReduce 2.0)
  • 18. HDFS Federation - Limitation of Hadoop 1.0 • Scalability – Storage scales horizontally - namespace doesn’t • Performance – File system operations throughput limited by a single node • Poor isolation – All the tenants share a single namespace
  • 19. HDFS Federation • Multiple independent NameNodes and Namespace Volumes in a cluster – Namespace Volume = Namespace + Block Pool • Block Storage as generic storage service – Set of blocks for a Namespace Volume is called a Block Pool – DNs store blocks for all the Namespace Volumes – no partitioning
  • 20. HDFS Federation Hadoop Hadoop 2.0 http://hortonworks.com/blog/an-introduction-to-hdfs-federation/ /home//app/Hive /app/HBase
  • 21. HDFS High Availability (HA) • Secondary Name Node is not Name Node • http://www.youtube.com/watch?v=hEqQMLSXQlY
  • 22. HDFS High Availability (HA) https://issues.apache.org/jira/browse/HDFS-1623
  • 23. Why do we need YARN • Scalability – Maximum Cluster size – 4,000 nodes – Maximum concurrent tasks – 40,000 • Single point of failure – Failure kills all queued and running jobs • Lacks support for alternate paradigms – Iterative applications implemented using MapReduce are 10x slower – Example: K-Means, PageRank
  • 25. Role of YARN • Resource Manager – Per-cluster – Global resource scheduler – Hierarchical queues • Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring • Application Master – Per-application – Manages application scheduling and task execution – E.g. MapReduce Application Master Job Tracker Resource Manager Application Master
  • 26. Hadoop YARN architectural http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/ • Container – Basic unit of allocation – Ex. Container A = 2GB, 1CPU – Fine-grained resource allocation – Replace the fixed map/reduce slots
  • 27. What’s next for Hadoop? • Real-time – Apache Tez • Part of Stinger – Spark • SQL in Hadoop – Stinger • An immediate aim of 100x performance increase for Hive is more ambitious than any other effort. • Based on industry standard SQL, the Stinger Initiative improves HiveQL to deliver SQL compatibility. – Shark
  • 28. What’s next for Hadoop? • Security: Data encryption – hadoop-9331: Hadoop crypto codec framework and crypto codec implementations • hadoop-9332: Crypto codec implementations for AES • hadoop-9333: Hadoop crypto codec framework based on compression codec • mapreduce-5025: Key Distribution and Management for supporting crypto codec in Map Reduce • 2013/09/28 Hadoop in Taiwan 2013 – Hadoop Security: Now and future – Session B, 16:00~16:40
  • 30. Growing Hadoop Ecosystem • The term ‘Hadoop’ is taken to be the combination of HDFS and MapReduce • There are numerous other projects surrounding Hadoop – Typically referred to as the ‘Hadoop Ecosystem’ • Zookeeper • Hive and Pig • HBase • Flume • Other Ecosystem Projects – Sqoop – Oozie – Mahout
  • 31. The Ecosystem is the System • Hadoop has become the kernel of the distributed operating system for Big Data • No one uses the kernel alone • A collection of projects at Apache
  • 32. Relation Map MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 33. ZooKeeper – Coordination Framework MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 34. What is ZooKeeper • A centralized service for maintaining – Configuration information – Providing distributed synchronization • A set of tools to build distributed applications that can safely handle partial failures • ZooKeeper was designed to store coordination data – Status information – Configuration – Location information
  • 35. Why use ZooKeeper? • Manage configuration across nodes • Implement reliable messaging • Implement redundant services • Synchronize process execution
  • 36. ZooKeeper Architecture – All servers store a copy of the data (in memory) – A leader is elected at startup – 2 roles – leader and follower • Followers service clients, all updates go through leader • Update responses are sent when a majority of servers have persisted the change – HA support
  • 37. HBase – Column NoSQL DB MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 39. I – Inspired by • Apache open source project • Inspired from Google Big Table • Non-relational, distributed database written in Java • Coordinated by Zookeeper
  • 40. Row & Column Oriented
  • 41. HBase – Data Model • Cells are “versioned” • Table rows are sorted by row key • Region – a row range [start-key:end-key]
  • 42. When to use HBase • Need random, low latency access to the data • Application has a flexible schema where each row is slightly different – Add columns on the fly • Most of columns are NULL in each row
  • 43. Flume / Sqoop – Data Integration Framework MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 44. What’s the problem for data collection • Data collection is currently a priori and ad hoc • A priori – decide what you want to collect ahead of time • Ad hoc – each kind of data source goes through its own collection path
  • 45. (and how can it help?) • A distributed data collection service • It efficiently collecting, aggregating, and moving large amounts of data • Fault tolerant, many failover and recovery mechanism • One-stop solution for data collection of all formats
  • 47. Sqoop • Easy, parallel database import/export • What you want do? – Insert data from RDBMS to HDFS – Export data from HDFS back into RDBMS
  • 48. What is Sqoop • A suite of tools that connect Hadoop and database systems • Import tables from databases into HDFS for deep analysis • Export MapReduce results back to a database for presentation to end-users • Provides the ability to import from SQL databases straight into your Hive data warehouse
  • 49. How Sqoop helps • The Problem – Structured data in traditional databases cannot be easily combined with complex data stored in HDFS • Sqoop (SQL-to-Hadoop) – Easy import of data from many databases to HDFS – Generate code for use in MapReduce applications
  • 50. Why Sqoop • JDBC-based implementation – Works with many popular database vendors • Auto-generation of tedious user-side code – Write MapReduce applications to work with your data, faster • Integration with Hive – Allows you to stay in a SQL-based environment
  • 51. Pig / Hive – Analytical Language MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 52. Why Hive and Pig? • Although MapReduce is very powerful, it can also be complex to master • Many organizations have business or data analysts who are skilled at writing SQL queries, but not at writing Java code • Many organizations have programmers who are skilled at writing code in scripting languages • Hive and Pig are two projects which evolved separately to help such people analyze huge amounts of data via MapReduce – Hive was initially developed at Facebook, Pig at Yahoo!
  • 53. Hive – Developed by • What is Hive? – An SQL-like interface to Hadoop • Data Warehouse infrastructure that provides data summarization and ad hoc querying on top of Hadoop – MapRuduce for execution – HDFS for storage • Hive Query Language – Basic-SQL : Select, From, Join, Group-By – Equi-Join, Muti-Table Insert, Multi-Group-By – Batch query SELECT * FROM purchases WHERE price > 100 GROUP BY storeid
  • 55. Pig • A high-level scripting language (Pig Latin) • Process data one step at a time • Simple to write MapReduce program • Easy understand • Easy debug A = load ‘a.txt’ as (id, name, age, ...) B = load ‘b.txt’ as (id, address, ...) C = JOIN A BY id, B BY id;STORE C into ‘c.txt’ – Initiated by
  • 56. Hive vs. Pig Hive Pig Language HiveQL (SQL-like) Pig Latin, a scripting language Schema Table definitions that are stored in a metastore A schema is optionally defined at runtime Programmait Access JDBC, ODBC PigServer
  • 57. • Input • For the given sample input the map emits • the reduce just sums up the values Hello World Bye World Hello Hadoop Goodbye Hadoop < Hello, 1> < World, 1> < Bye, 1> < World, 1> < Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1> < Bye, 1> < Goodbye, 1> < Hadoop, 2> < Hello, 2> < World, 2> WordCount Example
  • 58. WordCount Example In MapReduce public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }
  • 59. WordCount Example By Pig A = LOAD 'wordcount/input' USING PigStorage as (token:chararray); B = GROUP A BY token; C = FOREACH B GENERATE group, COUNT(A) as count; DUMP C;
  • 60. WordCount Example By Hive CREATE TABLE wordcount (token STRING); LOAD DATA LOCAL INPATH ’wordcount/input' OVERWRITE INTO TABLE wordcount; SELECT count(*) FROM wordcount GROUP BY token;
  • 61. Spark / Shark - Analytical Language MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 62. Why • MapReduce is too slow • Aims to make data analytics fast — both fast to run and fast to write. • When you have the request: iterative algorithms
  • 63. What is • In-memory distributed computing framework • Create by UC Berkeley AMP Lab in 2010 • Target Problem that Hadoop MR is bad at – Iterative algorithm (Machine Learning ) – Interactive data mining • More general purpose than Hadoop MR • Active contributions from ~15 companies
  • 64. BDAS, the Berkeley Data Analytics Stack https://amplab.cs.berkeley.edu/software/
  • 65. What Different between Hadoop and Spark Data Source Map() Data Source 2 Join() Cache()Transform http://spark.incubator.apache.org HDFS Map Reduce Map Reduce
  • 66. What is Shark • A data analytic (warehouse) system that – Port of Apache Hive to run on Spark – Compatible with existing Hive data, metastores, and query(Hive, UDFs,etc) – Similar speedup of up to 40x than hive – Scale out and is fault-tolerant – Support low-latency, interactive query through in-memory computing
  • 67. Shark Architecture Hive Meta Store HDFS/HBase Spark SQL Parser Query Optimizer Physical Plan Execution Cache Mgr. CLI Thrift/JDBC Driver
  • 68. Oozie – Job Workflow & Scheduling MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 69. What is ? • A Java Web Application • Oozie is a workflow scheduler for Hadoop • Crond for Hadoop Job 1 Job 3 Job 2 Job 4 Job 5
  • 70. Why • Why use Oozie instead of just cascading a jobs one after another • Major flexibility – Start, Stop, Suspend, and re-run jobs • Oozie allows you to restart from a failure – You can tell Oozie to restart a job from a specific node in the graph or to skip specific failed nodes
  • 71. How it triggered • Time – Execute your workflow every 15 minutes • Time and Data – Materialize your workflow every hour, but only run them when the input data is ready. 00:15 00:30 00:45 01:00 01:00 02:00 03:00 04:00 Hadoop Input Data Exists?
  • 72. Oozie use criteria • Need Launch, control, and monitor jobs from your Java Apps – Java Client API/Command Line Interface • Need control jobs from anywhere – Web Service API • Have jobs that you need to run every hour, day, week • Need receive notification when a job done – Email when a job is complete
  • 73. Mahout – Data Mining MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark
  • 74. What is • Machine-learning tool • Distributed and scalable machine learning algorithms on the Hadoop platform • Building intelligent applications easier and faster
  • 75. Why • Current state of ML libraries – Lack Community – Lack Documentation and Examples – Lack Scalability – Are Research oriented
  • 76. Mahout – scale • Scale to large datasets – Hadoop MapReduce implementations that scales linearly with data • Scalable to support your business case – Mahout is distributed under a commercially friendly Apache Software license • Scalable community – Vibrant, responsive and diverse
  • 77. Mahout – four use cases • Mahout machine learning algorithms – Recommendation mining : takes users’ behavior and find items said specified user might like – Clustering : takes e.g. text documents and groups them based on related document topics – Classification : learns from existing categorized documents what specific category documents look like and is able to assign unlabeled documents to appropriate category – Frequent item set mining : takes a set of item groups (e.g. terms in query session, shopping cart content) and identifies, which individual items typically appear together
  • 78. Use case Example • Predict what the user likes based on – His/Her historical behavior – Aggregate behavior of people similar to him
  • 79. Conclusion • Big Data Opportunities – The market still growing • Hadoop 2.0 – Federation – HA – YARN • What’s next for Hadoop – Real-time query – Data encryption • What other projects are included in the Hadoop ecosystem – Different project for different purpose – Choose right tools for your needs
  • 80. Recap – Hadoop Ecosystem MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) HBase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Mahout (Data Mining) YARN ZooKeeper (Coordination) Tez (near real-time processing) Spark (in- memory) Shark