SlideShare uma empresa Scribd logo
1 de 24
Big Data Analysis using
Hadoop on a Eucalyptus
Cloud
How secure is our cloud?
PRESENTED BY: ABHISHEK DE
STUDENT, CSE 2ND YEAR, BPPIMT
Contents:
 The Big Data Crisis
 Let’s embrace Cloud Computing
 Benefits of cloud
 Establishing an IaaS using Eucalyptus
 A word on Virtualization
 Hadoop as a Platform
 MapReduce and HDFS
 Typical algorithms
 Benefits we achieve
 How secure is the system?
PREPARED BY: ABHISHEK DE
2
06-Apr-13
The drifting era: BIG DATA and crisis
• YouTube users upload 48 hours of
new video every minute of the
day.
• 100 terabytes of data uploaded
daily to Facebook.
• Twitter sees roughly 175 million
tweets every day, and has more
than 465 million accounts.
• Walmart handles more than 1
million customer transactions
every hour, and databases more
than 2.5 petabytes of data.
PREPARED BY: ABHISHEK DE
3
06-Apr-13
DATA is
precious, too
precious..
We need
Infrastructure, which
comes easily as a
Service
06-Apr-13PREPARED BY: ABHISHEK DE
4
Solution: Cloud Computing
 Conventional Computing:
You data gets processed in your own
computer.
 Cloud computing:
You send your data to some other
computer. It gets processed there and it
comes back to you.
“Cloud Computing is the use
of computing resources (hardware and
soft ware) that are delivered as a service
over a network (typically the Internet)”
--WIKIPEDIA
PREPARED BY: ABHISHEK DE
5
06-Apr-13
Benefits of Cloud Computing:
High
reliability.
Highly scalable and
fault tolerant.
Reduced Cost: Only
pay for what you
need.
Efficient management of
resources.
Improved
Security.
Achieved out of
commodity
hardware.
PREPARED BY: ABHISHEK DE
6
06-Apr-13
Why Eucalyptus?
“Elastic Utility Computing Architecture Linking Your Programs To Useful System”
Eucalyptus is the world's most widely deployed software platform for on-premise
(private) Infrastructure as a Service (IaaS) clouds.
It uses existing infrastructure to create a scalable, secure web services layer that
abstracts compute, network and storage to offer IaaS.
Eucalyptus can be dynamically scaled up or down depending on application
workloads.
PREPARED BY: ABHISHEK DE
7
06-Apr-13
Architecture of Eucalyptus:
FRONT END:
• Users login to
the cloud
using
credentials
• The user is
redirected to
the back end
of the
cloud, i.e., the
Storage and
the Resource
pool
user1
user1@nc1:
BACK END:
• Runs the Node
Controller.
• Mounts
images as
Virtual
Machines or
instances
using XEN or
KVM.
• Hosts the
resource pool.
FRONT END BACK END
PREPARED BY: ABHISHEK DE
8
06-Apr-13
XEN: Virtualize your resources
 XEN, is the under laying technology used by
eucalyptus. Xen hypervisor allows several guest
operating systems to be executed on the same
computer hardware concurrently.
 Xen partitions a single physical machine into
multiple virtual machines, to provide server
consolidation and utility computing. Existing
applications and binaries run unmodified.
 The hypervisor controls the MMU, CPU
scheduling, and interrupt controller, presenting a
virtual machine to guests.
PREPARED BY: ABHISHEK DE
9
06-Apr-13
HADOOP: Solution to BIG DATA
PREPARED BY: ABHISHEK DE
10
 Roughly how long does it take to read 1TB from a commodity hard disk:
 That is roughly around 4 hours.
 With HADOOP it takes around :
06-Apr-13
Birth of HADOOP: Opensource
alternative to GFS
 Pre-2004 : Cutting and Cafarella develop open source projects for web-scale
indexing, crawling and search.
 2004: Jeffrey Dean and Sanjay Ghemawat introduce map reduce model used internally
at Google.
 2006: Hadoop becomes official Apache project, Cutting joins Yahoo! Yahoo adopts
Hadoop.
06-Apr-13PREPARED BY: ABHISHEK DE
11
HDFS: Hadoop Distributed File System
 Files split into 128MB (or 64MB) blocks
 Blocks replicated across several datanodes(usually 3)
 Single namenode stores metadata (file names, block
locations, etc.)
 Optimized for large files, sequential reads
 Clients read from closest replica available.(note:
locality of reference.)
 If the replication for a block drops below target, it is
automatically re-replicated.
Datanodes
1
2
3
4
1
2
4
2
1
3
1
4
3
3
2
4
Namenode
06-Apr-13PREPARED BY: ABHISHEK DE
12
Data Flow
Web Servers Scribe
Servers
Network
Storage
Hadoop ClusterOracle
RAC
MySQL
06-Apr-13PREPARED BY: ABHISHEK DE
13
HADOOP and MapReduce:
PREPARED BY: ABHISHEK DE
14
Input
Map
Shuffle/SortReduce
Output
06-Apr-13
Word Count: A typical Example
PREPARED BY: ABHISHEK DE
15
06-Apr-13
Implementation: Hardware
PREPARED BY: ABHISHEK DE
16
Move code to data (local
computation)
Allow programs to scale
transparently w.r.t size of input
Abstract away fault tolerance,
synchronization, etc.
06-Apr-13
HADOOP in
action!
 SOCIAL NETWORKING
ANALYSIS
 PAGE RANKING ANALYSIS
 ANALYTICS ENGINE WITH
MAP/REDUCE
 IMAGE PROCESSING
06-Apr-13PREPARED BY: ABHISHEK DE
17
Social Networking Analysis:
 Problem: recommend new friends (friend-of-a-friend, FOAF)
 Map task:
– U (target user) is fixed and its friends list copied to all cluster nodes (“copy join”); each cluster node
stores part of the social graph
– In: (X, <friendsX>), i.e. the local data for the cluster node
– Out:
if (U, X) are friends => (U, <friendsXfriendsU>), i.e. the users who are friends of X but not already
friends of U
nil otherwise
 Reduce task:
– In: (U, <<friendsAfriendsU>,<friendsBfriendsU>, … >), i.e. the FOAF lists for all users A, B, etc. who
are friends with U
– Out (U, <(X1, N1), (X2, N2), …>), where each X is a FOAF for U, and N is its total number of
occurrences in all FOAF lists (sort/rank the result!)
06-Apr-13PREPARED BY: ABHISHEK DE
18
Pro’s and Con’s
 Batch, offline jobs
 Write-once, read-many across full
data set
 Usually, though not always, simple
computations
 I/O bound by disk/network
bandwidth
PREPARED BY: ABHISHEK DE
19
What it’s not:
 High-performance
parallel computing, e.g.
MPI
 Low-latency random
access relational
database
 Always the right solution
06-Apr-13
Cloud Security: Threats unveiled
XML SIGNATURE ATTACK:
 The original SOAP body element is moved to a newly
added bogus wrapper element in the SOAP security
header. Note that the moved body is still referenced
by the signature using its identifier attribute Id="body".
The signature is still cryptographically valid, as the
body element in question has not been modified (but
simply relocated). Subsequently, in order to make the
SOAP message XML schema compliant, the attacker
changes the identifier of the cogently placed SOAP
body (in this example he uses Id="attack"). The filling
of the empty SOAP body with bogus content can
now begin, as any of the operations denied by the
attacker can be effectively executed due to the
successful signature verification.
06-Apr-13PREPARED BY: ABHISHEK DE
20
Script Injection Attack
 targets only the AWS management console users.
 exploits the shared credentials between the amazon shop interface and AWS.
 The first vulnerability is exploits the GET parameters in the download link users
utilize for downloading their X.509 certificates issued by Amazon. However the
preconditions for the attack are rather high including use of UTF-7 encoding for
the injected script to bypass server logic to encode standard HTML characters
as well as the exploitation of features in specific IE versions.
 The second script injection attack uses a persistent cross site scripting attack by
exploiting the login session that is initiated with AWS the first time a user logs into
the Amazons hop interface
06-Apr-13PREPARED BY: ABHISHEK DE
21
Who uses it? Applications and
Innovations
Projects under Hadoop:
 HBase
 ZooKeeper
 Pig
 Zombie
 Hive
 Sqoop
PREPARED BY: ABHISHEK DE
22
06-Apr-13
References:
 http://www.eucalyptus.com/what-is-cloud-computing
 http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_p
etabyte_in_162/
 http://int3.de/res/GfsMapReduce/GfsAndMapReduce.pdf
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-
site/Federation.html
 http://www.change-
project.eu/fileadmin/publications/Presentations/CHANGE_-
_The_role_of_virtualisation_in_future_network_infrastructures_-
_Warsaw_cluster_workshop_contribution.pdf
 http://wiki.apache.org/hadoop/NameNode
06-Apr-13PREPARED BY: ABHISHEK DE
23
That’s the end..
But the beginning of a new
horizon..
Special thanks to the entire
team that helped me in this
endeavor.
ALL QUERIES, PLEASE CONTACT ME AT: abhishekde@hotmail.com
QUESTIONS?

Mais conteúdo relacionado

Mais procurados

Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit/Hadoop Summit
 
Streaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virendervithakur
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkFlink Forward
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Yong Feng
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Pat Patterson
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJim Plush
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkDataWorks Summit
 

Mais procurados (20)

LinkedIn
LinkedInLinkedIn
LinkedIn
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Streaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virender
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. Spark
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache FlinkCloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
 

Destaque

Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly BeanSamsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly BeanAtanu Das
 
impact of globalization in indian retail industry
impact of globalization in indian retail industryimpact of globalization in indian retail industry
impact of globalization in indian retail industrywwgreatmutha
 
Tablet School ImparaDigitale
Tablet School ImparaDigitaleTablet School ImparaDigitale
Tablet School ImparaDigitalemarco anselmi
 
Catalog Panosol 2013
Catalog Panosol 2013Catalog Panosol 2013
Catalog Panosol 2013EcoMove CT
 
Slideshare2
Slideshare2Slideshare2
Slideshare2buildva
 
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02Lavendar Craft
 
презентация Microsoft power point (2)
презентация Microsoft power point (2)презентация Microsoft power point (2)
презентация Microsoft power point (2)Alexander Denisov
 
萬寿鏡 甕覗(かめのぞき)
萬寿鏡 甕覗(かめのぞき) 萬寿鏡 甕覗(かめのぞき)
萬寿鏡 甕覗(かめのぞき) Megumi Yamazaki
 
Kerrang! Cover Analysis
Kerrang! Cover AnalysisKerrang! Cover Analysis
Kerrang! Cover AnalysisLukaMedia
 
Презентация НОВИНОК Каталога 3 2012 ORIFLAME
Презентация НОВИНОК Каталога 3 2012 ORIFLAMEПрезентация НОВИНОК Каталога 3 2012 ORIFLAME
Презентация НОВИНОК Каталога 3 2012 ORIFLAMEngespss02
 
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012nutriscience
 
Pro camps national overview wal-mart
Pro camps national overview   wal-martPro camps national overview   wal-mart
Pro camps national overview wal-martfiskmj
 
Automatic problem generation
Automatic problem generationAutomatic problem generation
Automatic problem generationAbhishek Dey
 
Интернет-агентство "видОК" - как сделать сайт, который продает
Интернет-агентство "видОК" - как сделать сайт, который продаетИнтернет-агентство "видОК" - как сделать сайт, который продает
Интернет-агентство "видОК" - как сделать сайт, который продаетДенис Мидаков
 
งานนำเสนอบทที่ 5
งานนำเสนอบทที่ 5งานนำเสนอบทที่ 5
งานนำเสนอบทที่ 5sawitri555
 
Promocion Pagina Web $99.00
Promocion Pagina Web $99.00Promocion Pagina Web $99.00
Promocion Pagina Web $99.00rafypd
 

Destaque (20)

Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly BeanSamsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
Samsung Unveiled Galaxy Grand with Android 4.1.2 Jelly Bean
 
Jawapan topik 9
Jawapan topik 9Jawapan topik 9
Jawapan topik 9
 
impact of globalization in indian retail industry
impact of globalization in indian retail industryimpact of globalization in indian retail industry
impact of globalization in indian retail industry
 
Tablet School ImparaDigitale
Tablet School ImparaDigitaleTablet School ImparaDigitale
Tablet School ImparaDigitale
 
Catalog Panosol 2013
Catalog Panosol 2013Catalog Panosol 2013
Catalog Panosol 2013
 
Slideshare2
Slideshare2Slideshare2
Slideshare2
 
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
Pergerakanlokomotorbukanlokomotor 100328025502-phpapp02
 
презентация Microsoft power point (2)
презентация Microsoft power point (2)презентация Microsoft power point (2)
презентация Microsoft power point (2)
 
萬寿鏡 甕覗(かめのぞき)
萬寿鏡 甕覗(かめのぞき) 萬寿鏡 甕覗(かめのぞき)
萬寿鏡 甕覗(かめのぞき)
 
Kerrang! Cover Analysis
Kerrang! Cover AnalysisKerrang! Cover Analysis
Kerrang! Cover Analysis
 
правила работы в гугле
правила работы в гуглеправила работы в гугле
правила работы в гугле
 
6 Ways to Save Your Hearing
6 Ways to Save Your Hearing6 Ways to Save Your Hearing
6 Ways to Save Your Hearing
 
Презентация НОВИНОК Каталога 3 2012 ORIFLAME
Презентация НОВИНОК Каталога 3 2012 ORIFLAMEПрезентация НОВИНОК Каталога 3 2012 ORIFLAME
Презентация НОВИНОК Каталога 3 2012 ORIFLAME
 
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
Milk signaling_Dr. Melnik, NutriScience, Portugal, 2012
 
104
104104
104
 
Pro camps national overview wal-mart
Pro camps national overview   wal-martPro camps national overview   wal-mart
Pro camps national overview wal-mart
 
Automatic problem generation
Automatic problem generationAutomatic problem generation
Automatic problem generation
 
Интернет-агентство "видОК" - как сделать сайт, который продает
Интернет-агентство "видОК" - как сделать сайт, который продаетИнтернет-агентство "видОК" - как сделать сайт, который продает
Интернет-агентство "видОК" - как сделать сайт, который продает
 
งานนำเสนอบทที่ 5
งานนำเสนอบทที่ 5งานนำเสนอบทที่ 5
งานนำเสนอบทที่ 5
 
Promocion Pagina Web $99.00
Promocion Pagina Web $99.00Promocion Pagina Web $99.00
Promocion Pagina Web $99.00
 

Semelhante a Big Data Analysis on a Cloud Ecosystem-PATW 2013

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanVMware Tanzu
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Principled Technologies
 
Current state of affairs cloud computing
Current state of affairs   cloud computingCurrent state of affairs   cloud computing
Current state of affairs cloud computingChirag Jog
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearningLigong Duan
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputePatrick McGarry
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptxmohaaalsa
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIAlluxio, Inc.
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructuredevopsdaysaustin
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Microsoft Azure Overview Infographic
Microsoft Azure Overview InfographicMicrosoft Azure Overview Infographic
Microsoft Azure Overview InfographicMicrosoft Azure
 
Successful Patterns for running platforms
Successful Patterns for running platformsSuccessful Patterns for running platforms
Successful Patterns for running platformsPaul Czarkowski
 
OpenStack Swift Object Storage on EMC Isilon Scale-Out NAS
OpenStack Swift Object Storage on EMC Isilon Scale-Out NASOpenStack Swift Object Storage on EMC Isilon Scale-Out NAS
OpenStack Swift Object Storage on EMC Isilon Scale-Out NASEMC
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
 

Semelhante a Big Data Analysis on a Cloud Ecosystem-PATW 2013 (20)

Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...
 
PRAFUL_HADOOP
PRAFUL_HADOOPPRAFUL_HADOOP
PRAFUL_HADOOP
 
Current state of affairs cloud computing
Current state of affairs   cloud computingCurrent state of affairs   cloud computing
Current state of affairs cloud computing
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearning
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptx
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 
Cisco project ideas
Cisco   project ideasCisco   project ideas
Cisco project ideas
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Microsoft Azure Overview Infographic
Microsoft Azure Overview InfographicMicrosoft Azure Overview Infographic
Microsoft Azure Overview Infographic
 
Successful Patterns for running platforms
Successful Patterns for running platformsSuccessful Patterns for running platforms
Successful Patterns for running platforms
 
OpenStack Swift Object Storage on EMC Isilon Scale-Out NAS
OpenStack Swift Object Storage on EMC Isilon Scale-Out NASOpenStack Swift Object Storage on EMC Isilon Scale-Out NAS
OpenStack Swift Object Storage on EMC Isilon Scale-Out NAS
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Big Data Analysis on a Cloud Ecosystem-PATW 2013

  • 1. Big Data Analysis using Hadoop on a Eucalyptus Cloud How secure is our cloud? PRESENTED BY: ABHISHEK DE STUDENT, CSE 2ND YEAR, BPPIMT
  • 2. Contents:  The Big Data Crisis  Let’s embrace Cloud Computing  Benefits of cloud  Establishing an IaaS using Eucalyptus  A word on Virtualization  Hadoop as a Platform  MapReduce and HDFS  Typical algorithms  Benefits we achieve  How secure is the system? PREPARED BY: ABHISHEK DE 2 06-Apr-13
  • 3. The drifting era: BIG DATA and crisis • YouTube users upload 48 hours of new video every minute of the day. • 100 terabytes of data uploaded daily to Facebook. • Twitter sees roughly 175 million tweets every day, and has more than 465 million accounts. • Walmart handles more than 1 million customer transactions every hour, and databases more than 2.5 petabytes of data. PREPARED BY: ABHISHEK DE 3 06-Apr-13
  • 4. DATA is precious, too precious.. We need Infrastructure, which comes easily as a Service 06-Apr-13PREPARED BY: ABHISHEK DE 4
  • 5. Solution: Cloud Computing  Conventional Computing: You data gets processed in your own computer.  Cloud computing: You send your data to some other computer. It gets processed there and it comes back to you. “Cloud Computing is the use of computing resources (hardware and soft ware) that are delivered as a service over a network (typically the Internet)” --WIKIPEDIA PREPARED BY: ABHISHEK DE 5 06-Apr-13
  • 6. Benefits of Cloud Computing: High reliability. Highly scalable and fault tolerant. Reduced Cost: Only pay for what you need. Efficient management of resources. Improved Security. Achieved out of commodity hardware. PREPARED BY: ABHISHEK DE 6 06-Apr-13
  • 7. Why Eucalyptus? “Elastic Utility Computing Architecture Linking Your Programs To Useful System” Eucalyptus is the world's most widely deployed software platform for on-premise (private) Infrastructure as a Service (IaaS) clouds. It uses existing infrastructure to create a scalable, secure web services layer that abstracts compute, network and storage to offer IaaS. Eucalyptus can be dynamically scaled up or down depending on application workloads. PREPARED BY: ABHISHEK DE 7 06-Apr-13
  • 8. Architecture of Eucalyptus: FRONT END: • Users login to the cloud using credentials • The user is redirected to the back end of the cloud, i.e., the Storage and the Resource pool user1 user1@nc1: BACK END: • Runs the Node Controller. • Mounts images as Virtual Machines or instances using XEN or KVM. • Hosts the resource pool. FRONT END BACK END PREPARED BY: ABHISHEK DE 8 06-Apr-13
  • 9. XEN: Virtualize your resources  XEN, is the under laying technology used by eucalyptus. Xen hypervisor allows several guest operating systems to be executed on the same computer hardware concurrently.  Xen partitions a single physical machine into multiple virtual machines, to provide server consolidation and utility computing. Existing applications and binaries run unmodified.  The hypervisor controls the MMU, CPU scheduling, and interrupt controller, presenting a virtual machine to guests. PREPARED BY: ABHISHEK DE 9 06-Apr-13
  • 10. HADOOP: Solution to BIG DATA PREPARED BY: ABHISHEK DE 10  Roughly how long does it take to read 1TB from a commodity hard disk:  That is roughly around 4 hours.  With HADOOP it takes around : 06-Apr-13
  • 11. Birth of HADOOP: Opensource alternative to GFS  Pre-2004 : Cutting and Cafarella develop open source projects for web-scale indexing, crawling and search.  2004: Jeffrey Dean and Sanjay Ghemawat introduce map reduce model used internally at Google.  2006: Hadoop becomes official Apache project, Cutting joins Yahoo! Yahoo adopts Hadoop. 06-Apr-13PREPARED BY: ABHISHEK DE 11
  • 12. HDFS: Hadoop Distributed File System  Files split into 128MB (or 64MB) blocks  Blocks replicated across several datanodes(usually 3)  Single namenode stores metadata (file names, block locations, etc.)  Optimized for large files, sequential reads  Clients read from closest replica available.(note: locality of reference.)  If the replication for a block drops below target, it is automatically re-replicated. Datanodes 1 2 3 4 1 2 4 2 1 3 1 4 3 3 2 4 Namenode 06-Apr-13PREPARED BY: ABHISHEK DE 12
  • 13. Data Flow Web Servers Scribe Servers Network Storage Hadoop ClusterOracle RAC MySQL 06-Apr-13PREPARED BY: ABHISHEK DE 13
  • 14. HADOOP and MapReduce: PREPARED BY: ABHISHEK DE 14 Input Map Shuffle/SortReduce Output 06-Apr-13
  • 15. Word Count: A typical Example PREPARED BY: ABHISHEK DE 15 06-Apr-13
  • 16. Implementation: Hardware PREPARED BY: ABHISHEK DE 16 Move code to data (local computation) Allow programs to scale transparently w.r.t size of input Abstract away fault tolerance, synchronization, etc. 06-Apr-13
  • 17. HADOOP in action!  SOCIAL NETWORKING ANALYSIS  PAGE RANKING ANALYSIS  ANALYTICS ENGINE WITH MAP/REDUCE  IMAGE PROCESSING 06-Apr-13PREPARED BY: ABHISHEK DE 17
  • 18. Social Networking Analysis:  Problem: recommend new friends (friend-of-a-friend, FOAF)  Map task: – U (target user) is fixed and its friends list copied to all cluster nodes (“copy join”); each cluster node stores part of the social graph – In: (X, <friendsX>), i.e. the local data for the cluster node – Out: if (U, X) are friends => (U, <friendsXfriendsU>), i.e. the users who are friends of X but not already friends of U nil otherwise  Reduce task: – In: (U, <<friendsAfriendsU>,<friendsBfriendsU>, … >), i.e. the FOAF lists for all users A, B, etc. who are friends with U – Out (U, <(X1, N1), (X2, N2), …>), where each X is a FOAF for U, and N is its total number of occurrences in all FOAF lists (sort/rank the result!) 06-Apr-13PREPARED BY: ABHISHEK DE 18
  • 19. Pro’s and Con’s  Batch, offline jobs  Write-once, read-many across full data set  Usually, though not always, simple computations  I/O bound by disk/network bandwidth PREPARED BY: ABHISHEK DE 19 What it’s not:  High-performance parallel computing, e.g. MPI  Low-latency random access relational database  Always the right solution 06-Apr-13
  • 20. Cloud Security: Threats unveiled XML SIGNATURE ATTACK:  The original SOAP body element is moved to a newly added bogus wrapper element in the SOAP security header. Note that the moved body is still referenced by the signature using its identifier attribute Id="body". The signature is still cryptographically valid, as the body element in question has not been modified (but simply relocated). Subsequently, in order to make the SOAP message XML schema compliant, the attacker changes the identifier of the cogently placed SOAP body (in this example he uses Id="attack"). The filling of the empty SOAP body with bogus content can now begin, as any of the operations denied by the attacker can be effectively executed due to the successful signature verification. 06-Apr-13PREPARED BY: ABHISHEK DE 20
  • 21. Script Injection Attack  targets only the AWS management console users.  exploits the shared credentials between the amazon shop interface and AWS.  The first vulnerability is exploits the GET parameters in the download link users utilize for downloading their X.509 certificates issued by Amazon. However the preconditions for the attack are rather high including use of UTF-7 encoding for the injected script to bypass server logic to encode standard HTML characters as well as the exploitation of features in specific IE versions.  The second script injection attack uses a persistent cross site scripting attack by exploiting the login session that is initiated with AWS the first time a user logs into the Amazons hop interface 06-Apr-13PREPARED BY: ABHISHEK DE 21
  • 22. Who uses it? Applications and Innovations Projects under Hadoop:  HBase  ZooKeeper  Pig  Zombie  Hive  Sqoop PREPARED BY: ABHISHEK DE 22 06-Apr-13
  • 23. References:  http://www.eucalyptus.com/what-is-cloud-computing  http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_p etabyte_in_162/  http://int3.de/res/GfsMapReduce/GfsAndMapReduce.pdf  http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- site/Federation.html  http://www.change- project.eu/fileadmin/publications/Presentations/CHANGE_- _The_role_of_virtualisation_in_future_network_infrastructures_- _Warsaw_cluster_workshop_contribution.pdf  http://wiki.apache.org/hadoop/NameNode 06-Apr-13PREPARED BY: ABHISHEK DE 23
  • 24. That’s the end.. But the beginning of a new horizon.. Special thanks to the entire team that helped me in this endeavor. ALL QUERIES, PLEASE CONTACT ME AT: abhishekde@hotmail.com QUESTIONS?