SlideShare uma empresa Scribd logo
1 de 175
The




ecosystem
First of all...
Hadoop is not an acronym
Highly Available Data Object Oriented Program
Hadoop is an elephant
http://www.fickr.com/photos/18555810@N00/2145683109/in/photostream/
How it all began...
1997
2002
2003
Nutch Distributed FileSystem
2004
2005
Nutch Algorithms ported to M/R
2006
as subprojects of Lucene
Doug Cutting joins
One month later they adopt Hadoop

and set up a dedicated research team
2007
2008
Top Level Project
Yahoo! announces its Web index is built using Hadoop
                       (2008-02)
Hadoop sorts 1TB in 209 seconds using 910 nodes
                    (2008-04)
2009
Doug Cutting joins
Today
Big Data is here to stay
http://salsahpc.indiana.edu/CloudCom2010/slides/PDF/tutorials/Yahoo_business_seminar.pdf
More and more commercial solutions
Hadoop Distributed FileSystem
Fault tolerant, very large fle support
Write once, read many access model
Hierarchical namespace
File permissions
Files are divided into blocks




              Typical block size

                64 or 128 MB
Blocks are replicated




multiple machines, multiple racks
Blocks are managed by DataNodes
DataNodes report to a NameNode
Write Data Flow
Read Data Flow
HDFS Thrift Server
Map Reduce
Origins in FP and λ-calculus
Map




Map(k,v) ⇒ list(k',v')
Sort




{ list(k',v') } ⇒ { (k', list(v')) }
Reduce




Reduce(k',list(v')) ⇒ list(k”, v”)
The Hello, World of MR, word count
MR word count




Map:      <∅,line>    ⇒    <word0,1> <word1,1>    ...   <wordk,1>


Reduce: <wordi, 10, 11, ..., 1N-1>   ⇒   <wordi, N>
MapReduce in practice
Data moves slowly on the network
So we bring computation to the data
Data is read according to an InputFormat
An InputFormat splits the data
Each InputSplit is fed to a Mapper
HDFS blocks make ideal splits
Run Mappers on DataNodes
Each DataNode is also a TaskTracker
A JobTracker dispatches Mappers & Reducers
Mappers' output data are written locally
A Partitioner splits them per Reducer
The Reducers retrieve those partitions
The Shuffe
This step can be costly depending on volume
Compress Mappers' output
Run Combiners on Mappers' output
Combiners: Reducers executing at the Mappers'
Reducers sort data retrieved during Shuffe
MapReduce job ends with one fle per reducer
Hadoop Streaming
http://www.fickr.com/photos/19melissa68/2476168474/sizes/l/in/photostream/
Pig / Hive
Pig Latin is easy to learn
I bet you can understand the following


A = LOAD 'mydata' USING PigStorage() AS (url, time, size);

B = GROUP A BY url;

C = FOREACH B GENERATE group,
                       COUNT(A),
                       SUM(size),
                       SUM(time)/COUNT(A);

DUMP C;
Pig Latin is converted to MR jobs
Pig supports Streaming and UDFs
Offers HiveQL, close to SQL



CREATE TABLE page_view(viewTime INT, userid BIGINT,
                page_url STRING, referrer_url STRING,
                friends ARRAY<BIGINT>, properties MAP<STRING, STRING>
                ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '1'
        COLLECTION ITEMS TERMINATED BY '2'
        MAP KEYS TERMINATED BY '3'
STORED AS SEQUENCEFILE;

INSERT OVERWRITE TABLE xyz_com_page_views
SELECT page_views.*
FROM page_views
WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31'
      AND page_views.referrer_url like '%xyz.com';
HiveQL converted to MR jobs
Hive also supports UDFs
MapReduce is powerful
But MapReduce works in batch mode
Need something similar for real time streams
ZooKeeper
ZooKeeper provides a highly available flesystem abstraction
ZooKeeper has no fles or directories but znodes
Znodes form a hierarchical namespace
A znode can have data AND children




                    2f   85   1e   4a   73   47   c5   e4   39   ff   0f   b6   46   79   ac   c5
                    48   c1   99   85   48   16   df   04   6a   2c   cc   ce   9e   4f   ae   cb
                    20   a5   9d   62   57   96   35   c3   eb   3d   cb   c3   1c   cb   91   f8
                    a2   4d   90   57   0a   62   24   f9   5e   a4   50   00   6a   bd   3c   ea
                    68   61   3f   bf   7a   48   8f   26   63   24   e9   d4   3b   b4   55   c2
A znode has Access Control Lists




CREATE   (children)
READ     (list children and get data)
WRITE    (set data)
DELETE   (children)
ADMIN    (setACL)
Znodes can be persistent or ephemeral




Ephemeral znodes vanish with their creator's session


Persistent znodes outlive their creator's session




Ephemeral znodes cannot have children
Znodes can be sequential




A monotonically increasing counter is appended to the znode name



                          /zk/foo-1
                          /zk/foo-3
                          /zk/foo-4
                          ...


 This can be used to impose a global order among direct children
Znodes can have watches




Watches allow clients to be notifed when znodes change



               NodeCreated
               NodeDeleted
               NodeDataChanged
               NodeChildrenChanged
ZooKeeper has a simple API




 exists
 getData
 getChildren   }   set watches

 create
 delete
 setData
 setACL
 getACL

 sync
ZooKeeper consistency




Sequential Consistency
   Updates from a client are applied in the order sent
Atomicity
   Updates either succeed or fail
Single System Image
    Unique view, regardless of the server we connect to
Durability
   Updates once succeeded will not be undone
Timeliness
   Lag is bounded
ZooKeeper use cases




Confguration service
   Get latest confg and get notifed when it changes
Lock service
    Provide mutual exclusion
Leader election
   There can be only one...
Group membership
   Dynamically determine members of a group
Queue
   Producer/Consumer paradigm
http://www.fickr.com/photos/7-how-7/2308008668/sizes/o/in/photostream/
HBase
Data Model
“a sparse, distributed multi-dimensional sorted map”
Row keys, column qualifers, values




        byte[]
Regions
A region is a slice of row keys
A region is served by a Region Server
HBase Master
Coordinates the slaves (Region Servers)
Assigns Regions to Region Servers
Does so using ZooKeeper
.META.
Special table, stores region assignments
-ROOT-
Special single region table
Stores .META. region assignments
Region Server
MemStore fushed when full
New immutable fles written
When too many fles per region
Perform a minor compaction
Merge most recent N fles
Periodically merge all fles in a region
That's a major compaction
When aggregate size of a region is too big
Split region into two new regions
Parent region fles eventually discared
Region Server Crashes
Master will split WAL into Region oldlogs
Regions will be reassigned
HBase API
Put, Get, Delete
CheckAndPut, Lock
Counters
Scan
Filters, row/column, in Region Server
Coprocessors
REST / Thrift gateways for non Java access
Map Reduce
   NextGen
NodeManager
Resource Containers
Scheduler
Application Manager
More generic and more scalable
Monitor Everything
Automate as much as possible
Test Drive on
We're hiring




www.arkea.com
@herberts

Mais conteúdo relacionado

Mais procurados

Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopMohamed Elsaka
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0Sigmoid
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
 
Dive into Catalyst
Dive into CatalystDive into Catalyst
Dive into CatalystCheng Lian
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Sigmoid
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsSamir Bessalah
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Spark Summit
 

Mais procurados (20)

Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
 
Dive into Catalyst
Dive into CatalystDive into Catalyst
Dive into Catalyst
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
mesos-devoxx14
mesos-devoxx14mesos-devoxx14
mesos-devoxx14
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Hadoop
HadoopHadoop
Hadoop
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 

Destaque

Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121Mathias Herberts
 
No more (unsecure) secrets, Marty
No more (unsecure) secrets, MartyNo more (unsecure) secrets, Marty
No more (unsecure) secrets, MartyMathias Herberts
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.Mathias Herberts
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Mathias Herberts
 
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentationIoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentationMathias Herberts
 
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Mathias Herberts
 
Programmation fonctionnelle
Programmation fonctionnelleProgrammation fonctionnelle
Programmation fonctionnelleJean Detoeuf
 
Scala : programmation fonctionnelle
Scala : programmation fonctionnelleScala : programmation fonctionnelle
Scala : programmation fonctionnelleMICHRAFY MUSTAFA
 
OpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on TutorialOpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on TutorialOpenNebula Project
 
The Lambda Calculus and The JavaScript
The Lambda Calculus and The JavaScriptThe Lambda Calculus and The JavaScript
The Lambda Calculus and The JavaScriptNorman Richards
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
Programmation fonctionnelle en JavaScript
Programmation fonctionnelle en JavaScriptProgrammation fonctionnelle en JavaScript
Programmation fonctionnelle en JavaScriptLoïc Knuchel
 
Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016
Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016
Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016Loïc Knuchel
 
Mathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel Arkéa
Mathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel ArkéaMathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel Arkéa
Mathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel ArkéaModern Data Stack France
 

Destaque (15)

Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121Big Data - Open Coffee Brest - 20121121
Big Data - Open Coffee Brest - 20121121
 
No more (unsecure) secrets, Marty
No more (unsecure) secrets, MartyNo more (unsecure) secrets, Marty
No more (unsecure) secrets, Marty
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentationIoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
IoT Silicon Valley - Cityzen Sciences and Cityzen Data presentation
 
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
 
Programmation fonctionnelle
Programmation fonctionnelleProgrammation fonctionnelle
Programmation fonctionnelle
 
Scala : programmation fonctionnelle
Scala : programmation fonctionnelleScala : programmation fonctionnelle
Scala : programmation fonctionnelle
 
OpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on TutorialOpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on Tutorial
 
The Lambda Calculus and The JavaScript
The Lambda Calculus and The JavaScriptThe Lambda Calculus and The JavaScript
The Lambda Calculus and The JavaScript
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
Programmation fonctionnelle en JavaScript
Programmation fonctionnelle en JavaScriptProgrammation fonctionnelle en JavaScript
Programmation fonctionnelle en JavaScript
 
Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016
Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016
Comprendre la programmation fonctionnelle, Blend Web Mix le 02/11/2016
 
Mathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel Arkéa
Mathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel ArkéaMathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel Arkéa
Mathias Herberts fait le retour d'expérience Hadoop au Crédit Mutuel Arkéa
 

Semelhante a The Hadoop Ecosystem

Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemYahoo Developer Network
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Hidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceHidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceSven Ruppert
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiJoydeep Sen Sarma
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at ClouderaDataconomy Media
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 

Semelhante a The Hadoop Ecosystem (20)

Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Hidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceHidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-Persistence
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
hadoop-spark.ppt
hadoop-spark.ppthadoop-spark.ppt
hadoop-spark.ppt
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Unit 1
Unit 1Unit 1
Unit 1
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 

Mais de Mathias Herberts

2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...Mathias Herberts
 
20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoop20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoopMathias Herberts
 
WebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic ApproachWebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic ApproachMathias Herberts
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsMathias Herberts
 

Mais de Mathias Herberts (7)

2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
2019-09-25 Paris Time Series Meetup - Warp 10 - Advanced Time Series Technolo...
 
20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoop20170516 hug france-warp10-time-seriesanalysisontopofhadoop
20170516 hug france-warp10-time-seriesanalysisontopofhadoop
 
Big Data Tribute
Big Data TributeBig Data Tribute
Big Data Tribute
 
Hadoop Pig Syntax Card
Hadoop Pig Syntax CardHadoop Pig Syntax Card
Hadoop Pig Syntax Card
 
Hadoop Pig
Hadoop PigHadoop Pig
Hadoop Pig
 
WebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic ApproachWebScale Computing and Big Data a Pragmatic Approach
WebScale Computing and Big Data a Pragmatic Approach
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy Systems
 

Último

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Último (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

The Hadoop Ecosystem