SlideShare uma empresa Scribd logo
1 de 64
Ein MapReduce-basiertes
    Programmiermodell für
selbstwartbare Aggregatsichten
                          2012-08-31




                            Johannes Schildgen
                              TU Kaiserslautern
                                schildgen@cs.uni-kl.de
Motivation
11    26        14 23 37
             39   41         26
   19    8
                     25
19   22 15 18 10 16
                    27
 8 9    12 14 15
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -1
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -8
Pickup: 12         Snickers: +24
…                  Ritter Sport: -7
Increment Installation

Kinderriegel: 26
Balisto: 31
Hanuta: 14
Snickers: 43
Ritter Sport: 34
                          Δ
                     Balisto: -8
Pickup: 12           Snickers: +24
…                    Ritter Sport: -7
Overwrite Installation

Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
                                   Δ
                               Balisto: -8
Ritter Sport: 41Kinderriegel: 26
Pickup: 12      Balisto: 31    Snickers: +24
                Hanuta: 14 Ritter Sport: -7
…               Snickers: 43
                Ritter Sport: 34
                Pickup: 12
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
public class WordCount extends Configured implements Tool {

        public static class WordCountMapper extends
                         Mapper<LongWritable, Text, ImmutableBytesWritable,
                                          LongWritable> {

                 private Text word = new Text();

                 @Override
                 public void map(LongWritable key, Text value, Context context)
                                   throws IOException, InterruptedException {
                          String line = value.toString();
                          StringTokenizer tokenizer = new StringTokenizer(line);
                          while (tokenizer.hasMoreTokens()) {
                                   this.word.set(tokenizer.nextToken());
A   A   E
B   E   F
C   F   B
D   C   D
> create 'person', 'default'
> put 'person', 'p27', 'default:vorname', 'Anton'
> put 'person', 'p27', 'default:nachname', 'Schmidt'

> get 'person', 'p27'
COLUMN                CELL
 default:nachname     timestamp=1338991497408, value=Schmidt
 default:vorname      timestamp=1338991436688, value=Anton
2 row(s) in 0.0640 seconds
banane
die
iss
nimm
          3
          4
          1
          1
                Δ
              kaufe     1
schale    1
              die       0
schäle    1
              banane    1
schmeiß   1
              schmeiß   -1
weg       1
              schale    -1
              weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
              increment()
schale    0                 die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
banane
die
iss
nimm
          3
          4
          1
          1
                              Δ
                            kaufe     1
schale    1   overwrite()   die       0
schäle    1
                            banane    1
schmeiß   1
                            schmeiß   -1
weg       1
                            schale    -1
                            weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
schale    0   overwrite()   die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }



}
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }
 else { // old result
    write(key, value);
 }
}
Overwrite Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  put = new Put(key);
  put.add("fam", "col", sum);
  context.write(key, put);
}
Increment Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  inc = new Increment(key);
  inc.add("fam", "col", sum);
  context.write(key, inc);
}
Formalisierung
Formalisierung
Allgemeiner Mapper
Allgemeiner Reducer
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
Kernfunktionalität:
Verteiltes Rechnen mittels MapReduce

                                          Ich kümmere mich um:
                                             IncDec, Overwrite,
                                           Lesen alter Ergebnisse,
                                       Erzeugung von Inkrementen,…


                             Ich erkläre dir, wie man
                              die Eingabedaten liest,
Kernfunktionalität:        aggregiert, invertiert und die
                                 Ausgabe schreibt.
Inkrementelles Rechnen
public class WordTranslator extends
    Translator<LongWritable, Text> {
  public void translate(…) {
    …
}


IncJob job = new IncJobOverwrite(conf);
job.setTranslatorClass(
             WordTranslator.class);
job.setAbelianClass(WordAbelian.class);
public class WordAbelian implements
    Abelian<WordAbelian> {
 WordAbelian invert() { … }
 WordAbelian aggregate(WordAbelian
                        other) { … }
 WordAbelian neutral() { … }
 boolean isNeutral() { … }
 Writable extractKey() { … }
 void write(…) { … }
 void readFields(…) { … }
}
public class WordSerializer
 implements Serializer<WordAbelian> {

 Writable serialize(Writable key,
                    WordAbelian v) {
    …
 }
 WordAbelian deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value) {
    …
 }
}
How To Write A Marimba-Job

1. Abelian-Klasse
2. Translator-Klasse
3. Serializer-Klasse
4. Hadoop-Job schreiben unter
   Verwendung der Klasse IncJob
Implementierung

                                      setInputTable(…)


                     IncJob          setOutputTable(…)




  IncJobFull
                    IncJobIncDec      IncJobOverwrite
Recomputation

                                   setResultInputTable(…)
NeutralOutputStrategy
(bei IncJobOverwrite)
public interface Abelian<T extends
 Abelian<?>> extends
 WritableComparable<BinaryComparable>{

 T invert();
 T aggregate(T other);
 T neutral();
 boolean isNeutral();
 Writable extractKey();
 void write(…);
 void readFields(…);
}
public interface Serialzer<T extends
 Abelian<?>> {

 Writable serialize(T obj);
 T deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value);
}
public abstract class Translator
 <KEYIN, VALUEIN> {

 public abstract void translate
    (KEYIN key, VALUEIN value,
     Context context);

this.mapContext.write(
    abelianValue.extractKey(),
    this.invertValue ?
         abelianValue.invert() :
         abelianValue);
GenericMapper


Eingabeformat liefert:      Value


OverwriteResult    InsertedValue      DeletedValue           PreservedValue




 deserialisieren    weitergeben    setze invertValue=true;     ignorieren
                                   weitergeben
GenericReducer


            1. Aggregieren
            2. Serialisieren
            3. Schreiben


IncDec:                        Overwrite:
                               PUT → schreiben
putToIncrement(…)              IGNORE → nicht schreiben
                               DELETE → putToDelete(...)
GenericCombiner

„Write A Combiner“
   -- 7 Tips for Improving MapReduce Performance, (Tipp 4)




                1. Aggregieren
TextWindowInputFormat
Beispielanwendungen:
                   1. WordCount
void translate(key, value) {
                                   WordAbelian invert() {
                                    return new WordAbelian(
 for(word : value.split(" ")) {        this.word,
  write(                               -1 * this.count);
    new WordAbelian(word, 1));     }
 }
}                                   WordAbelian aggregate(
                                    WordAbelian other) {
                                     return new WordAbelian(
Writable serialize(                     this.word,
    WordAbelian w) {                    this.count
 Put p = new Put(                       + other.count);
        w.getWord());               }
 p.add(…);
 return p;                         boolean neutral() {
}                                   return new WordAbelian(
                                       this.word, 0);
                                   }

                                  boolean isNeutral() {

  Translator                      }
                                   return (this.count == 0);



  Serializer                      WordAbelian
Beispielanwendungen:
     2. Friends Of Friends
                      FRIENDS


A
             D
     B                FRIENDS OF FRIENDS

C
         E
Beispielanwendungen:
              2. Friends Of Friends
            translate(person, friends):

aggregate(…):
Menge der Freundes-
freunde mischen
Beispielanwendungen:
3. Reverse WebLink-Graph

                            REVERSE WEB LINK GRAPH
                            (Row-ID -> Columns)

                            Google -> {eBay, Wikipedia}

          aggregate(…): -> {Google, Wikipedia}
                      eBay

          Menge der LinksMensa-KL -> {Google}
                          mischen
                            Facebook -> {Google, Mensa-
                            KL, Uni-KL}

                            Wikipedia -> {Google}

                            Uni-KL -> {Google, Wikipedia}
Beispielanwendungen:
         4. Bigrams
Hi, kannst du mich ___?___ am
Bahnhof abholen? So in etwa
10 ___?___. Viele liebe ___?__.
P.S. Ich habe viel ___?___.
Idee:
 Große Mengen von
Text mit MapReduce
    analysieren.
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
„Woher Daten
 nehmen?“
bitte
Hi, kannst du mich ___ ___ am
                    nicht
Bahnhof abholen? So in etwa
<num>Minuten
     <num>
     Jahre
10 ___ ___.                Grüße
                            dich
               Viele liebe ___ __.
                   zu
                   Spaß
P.S. Ich habe viel ___ ___.
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
WordCount

               01:10




               01:00




               00:50




               00:40
Zeit [hh:mm]




               00:30                                                               FULL
                                                                                   INCDEC
                                                                                   OVERWRITE
               00:20




               00:10




               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Reverse Weblink-Graph

               02:51

               02:41

               02:31

               02:21

               02:11

               02:00

               01:50

               01:40
Zeit [hh:mm]




               01:30
                                                                                   FULL
               01:20
                                                                                   INCDEC
               01:10
                                                                                   OVERWRITE
               01:00

               00:50

               00:40

               00:30

               00:20

               00:10

               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Fazit
Vollständige Neuberechnung
IncDec / Overwrite
Verwendete Grafiken
Folie 5-9:                                               Folie 37-44:
Flammen und Smilie: Microsoft Office 2010                Puzzle: http://www.flickr.com/photos/dps/136565237/

Folie 10:                                                Folie 46 - 48:
Google: http://www.google.de                             Junge: Microsoft Office 2010

Folie 11:                                                Folie 49:
Amazon: http://www.amazon.de                             Google: http://www.google.de
                                                         eBay: http://www.ebay.de
Folie 12:                                                Mensa-KL: http://www.mensa-kl.de
Hadoop: http://hadoop.apache.org                         facebook: http://www.facebook.de
Casio Wristwatch:                                        Wikipedia: http://de.wikipedia.org
http://www.flickr.com/photos/andresrueda/3448240252      TU Kaiserslautern: http://www.uni-kl.de

Folie 16:                                                Folie 50-51:
Hadoop: http://hadoop.apache.org                         Handy: Microsoft Office 2010

Folie 17:                                                Folie 56:
Hadoop: http://hadoop.apache.org                         Wikipedia: http://de.wikipedia.org
Notebook: Microsoft Office 2010                          Twitter: http://www.twitter.com

Folie 18:                                                Folie 57:
HBase: http://hbase.apache.org                           Handy: Microsoft Office 2010

Folie 31:                                                Folie 58:
Hadoop: http://hadoop.apache.org                         Hadoop: http://hadoop.apache.org
Casio Wristwatch:                                        Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252
http://www.flickr.com/photos/andresrueda/3448240252

Folie 32:
Gerüst: http://www.flickr.com/photos/michale/94538528/
Hadoop: http://hadoop.apache.org
Junge: Microsoft Office 2010
Literaturverzeichnis (1/2)
[0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten.
Masterarbeit, TU Kaiserslautern, August 2012

[1] Apache Hadoop project. http://hadoop.apache.org/.
[2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526.
[3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010.
http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/.
[4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012.
http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast.
[5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012.
http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1.
[6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004.
[7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011.
[8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis.
http://database.cs.brown.edu/projects/mapreduce-vs-dbms/.
[9] Ricky Ho. Map/Reduce to recommend people connection, August 2010.
http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html.
[10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012.
[11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views?
In LADIS 2011, pages 1 – 5, 9 2011.
[12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011.
[13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce.
[14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving-
mapreduce-performance/.
Literaturverzeichnis (2/2)
[15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012.
http://play.google.com/store/apps/details?id=com.touchtype.swiftkey.
[16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf.
[17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures.
ICDE, pages 183–194, 2011.
[18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012.
http://heise.de/-1569837.
[19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7.
[20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern,
Juni 2011.
[21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012.
http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/.
[23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012.
[24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009.
http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation.
[25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009.
[26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012.
http://www.formtek.com/blog/?p=3032.
[27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60

Mais conteúdo relacionado

Mais procurados

Empathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeEmpathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeMario Gleichmann
 
Scala in a Java 8 World
Scala in a Java 8 WorldScala in a Java 8 World
Scala in a Java 8 WorldDaniel Blyth
 
TDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypeTDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypetdc-globalcode
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Qiangning Hong
 
Exhibition of Atrocity
Exhibition of AtrocityExhibition of Atrocity
Exhibition of AtrocityMichael Pirnat
 
The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212Mahmoud Samir Fayed
 
집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링Kwang Woo NAM
 
Functional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenFunctional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenPawel Szulc
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 
Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기JangHyuk You
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to PigChris Wilkes
 
A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009Jordan Baker
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 

Mais procurados (20)

Empathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible codeEmpathic Programming - How to write comprehensible code
Empathic Programming - How to write comprehensible code
 
Scala in a Java 8 World
Scala in a Java 8 WorldScala in a Java 8 World
Scala in a Java 8 World
 
TDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hypeTDC2016SP - Código funcional em Java: superando o hype
TDC2016SP - Código funcional em Java: superando o hype
 
Introduction to Groovy
Introduction to GroovyIntroduction to Groovy
Introduction to Groovy
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Exhibition of Atrocity
Exhibition of AtrocityExhibition of Atrocity
Exhibition of Atrocity
 
The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212The Ring programming language version 1.10 book - Part 81 of 212
The Ring programming language version 1.10 book - Part 81 of 212
 
Unit testing pig
Unit testing pigUnit testing pig
Unit testing pig
 
집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링
 
Functional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heavenFunctional Programming & Event Sourcing - a pair made in heaven
Functional Programming & Event Sourcing - a pair made in heaven
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
Go &lt;-> Ruby
Go &lt;-> RubyGo &lt;-> Ruby
Go &lt;-> Ruby
 
Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기Programming Lisp Clojure - 2장 : 클로저 둘러보기
Programming Lisp Clojure - 2장 : 클로저 둘러보기
 
Corona sdk
Corona sdkCorona sdk
Corona sdk
 
Ruby 1.9
Ruby 1.9Ruby 1.9
Ruby 1.9
 
Pig Introduction to Pig
Pig Introduction to PigPig Introduction to Pig
Pig Introduction to Pig
 
A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009A Taste of Python - Devdays Toronto 2009
A Taste of Python - Devdays Toronto 2009
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Lập trình Python cơ bản
Lập trình Python cơ bảnLập trình Python cơ bản
Lập trình Python cơ bản
 
Begin with Python
Begin with PythonBegin with Python
Begin with Python
 

Semelhante a Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

Coffee script
Coffee scriptCoffee script
Coffee scripttimourian
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapHoward Lewis Ship
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with GroovyArturo Herrero
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Codemotion
 
Programmation fonctionnelle Scala
Programmation fonctionnelle ScalaProgrammation fonctionnelle Scala
Programmation fonctionnelle ScalaSlim Ouertani
 
Monadologie
MonadologieMonadologie
Monadologieleague
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined FunctionsChristoph Bauer
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Rick Copeland
 
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopOSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopPublicis Sapient Engineering
 
Introduction to Kotlin.pptx
Introduction to Kotlin.pptxIntroduction to Kotlin.pptx
Introduction to Kotlin.pptxAzharFauzan9
 
01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptxIvanZawPhyo
 
JavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyJavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyYasuharu Nakano
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsMichael Pirnat
 
Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)intelliyole
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Eelco Visser
 
Pydiomatic
PydiomaticPydiomatic
Pydiomaticrik0
 
Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Paulo Morgado
 

Semelhante a Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten (20)

Coffee script
Coffee scriptCoffee script
Coffee script
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter Bootstrap
 
Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with Groovy
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
 
Programmation fonctionnelle Scala
Programmation fonctionnelle ScalaProgrammation fonctionnelle Scala
Programmation fonctionnelle Scala
 
Monadologie
MonadologieMonadologie
Monadologie
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined Functions
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)Python Functions (PyAtl Beginners Night)
Python Functions (PyAtl Beginners Night)
 
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour HadoopOSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
 
Introduction to Kotlin.pptx
Introduction to Kotlin.pptxIntroduction to Kotlin.pptx
Introduction to Kotlin.pptx
 
01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx01 Introduction to Kotlin - Programming in Kotlin.pptx
01 Introduction to Kotlin - Programming in Kotlin.pptx
 
JavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovyJavaSE7 Launch Event: Java7xGroovy
JavaSE7 Launch Event: Java7xGroovy
 
Python speleology
Python speleologyPython speleology
Python speleology
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)Feel of Kotlin (Berlin JUG 16 Apr 2015)
Feel of Kotlin (Berlin JUG 16 Apr 2015)
 
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
Model-Driven Software Development - Pretty-Printing, Editor Services, Term Re...
 
Pydiomatic
PydiomaticPydiomatic
Pydiomatic
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7Tuga IT 2017 - What's new in C# 7
Tuga IT 2017 - What's new in C# 7
 

Mais de Johannes Schildgen

Visualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingVisualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingJohannes Schildgen
 
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...Johannes Schildgen
 
Incremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLIncremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLJohannes Schildgen
 
Big-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenBig-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenJohannes Schildgen
 

Mais de Johannes Schildgen (6)

Precision and Recall
Precision and RecallPrecision and Recall
Precision and Recall
 
Visualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using SamplingVisualization of NotaQL Transformations using Sampling
Visualization of NotaQL Transformations using Sampling
 
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column S...
 
Incremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQLIncremental Data Transformations on Wide-Column Stores with NotaQL
Incremental Data Transformations on Wide-Column Stores with NotaQL
 
Big-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-DatenbankenBig-Data-Analyse und NoSQL-Datenbanken
Big-Data-Analyse und NoSQL-Datenbanken
 
Precision und Recall
Precision und RecallPrecision und Recall
Precision und Recall
 

Último

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

  • 1. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten 2012-08-31 Johannes Schildgen TU Kaiserslautern schildgen@cs.uni-kl.de
  • 3. 11 26 14 23 37 39 41 26 19 8 25 19 22 15 18 10 16 27 8 9 12 14 15
  • 4. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 5. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 6. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -1 Pickup: 12 …
  • 7. Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 8. Increment Installation Kinderriegel: 26 Balisto: 31 Hanuta: 14 Snickers: 43 Ritter Sport: 34 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 9. Overwrite Installation Kinderriegel: 26 Balisto: 39 Hanuta: 14 Snickers: 19 Δ Balisto: -8 Ritter Sport: 41Kinderriegel: 26 Pickup: 12 Balisto: 31 Snickers: +24 Hanuta: 14 Ritter Sport: -7 … Snickers: 43 Ritter Sport: 34 Pickup: 12
  • 10.
  • 11.
  • 12. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 13.
  • 14.
  • 15.
  • 16. public class WordCount extends Configured implements Tool { public static class WordCountMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, LongWritable> { private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { this.word.set(tokenizer.nextToken());
  • 17. A A E B E F C F B D C D
  • 18. > create 'person', 'default' > put 'person', 'p27', 'default:vorname', 'Anton' > put 'person', 'p27', 'default:nachname', 'Schmidt' > get 'person', 'p27' COLUMN CELL default:nachname timestamp=1338991497408, value=Schmidt default:vorname timestamp=1338991436688, value=Anton 2 row(s) in 0.0640 seconds
  • 19. banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 20. banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 increment() schale 0 die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 21. banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 overwrite() die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 22. banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 schale 0 overwrite() die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 23. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } }
  • 24. void map(key, value) { if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } else { // old result write(key, value); } }
  • 25. Overwrite Installation void reduce(key, values) { sum = 0; for(value : value) { sum += value; } put = new Put(key); put.add("fam", "col", sum); context.write(key, put); }
  • 26. Increment Installation void reduce(key, values) { sum = 0; for(value : value) { sum += value; } inc = new Increment(key); inc.add("fam", "col", sum); context.write(key, inc); }
  • 31. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 32. Kernfunktionalität: Verteiltes Rechnen mittels MapReduce Ich kümmere mich um: IncDec, Overwrite, Lesen alter Ergebnisse, Erzeugung von Inkrementen,… Ich erkläre dir, wie man die Eingabedaten liest, Kernfunktionalität: aggregiert, invertiert und die Ausgabe schreibt. Inkrementelles Rechnen
  • 33. public class WordTranslator extends Translator<LongWritable, Text> { public void translate(…) { … } IncJob job = new IncJobOverwrite(conf); job.setTranslatorClass( WordTranslator.class); job.setAbelianClass(WordAbelian.class);
  • 34. public class WordAbelian implements Abelian<WordAbelian> { WordAbelian invert() { … } WordAbelian aggregate(WordAbelian other) { … } WordAbelian neutral() { … } boolean isNeutral() { … } Writable extractKey() { … } void write(…) { … } void readFields(…) { … } }
  • 35. public class WordSerializer implements Serializer<WordAbelian> { Writable serialize(Writable key, WordAbelian v) { … } WordAbelian deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value) { … } }
  • 36. How To Write A Marimba-Job 1. Abelian-Klasse 2. Translator-Klasse 3. Serializer-Klasse 4. Hadoop-Job schreiben unter Verwendung der Klasse IncJob
  • 37. Implementierung setInputTable(…) IncJob setOutputTable(…) IncJobFull IncJobIncDec IncJobOverwrite Recomputation setResultInputTable(…)
  • 39. public interface Abelian<T extends Abelian<?>> extends WritableComparable<BinaryComparable>{ T invert(); T aggregate(T other); T neutral(); boolean isNeutral(); Writable extractKey(); void write(…); void readFields(…); }
  • 40. public interface Serialzer<T extends Abelian<?>> { Writable serialize(T obj); T deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value); }
  • 41. public abstract class Translator <KEYIN, VALUEIN> { public abstract void translate (KEYIN key, VALUEIN value, Context context); this.mapContext.write( abelianValue.extractKey(), this.invertValue ? abelianValue.invert() : abelianValue);
  • 42. GenericMapper Eingabeformat liefert: Value OverwriteResult InsertedValue DeletedValue PreservedValue deserialisieren weitergeben setze invertValue=true; ignorieren weitergeben
  • 43. GenericReducer 1. Aggregieren 2. Serialisieren 3. Schreiben IncDec: Overwrite: PUT → schreiben putToIncrement(…) IGNORE → nicht schreiben DELETE → putToDelete(...)
  • 44. GenericCombiner „Write A Combiner“ -- 7 Tips for Improving MapReduce Performance, (Tipp 4) 1. Aggregieren
  • 46. Beispielanwendungen: 1. WordCount void translate(key, value) { WordAbelian invert() { return new WordAbelian( for(word : value.split(" ")) { this.word, write( -1 * this.count); new WordAbelian(word, 1)); } } } WordAbelian aggregate( WordAbelian other) { return new WordAbelian( Writable serialize( this.word, WordAbelian w) { this.count Put p = new Put( + other.count); w.getWord()); } p.add(…); return p; boolean neutral() { } return new WordAbelian( this.word, 0); } boolean isNeutral() { Translator } return (this.count == 0); Serializer WordAbelian
  • 47. Beispielanwendungen: 2. Friends Of Friends FRIENDS A D B FRIENDS OF FRIENDS C E
  • 48. Beispielanwendungen: 2. Friends Of Friends translate(person, friends): aggregate(…): Menge der Freundes- freunde mischen
  • 49. Beispielanwendungen: 3. Reverse WebLink-Graph REVERSE WEB LINK GRAPH (Row-ID -> Columns) Google -> {eBay, Wikipedia} aggregate(…): -> {Google, Wikipedia} eBay Menge der LinksMensa-KL -> {Google} mischen Facebook -> {Google, Mensa- KL, Uni-KL} Wikipedia -> {Google} Uni-KL -> {Google, Wikipedia}
  • 50. Beispielanwendungen: 4. Bigrams Hi, kannst du mich ___?___ am Bahnhof abholen? So in etwa 10 ___?___. Viele liebe ___?__. P.S. Ich habe viel ___?___.
  • 51. Idee: Große Mengen von Text mit MapReduce analysieren.
  • 52.
  • 53.
  • 54. Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 55. Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 57. bitte Hi, kannst du mich ___ ___ am nicht Bahnhof abholen? So in etwa <num>Minuten <num> Jahre 10 ___ ___. Grüße dich Viele liebe ___ __. zu Spaß P.S. Ich habe viel ___ ___.
  • 58. Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 59. WordCount 01:10 01:00 00:50 00:40 Zeit [hh:mm] 00:30 FULL INCDEC OVERWRITE 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 60. Reverse Weblink-Graph 02:51 02:41 02:31 02:21 02:11 02:00 01:50 01:40 Zeit [hh:mm] 01:30 FULL 01:20 INCDEC 01:10 OVERWRITE 01:00 00:50 00:40 00:30 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 62. Verwendete Grafiken Folie 5-9: Folie 37-44: Flammen und Smilie: Microsoft Office 2010 Puzzle: http://www.flickr.com/photos/dps/136565237/ Folie 10: Folie 46 - 48: Google: http://www.google.de Junge: Microsoft Office 2010 Folie 11: Folie 49: Amazon: http://www.amazon.de Google: http://www.google.de eBay: http://www.ebay.de Folie 12: Mensa-KL: http://www.mensa-kl.de Hadoop: http://hadoop.apache.org facebook: http://www.facebook.de Casio Wristwatch: Wikipedia: http://de.wikipedia.org http://www.flickr.com/photos/andresrueda/3448240252 TU Kaiserslautern: http://www.uni-kl.de Folie 16: Folie 50-51: Hadoop: http://hadoop.apache.org Handy: Microsoft Office 2010 Folie 17: Folie 56: Hadoop: http://hadoop.apache.org Wikipedia: http://de.wikipedia.org Notebook: Microsoft Office 2010 Twitter: http://www.twitter.com Folie 18: Folie 57: HBase: http://hbase.apache.org Handy: Microsoft Office 2010 Folie 31: Folie 58: Hadoop: http://hadoop.apache.org Hadoop: http://hadoop.apache.org Casio Wristwatch: Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252 http://www.flickr.com/photos/andresrueda/3448240252 Folie 32: Gerüst: http://www.flickr.com/photos/michale/94538528/ Hadoop: http://hadoop.apache.org Junge: Microsoft Office 2010
  • 63. Literaturverzeichnis (1/2) [0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten. Masterarbeit, TU Kaiserslautern, August 2012 [1] Apache Hadoop project. http://hadoop.apache.org/. [2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526. [3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010. http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/. [4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012. http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast. [5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012. http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1. [6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004. [7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011. [8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis. http://database.cs.brown.edu/projects/mapreduce-vs-dbms/. [9] Ricky Ho. Map/Reduce to recommend people connection, August 2010. http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html. [10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012. [11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views? In LADIS 2011, pages 1 – 5, 9 2011. [12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011. [13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce. [14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving- mapreduce-performance/.
  • 64. Literaturverzeichnis (2/2) [15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012. http://play.google.com/store/apps/details?id=com.touchtype.swiftkey. [16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf. [17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures. ICDE, pages 183–194, 2011. [18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012. http://heise.de/-1569837. [19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009. http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7. [20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern, Juni 2011. [21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. [22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012. http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/. [23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012. [24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009. http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation. [25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009. [26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012. http://www.formtek.com/blog/?p=3032. [27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60