Map Reduce: An Example (James Grant at Big Data Brighton)

•

0 gostou•1,459 visualizações

Presentation by Brandwatch Developer James Grant at the second Big Data Brighton meetup, hosted by Brandwatch: www.brandwatch.com

Who am I?
My name is James Grant (james@queeg.org).

I'm a developer here at Brandwatch.

For the last three years I've been a Data
Engineer at Last.fm and the maintainer of their
Hadoop Cluster.

Coming up…
● What happens during MapReduce?
● Plays and Reach from music listening data
● The Mapper pseudo code
● The Reducer pseudo code
● The result
● What if…?

What happens during MapReduce?

Input Data
Data
Data
Fragment Mapper Map
Data Fragment
Fragment Output

Sort
Data
Data
Reduce Reducer
Fragment Reducer
Fragment
Output Input

Plays and Reach from music
listening data
● Plays - The number of times that song has
been played
● Reach - The number of unique listeners to
that song
● Similar to hits and uniques for web
properties
● Input data has columns for user id and song
id (amongst others)

The Mapper
function map(Integer user, Integer song):
emit(song, user);

The Reducer
function reduce(Integer song, Iterator users):
Integer plays = 0;
Set uniqueUsers = [];

foreach user in users:
increment plays;
if user not within uniqueUsers:
uniqueUsers.add(user);

result.plays = plays;
result.reach = uniqueUsers.cardinality();
emit(song, result);

What if…?
You often hear that for nearly all cases you
should use a higher level tool like Pig or Hive to
solve problems.

So what does the Pig script look like for this
problem?

Using Pig
subs = LOAD 'submissions.tsv' USING PigStorage()
AS (user:int, song:int);
songs = GROUP subs BY song;
songs = FOREACH songs GENERATE group AS song, subs.user;
songs = FOREACH songs GENERATE
song, COUNT($1.user), COUNT(Distinct($1.user));
STORE songs INTO 'playsreach.tsv';

Mais conteúdo relacionado

Semelhante a Map Reduce: An Example (James Grant at Big Data Brighton)

Map Reduce basicsAbhishek Mukherjee

WELCOME TO BIG DATA TRANINGUtkarsh Srivastava

AllegographUniversity of New South Wales

Remixable Media Week 5 SeminarMichela Ledwidge

MapReduce: teoria e práticaPET Computação

To GO or not to GOsuperstas88

What Shazam doesn't want you to knowRoy van Rijn

Large Scale Data Processing & StorageIlayaraja P

Hadoop Map ReduceVNIT-ACM Student Chapter

Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...Unity Technologies

GDG DevFest Kyoto 2014　これからのGoの話をしようSatoshi Noda

Menggabungkan audio ke dalam sajian multimedia 3.englishEko Supriyadi

Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...sebastianewert

Scmad Chapter12Marcel Caraciolo

ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez

Mapreduce AlgorithmsAmund Tveit

GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...AMD Developer Central

Audio equalizerHasham khan

Lecture 2 part 3Jazan University

Introduction to R for Learning Analytics ResearchersVitomir Kovanovic

Semelhante a Map Reduce: An Example (James Grant at Big Data Brighton) (20)

Map Reduce basics

WELCOME TO BIG DATA TRANING

Allegograph

Remixable Media Week 5 Seminar

MapReduce: teoria e prática

To GO or not to GO

What Shazam doesn't want you to know

Large Scale Data Processing & Storage

Hadoop Map Reduce

Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...

GDG DevFest Kyoto 2014　これからのGoの話をしよう

Menggabungkan audio ke dalam sajian multimedia 3.english

Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...

Scmad Chapter12

ACM DBPL Keynote: The Graph Traversal Machine and Language

Mapreduce Algorithms

GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...

Audio equalizer

Lecture 2 part 3

Introduction to R for Learning Analytics Researchers

Mais de Brandwatch

Identifying and Analyzing a target audience with Analytics Brandwatch

Brand protection & Crisis AversionBrandwatch

Leveraging Insights with Creative SegmentationBrandwatch

Life As a Brandwatch AnalystBrandwatch

Intelligence: The Fundamentals Brandwatch

Control vs. Culture: The New Technology Operating EnvironmentBrandwatch

Collective creativity for better intelligenceBrandwatch

Ethics and humanity in the age of technology Brandwatch

Digital transformation in a regulated industry Brandwatch

Emotional Intelligence Brandwatch

25 things we learned analyzing billions of tweets Brandwatch

PSB + Aga Khan Foundation: United We BrandBrandwatch

Ditch the Label and Brandwatch: Mental Health Study, 2017Brandwatch

Telling a story with your social insightsBrandwatch

Combining Brandwatch and non Brandwatch data using Vizia 2Brandwatch

How can social listening help to determine ROI?Brandwatch

One step ahead: How Co-op uses Brandwatch to inform their businessBrandwatch

Today’s Reality: Managing & Monitoring Campus Crises through Social MediaBrandwatch

Social Truth: Revealing what Truly Matters to CustomersBrandwatch

Social MaturityBrandwatch

Mais de Brandwatch (20)

Identifying and Analyzing a target audience with Analytics

Brand protection & Crisis Aversion

Leveraging Insights with Creative Segmentation

Life As a Brandwatch Analyst

Intelligence: The Fundamentals

Control vs. Culture: The New Technology Operating Environment

Collective creativity for better intelligence

Ethics and humanity in the age of technology

Digital transformation in a regulated industry

Emotional Intelligence

25 things we learned analyzing billions of tweets

PSB + Aga Khan Foundation: United We Brand

Ditch the Label and Brandwatch: Mental Health Study, 2017

Telling a story with your social insights

Combining Brandwatch and non Brandwatch data using Vizia 2

How can social listening help to determine ROI?

One step ahead: How Co-op uses Brandwatch to inform their business

Today’s Reality: Managing & Monitoring Campus Crises through Social Media

Social Truth: Revealing what Truly Matters to Customers

Social Maturity

Map Reduce: An Example (James Grant at Big Data Brighton)

1. Map Reduce An Example

2. Who am I? My name is James Grant (james@queeg.org). I'm a developer here at Brandwatch. For the last three years I've been a Data Engineer at Last.fm and the maintainer of their Hadoop Cluster.

3. Coming up… ● What happens during MapReduce? ● Plays and Reach from music listening data ● The Mapper pseudo code ● The Reducer pseudo code ● The result ● What if…?

4. What happens during MapReduce? Input Data Data Data Fragment Mapper Map Data Fragment Fragment Output Sort Data Data Reduce Reducer Fragment Reducer Fragment Output Input

5. Plays and Reach from music listening data ● Plays - The number of times that song has been played ● Reach - The number of unique listeners to that song ● Similar to hits and uniques for web properties ● Input data has columns for user id and song id (amongst others)

6. The Mapper function map(Integer user, Integer song): emit(song, user);

7. The Reducer function reduce(Integer song, Iterator users): Integer plays = 0; Set uniqueUsers = []; foreach user in users: increment plays; if user not within uniqueUsers: uniqueUsers.add(user); result.plays = plays; result.reach = uniqueUsers.cardinality(); emit(song, result);

8. What if…? You often hear that for nearly all cases you should use a higher level tool like Pig or Hive to solve problems. So what does the Pig script look like for this problem?

9. Using Pig subs = LOAD 'submissions.tsv' USING PigStorage() AS (user:int, song:int); songs = GROUP subs BY song; songs = FOREACH songs GENERATE group AS song, subs.user; songs = FOREACH songs GENERATE song, COUNT($1.user), COUNT(Distinct($1.user)); STORE songs INTO 'playsreach.tsv';

10. Questions?

Map Reduce: An Example (James Grant at Big Data Brighton)

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Map Reduce: An Example (James Grant at Big Data Brighton)

Semelhante a Map Reduce: An Example (James Grant at Big Data Brighton) (20)

Mais de Brandwatch

Mais de Brandwatch (20)

Map Reduce: An Example (James Grant at Big Data Brighton)