04 Algorithms

•

0 gostou•368 visualizações

Omid Djoudi

Algorithms
SORT
map (key, values):
for each val in values:
emit (values)

No reduce needed

Values will be automatically sorted by shuffle/sort

2011 IPM - HPC4 2

Algorithms
INVERTED INDEX
File1 : aa bb cc
File2: bb cc
Result -> (aa,”File1”) (“bb, “File1,File2”) (“cc”, “File1,FIle2”)

map (key, values):
for each val in values:
emit (val, Key)

reduce (key, values):
string str
for each val in values:
str +=“val,”
emit (key, str)

Algorithms
INNER JOIN
Map ()
if (type == PK) emit (a_id, „A‟), a_data)
else emit (a_id, „B‟), b_data)

-> Secondary sort – intermdiate values ordered by key,keyType (PK or FK)
=> Primary Key will always be before Foreing key

Reduce()
string a_data_val
if (key.keyType == „A‟) a_data_val = value.data
if (key.keyType==„B‟) emit (key.a_id,a_data_val,value);

Algorithms
Standard Deviation.
Weather datasets - classify abnormal climatic
conditions.
StdDev one of the measures of dispersion
describing the spread of data
Standard Deviations Abnormality Probability of
Away From Mean Occurance
beyond -3 sd extremely subnormal 0.15%
-3 to -2 sd greatly subnormal 2.35%
-2 to -1 sd subnormal 13.50%
-1 to +1 sd normal 68.00%
+1 to +2 sd above normal 13.50%
+2 to +3 sd greatly above normal 2.35%
beyond +3 sd extremely above 0.15%
normal

Algorithms
Weather dataset : http://www.ncdc.noaa.gov/
0200010570999992011010106004...000010021019N0250001N1-01401-01591999999ADDAA112...70002;
0114010570999992011010112004...000010021019N0750001N1-00901-01081999999ADDAY1818...693/;
0114010570999992011012712004...005010300019N0750001N1+00131-00581999999ADDAY1310...3945;

Extract Date, Temperature and Quality.

The process should:
Filter by Quality
Calculate Mean for temperature on each date.
Calculate standard deviation for temperature on each date.

$Algorithms Standard deviation Map() {if quality = … Emit(date,temp)} Can we use a combiner? Reduce(date,temp) { All processing is done in reducers ,no n = size(temp) load balancing across nodes. μ = ∑temp/n; σ = √ ∑(temp_i–μ)²/n Bottleneck if many sampling per date Emit (date, σ) (temperature array becoming too } big).$

Algorithms
Standard deviation can be expressed differently:

Map(){ Reduce(date,[[n,sum,sum2]])
Emit(date,[1,temp,temp²])} {
μ = ∑sum/ ∑n;
σ = √ ((∑(sum2) / ∑n) - μ²);
Combine(date,[[n,sum,sum2]]){ Emit (date, σ)
Emit (date, }
[∑n,∑sum,∑sum2])}

Combiner contain the associative part of calculation.
It’s executed on mapper nodes -> Much better load balancing.
But is combiner always executed ?

Reference
http://www.cloudera.com

Hadoop – The definitive guide
Tom White

Data-Intensive Text Processing with MapReduce
Jimmy Lin and Chris Dyer

Beautiful Data
Toby Segaran / Jeff Hammerbacher

2011 IPM - HPC4 10

Mais conteúdo relacionado

Mais procurados

Pyclustering tutorial - K-meansAndrei Novikov

Chapter 1 Basic ConceptsHareem Aslam

Simulation and Comparison of P, PI, PID Controllers on MATLAB/ SimulinkHarshKumar649

LTI System, Basic Types of Digital signals, Basic Operations, Causality, Stab...Waqas Afzal

PCA and LDA in machine learningAkhilesh Joshi

5. R basicsFAO

Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman

A Scalable Dataflow Implementation of Curran's Approximation AlgorithmNECST Lab @ Politecnico di Milano

Energy of Some Simple Graphs: MATLAB ApproachIJCSIS Research Publications

BREEZE 3D Analyst for the Advanced AERMOD ModelerBREEZE Software

Presentation on binary search, quick sort, merge sort and problemsSumita Das

Maps&hash tablesPriyanka Rana

Size measurement and estimationLouis A. Poulin

rit seminars-privacy assured outsourcing of image reconstruction services in ...thahirakabeer

control systemnaqeeb93

Energy Efficient Compression of Shock Data using Compressed SensingJerrin Panachakel

Inside Apache SystemMLFrederick Reiss

Algorithm: Quick-SortTareq Hasan

11. Linear ModelsFAO

Programming Assignment HelpProgramming Homework Help

Mais procurados (20)

Pyclustering tutorial - K-means

Chapter 1 Basic Concepts

Simulation and Comparison of P, PI, PID Controllers on MATLAB/ Simulink

LTI System, Basic Types of Digital signals, Basic Operations, Causality, Stab...

PCA and LDA in machine learning

5. R basics

Enhancing Spark SQL Optimizer with Reliable Statistics

A Scalable Dataflow Implementation of Curran's Approximation Algorithm

Energy of Some Simple Graphs: MATLAB Approach

BREEZE 3D Analyst for the Advanced AERMOD Modeler

Presentation on binary search, quick sort, merge sort and problems

Maps&hash tables

Size measurement and estimation

rit seminars-privacy assured outsourcing of image reconstruction services in ...

control system

Energy Efficient Compression of Shock Data using Compressed Sensing

Inside Apache SystemML

Algorithm: Quick-Sort

11. Linear Models

Programming Assignment Help

Semelhante a 04 Algorithms

Principal Components Analysis, Calculation and VisualizationMarjan Sterjev

OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...Pioneer Natural Resources

Big Data Analytics with Hadoop with @techmilindEMC

My Postdoctoral ResearchPo-Ting Wu

Parallel R in snow (english after 2nd slide)Cdiscount

Secure information aggregation in sensor networksAleksandr Yampolskiy

01 - DAA - PPT.pptxKokilaK25

Data structures notes for college students btech.pptxKarthikVijay59

Paper computerbikram ...

Idea for ineractive programming languageLincoln Hannah

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)Maho Nakata

CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfRajJain516913

Aggarwal DraftDeanna Kosaraju

Neural Networks - How do they work?Accubits Technologies

SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTERTarun Kumar

Introduction to data structures and complexity.pptxPJS KUMAR

Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)Austin Benson

Slide2Thiti Sununta

dsp.pdfNaol Worku

Semelhante a 04 Algorithms (20)

Principal Components Analysis, Calculation and Visualization

OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...

Big Data Analytics with Hadoop with @techmilind

My Postdoctoral Research

Parallel R in snow (english after 2nd slide)

Secure information aggregation in sensor networks

01 - DAA - PPT.pptx

Data structures notes for college students btech.pptx

Paper computer

Idea for ineractive programming language

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)

CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf

Aggarwal Draft

Neural Networks - How do they work?

SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTER

Introduction to data structures and complexity.pptx

Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)

Slide2

dsp.pdf

04 Algorithms

1. MAP/REDUCE ALGORITHMS HPC4 Seminar IPM December 2011 Omid Djoudi od90125@yahoo.com 2011 IPM - HPC4 1

2. Algorithms SORT map (key, values): for each val in values: emit (values) No reduce needed Values will be automatically sorted by shuffle/sort 2011 IPM - HPC4 2

3. Algorithms INVERTED INDEX File1 : aa bb cc File2: bb cc Result -> (aa,”File1”) (“bb, “File1,File2”) (“cc”, “File1,FIle2”) map (key, values): for each val in values: emit (val, Key) reduce (key, values): string str for each val in values: str +=“val,” emit (key, str)

4. Algorithms INNER JOIN

5. Algorithms INNER JOIN Map () if (type == PK) emit (a_id, „A‟), a_data) else emit (a_id, „B‟), b_data) -> Secondary sort – intermdiate values ordered by key,keyType (PK or FK) => Primary Key will always be before Foreing key Reduce() string a_data_val if (key.keyType == „A‟) a_data_val = value.data if (key.keyType==„B‟) emit (key.a_id,a_data_val,value);

6. Algorithms Standard Deviation. Weather datasets - classify abnormal climatic conditions. StdDev one of the measures of dispersion describing the spread of data Standard Deviations Abnormality Probability of Away From Mean Occurance beyond -3 sd extremely subnormal 0.15% -3 to -2 sd greatly subnormal 2.35% -2 to -1 sd subnormal 13.50% -1 to +1 sd normal 68.00% +1 to +2 sd above normal 13.50% +2 to +3 sd greatly above normal 2.35% beyond +3 sd extremely above 0.15% normal

7. Algorithms Weather dataset : http://www.ncdc.noaa.gov/ 0200010570999992011010106004...000010021019N0250001N1-01401-01591999999ADDAA112...70002; 0114010570999992011010112004...000010021019N0750001N1-00901-01081999999ADDAY1818...693/; 0114010570999992011012712004...005010300019N0750001N1+00131-00581999999ADDAY1310...3945; Extract Date, Temperature and Quality. The process should: Filter by Quality Calculate Mean for temperature on each date. Calculate standard deviation for temperature on each date.

8. Algorithms Standard deviation Map() {if quality = … Emit(date,temp)} Can we use a combiner? Reduce(date,temp) { All processing is done in reducers ,no n = size(temp) load balancing across nodes. μ = ∑temp/n; σ = √ ∑(temp_i–μ)²/n Bottleneck if many sampling per date Emit (date, σ) (temperature array becoming too } big).

9. Algorithms Standard deviation can be expressed differently: Map(){ Reduce(date,[[n,sum,sum2]]) Emit(date,[1,temp,temp²])} { μ = ∑sum/ ∑n; σ = √ ((∑(sum2) / ∑n) - μ²); Combine(date,[[n,sum,sum2]]){ Emit (date, σ) Emit (date, } [∑n,∑sum,∑sum2])} Combiner contain the associative part of calculation. It’s executed on mapper nodes -> Much better load balancing. But is combiner always executed ?

10. Reference http://www.cloudera.com Hadoop – The definitive guide Tom White Data-Intensive Text Processing with MapReduce Jimmy Lin and Chris Dyer Beautiful Data Toby Segaran / Jeff Hammerbacher 2011 IPM - HPC4 10

04 Algorithms

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a 04 Algorithms

Semelhante a 04 Algorithms (20)

04 Algorithms