SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
MAPREDUCE PARA O
MÉTODO DE REGRESSÃO
    POR MÍNIMOS
 QUADRADOS PARCIAIS
       (MRPLS)

     MACHINE LEARNING I
       LEANDRO ALVIM
      PROF. RUY MILIDIÚ

                          1
MOTIVAÇÃO



CONSTRUIR MODELOS
MAIS ROBUSTOS

UTILIZAÇÃO DO PLS                           PLS                           PLS
                    TEMPO ( PLS/                  TEMPO ( PLS/
                      TOTAL )                       TOTAL )

PROBLEMA              100                           100
                       75                            75
  DESEMPENHO           50                            50
                       25                            25
                        0                             0
                             1     10 20 30               27k 54k 108k 216k
                                   N. FATORES                    N. EXEMPLOS



                                                                                2
MOTIVAÇÃO


PROBLEMA
                    T   X   Y
 PLS - DUAS FASES

   TREINO               Q   B
   (CUSTOSO)

   TESTE




                                3
OBJETIVO


MODELO PLS

 VOLUME ELEVADO DE DADOS

 FASE DE TREINAMENTO

   ALGORITMOS: PLS1 (USA NIPALS), PLS2

   PARADIGMA MAPREDUCE




                                         4
OBJETIVO


INVESTIGAR                  PLS   MRPLS


    DESEMPENHO

    EFICIÊNCIA

    VOLUME DE DADOS

    MODELO




                                          5
MAPREDUCE

DESENVOLVIDO PELA GOOGLE

PARADIGMA DE PROGRAMAÇÃO (CLOUD COMPUTING)

  OBJETIVO

    SIMPLIFICAR A PROGRAMAÇÃO - GRANDES
    VOLUMES DE DADOS

      MASCARAR O PARADIGMA MESTRE/ESCRAVO




                                             6
MAPREDUCE

PROBLEMA

 CONTAGEM DE PALAVRAS

   ENTRADA = [BANANA,MELÃO,MAÇÃ,MELÃO,MAÇÃ]


   SAÍDA DESEJADA = {BANANA: 1, MELÃO: 2, MAÇÃ: 2}




                                                     7
MAPREDUCE

MAP                    REDUCE

 (BANANA,1);(MELÃO,     (BANANA,[1]);
 1);(MAÇÃ,1);(MELÃO,    (MELÃO,[1,1]);
 1)                     [(MAÇÃ,[1])]

                        SOMAR VALORES
                        POR CHAVE




                                         8
MAPREDUCE




            9
MAPREDUCE




            10
HADOOP
DESENVOLVIDO PELA APACHE

  INSPIRADO NO GFS/MAPREDUCE

PLATAFORMA

    OBJETIVOS

      EXECUTAR APLICAÇÕES PARA GRANDES
      VOLUMES DE DADOS

      MÁQUINAS DE CUSTO BAIXO

      EFICIENTE (PARALELISMO LOCAL)

      CONFIÁVEL (HDFS)

                                         11
HADOOP




         12
DATASET

TOY-DATASET (MEAT)

  APROX. 200 EXEMPLOS, 100 CARACTS. E 3 VAR.
  DEPENDENTES

TOY-DATASET

  REPLICAR CONJUNTO DE EXEMPLOS

  1M EXEMPLOS X 100 CARACT. E 3 VAR. DEPENDENTES




                                                   13
METODOLOGIA


ELABORAR A VERSÃO MAPREDUCE DO PLS

ANALISAR A CORRETUDE DOS ALGORITMOS

PREPARAR O DATASET

SIMULAÇÃO

 AMBIENTE PSEUDO-DISTRIBUIDO




                                      14
METODOLOGIA


ESCOLHER/PREPARAR AMBIENTE REAL(CLUSTER)

ANALISAR O TEMPO DE PROCESSAMENTO - MÉTRICAS

  SPEEDUP (SP = TS/TP)

    LINEAR? (SP=P)

  EFICIENCY (EP = SP/P)

RELATÓRIO




                                               15
FERRAMENTAS/EXPERIMENTOS

HADOOP (HDFS)

  HADOOP STREAMING

FRAMEWORK LEARNTRADE

CLUSTER DA TECGRAF




                           16
CRONOGRAMA

 ELABORAR A VERSAO MAPREDUCE DO
                                    ok    07/09/08 - 20/09/08
PLS

    ANALISAR A CORRETUDE DOS
                                    ok    20/09/08 - 22/09/08
    ALGORITMOS

    PREPARAR UM DATASET PARA
                                    ok         01/10/08
    TESTE

    SIMULACAO EM AMBIENTE PSEUDO-
                                    ok    01/10/08 - 03/10/08
    DISTRIBUIDO


    ESCOLHER/PREPARAR AMBIENTE
                                    ok    20/09/08 - 07/09/08
    PARA OS TESTES


    ANALISAR O TEMPO DE
                                    nok   08/09/08 - ??/??/08
    PROCESSAMENTO - METRICAS



    ESCREVER UM RELATORIO           nok    ??/??/08-??/??/08




                                                                17
REFERÊNCIAS



MILIDIU, R. L. ; RENTERIA, Raul . DPLS and PPLS: Two PLS Algorithms for Large Data Sets. Computational Statistics and Data Analysis, v. 48, p. 125-138, 2005.



MapReduce: Simplified Data Processing on Large Clusters


Hadoop Distributed File System


Hadoop Map/Reduce




                                                                                                                                                                18

Mais conteúdo relacionado

Destaque

Lenguaje sas2
Lenguaje sas2Lenguaje sas2
Lenguaje sas2
azmeneses
 
SEM (Structural Equational Model)
SEM (Structural Equational Model)SEM (Structural Equational Model)
SEM (Structural Equational Model)
Chinchilla1984
 
Spss Tutorial 1
Spss Tutorial 1Spss Tutorial 1
Spss Tutorial 1
vinod
 

Destaque (20)

Unidad1. investigación en las ciencias sociales
Unidad1. investigación en las ciencias socialesUnidad1. investigación en las ciencias sociales
Unidad1. investigación en las ciencias sociales
 
Informe de Movilidad Social en México
Informe de Movilidad Social en MéxicoInforme de Movilidad Social en México
Informe de Movilidad Social en México
 
Correspondencias
CorrespondenciasCorrespondencias
Correspondencias
 
Lenguaje sas2
Lenguaje sas2Lenguaje sas2
Lenguaje sas2
 
Sedesol29ene2013
Sedesol29ene2013Sedesol29ene2013
Sedesol29ene2013
 
04-02-11 Migracion en Mexico - Dr. Cesar Lenin
04-02-11 Migracion en Mexico - Dr. Cesar Lenin04-02-11 Migracion en Mexico - Dr. Cesar Lenin
04-02-11 Migracion en Mexico - Dr. Cesar Lenin
 
Análisis de Correspondencias
Análisis de CorrespondenciasAnálisis de Correspondencias
Análisis de Correspondencias
 
Taller de Ecuaciones Estructurales
Taller de Ecuaciones Estructurales Taller de Ecuaciones Estructurales
Taller de Ecuaciones Estructurales
 
SEM (Structural Equational Model)
SEM (Structural Equational Model)SEM (Structural Equational Model)
SEM (Structural Equational Model)
 
Spss Tutorial 1
Spss Tutorial 1Spss Tutorial 1
Spss Tutorial 1
 
Manova
ManovaManova
Manova
 
Manova mb
Manova mbManova mb
Manova mb
 
Escalamiento Multidimensional
Escalamiento MultidimensionalEscalamiento Multidimensional
Escalamiento Multidimensional
 
Análisis multivariado de varianza manova
Análisis multivariado de varianza manovaAnálisis multivariado de varianza manova
Análisis multivariado de varianza manova
 
Introduction to sas in spanish
Introduction to sas in spanishIntroduction to sas in spanish
Introduction to sas in spanish
 
Creación de un modelo pls sem con smart pls y análsiis de resultados
Creación de un modelo pls sem con smart pls y análsiis de resultadosCreación de un modelo pls sem con smart pls y análsiis de resultados
Creación de un modelo pls sem con smart pls y análsiis de resultados
 
Manova Report
Manova ReportManova Report
Manova Report
 
Manova
ManovaManova
Manova
 
Evidencias en la rehabilitación del hombro doloroso
Evidencias en la rehabilitación del hombro dolorosoEvidencias en la rehabilitación del hombro doloroso
Evidencias en la rehabilitación del hombro doloroso
 
Ecuaciones Estructurales
Ecuaciones EstructuralesEcuaciones Estructurales
Ecuaciones Estructurales
 

Semelhante a MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS)

Variation aware design of custom integrated circuits a hands on field guide
Variation aware design of custom integrated circuits  a hands on field guideVariation aware design of custom integrated circuits  a hands on field guide
Variation aware design of custom integrated circuits a hands on field guide
Springer
 
Chemical process debottlenecking
Chemical process debottleneckingChemical process debottlenecking
Chemical process debottlenecking
Stephen (Steve) Galante
 
Caret max kuhn
Caret max kuhnCaret max kuhn
Caret max kuhn
kmettler
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognition
Sebastian Hafner
 

Semelhante a MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS) (20)

Variation aware design of custom integrated circuits a hands on field guide
Variation aware design of custom integrated circuits  a hands on field guideVariation aware design of custom integrated circuits  a hands on field guide
Variation aware design of custom integrated circuits a hands on field guide
 
MSc group project presentation
MSc group project presentationMSc group project presentation
MSc group project presentation
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
Chemical process debottlenecking
Chemical process debottleneckingChemical process debottlenecking
Chemical process debottlenecking
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Prediction of soil properties with NIR data and site descriptors using prepro...
Prediction of soil properties with NIR data and site descriptors using prepro...Prediction of soil properties with NIR data and site descriptors using prepro...
Prediction of soil properties with NIR data and site descriptors using prepro...
 
ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07
 
A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...
A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...
A Comparison of Panel Method and RANS Calculations for a Ducted Propeller Sys...
 
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Aghora A High-Order DG Solver for Turbulent Flow Simulations.pdf
Aghora  A High-Order DG Solver for Turbulent Flow Simulations.pdfAghora  A High-Order DG Solver for Turbulent Flow Simulations.pdf
Aghora A High-Order DG Solver for Turbulent Flow Simulations.pdf
 
Caret Package for R
Caret Package for RCaret Package for R
Caret Package for R
 
Caret max kuhn
Caret max kuhnCaret max kuhn
Caret max kuhn
 
The ExoMars Sample Handling and Distribution Subsystem (SPDS)
The ExoMars Sample Handling and Distribution Subsystem (SPDS)The ExoMars Sample Handling and Distribution Subsystem (SPDS)
The ExoMars Sample Handling and Distribution Subsystem (SPDS)
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognition
 

Mais de elliando dias

Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
elliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
elliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 

Mais de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Último

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS)

  • 1. MAPREDUCE PARA O MÉTODO DE REGRESSÃO POR MÍNIMOS QUADRADOS PARCIAIS (MRPLS) MACHINE LEARNING I LEANDRO ALVIM PROF. RUY MILIDIÚ 1
  • 2. MOTIVAÇÃO CONSTRUIR MODELOS MAIS ROBUSTOS UTILIZAÇÃO DO PLS PLS PLS TEMPO ( PLS/ TEMPO ( PLS/ TOTAL ) TOTAL ) PROBLEMA 100 100 75 75 DESEMPENHO 50 50 25 25 0 0 1 10 20 30 27k 54k 108k 216k N. FATORES N. EXEMPLOS 2
  • 3. MOTIVAÇÃO PROBLEMA T X Y PLS - DUAS FASES TREINO Q B (CUSTOSO) TESTE 3
  • 4. OBJETIVO MODELO PLS VOLUME ELEVADO DE DADOS FASE DE TREINAMENTO ALGORITMOS: PLS1 (USA NIPALS), PLS2 PARADIGMA MAPREDUCE 4
  • 5. OBJETIVO INVESTIGAR PLS MRPLS DESEMPENHO EFICIÊNCIA VOLUME DE DADOS MODELO 5
  • 6. MAPREDUCE DESENVOLVIDO PELA GOOGLE PARADIGMA DE PROGRAMAÇÃO (CLOUD COMPUTING) OBJETIVO SIMPLIFICAR A PROGRAMAÇÃO - GRANDES VOLUMES DE DADOS MASCARAR O PARADIGMA MESTRE/ESCRAVO 6
  • 7. MAPREDUCE PROBLEMA CONTAGEM DE PALAVRAS ENTRADA = [BANANA,MELÃO,MAÇÃ,MELÃO,MAÇÃ] SAÍDA DESEJADA = {BANANA: 1, MELÃO: 2, MAÇÃ: 2} 7
  • 8. MAPREDUCE MAP REDUCE (BANANA,1);(MELÃO, (BANANA,[1]); 1);(MAÇÃ,1);(MELÃO, (MELÃO,[1,1]); 1) [(MAÇÃ,[1])] SOMAR VALORES POR CHAVE 8
  • 10. MAPREDUCE 10
  • 11. HADOOP DESENVOLVIDO PELA APACHE INSPIRADO NO GFS/MAPREDUCE PLATAFORMA OBJETIVOS EXECUTAR APLICAÇÕES PARA GRANDES VOLUMES DE DADOS MÁQUINAS DE CUSTO BAIXO EFICIENTE (PARALELISMO LOCAL) CONFIÁVEL (HDFS) 11
  • 12. HADOOP 12
  • 13. DATASET TOY-DATASET (MEAT) APROX. 200 EXEMPLOS, 100 CARACTS. E 3 VAR. DEPENDENTES TOY-DATASET REPLICAR CONJUNTO DE EXEMPLOS 1M EXEMPLOS X 100 CARACT. E 3 VAR. DEPENDENTES 13
  • 14. METODOLOGIA ELABORAR A VERSÃO MAPREDUCE DO PLS ANALISAR A CORRETUDE DOS ALGORITMOS PREPARAR O DATASET SIMULAÇÃO AMBIENTE PSEUDO-DISTRIBUIDO 14
  • 15. METODOLOGIA ESCOLHER/PREPARAR AMBIENTE REAL(CLUSTER) ANALISAR O TEMPO DE PROCESSAMENTO - MÉTRICAS SPEEDUP (SP = TS/TP) LINEAR? (SP=P) EFICIENCY (EP = SP/P) RELATÓRIO 15
  • 16. FERRAMENTAS/EXPERIMENTOS HADOOP (HDFS) HADOOP STREAMING FRAMEWORK LEARNTRADE CLUSTER DA TECGRAF 16
  • 17. CRONOGRAMA ELABORAR A VERSAO MAPREDUCE DO ok 07/09/08 - 20/09/08 PLS ANALISAR A CORRETUDE DOS ok 20/09/08 - 22/09/08 ALGORITMOS PREPARAR UM DATASET PARA ok 01/10/08 TESTE SIMULACAO EM AMBIENTE PSEUDO- ok 01/10/08 - 03/10/08 DISTRIBUIDO ESCOLHER/PREPARAR AMBIENTE ok 20/09/08 - 07/09/08 PARA OS TESTES ANALISAR O TEMPO DE nok 08/09/08 - ??/??/08 PROCESSAMENTO - METRICAS ESCREVER UM RELATORIO nok ??/??/08-??/??/08 17
  • 18. REFERÊNCIAS MILIDIU, R. L. ; RENTERIA, Raul . DPLS and PPLS: Two PLS Algorithms for Large Data Sets. Computational Statistics and Data Analysis, v. 48, p. 125-138, 2005. MapReduce: Simplified Data Processing on Large Clusters Hadoop Distributed File System Hadoop Map/Reduce 18