SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
Data Computing Division
Hadoop Hands On
Session
Milind Bhandarkar
Greenplum,A Division of EMC
Monday, February 18, 13
Data Computing Division
Prerequisites
•Make sure you haveVMWare player installed
•VMWare Fusion for Mac OS X
•Copy the GPHD (Greenplum Distribution of
Hadoop v 1.0) virtual machine to your
laptop
•Also copy exercise.zip file to your laptop,
and decompress
Monday, February 18, 13
Data Computing Division
Setting Up
•Start GPHDVirtual Machine
•Make sure you can login to it
•Copy exercise.zip from your laptop to the
VM, and unzip in ~/exercise
Monday, February 18, 13
Data Computing Division
Preparation
•Make sure HDFS is running
•Make sure MapReduce is running
•Check configuration files *-site.xml
Monday, February 18, 13
Data Computing Division
Hands-On
•Objective: Implement Linear Regression using
MapReduce, and use it to train a model
•Data Set: from Marine Resources Division,
Department of Primary Industries and
Fisheries,Tasmania
•4177 samples from observations
Monday, February 18, 13
Data Computing Division
Data
•Attributes about a type of fish
•M/F, Length, Diameter, Height,Weight,
Rings on shell
•Problem:To predict number of rings as a
function of other attributes
Monday, February 18, 13
Data Computing Division
Step 1
•Copy the small sample data set to HDFS
•See: Scripts/cp_to_grid.sh
Monday, February 18, 13
Data Computing Division
Step 2
•Blow up the dataset 1000 times by adding
gaussian noise to most fields
•Output: 4M sample observations
•Using Hadoop Streaming
•See: Scripts/stream_replicate.sh
•Monitor this job in JobTracker UI
Monday, February 18, 13
Data Computing Division
Step 3
•Train model based on Linear Regression
•See: Scripts/stream_train_linreg.sh
•Monitor the Job
•Copy the model to a local directory
•Check it
Monday, February 18, 13

Mais conteúdo relacionado

Destaque

Ինչպիսին պետք է լինի
Ինչպիսին պետք է լինիԻնչպիսին պետք է լինի
Ինչպիսին պետք է լինիtatevabrahamyan
 
Insaat kursu-bakirkoy
Insaat kursu-bakirkoyInsaat kursu-bakirkoy
Insaat kursu-bakirkoysersld54
 
Changing the Security Monitoring Status Quo
Changing the Security Monitoring Status QuoChanging the Security Monitoring Status Quo
Changing the Security Monitoring Status QuoEMC
 
Protectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i GemmaProtectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i Gemmamgonellgomez
 
Ablation material book
Ablation material   bookAblation material   book
Ablation material bookRahman Hakim
 
Hvad koster stress?
Hvad koster stress?Hvad koster stress?
Hvad koster stress?roddik
 
Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...Microsoft TechNet - Belgium and Luxembourg
 
Advance DNA sequencing
Advance DNA sequencing Advance DNA sequencing
Advance DNA sequencing Asheesh Pandey
 

Destaque (12)

Ինչպիսին պետք է լինի
Ինչպիսին պետք է լինիԻնչպիսին պետք է լինի
Ինչպիսին պետք է լինի
 
Insaat kursu-bakirkoy
Insaat kursu-bakirkoyInsaat kursu-bakirkoy
Insaat kursu-bakirkoy
 
Changing the Security Monitoring Status Quo
Changing the Security Monitoring Status QuoChanging the Security Monitoring Status Quo
Changing the Security Monitoring Status Quo
 
Protectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i GemmaProtectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i Gemma
 
Yourprezi
YourpreziYourprezi
Yourprezi
 
Forex graphs
Forex graphsForex graphs
Forex graphs
 
Topic 9 final accounts
Topic 9 final accountsTopic 9 final accounts
Topic 9 final accounts
 
Ablation material book
Ablation material   bookAblation material   book
Ablation material book
 
Hvad koster stress?
Hvad koster stress?Hvad koster stress?
Hvad koster stress?
 
Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...
 
3349
33493349
3349
 
Advance DNA sequencing
Advance DNA sequencing Advance DNA sequencing
Advance DNA sequencing
 

Semelhante a Hadoop Hands On Session for Linear Regression Model Training

Managing forestry operations
Managing forestry operationsManaging forestry operations
Managing forestry operationsSimon Mercier
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop InstallMike Frampton
 
Back to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding FrontiersBack to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding FrontiersSafe Software
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...Athens Big Data
 
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...Veritas Technologies LLC
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc
 
Help your Enterprise Implement Big Data with Control-M for Hadoop
 Help your Enterprise Implement Big Data with Control-M for Hadoop Help your Enterprise Implement Big Data with Control-M for Hadoop
Help your Enterprise Implement Big Data with Control-M for HadoopBMC Software
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...Hirofumi Hayashi
 
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devicesIct 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devicesYonel Cadapan
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster IJECEIAES
 
TechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a yearTechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a yearTrivadis
 
Infrastructure Management in GCP
Infrastructure Management in GCPInfrastructure Management in GCP
Infrastructure Management in GCPDana Hoffman
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Backup and Disaster Recovery Product
Backup and Disaster Recovery ProductBackup and Disaster Recovery Product
Backup and Disaster Recovery ProductPrabhas Gupte
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environmentsinovex GmbH
 
Best Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the CloudBest Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the CloudEDB
 

Semelhante a Hadoop Hands On Session for Linear Regression Model Training (20)

Managing forestry operations
Managing forestry operationsManaging forestry operations
Managing forestry operations
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
Back to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding FrontiersBack to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding Frontiers
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
 
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Instalação geo ip
Instalação geo ipInstalação geo ip
Instalação geo ip
 
Help your Enterprise Implement Big Data with Control-M for Hadoop
 Help your Enterprise Implement Big Data with Control-M for Hadoop Help your Enterprise Implement Big Data with Control-M for Hadoop
Help your Enterprise Implement Big Data with Control-M for Hadoop
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
 
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devicesIct 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
 
TechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a yearTechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a year
 
Infrastructure Management in GCP
Infrastructure Management in GCPInfrastructure Management in GCP
Infrastructure Management in GCP
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Backup and Disaster Recovery Product
Backup and Disaster Recovery ProductBackup and Disaster Recovery Product
Backup and Disaster Recovery Product
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environments
 
Best Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the CloudBest Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the Cloud
 

Mais de EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Mais de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Último

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Hadoop Hands On Session for Linear Regression Model Training

  • 1. Data Computing Division Hadoop Hands On Session Milind Bhandarkar Greenplum,A Division of EMC Monday, February 18, 13
  • 2. Data Computing Division Prerequisites •Make sure you haveVMWare player installed •VMWare Fusion for Mac OS X •Copy the GPHD (Greenplum Distribution of Hadoop v 1.0) virtual machine to your laptop •Also copy exercise.zip file to your laptop, and decompress Monday, February 18, 13
  • 3. Data Computing Division Setting Up •Start GPHDVirtual Machine •Make sure you can login to it •Copy exercise.zip from your laptop to the VM, and unzip in ~/exercise Monday, February 18, 13
  • 4. Data Computing Division Preparation •Make sure HDFS is running •Make sure MapReduce is running •Check configuration files *-site.xml Monday, February 18, 13
  • 5. Data Computing Division Hands-On •Objective: Implement Linear Regression using MapReduce, and use it to train a model •Data Set: from Marine Resources Division, Department of Primary Industries and Fisheries,Tasmania •4177 samples from observations Monday, February 18, 13
  • 6. Data Computing Division Data •Attributes about a type of fish •M/F, Length, Diameter, Height,Weight, Rings on shell •Problem:To predict number of rings as a function of other attributes Monday, February 18, 13
  • 7. Data Computing Division Step 1 •Copy the small sample data set to HDFS •See: Scripts/cp_to_grid.sh Monday, February 18, 13
  • 8. Data Computing Division Step 2 •Blow up the dataset 1000 times by adding gaussian noise to most fields •Output: 4M sample observations •Using Hadoop Streaming •See: Scripts/stream_replicate.sh •Monitor this job in JobTracker UI Monday, February 18, 13
  • 9. Data Computing Division Step 3 •Train model based on Linear Regression •See: Scripts/stream_train_linreg.sh •Monitor the Job •Copy the model to a local directory •Check it Monday, February 18, 13