SlideShare a Scribd company logo
1 of 19
MapReduce@DirectI

     amkiray: ramki.g@directi.com
    uvdhray: dhruv.m@directi.com
Lets start with an example…

 access.log
 timestamp,url,response_code,response_time


 products.dat
 date, product_id, price
Requirement:

    Number of requests in the last 30 days.




    $> ls –rt *.log | tail -30 | xargs “wc –l”
Requirement:

    Busiest 30 minutes in last 30 days.





    $> ls –rt   *.log| tail -30 | xargs “./count_30min.sh“
Requirement:

    Number of failed buy requests for products worth

    more than $30 in the last 30 days .

    Import data to an RDBMS;

    SELECT COUNT(*) FROM logs, products
    WHERE GET_REQUEST_TYPE(logs.url)=„BUY‟
    AND
    GET_PRODUCT_ID(logs.url)=products.product_id
    AND product.price>30
    AND DATE(log.timestamp) = products.date;
Now gimme the number of
                  
                      failed buy requests for products
                      worth more than $30 in the
It might take a
                      last 1 Year.
     while!
2 days later….




               On its way!
             Inserting data
            into database…
5 days later…

                 $> mysqladmin processlist
           +-----------------------------------+
           |   Query   |   Copy to Temp Table |
           +-----------------------------------+

                       May be its Joining!!
                     Or may be its dead…
         Or may be my replacement will see the result .. 
Hadoop    god@internal.directi.com
Go use
         bloody!
But Why?!
    Distributed data processing cluster

    A distributed file system

    Data location sensitive Task scheduler

    MapReduce paradigm

    Handles Parallelizable and distributable tasks

    Failover capability

    Web based monitoring capabilities

    More value for your time!

Where to start??

    MapReduce:

    Q: Sum of all squares of a=[1,2,3,4,3,2,7]

    Simple….




    fold( map(a, square()),                          sum())



      You can do that in any functional programming language….
Now do it for an array
   of 100 million
  elements…….
This is where Hadoop comes in..
    Distributed File System : HDFS

    Distributed Computation/Task Processing: Hadoop

    Name Node + Data Node

    Task Tracker + Job Tracker

Task Tracker
How MapReduce Works…
An example
     Word Count


hadoop jar contrib/streaming/hadoop-0.*-streaming.jar
      -jobconf mapred.data.field.separator=quot;,quot;
      -input 'wc.eg.in'
      -output 'wc.eg.out'
      -mapper 'wc -w'
      -reducer quot;awk „{ sum+=$1 } END{ print sum}‟quot;




    Task Tracker: http://cae5.internal.directi.com:50030/jobtracker.jsp
    Name Node: http://cae2.internal.directi.com:50070/dfshealth.jsp
Pig and Pig Latin
    A procedural language for MapReduce operations.


logs = LOAD 'access.logs' USING PigStorage(',') AS
         (ts:int, URL:chararray, resp:chararray, resp_time:int);

products = LOAD 'products.dat' USING PigStorage(',') AS
         (date:int, pid:int, price:int);

l1 = FOREACH logs GENERATE GetDate(ts) as req_date,
         GetProductID(URL) as prod_id,
         GetRequestType(URL) as rtype, resp, resp_time;

j1 = JOIN l1 BY prod_id, products by pid;

j2 = FILTER j1 BY req_date==date AND price>30.0F;

j3 = GROUP j2 ALL;

j4 = FOREACH j3 GENERATE COUNT(j2);

DUMP j4
Q&A

More Related Content

What's hot

Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsUnified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsAltinity Ltd
 
Raw system logs processing with hive
Raw system logs processing with hiveRaw system logs processing with hive
Raw system logs processing with hiveArpit Patil
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionSimon Su
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Google App Engine BeCamp 2008
Google App Engine BeCamp 2008Google App Engine BeCamp 2008
Google App Engine BeCamp 2008atonse
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce APITom Croucher
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 

What's hot (20)

Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsUnified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco Systems
 
Raw system logs processing with hive
Raw system logs processing with hiveRaw system logs processing with hive
Raw system logs processing with hive
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow Introduction
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Google App Engine BeCamp 2008
Google App Engine BeCamp 2008Google App Engine BeCamp 2008
Google App Engine BeCamp 2008
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Prashant de-ny-project-s1
Prashant de-ny-project-s1Prashant de-ny-project-s1
Prashant de-ny-project-s1
 

Similar to MapReduce@DirectI

Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsMaxim Grinev
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Ilya Haykinson
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Skills Matter
 
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010Skills Matter
 
Dynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web siteDynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web siteSriram Natarajan
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Skills Matter
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From UberChester Chen
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebEverything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebJames Rakich
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packsBram Vogelaar
 

Similar to MapReduce@DirectI (20)

03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010
 
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
 
Dynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web siteDynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web site
 
Gur1009
Gur1009Gur1009
Gur1009
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
Handout3o
Handout3oHandout3o
Handout3o
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebEverything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the Web
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packs
 

More from Directi Group

Hr coverage directi 2012
Hr coverage directi 2012Hr coverage directi 2012
Hr coverage directi 2012Directi Group
 
MDI - Mandevian Knights
MDI - Mandevian KnightsMDI - Mandevian Knights
MDI - Mandevian KnightsDirecti Group
 
FMS - Riders on the Storm
FMS - Riders on the StormFMS - Riders on the Storm
FMS - Riders on the StormDirecti Group
 
ISB - Beirut Film Fiesta
ISB - Beirut Film FiestaISB - Beirut Film Fiesta
ISB - Beirut Film FiestaDirecti Group
 
Great Lakes - Synergy
Great Lakes - SynergyGreat Lakes - Synergy
Great Lakes - SynergyDirecti Group
 
Great Lakes - Fabulous Four
Great Lakes - Fabulous FourGreat Lakes - Fabulous Four
Great Lakes - Fabulous FourDirecti Group
 
IIM C - Baker Street
IIM C - Baker StreetIIM C - Baker Street
IIM C - Baker StreetDirecti Group
 
Directi Case Study Contest - Team idate from MDI Gurgaon
Directi Case Study Contest -  Team idate from MDI GurgaonDirecti Case Study Contest -  Team idate from MDI Gurgaon
Directi Case Study Contest - Team idate from MDI GurgaonDirecti Group
 
Directi Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB HyderabadDirecti Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB HyderabadDirecti Group
 
Directi Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB HyderabadDirecti Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB HyderabadDirecti Group
 
Directi Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM CDirecti Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM CDirecti Group
 
Directi Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFTDirecti Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFTDirecti Group
 
Directi Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM ADirecti Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM ADirecti Group
 
Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012Directi Group
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti Group
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti Group
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti Group
 

More from Directi Group (20)

Hr coverage directi 2012
Hr coverage directi 2012Hr coverage directi 2012
Hr coverage directi 2012
 
IIM L - ConArtists
IIM L - ConArtistsIIM L - ConArtists
IIM L - ConArtists
 
MDI - Mandevian Knights
MDI - Mandevian KnightsMDI - Mandevian Knights
MDI - Mandevian Knights
 
ISB - Pikturewale
ISB - PikturewaleISB - Pikturewale
ISB - Pikturewale
 
FMS - Riders on the Storm
FMS - Riders on the StormFMS - Riders on the Storm
FMS - Riders on the Storm
 
IIM L - Inferno
IIM L - InfernoIIM L - Inferno
IIM L - Inferno
 
ISB - Beirut Film Fiesta
ISB - Beirut Film FiestaISB - Beirut Film Fiesta
ISB - Beirut Film Fiesta
 
Great Lakes - Synergy
Great Lakes - SynergyGreat Lakes - Synergy
Great Lakes - Synergy
 
Great Lakes - Fabulous Four
Great Lakes - Fabulous FourGreat Lakes - Fabulous Four
Great Lakes - Fabulous Four
 
IIM C - Baker Street
IIM C - Baker StreetIIM C - Baker Street
IIM C - Baker Street
 
Directi Case Study Contest - Team idate from MDI Gurgaon
Directi Case Study Contest -  Team idate from MDI GurgaonDirecti Case Study Contest -  Team idate from MDI Gurgaon
Directi Case Study Contest - Team idate from MDI Gurgaon
 
Directi Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB HyderabadDirecti Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB Hyderabad
 
Directi Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB HyderabadDirecti Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB Hyderabad
 
Directi Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM CDirecti Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM C
 
Directi Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFTDirecti Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFT
 
Directi Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM ADirecti Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM A
 
Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

MapReduce@DirectI

  • 1. MapReduce@DirectI amkiray: ramki.g@directi.com uvdhray: dhruv.m@directi.com
  • 2. Lets start with an example… access.log timestamp,url,response_code,response_time products.dat date, product_id, price
  • 3. Requirement:  Number of requests in the last 30 days.  $> ls –rt *.log | tail -30 | xargs “wc –l”
  • 4. Requirement:  Busiest 30 minutes in last 30 days.  $> ls –rt *.log| tail -30 | xargs “./count_30min.sh“
  • 5. Requirement:  Number of failed buy requests for products worth  more than $30 in the last 30 days . Import data to an RDBMS; SELECT COUNT(*) FROM logs, products WHERE GET_REQUEST_TYPE(logs.url)=„BUY‟ AND GET_PRODUCT_ID(logs.url)=products.product_id AND product.price>30 AND DATE(log.timestamp) = products.date;
  • 6. Now gimme the number of  failed buy requests for products worth more than $30 in the It might take a last 1 Year. while!
  • 7. 2 days later…. On its way! Inserting data into database…
  • 8. 5 days later… $> mysqladmin processlist +-----------------------------------+ | Query | Copy to Temp Table | +-----------------------------------+ May be its Joining!! Or may be its dead… Or may be my replacement will see the result .. 
  • 9. Hadoop god@internal.directi.com Go use bloody!
  • 10. But Why?! Distributed data processing cluster  A distributed file system  Data location sensitive Task scheduler  MapReduce paradigm  Handles Parallelizable and distributable tasks  Failover capability  Web based monitoring capabilities  More value for your time! 
  • 11. Where to start?? MapReduce:  Q: Sum of all squares of a=[1,2,3,4,3,2,7]  Simple….  fold( map(a, square()), sum()) You can do that in any functional programming language….
  • 12. Now do it for an array of 100 million elements…….
  • 13. This is where Hadoop comes in.. Distributed File System : HDFS  Distributed Computation/Task Processing: Hadoop  Name Node + Data Node  Task Tracker + Job Tracker 
  • 14.
  • 17. An example Word Count  hadoop jar contrib/streaming/hadoop-0.*-streaming.jar -jobconf mapred.data.field.separator=quot;,quot; -input 'wc.eg.in' -output 'wc.eg.out' -mapper 'wc -w' -reducer quot;awk „{ sum+=$1 } END{ print sum}‟quot; Task Tracker: http://cae5.internal.directi.com:50030/jobtracker.jsp Name Node: http://cae2.internal.directi.com:50070/dfshealth.jsp
  • 18. Pig and Pig Latin A procedural language for MapReduce operations.  logs = LOAD 'access.logs' USING PigStorage(',') AS (ts:int, URL:chararray, resp:chararray, resp_time:int); products = LOAD 'products.dat' USING PigStorage(',') AS (date:int, pid:int, price:int); l1 = FOREACH logs GENERATE GetDate(ts) as req_date, GetProductID(URL) as prod_id, GetRequestType(URL) as rtype, resp, resp_time; j1 = JOIN l1 BY prod_id, products by pid; j2 = FILTER j1 BY req_date==date AND price>30.0F; j3 = GROUP j2 ALL; j4 = FOREACH j3 GENERATE COUNT(j2); DUMP j4
  • 19. Q&A