SlideShare uma empresa Scribd logo
1 de 17
Hadoop gets Groovy
Steve Loughran– Hortonworks
stevel at hortonworks.com
@steveloughran

Berlin, June 2012




© Hortonworks Inc. 2012
Where are you in this diagram?
                                    Hadoop Skills
                                                      Doug,Owen
                                                      Arun, Jakob
   Groovy Skills



                                         @steveloughran




                     James Strachan
                     Guillamue Laforge
                                                               Page 2
          © Hortonworks Inc. 2012
Grumpy : Groovy Hadoop Library

• Something lightweight for testing

• Wanted to play in the M/R layer

• Already using Groovy

• Liked: JVM integration, tooling, libraries, IntelliJ
 IDEA, Books…



        git@github.com:steveloughran/grumpy.git


                                                         Page 3
      © Hortonworks Inc. 2012
What is Groovy?
A dynamic language within the JVM

• Java++
   –Maps, lists, tuples, Closures

• Flavours of Ruby and Python
    –'Duck' typing, Grails, (Scripting)



A way to do things in the JVM that Sun didn't imagine



                                                    Page 4
      © Hortonworks Inc. 2012
Can use & subclass java classes:
class LineCountMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {

static final def emitKey = new Text("lines")
static final def one = new IntWritable(1)

void map(LongWritable key,
           Text value,
           Mapper.Context context) {
    context.write(emitKey, one)
  }
}




                                                       Page 5
      © Hortonworks Inc. 2012
Closures & lists

class CountReducer2 extends Reducer {

    def reduce(Text k,
               Iterable values,
               Reducer.Context ctx) {

        def sum = values.collect() {it.get() }.sum()

        ctx.write(k, new IntWritable(sum));
    }

}




                                                       Page 6
          © Hortonworks Inc. 2012
Closures & lists

values.collect() {
    it.get()
  }.sum()

List<values> -> List<int> -> int




                                   Page 7
    © Hortonworks Inc. 2012
Result: MR jobs in Groovy
In:
gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurball
gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptop
gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Panky
gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23,
gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vas
gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurball
gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptop
gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Panky
gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vas
gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43,
gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,Smiley



Out: 163,198,223 device sightings!




                                                                       Page 8
       © Hortonworks Inc. 2012
why no Pig? Sliding Window Debounce
void map(LongWritable key, BlueEvent event,
           Mapper.Context context) {

    BlueEvent ev2 = window.insert(event)
    List<BlueEvent> expired = window.purgeExpired(event)
    expired.each { evt ->
        emit(context, evt)
    }
}

void cleanup(Mapper.Context context) {
  window.each { evt ->
    emit(context, evt)
  }
}

                                                           Page 9
        © Hortonworks Inc. 2012
Device sightings by day for 2007

1600000




1400000




1200000




1000000




800000




600000




400000




200000




      0
          1   6   11   16   21   26   31   36   41   46   51   56   61   66   71   76   81   86   91   96   101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361 366




                                                                                                                                                                                                                                                                                                      Page 10
                                                      © Hortonworks Inc. 2012
Improving Hadoop APIs
Configuration.metaClass.setAt = { key, val ->
 set(key.toString(), val.toString())
}

Configuration.metaClass.getAt = { key ->
  get(key)
}

Configuration.metaClass.add = {map ->
  map.each {elt ->
    set((elt.key).toString(),
        (elt.value).toString() )
}


                                                Page 11
     © Hortonworks Inc. 2012
& Configuration gets better

conf['mapscript'] = new File(src).text

String scriptText = conf['mapscript']

conf.add([
  window:60000,
  'redscript':reduceScript
  ])



Extending to Job class trickier –subclassing better


                                                      Page 12
     © Hortonworks Inc. 2012
New today! script driven MR jobs!
protected void setup(Mapper.Context ctx) {
   this.ctx = ctx
   this.conf = ctx.configuration
   ScriptCompiler comp = new ScriptCompiler(conf)
   String scriptText = conf['mapscript']
   map = comp.parse(scriptText, this, ctx)
 }

 protected void map(Writable key, Writable value,
    Mapper.Context ctx) {
   map.setProperty('key',key)
   map.setProperty('value',value)
   map.run()
 }



                                                    Page 13
     © Hortonworks Inc. 2012
Things to consider
• Performance: Groovy 2 on Java7
• 'False friends' -Types, if(), exceptions

• If you can use Pig, use it.
• Use Groovy for testing, extending Hadoop
  classes (output formatter, etc)
• Play with YARN and Giraph with it




                                             Page 14
     © Hortonworks Inc. 2012
Questions?

hortonworks.com




                             Page 15
   © Hortonworks Inc. 2012
hortonworks.com




                             Page 16
   © Hortonworks Inc. 2012
Performance?
• Groovy 1 over-introspects
• HLL hides a lot of overhead



• If your work is I/O bound, less important
• Speed of development vs execution
• Need to benchmark on Java 7




                                              Page 17
     © Hortonworks Inc. 2012

Mais conteúdo relacionado

Mais procurados

Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOAltinity Ltd
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Nagi Teramo
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009prasadc
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7Rahul Gupta
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesDataWorks Summit/Hadoop Summit
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverDataWorks Summit
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010Ben Scofield
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Jéferson Machado
 

Mais procurados (20)

Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Trading volume mapping R in recent environment
Trading volume mapping R in recent environment Trading volume mapping R in recent environment
Trading volume mapping R in recent environment
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Indexed Hive
Indexed HiveIndexed Hive
Indexed Hive
 
Hive Percona 2009
Hive Percona 2009Hive Percona 2009
Hive Percona 2009
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game Forever
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Apache Spark Internals - Part 2
Apache Spark Internals - Part 2Apache Spark Internals - Part 2
Apache Spark Internals - Part 2
 
Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure Data
 

Destaque

Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)guestbbb20c4
 
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2thePlatform
 
Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Wuzna Haroon
 
Energy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEnergy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEugenio Bacile di Castiglione
 
Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Patrick Nicholson
 
Secure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSecure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSafeNet
 
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities SAP Portal
 

Destaque (16)

Teamcenter – sap integration gateway
Teamcenter – sap integration gatewayTeamcenter – sap integration gateway
Teamcenter – sap integration gateway
 
Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)
 
Advanced Work Packaging in Construction: An Introduction
Advanced Work Packaging in Construction: An IntroductionAdvanced Work Packaging in Construction: An Introduction
Advanced Work Packaging in Construction: An Introduction
 
S&OP Process
S&OP ProcessS&OP Process
S&OP Process
 
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
mpx Replay, Expedite Your Catch-Up and C3 Workflow 2 of 2
 
Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution
 
Energy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEnergy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energetica
 
Nt1310 project
Nt1310 projectNt1310 project
Nt1310 project
 
Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016
 
Information från Läkemedelsverket #5 2013
Information från Läkemedelsverket #5 2013Information från Läkemedelsverket #5 2013
Information från Läkemedelsverket #5 2013
 
"15 Business Story Ideas to Jump on Now"
"15 Business Story Ideas to Jump on Now""15 Business Story Ideas to Jump on Now"
"15 Business Story Ideas to Jump on Now"
 
Credit cards
Credit cardsCredit cards
Credit cards
 
cathy resume
cathy resumecathy resume
cathy resume
 
Secure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSecure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the Web
 
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
Enterprise workspaces - Extending SAP NetWeaver Portal capabilities
 
Basics of Coding in Pediatrics Medical Billing
Basics of Coding in Pediatrics Medical BillingBasics of Coding in Pediatrics Medical Billing
Basics of Coding in Pediatrics Medical Billing
 

Semelhante a Hadoop gets Groovy

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browsergethue
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
LISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsLISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsTobias Oetiker
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 
Cascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 

Semelhante a Hadoop gets Groovy (20)

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Hadoop Jungle
Hadoop JungleHadoop Jungle
Hadoop Jungle
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
LISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsLISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial Handouts
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
Cascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop World
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Clojure And Swing
Clojure And SwingClojure And Swing
Clojure And Swing
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 

Mais de Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 

Mais de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Hadoop gets Groovy

  • 1. Hadoop gets Groovy Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran Berlin, June 2012 © Hortonworks Inc. 2012
  • 2. Where are you in this diagram? Hadoop Skills Doug,Owen Arun, Jakob Groovy Skills @steveloughran James Strachan Guillamue Laforge Page 2 © Hortonworks Inc. 2012
  • 3. Grumpy : Groovy Hadoop Library • Something lightweight for testing • Wanted to play in the M/R layer • Already using Groovy • Liked: JVM integration, tooling, libraries, IntelliJ IDEA, Books… git@github.com:steveloughran/grumpy.git Page 3 © Hortonworks Inc. 2012
  • 4. What is Groovy? A dynamic language within the JVM • Java++ –Maps, lists, tuples, Closures • Flavours of Ruby and Python –'Duck' typing, Grails, (Scripting) A way to do things in the JVM that Sun didn't imagine Page 4 © Hortonworks Inc. 2012
  • 5. Can use & subclass java classes: class LineCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { static final def emitKey = new Text("lines") static final def one = new IntWritable(1) void map(LongWritable key, Text value, Mapper.Context context) { context.write(emitKey, one) } } Page 5 © Hortonworks Inc. 2012
  • 6. Closures & lists class CountReducer2 extends Reducer { def reduce(Text k, Iterable values, Reducer.Context ctx) { def sum = values.collect() {it.get() }.sum() ctx.write(k, new IntWritable(sum)); } } Page 6 © Hortonworks Inc. 2012
  • 7. Closures & lists values.collect() { it.get() }.sum() List<values> -> List<int> -> int Page 7 © Hortonworks Inc. 2012
  • 8. Result: MR jobs in Groovy In: gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurball gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptop gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Panky gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23, gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vas gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurball gate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptop gate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Panky gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vas gate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43, gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,Smiley Out: 163,198,223 device sightings! Page 8 © Hortonworks Inc. 2012
  • 9. why no Pig? Sliding Window Debounce void map(LongWritable key, BlueEvent event, Mapper.Context context) { BlueEvent ev2 = window.insert(event) List<BlueEvent> expired = window.purgeExpired(event) expired.each { evt -> emit(context, evt) } } void cleanup(Mapper.Context context) { window.each { evt -> emit(context, evt) } } Page 9 © Hortonworks Inc. 2012
  • 10. Device sightings by day for 2007 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361 366 Page 10 © Hortonworks Inc. 2012
  • 11. Improving Hadoop APIs Configuration.metaClass.setAt = { key, val -> set(key.toString(), val.toString()) } Configuration.metaClass.getAt = { key -> get(key) } Configuration.metaClass.add = {map -> map.each {elt -> set((elt.key).toString(), (elt.value).toString() ) } Page 11 © Hortonworks Inc. 2012
  • 12. & Configuration gets better conf['mapscript'] = new File(src).text String scriptText = conf['mapscript'] conf.add([ window:60000, 'redscript':reduceScript ]) Extending to Job class trickier –subclassing better Page 12 © Hortonworks Inc. 2012
  • 13. New today! script driven MR jobs! protected void setup(Mapper.Context ctx) { this.ctx = ctx this.conf = ctx.configuration ScriptCompiler comp = new ScriptCompiler(conf) String scriptText = conf['mapscript'] map = comp.parse(scriptText, this, ctx) } protected void map(Writable key, Writable value, Mapper.Context ctx) { map.setProperty('key',key) map.setProperty('value',value) map.run() } Page 13 © Hortonworks Inc. 2012
  • 14. Things to consider • Performance: Groovy 2 on Java7 • 'False friends' -Types, if(), exceptions • If you can use Pig, use it. • Use Groovy for testing, extending Hadoop classes (output formatter, etc) • Play with YARN and Giraph with it Page 14 © Hortonworks Inc. 2012
  • 15. Questions? hortonworks.com Page 15 © Hortonworks Inc. 2012
  • 16. hortonworks.com Page 16 © Hortonworks Inc. 2012
  • 17. Performance? • Groovy 1 over-introspects • HLL hides a lot of overhead • If your work is I/O bound, less important • Speed of development vs execution • Need to benchmark on Java 7 Page 17 © Hortonworks Inc. 2012

Notas do Editor

  1. What is the knowledge/skill level of the audience?I’m taking about Groovy, not time to cover Hadoop as wellDisclaimer: I am a Groovy user, not an Expert.
  2. How to describe Groovy? It depends on what you are doing with it? You can look at it and come to different conclusions based on your useIt&apos;s like Java with the datatypes they forgotIt&apos;s got ruby concepts (closures), keywords from python, and dynamic &apos;duck&apos; typingIt lets you do things in the JVM that the original Java authors didn&apos;t expect
  3. It&apos;s a map and reduce -in the reduce. Turtles all the way down. Or at least elephants.