SlideShare uma empresa Scribd logo
1 de 33
Scala and Hadoop @ eBay
What we will cover
• Polymorphic Function Values
• Higher Kinded/Recursive Types
• Cokleislis Star Operators
• Scala Macros
I have no clue what those things are
What we will ACTUALLY cover
• Why Scala
• Why Hadoop
• How we use Scala with Hadoop
• Lots of CODE!
Why Scala?
• JVM
• **Functional**
• Expressive
• How to convince your boss?
Someone on Hacker News said
Scala sucks
• Compile Times
• You changed List again?
• Complicated
• Leads to Madness
Madness?
trait Lazy[+T, P] {
var creationParameters: P = None.asInstanceOf[P];
lazy val lazyThing: Either[Throwable, T] = try {
Right(create(creationParameters)) }
catch { case e => Left(e) }
def get(createParams: P): Either[Throwable, T] = {
creationParameters = createParams
lazyThing
}
def create(params: P): T
}
Madness?
def getSingleInstance[T, P](params: P)(implicit
lazyCreator: Lazy[T, P]): T = {
lazyCreator.get(params) match {
case Right(successValue) => successValue
case Left(exception) => throw new
StackException(exception)
}
}
This is used by ONE client class
• Show some self-restraint
Hadoop
• void map(K1 key, V1 value,
OutputCollector<K2, V2> output, Reporter
reporter)
• void reduce(K2 key, Iterator<V2> values,
OutputCollector<K3, V3> output, Reporter
reporter)
BIG NUMBERS
• Petabytes of data
• 1k+ node Hadoop cluster
• Multi-billion dollar merchandising business
• Lots of users and items 
How should I use Map Reduce?
• Raw map reduce 
• Pig
• Hive
• Cascading
• Scoobi
• Scalding 
Decision Time
• “And every one that heareth these sayings of
mine (great software engineers of the past),
and doeth them not, shall be likened unto a
foolish man, which built his house upon the
sand.”
• “And the rain descended, and the floods
came, and the winds blew, and beat upon that
house; and it fell: and great was the fall of it.”
I believe!
• Scalding combines the best of PIG and
Cascading
Good Pig
A = LOAD 'input' AS (x, y, z);
B = FILTER A BY x > 5;
DUMP B;
C = FOREACH B GENERATE y, z;
STORE C INTO 'output';
// do joins and group by also
Bad Pig
DEFINE NV_terms `perl nv_terms2.pl`
ship('$scripts/nv_terms2.pl');
i5 = stream i4 through NV_terms as (leafcat:chararray,
name:chararray, name1:chararray);
i7 = foreach i5 generate leafcat,
com.ebay.pigudf.sic.RtlUDF(0,0,0,'$site_id',name) as
name,
com.ebay.pigudf.sic.RtlUDF(0,0,0,'$site_id',name1) as
name1;
Other Pig Issues
• Scheduling and DAG creation
Cascading Rocks!
• What is it?
• Supports large workflows and reusable
components
– DAG generation
– Parallel Executions
Cascading code in Scala
val masterPipe = new
FilterURLEncodedStrings(masterPipe, "sqr")
masterPipe = new
FilterInappropriateQueries(masterPipe, "sqr”)
masterPipe = new GroupBy(masterPipe,
CFields("user_id", "epoch_ts", "sqr"),
sortFields)
Someone should really code review this
Cascading Issues
This page intentionally left blank
Scalding Time
class WordCountJob(args : Args) extends Job(args) {
TextLine( args("input") )
.flatMap('line -> 'word) { line : String => tokenize(line) }
.groupBy('word) { _.size }
.write( Tsv( args("output") ) )
// Split a piece of text into individual words.
def tokenize(text : String) : Array[String] = {
// Lowercase each word and remove punctuation.
text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "").split("s+")
}
}
Scalding @ eBay
• Boilerplate reduction
• Extensibility
• New hires
Practical Scalding Use
• Pimp my pimp
• Code generated boilerplate
• Cascades
• Traps
• Testing!
class eBayJob(args: Args) extends Job(args) with PipeBoilerPlate {
implicit def pipe2eBayRichPipe(pipe: Pipe) = new eBayRichPipe(pipe)
class eBayRichPipe(pipe: Pipe) extends RichPipe(pipe) with
CommonFunctions
trait CommonFunctions {
import Dsl._
import RichPipe.assignName
def pipe: Pipe
def reallyComplexFunction(field: Fields, param: Long) = {
//mind blowing code here
}
}}
CheckoutTransactionsPipe(//default path logic)
.project(//fields I need)
.countUserInteractions(//params)
.doScoreCalculation(//params)
.doConfidenceCalculation(//params)
Seems a bit too readable for Scala
Collaborative Filtering
• Typically hard to run on large datasets
Structured Data Importance
• Do people shop by brand?
0
0.2
0.4
0.6
0.8
1
1.2
Supply
Handbags and Purses
Markov Chains
• Investigation of buying patterns in ~50 lines of
code
val purchases = "firsttime" :: x.take(500).toList
val pairs = purchases zip purchases.tail
val grouped = pairs.groupBy(x =>
x._1.toString+"-"+x._2.toString)
val sizes = grouped map { x => {
x._1 -> x._2.size
}} toList
Mining Search Queries
• 20+ billion user queries - give me the top ones
per user
De-Dupe Rank ValidateSample Data
Automation
Hadoop Proxy
Batch Database Load
Machines
Cassandra
Jenkins
MySql
Mongo
Questions?
www.ebaynyc.com

Mais conteúdo relacionado

Mais procurados

Tools for writing Haskell programs
Tools for writing Haskell programsTools for writing Haskell programs
Tools for writing Haskell programs
nkpart
 

Mais procurados (19)

Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
 
Parse, scale to millions
Parse, scale to millionsParse, scale to millions
Parse, scale to millions
 
今時なウェブ開発をSmalltalkでやってみる?
今時なウェブ開発をSmalltalkでやってみる?今時なウェブ開発をSmalltalkでやってみる?
今時なウェブ開発をSmalltalkでやってみる?
 
Jug Marche: Meeting June 2014. Java 8 hands on
Jug Marche: Meeting June 2014. Java 8 hands onJug Marche: Meeting June 2014. Java 8 hands on
Jug Marche: Meeting June 2014. Java 8 hands on
 
Java scriptcore brief introduction
Java scriptcore brief introductionJava scriptcore brief introduction
Java scriptcore brief introduction
 
Value protocols and codables
Value protocols and codablesValue protocols and codables
Value protocols and codables
 
Journey's End – Collection and Reduction in the Stream API
Journey's End – Collection and Reduction in the Stream APIJourney's End – Collection and Reduction in the Stream API
Journey's End – Collection and Reduction in the Stream API
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
 
Ruby is an Acceptable Lisp
Ruby is an Acceptable LispRuby is an Acceptable Lisp
Ruby is an Acceptable Lisp
 
Holden Karau - Spark ML for Custom Models
Holden Karau - Spark ML for Custom ModelsHolden Karau - Spark ML for Custom Models
Holden Karau - Spark ML for Custom Models
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
The Evolution of Scala / Scala進化論
The Evolution of Scala / Scala進化論The Evolution of Scala / Scala進化論
The Evolution of Scala / Scala進化論
 
Clojure & Scala
Clojure & ScalaClojure & Scala
Clojure & Scala
 
RubyMotion
RubyMotionRubyMotion
RubyMotion
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
HOW TO SCALE FROM ZERO TO BILLIONS!
HOW TO SCALE FROM ZERO TO BILLIONS!HOW TO SCALE FROM ZERO TO BILLIONS!
HOW TO SCALE FROM ZERO TO BILLIONS!
 
Tools for writing Haskell programs
Tools for writing Haskell programsTools for writing Haskell programs
Tools for writing Haskell programs
 
Adding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of TricksAdding Riak to your NoSQL Bag of Tricks
Adding Riak to your NoSQL Bag of Tricks
 
Persistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfPersistent Data Structures - partial::Conf
Persistent Data Structures - partial::Conf
 

Semelhante a Scala and Hadoop @ eBay

Fast Web Applications Development with Ruby on Rails on Oracle
Fast Web Applications Development with Ruby on Rails on OracleFast Web Applications Development with Ruby on Rails on Oracle
Fast Web Applications Development with Ruby on Rails on Oracle
Raimonds Simanovskis
 
Why hadoop map reduce needs scala, an introduction to scoobi and scalding
Why hadoop map reduce needs scala, an introduction to scoobi and scaldingWhy hadoop map reduce needs scala, an introduction to scoobi and scalding
Why hadoop map reduce needs scala, an introduction to scoobi and scalding
Xebia Nederland BV
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
Jeremy Kendall
 

Semelhante a Scala and Hadoop @ eBay (20)

Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
PSGI and Plack from first principles
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principles
 
Why Functional Programming Is Important in Big Data Era
Why Functional Programming Is Important in Big Data EraWhy Functional Programming Is Important in Big Data Era
Why Functional Programming Is Important in Big Data Era
 
Fast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaFast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible Java
 
The Essential Perl Hacker's Toolkit
The Essential Perl Hacker's ToolkitThe Essential Perl Hacker's Toolkit
The Essential Perl Hacker's Toolkit
 
Perl6 meets JVM
Perl6 meets JVMPerl6 meets JVM
Perl6 meets JVM
 
Fast Web Applications Development with Ruby on Rails on Oracle
Fast Web Applications Development with Ruby on Rails on OracleFast Web Applications Development with Ruby on Rails on Oracle
Fast Web Applications Development with Ruby on Rails on Oracle
 
Migrating Legacy Data (Ruby Midwest)
Migrating Legacy Data (Ruby Midwest)Migrating Legacy Data (Ruby Midwest)
Migrating Legacy Data (Ruby Midwest)
 
Static or Dynamic Typing? Why not both?
Static or Dynamic Typing? Why not both?Static or Dynamic Typing? Why not both?
Static or Dynamic Typing? Why not both?
 
SproutCore and the Future of Web Apps
SproutCore and the Future of Web AppsSproutCore and the Future of Web Apps
SproutCore and the Future of Web Apps
 
Ruby on Rails survival guide of an aged Java developer
Ruby on Rails survival guide of an aged Java developerRuby on Rails survival guide of an aged Java developer
Ruby on Rails survival guide of an aged Java developer
 
[Start] Scala
[Start] Scala[Start] Scala
[Start] Scala
 
Scio
ScioScio
Scio
 
Scalding Presentation
Scalding PresentationScalding Presentation
Scalding Presentation
 
Why hadoop map reduce needs scala, an introduction to scoobi and scalding
Why hadoop map reduce needs scala, an introduction to scoobi and scaldingWhy hadoop map reduce needs scala, an introduction to scoobi and scalding
Why hadoop map reduce needs scala, an introduction to scoobi and scalding
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache Spark
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Down the Rabbit Hole: An Adventure in JVM Wonderland
Down the Rabbit Hole: An Adventure in JVM WonderlandDown the Rabbit Hole: An Adventure in JVM Wonderland
Down the Rabbit Hole: An Adventure in JVM Wonderland
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Scala and Hadoop @ eBay

  • 2. What we will cover • Polymorphic Function Values • Higher Kinded/Recursive Types • Cokleislis Star Operators • Scala Macros
  • 3. I have no clue what those things are
  • 4. What we will ACTUALLY cover • Why Scala • Why Hadoop • How we use Scala with Hadoop • Lots of CODE!
  • 5. Why Scala? • JVM • **Functional** • Expressive • How to convince your boss?
  • 6. Someone on Hacker News said Scala sucks • Compile Times • You changed List again? • Complicated • Leads to Madness
  • 7. Madness? trait Lazy[+T, P] { var creationParameters: P = None.asInstanceOf[P]; lazy val lazyThing: Either[Throwable, T] = try { Right(create(creationParameters)) } catch { case e => Left(e) } def get(createParams: P): Either[Throwable, T] = { creationParameters = createParams lazyThing } def create(params: P): T }
  • 8. Madness? def getSingleInstance[T, P](params: P)(implicit lazyCreator: Lazy[T, P]): T = { lazyCreator.get(params) match { case Right(successValue) => successValue case Left(exception) => throw new StackException(exception) } }
  • 9. This is used by ONE client class • Show some self-restraint
  • 10.
  • 11. Hadoop • void map(K1 key, V1 value, OutputCollector<K2, V2> output, Reporter reporter) • void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, Reporter reporter)
  • 12. BIG NUMBERS • Petabytes of data • 1k+ node Hadoop cluster • Multi-billion dollar merchandising business • Lots of users and items 
  • 13. How should I use Map Reduce? • Raw map reduce  • Pig • Hive • Cascading • Scoobi • Scalding 
  • 14. Decision Time • “And every one that heareth these sayings of mine (great software engineers of the past), and doeth them not, shall be likened unto a foolish man, which built his house upon the sand.” • “And the rain descended, and the floods came, and the winds blew, and beat upon that house; and it fell: and great was the fall of it.”
  • 15. I believe! • Scalding combines the best of PIG and Cascading
  • 16. Good Pig A = LOAD 'input' AS (x, y, z); B = FILTER A BY x > 5; DUMP B; C = FOREACH B GENERATE y, z; STORE C INTO 'output'; // do joins and group by also
  • 17. Bad Pig DEFINE NV_terms `perl nv_terms2.pl` ship('$scripts/nv_terms2.pl'); i5 = stream i4 through NV_terms as (leafcat:chararray, name:chararray, name1:chararray); i7 = foreach i5 generate leafcat, com.ebay.pigudf.sic.RtlUDF(0,0,0,'$site_id',name) as name, com.ebay.pigudf.sic.RtlUDF(0,0,0,'$site_id',name1) as name1;
  • 18. Other Pig Issues • Scheduling and DAG creation
  • 19. Cascading Rocks! • What is it? • Supports large workflows and reusable components – DAG generation – Parallel Executions
  • 20. Cascading code in Scala val masterPipe = new FilterURLEncodedStrings(masterPipe, "sqr") masterPipe = new FilterInappropriateQueries(masterPipe, "sqr”) masterPipe = new GroupBy(masterPipe, CFields("user_id", "epoch_ts", "sqr"), sortFields)
  • 21. Someone should really code review this
  • 22. Cascading Issues This page intentionally left blank
  • 23. Scalding Time class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) // Split a piece of text into individual words. def tokenize(text : String) : Array[String] = { // Lowercase each word and remove punctuation. text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "").split("s+") } }
  • 24. Scalding @ eBay • Boilerplate reduction • Extensibility • New hires
  • 25. Practical Scalding Use • Pimp my pimp • Code generated boilerplate • Cascades • Traps • Testing!
  • 26. class eBayJob(args: Args) extends Job(args) with PipeBoilerPlate { implicit def pipe2eBayRichPipe(pipe: Pipe) = new eBayRichPipe(pipe) class eBayRichPipe(pipe: Pipe) extends RichPipe(pipe) with CommonFunctions trait CommonFunctions { import Dsl._ import RichPipe.assignName def pipe: Pipe def reallyComplexFunction(field: Fields, param: Long) = { //mind blowing code here } }}
  • 27. CheckoutTransactionsPipe(//default path logic) .project(//fields I need) .countUserInteractions(//params) .doScoreCalculation(//params) .doConfidenceCalculation(//params) Seems a bit too readable for Scala
  • 28. Collaborative Filtering • Typically hard to run on large datasets
  • 29. Structured Data Importance • Do people shop by brand? 0 0.2 0.4 0.6 0.8 1 1.2 Supply Handbags and Purses
  • 30. Markov Chains • Investigation of buying patterns in ~50 lines of code val purchases = "firsttime" :: x.take(500).toList val pairs = purchases zip purchases.tail val grouped = pairs.groupBy(x => x._1.toString+"-"+x._2.toString) val sizes = grouped map { x => { x._1 -> x._2.size }} toList
  • 31. Mining Search Queries • 20+ billion user queries - give me the top ones per user De-Dupe Rank ValidateSample Data
  • 32. Automation Hadoop Proxy Batch Database Load Machines Cassandra Jenkins MySql Mongo

Notas do Editor

  1. Introduce myself and ebay NYC
  2. Laugh
  3. We are starting to use scala for live site recs
  4. Mention the Option and EitherFirst class functionsMention how great traits areI feel like Haskell will never break into the corporation this is a great draft All my life I’ve wanted a type safe build system. And NOW I have it
  5. They break backward compatibilityWeak IDE support – debugging, refactoring, etcExplain the madness
  6. Tell them about the example
  7. The most complicated system for counting words insert meme hereExplain why we use hadoop. Data is huge. I can’t say when you want to make the jump to map reduce but I see growth in making it THE platform
  8. Say why raw map reduce stinks. Mention what hive is and scoobi is
  9. Explain why we didn’t go with scoobi even though it’s all scala
  10. Scheduling and DAG creationWhere is my SOURCE?
  11. Mentionazkaban
  12. Can do parallel executions of tasks that don’t depend on each otherSupports static dependencies via cascades
  13. Verbose. You still need to write a bunch of code.
  14. Mention about scoobi and how it’s not super stableRemindthen about how it combines the best of PIG and Cascading
  15. This is actual code to compute a user’s preferences. Explain a bit about user preferences
  16. Mahout has some functions for this but they are hard to setup and get goingLess precise than other state of the art methods but still accurateScala Days Talk with Chris Severs
  17. Linear ModelTalk about Concept ExtractionUse SQL Lite for ad hoc queries
  18. Talk about the use of cascadesTalk about traps and counters
  19. Scalding makes this 100% times easier because of cascades and flows