SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
DMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMD
 MMM8OOOOOOOOOOO8MMMM8OOOOOOOOOOOOOOODMMMMOOOOOOOOOOOOOOMMMN
DMMIIIIIIIIIIIII$MMMM$IIIIIIIIIIIIIIIOMMMM7III?IIIIIIIIII7MM
MMOIIIIIIIIIIIII7MMMMOIIIIIIIIIIIIIIIIMMMM8IIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMDIIIIIIIIIIIIIIIIDMMMM7IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMM7IIIIIIIIIIIIIII?MMMMMIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMM8IIIIIIIIIIIIIIIIMMMMMOIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIOMMMMMIIIIIIIIIIIIIIIIIMMMMMMIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIIZMMMMMMIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIZMMMM8IIIIIIIIIIIIIIIII7MMMMMMOIIIIIIIIIMM
MM8$$IIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIII?III8MMMMMMMZIIIIIIMM
MMMMMMMMMMN87IIIIII8MMMMDIIIIIIIIIIIIIIIIIIIZMMMMMMMMMNZIIMM
MMMMMMMMMMMMMMMMMOII$MMMMMIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMM8MMMMMZIIIIIIIIIIIIIIIIIIIIII8MMMMMMMMMM
MMOIIIIIIII7NMMMMMMMMMMMMMMMIIIIIIIIIIIIIIIIIIIIII?IIII$ODMM
MMOIIIIIIIIII?I8MMMMMMMMMMMMDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIZMMMMMMMMMM7II?IIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIII7NMMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIDMMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIII7MMMMMMMM8IIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIIII7MMMMMMMMM$IIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIIIIIOMMMMMMMMMM8I?IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIIIIII7MMMMMMMMMMMMMN7IIIIIIIIIIIMM
MMMMMMD$IIIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMMMMMMMMDZIIIIIIIMM
MMMMMMMMMMMNIIIIIIIIIIIIIIIIIIIIIINMMMMDZNMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMM?IIIIIIIIIIIIIIIIIIIIMMMMMIII7NMMMMMMMMMMMMMMM
MMOIII7DMMMMMMMM$IIIIIIIIIIIIIIIIIIZMMMMNIIIIIIII7$DNMMMMMMM
MMOIIIIII7MMMMMMM7IIIIIIIIIIIIIIIIIIOMMMM8IIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIMMMMMMNIIIIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIMMMMMNIIIIIIIIIIIIIIIIIDMMMM7IIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIMMMMMMIIIIIIIIIIIIIIII7MMMMDIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIMMMMMZIIIIIIIIIIIIIIIIOMMMM$IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIIMMMM8IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMMNIIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIOMMMMMIIIIIIIIIIIIIIIIMMMMMI??IIIIIIIIIIIMM
$MMIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIII8MM
 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
    777777777777777777777777777777777777777777777777I77

 MD$N?   MMM8MN   OMM8MZ     OMMMDM   MMDM+ MD~NO    M= MM     ZZMI
   +ZI   M7   M   O7    MO   OM   M   OMMMMM 8M7     M= MM    MN7
 MM?+I   M7   M   O7    MO   OM   M   O+      8M7    M~ MM      ZMI
 MDMMN   MMNMM7   OMMMM      OM   M   MMM8M  MD:N8   MMMMMM   MO8M
         M7       O7
         M7       O7
PYTHON IN
AN EVOLVING
ENTERPRISE SYSTEM
EVALUATING INTEGRATION
SOLUTIONS WITH HADOOP
DAVE HIMROD
STEVE KANNAN
ANGELICA PANDO
Building today’s most powerful,
open, and customizable advertising
technology platform.
Ad is served in
<100 milliseconds

                                                                       WINNING
                      AUCTION                                            BID
                      REQUEST
            300x250



                         AD      ADVERTISER 1   ADVERTISER 2   ADVERTISER 3
                      RESPONSE    BID: $2.50    BID: $3.25     BID: $4.10


                                       APPNEXUS OPTIMIZATION
Evolution of AppNexus

                20    350      430 PEOPLE
      FROM    100M    39B     45B AD REQUESTS


     5000+    MYSQL, HADOOP/HBASE, AEROSPIKE,
   SERVERS    NETEZZA, VERTICA


    38+ TB
   OF DATA EVERY DAY


    99.99%
 UPTIME
Evolution of AppNexus

    ENG OFFICES         ENGINEERING
    IN PORTLAND         HQ IN NYC
    & SF
Data-Driven Decisioning (D3)


                  Bidder
                   Bidder
                    Bidder
                    BIDDERS




         DATA                     D3
       PIPELINE               PROCESSING
Python at AppNexus
Python enables us to scale our team and rapidly
iterate and prototype technologies.
Hadoop at AppNexus

Hadoop enables us to   1PB
    CLUSTER

do aggregations for
reporting and other    862
    NODES ACROSS
                               SEVERAL CLUSTERS
data pipeline jobs
                       40B
    BILLION LOG
                               RECORDS DAILY

                               BILLION
                       5.6B
   LOG RECORDS/HOUR
                               AT PEAK
Data modeling today
 BIG DATA: TBS/HOUR                    MEDIUM DATA: GBS/HOUR


   Task
    Task
     Task
      Task
        logs
         logs                                              CACHE
          logs               VERTICA
           logs

      HADOOP
                      Σ
                    DATA                     DATA DRIVEN
                  SERVICES                   DECISIONING
To enable the next
generation of data modeling,
we need to leverage our
Hadoop cluster
What are we trying to do
Access the data on Hadoop
Continue to use Python to model
à No consensus on the best solution




So we conducted our own research
to evaluate integration options
The budget problem
We have thousands of bidders buying billions
of ads per hour in real-time auctions.
We need to create a model that can manipulate
how our bidders spend their budgets and
purchase ads.
Data modeling today
 BIG DATA: TBS/HOUR                MEDIUM DATA: GBS/HOUR



   Task
    Task
     Task
      Task
        logs
         logs                                           CACHE
          logs    DATA DRIVEN
                         VERTICA
           logs   DECISIONING

      HADOOP
                      Σ
                    DATA                  DATA DRIVEN
                  SERVICES                DECISIONING
Test problem:
Budget aggregation
SCENARIO:
Each auction creates a row in a log.

 timestamp, auction_id, object_type, object_id, method, value


We need to aggregate and model to update
bidders.
Method:
Budget aggregation
STEP 1: De-duplicate records where
KEY: object_type, object_id, method, auction_id

STEP 2: Aggregate value where
KEY: object_type, object_id, method
HARDWARE
•  300 GB of log data
•  5 nodes running Scientific Linux 6.3 (Carbon)
   •  Intel Xeon CPU @ 2.13 GHz, 4 cores
 •   2 TB Disk
•  CDH4
•  45 map, 35 reduce tasks at a time
Research: Potential solutions
1.   Native Java
2.   Streaming ‒ no framework
3.   mrjob
4.   Happy / Jython / PyCascading
5.  Pig + Jython UDF
6.   Pydoop   prohibitive installation
7.   Disco    evaluating Hadoop
8.  Hadoopy / dumbo similar to mrjob
9.   Hipy Effectively ORM for Hive
Research: Criteria
1. Usability
2. Performance
3. Versatility / Flexibility
Research: Native Java

Benchmark for comparison, using new Hadoop Java API

BudgetAgg.java Mapper class




BudgetAgg.java Reducer class
Research: Native Java
USABILITY:
 ›  Not   straightforward for analysts to implement, launch, or tweak



PERFORMANCE:
 ›  Fastest implementation.
 ›  Can further enhance by overriding   comparators for grouping and
   sorting
Research: Native Java


VERSATILITY / FLEXIBILITY:
 ›  Abilityto customize pretty
   much everything
 ›  CustomPartitioner,
   Comparator, Grouping
   Comparator in our
   implementation
 ›  Canuse complex objects as
   keys or values
Research: Streaming
Supplies an executable to Hadoop that reads from stdin
and writes to stdout
mapper.py   
                 reducer.py
Research: Streaming
USABILITY:
 ›  Key/value detection has to      be done by the user
 ›  Still, straightforward for      relatively simple jobs


  hadoop jar /usr/lib/hadoop-0.23.0-mr1-cdh4b1/contrib/streaming/hadoop-*streaming*.jar 
  -D stream.num.map.output.key.fields=4 
  -D num.key.fields.for.partition=3 
  -D mapred.reduce.tasks=35 
  -file mapper.py 
  -mapper mapper.py 
  -file reducer.py 
  -reducer reducer_nongroup.py 
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner 
   -input /logs/log_budget/v002/2013/03/06/19/
   -output bidder_logs/streaming_output
Research: Streaming
PERFORMANCE:
 ›  ~50%   slower than Java



VERSATILITY / FLEXIBILITY:
 ›  Inputs in reducer are iterated line-by-line
 ›  Straightforward to get de-duplication and agg   to work in a single
  step
Research: mrjob
Open-source Python framework that wraps Hadoop Streaming


USABILITY:
 ›  “Simplified   Java”
 ›  Great docs,   actively developed

python budget_agg.py -r hadoop --hadoop-bin /usr/bin/hadoop 
--jobconf stream.num.map.output.key.fields=4 
--jobconf num.key.fields.for.partition=3 
--jobconf mapred.reduce.tasks=35 
--partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner 
-o   hdfs:///user/apando/budget_logs/mrjob_output 
hdfs:///logs/log_budget/v002/2013/03/06/19/
Research: mrjob
PERFORMANCE:
 ›  Not   much slower than Streaming if only using RawValueProtocol
Research: mrjob
PERFORMANCE:
 ›  Involvingobjects or
   multiple steps slow it
   down a lot



VERSATILITY /
FLEXIBILITY:
 ›  Candefine Input /
   Internal / Output
   protocols
Research: Happy / Jython
HAPPY:
 ›  Full access to Java MapReduce   API
 ›  Happy project is deprecated
    ›  Depends on Hadoop 0.17


JYTHON:
 ›  Doesn’t work easily out of the box
    ›  Relies on deprecated Jython compiler in Jython   2.2
 ›  Limited to Jython implementation of Python
    ›  Numpy/SciPy and Pandas unavailable
Research: PyCascading
Python wrapper around Cascading framework for data
processing workflow.
Uses Jython as high level language for defining
workflows.
Research: PyCascading
USABILITY:
 ›  Relatively new project
 ›  Cascading API is simple and intuitive
 ›  Job Planner abstracts details of MapReduce


PERFORMANCE:
 ›  Abstraction makes performance tuning   challenging
 ›  Does not support Combiner operation
 ›  Dev time was fast, runtime was slow
Research: PyCascading
VERSATILITY / FLEXIBILITY:
 ›  Allows Jython UDFs
 ›  Rich set of built-in   functions: GroupBy, Join, Merge
Research: Pig
Provides a high-level language for data analysis
which is compiled into a sequence of MapReduce
operations.

USABILITY:
Research: Pig
USABILITY:
 ›  Powerful   debugging and optimization tools (e.g. explain, illustrate)




 ›  Automatically optimizes MapReduce operations:
    ›  Applies Combiner operations where applicable
    ›  Reorders and conflates data flow for efficiency
Research: Pig
PERFORMANCE:
 ›  Pig compiler produces performant code
 ›  Complex operations might require manual optimization
 ›  Budget Aggregation require the implementation of a User
                                                          Defined
  Function in Jython to eliminate unnecessary MapReduce step
Research: Pig
VERSATILITY / FLEXIBILITY:
USING PIG + JYTHON UDF
 ›  PigLatin
           is expressive and can
  capture most use cases
 ›  Define
         custom data operations
  in Jython called UDFs
 ›  UDFs
       can implement custom
  loaders, partitioners, and
  other advanced features
Research: Summary
             Running Time / Lines of Code for Implementations

           Pig




   PyCascading




        MRJob
                                                                                    Lines of Code
                                                                                    Running Time

     Streaming




          Java



                  0
   50
       100
         150
        200
        250
   300
                             Running Time (minutes), Lines of Code
Research: Recommendations

•  Pig and PyCascading enable complex
   pipelines to be expressed simply
•  Pig is more mature and the most viable
   option for ad-hoc analysis
??????? ??:::::::?? ??:::::::::::? ?:::::????:::::? ?::::? ?::::? ?::::? ?::::? ?????? ?::::? ?::::? ?::::? ?::::? ?::::? ?::::? ?::::? ??::?? ???? ??? ??:?? ???




                                                                                                               ???????
                                                                                                             ??:::::::??
                                                                                                           ??:::::::::::?
                                                                                                          ?:::::????:::::?
                                                                                                          ?::::?     ?::::?
                                                                                                          ?::::?       ?::::?
                                                                                                          ??????       ?::::?
      QUESTIONS                                                                                                      ?::::?
                                                                                                                    ?::::?
                                                                                                                   ?::::?
                                                                                                                  ?::::?
                                                                                                                 ?::::?
                                                                                                                 ?::::?
     pydata@appnexus.com                                                                                         ??::??
                                                                                                                  ????
                                                                                                                             ???
                                                                                                                            ??:??
                                                                                                                             ???

Mais conteúdo relacionado

Mais procurados

Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity Calculator
Yahoo Developer Network
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
Supriya .
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 

Mais procurados (19)

Hadoop
HadoopHadoop
Hadoop
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity Calculator
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 

Semelhante a Python in an Evolving Enterprise System (PyData SV 2013)

Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 

Semelhante a Python in an Evolving Enterprise System (PyData SV 2013) (20)

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothman
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Apache Big Data Europa- How to make money with your own data
Apache Big Data Europa- How to make money with your own dataApache Big Data Europa- How to make money with your own data
Apache Big Data Europa- How to make money with your own data
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 

Mais de PyData

Mais de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Último

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Python in an Evolving Enterprise System (PyData SV 2013)