SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Company Profile
Сегментация пользователей
в online-рекламе
Spark vs Hadoop
Сергей Жемжицкий,
CTO, CleverDATA,
22 мая, 2015
cleverdata.ru | info@cleverdata.ru
International market
business development
since 2012
One of three leading IT companies in Russia
43 branches in Russia and abroad
+5500 employees
100K projects for 10K customers
Data management innovative
platform (Data Exchange Service)
Cloud Service
In-house development
Internet advertising solutions
Data Management Platforms
Customers Base Management
Web Analytics
Marketing automation
Big Data
Data Mining
Digital Intelligence
Operational Intelligence
Low Latency and NoSQL
Cloud Computing
cleverdata.ru | info@cleverdata.ru
Агенда
• Про задачу;
• Hadoop vs. Spark;
• Особенности;
• Что дальше.
cleverdata.ru | info@cleverdata.ru
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
D
S
P
S
S
P
Real Time Bidding (RTB)
TRACKING DATA
cleverdata.ru | info@cleverdata.ru
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNER’S DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
DMP
Data Management Platform (DMP)
cleverdata.ru | info@cleverdata.ru
3rd party
data
Relational Data Store
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
Типовые потоки данных
cleverdata.ru | info@cleverdata.ru
Типовые потоки данных :: RTB
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
• Зачем монетизировать?
• Как монетизировать?
• Чем монетизировать?
cleverdata.ru | info@cleverdata.ru
Зачем монетизировать?
Найти всех пользователей, которые
участвовали в рекламной кампании “Star Wars” [и]
видели один из баннеров “Darth Vader” или “Luke Skywalker”
в течении последних 6 дней [и]
кликнули на этот баннер [и]
посетили страницу покупки светового меча Darth’а Vader’а [и]
но так ничего и не купили
Для того, чтобы
сделать ретаргетинг персонифицированным баннером со
скидкой на меч в 40%
cleverdata.ru | info@cleverdata.ru
find all users who have
taken part in campaign[s] “Star Wars” [and]
viewed banner[s] “Darth Vader” or “Luke Skywalker”
during [last] 6 day[s] [and]
clicked banner[s] “Darth Vader's lightsaber” [and]
visited buying area of “Darth Vader's lightsaber” [and]
not visited order confirmed area of “Darth Vader's lightsaber”
Как монетизировать?
[impression]
[click]
[tr. pixel]
[tr. pixel]
id cookie event_id event_type campaign_id timestamp …
1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 …
2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 …
3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 …
[cookies]
cleverdata.ru | info@cleverdata.ru
Как монетизировать?
reducefind all users who have
taken part in campaign[s] “Star Wars”
viewed banner[s] “Darth Vader” or
“Luke Skywalker” during [last] 6 day[s]
clicked banner[s] “Darth Vader's
lightsaber”
visited buying area of “Darth Vader's
lightsaber”
not visited order confirmed area of “Darth
Vader's lightsaber”
(c1, 0)
(c1, 1)
(c1, 2)
(c1, 3)
Ø
map
(c1, 0;1;2;3)
true(0) and
true(1) and
true(2) and
true(3) and
not false(4)
C1
cleverdata.ru | info@cleverdata.ru
VS.
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Правда жизни
• Стильно;
• Модно;
• Молодежно.
cleverdata.ru | info@cleverdata.ru
Spark :: Размер
cleverdata.ru | info@cleverdata.ru
Перед тем, как смотреть на Hadoop
cleverdata.ru | info@cleverdata.ru
Map-Reduce :: Размер
cleverdata.ru | info@cleverdata.ru
Материалы и инструменты
Hardware (3 Nodes)
• 12 Core AMD Opteron™ 6338P
~ 2.8 GHz
• 64 GB RAM
• 1 GBPS NICs
Software
• CDH 5.3.1 (Hadoop 2.5.0)
• Spark 1.2.0
Data
• 14.2 GB of raw data
• 61.1 M of transactions
• 128 MB block size
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Время выполнения
cleverdata.ru | info@cleverdata.ru
Spark :: Exec-cores vs Num-execs
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Инициализация
MR
protected void setup(Context ctx)
o.a.h.c.Configured
distributed cache
Spark
mapRegion
broadcast vars
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Параллелизм
MR
mapred.reduce.tasks
mapreduce.job.reduces
splittable formats
Spark
spark.default.parallelism
num-executors, executor-cores in
yarn
numTasks в groupByKey,
reduceByKey, aggregateByKey…
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Зависимости
MR
o.a.h.u.Tool
o.a.h.u.ToolRunner
-conf app.conf
-files
-libjars
setUserClassesTakesPrecedence
Spark
--jars
--files
--conf
--driver-java-options
spark.driver.extraJavaOptions
spark.executor.extraJavaOptions
spark.driver.userClassPathFirst
spark.executor.userClassPathFirst
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Secondary Sort
MR
setSortComparatorClass
setGroupingComparatorClass
setPartitionerClass
Spark
repartitionAndSortWithinPartitions
mapPartitions
Entire partition processing result
must be able to fit in memory
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Тестирование
MR
MRUnit
o.a.h.h.MiniDFSCluster
o.a.h.m.MiniMRCluster
o.a.h.y.s.MiniYARNCluster
o.a.h.m.v2.MiniMRYarnCluster
Spark
Local executor
cleverdata.ru | info@cleverdata.ru
Что дальше и почему Spark?
• Spark Streaming;
• Micro Batches;
• λ-архитектура.
без серьезного хирургического вмешательства
cleverdata.ru | info@cleverdata.ru
Спасибо за вопросы!
info@cleverleaf.co.uk :: info@cleverdata.ru
cleverleaf.co.uk :: cleverdata.ru
1dmp.io :: crawler.1dmp.io
facebook.com/CleverData :: +7 (495) 967-66-50

Mais conteúdo relacionado

Mais procurados

Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?Taste Medio
 
From Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeFrom Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeAli Hodroj
 
Turning data from insights into value
Turning data from insights into valueTurning data from insights into value
Turning data from insights into valueKoray Kocabas
 
Big data for sales and marketing people
Big data for sales and marketing peopleBig data for sales and marketing people
Big data for sales and marketing peopleEdward Chenard
 
Geo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsGeo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsAli Hodroj
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBMongoDB
 

Mais procurados (6)

Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?
 
From Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeFrom Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics Converge
 
Turning data from insights into value
Turning data from insights into valueTurning data from insights into value
Turning data from insights into value
 
Big data for sales and marketing people
Big data for sales and marketing peopleBig data for sales and marketing people
Big data for sales and marketing people
 
Geo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsGeo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data Grids
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
 

Semelhante a CleverDATA for Hadoop_Meetup_22052015_Spark_vs_Hadoop

Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014KZero Worldswide
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020Patrick Deglon
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarMatěj Novák
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsMark Kromer
 
Experience Summary
Experience SummaryExperience Summary
Experience SummarySanket Dave
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfHuman37
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct School
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Olivier Van Baeveghem
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfScyllaDB
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Databricks
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategyVIVALDI
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacleShareThis
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketSB Soft
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Denodo
 

Semelhante a CleverDATA for Hadoop_Meetup_22052015_Spark_vs_Hadoop (20)

VR Radar Chart Q2 2014
VR Radar Chart Q2 2014VR Radar Chart Q2 2014
VR Radar Chart Q2 2014
 
Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies Webinar
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Experience Summary
Experience SummaryExperience Summary
Experience Summary
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdf
 
The Sizmek_Tech solutions
The Sizmek_Tech solutionsThe Sizmek_Tech solutions
The Sizmek_Tech solutions
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator X
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury Market
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 

Mais de CleverDATA

CRM onboarding - оффлайн данные для онлайн рекламы
CRM onboarding - оффлайн данные для онлайн рекламы CRM onboarding - оффлайн данные для онлайн рекламы
CRM onboarding - оффлайн данные для онлайн рекламы CleverDATA
 
Jpoint 2017 - как это было (обзор конференции)
Jpoint 2017 - как это было (обзор конференции)Jpoint 2017 - как это было (обзор конференции)
Jpoint 2017 - как это было (обзор конференции)CleverDATA
 
Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)
Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)
Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)CleverDATA
 
Data exchange как ключевой элемент экосистемы обмена данными
Data exchange как ключевой элемент экосистемы обмена даннымиData exchange как ключевой элемент экосистемы обмена данными
Data exchange как ключевой элемент экосистемы обмена даннымиCleverDATA
 
Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...
Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...
Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...CleverDATA
 
CleverDATA _HybridConf16_Public
CleverDATA _HybridConf16_PublicCleverDATA _HybridConf16_Public
CleverDATA _HybridConf16_PublicCleverDATA
 
Splunk for IT Operations and IT Service Intelligence
Splunk for IT Operations and IT Service IntelligenceSplunk for IT Operations and IT Service Intelligence
Splunk for IT Operations and IT Service IntelligenceCleverDATA
 
Splunk - универсальная платформа для работы с любыми данными
Splunk - универсальная платформа для работы с любыми даннымиSplunk - универсальная платформа для работы с любыми данными
Splunk - универсальная платформа для работы с любыми даннымиCleverDATA
 
Big data. Тренды и технологии. Использование в работе с клиентами.
Big data. Тренды и технологии. Использование в работе с клиентами.Big data. Тренды и технологии. Использование в работе с клиентами.
Big data. Тренды и технологии. Использование в работе с клиентами.CleverDATA
 
Д.Афанасьев_ CleverDATA_Охота за данными
Д.Афанасьев_ CleverDATA_Охота за даннымиД.Афанасьев_ CleverDATA_Охота за данными
Д.Афанасьев_ CleverDATA_Охота за даннымиCleverDATA
 
CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)
CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)
CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)CleverDATA
 
Fors и big data appliance
Fors и big data applianceFors и big data appliance
Fors и big data applianceCleverDATA
 
Oracle big data for finance
Oracle big data for financeOracle big data for finance
Oracle big data for financeCleverDATA
 
Clever data 1dmp_oracle_fors
Clever data 1dmp_oracle_forsClever data 1dmp_oracle_fors
Clever data 1dmp_oracle_forsCleverDATA
 
Clever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentation
Clever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentationClever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentation
Clever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentationCleverDATA
 
Customers segmentation_responce prediction
Customers segmentation_responce predictionCustomers segmentation_responce prediction
Customers segmentation_responce predictionCleverDATA
 
HR_Scoring_CleverDATA
HR_Scoring_CleverDATAHR_Scoring_CleverDATA
HR_Scoring_CleverDATACleverDATA
 
CleverDATA_Oracle Cloud BI Day 2015
CleverDATA_Oracle Cloud BI Day 2015CleverDATA_Oracle Cloud BI Day 2015
CleverDATA_Oracle Cloud BI Day 2015CleverDATA
 
CleverDATA_Spark_audience_segmentation_in_online_ad
CleverDATA_Spark_audience_segmentation_in_online_adCleverDATA_Spark_audience_segmentation_in_online_ad
CleverDATA_Spark_audience_segmentation_in_online_adCleverDATA
 
Julia Tuzin teradata omnichannel_interactions
Julia Tuzin teradata omnichannel_interactionsJulia Tuzin teradata omnichannel_interactions
Julia Tuzin teradata omnichannel_interactionsCleverDATA
 

Mais de CleverDATA (20)

CRM onboarding - оффлайн данные для онлайн рекламы
CRM onboarding - оффлайн данные для онлайн рекламы CRM onboarding - оффлайн данные для онлайн рекламы
CRM onboarding - оффлайн данные для онлайн рекламы
 
Jpoint 2017 - как это было (обзор конференции)
Jpoint 2017 - как это было (обзор конференции)Jpoint 2017 - как это было (обзор конференции)
Jpoint 2017 - как это было (обзор конференции)
 
Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)
Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)
Большие данные в маркетинге: обработка, хранение, монетизация (Big Data 2017)
 
Data exchange как ключевой элемент экосистемы обмена данными
Data exchange как ключевой элемент экосистемы обмена даннымиData exchange как ключевой элемент экосистемы обмена данными
Data exchange как ключевой элемент экосистемы обмена данными
 
Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...
Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...
Text mining of Beauty Blogs: о чем говорят женщины? (Артем Просветов, data sc...
 
CleverDATA _HybridConf16_Public
CleverDATA _HybridConf16_PublicCleverDATA _HybridConf16_Public
CleverDATA _HybridConf16_Public
 
Splunk for IT Operations and IT Service Intelligence
Splunk for IT Operations and IT Service IntelligenceSplunk for IT Operations and IT Service Intelligence
Splunk for IT Operations and IT Service Intelligence
 
Splunk - универсальная платформа для работы с любыми данными
Splunk - универсальная платформа для работы с любыми даннымиSplunk - универсальная платформа для работы с любыми данными
Splunk - универсальная платформа для работы с любыми данными
 
Big data. Тренды и технологии. Использование в работе с клиентами.
Big data. Тренды и технологии. Использование в работе с клиентами.Big data. Тренды и технологии. Использование в работе с клиентами.
Big data. Тренды и технологии. Использование в работе с клиентами.
 
Д.Афанасьев_ CleverDATA_Охота за данными
Д.Афанасьев_ CleverDATA_Охота за даннымиД.Афанасьев_ CleverDATA_Охота за данными
Д.Афанасьев_ CleverDATA_Охота за данными
 
CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)
CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)
CleverDATA (Denis Reymer) presentation for CNews Forum 2015 (Banking Section)
 
Fors и big data appliance
Fors и big data applianceFors и big data appliance
Fors и big data appliance
 
Oracle big data for finance
Oracle big data for financeOracle big data for finance
Oracle big data for finance
 
Clever data 1dmp_oracle_fors
Clever data 1dmp_oracle_forsClever data 1dmp_oracle_fors
Clever data 1dmp_oracle_fors
 
Clever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentation
Clever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentationClever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentation
Clever data datascienceweek_spark_vs_hadoop_in_online_audience_segmentation
 
Customers segmentation_responce prediction
Customers segmentation_responce predictionCustomers segmentation_responce prediction
Customers segmentation_responce prediction
 
HR_Scoring_CleverDATA
HR_Scoring_CleverDATAHR_Scoring_CleverDATA
HR_Scoring_CleverDATA
 
CleverDATA_Oracle Cloud BI Day 2015
CleverDATA_Oracle Cloud BI Day 2015CleverDATA_Oracle Cloud BI Day 2015
CleverDATA_Oracle Cloud BI Day 2015
 
CleverDATA_Spark_audience_segmentation_in_online_ad
CleverDATA_Spark_audience_segmentation_in_online_adCleverDATA_Spark_audience_segmentation_in_online_ad
CleverDATA_Spark_audience_segmentation_in_online_ad
 
Julia Tuzin teradata omnichannel_interactions
Julia Tuzin teradata omnichannel_interactionsJulia Tuzin teradata omnichannel_interactions
Julia Tuzin teradata omnichannel_interactions
 

Último

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

CleverDATA for Hadoop_Meetup_22052015_Spark_vs_Hadoop

  • 1. Company Profile Сегментация пользователей в online-рекламе Spark vs Hadoop Сергей Жемжицкий, CTO, CleverDATA, 22 мая, 2015
  • 2. cleverdata.ru | info@cleverdata.ru International market business development since 2012 One of three leading IT companies in Russia 43 branches in Russia and abroad +5500 employees 100K projects for 10K customers Data management innovative platform (Data Exchange Service) Cloud Service In-house development Internet advertising solutions Data Management Platforms Customers Base Management Web Analytics Marketing automation Big Data Data Mining Digital Intelligence Operational Intelligence Low Latency and NoSQL Cloud Computing
  • 3. cleverdata.ru | info@cleverdata.ru Агенда • Про задачу; • Hadoop vs. Spark; • Особенности; • Что дальше.
  • 4. cleverdata.ru | info@cleverdata.ru publishers AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK advertisers D S P S S P Real Time Bidding (RTB)
  • 5. TRACKING DATA cleverdata.ru | info@cleverdata.ru publishers COOKIE SYNCs ACCESS LOGS PARTNER’S DATA 3rd PARTY DATA CLICK STREAMS advertisers S S P D S P DMP Data Management Platform (DMP)
  • 6. cleverdata.ru | info@cleverdata.ru 3rd party data Relational Data Store raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates Типовые потоки данных
  • 7. cleverdata.ru | info@cleverdata.ru Типовые потоки данных :: RTB 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 8. cleverdata.ru | info@cleverdata.ru 1st-party data 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 9. cleverdata.ru | info@cleverdata.ru 1st-party data • Зачем монетизировать? • Как монетизировать? • Чем монетизировать?
  • 10. cleverdata.ru | info@cleverdata.ru Зачем монетизировать? Найти всех пользователей, которые участвовали в рекламной кампании “Star Wars” [и] видели один из баннеров “Darth Vader” или “Luke Skywalker” в течении последних 6 дней [и] кликнули на этот баннер [и] посетили страницу покупки светового меча Darth’а Vader’а [и] но так ничего и не купили Для того, чтобы сделать ретаргетинг персонифицированным баннером со скидкой на меч в 40%
  • 11. cleverdata.ru | info@cleverdata.ru find all users who have taken part in campaign[s] “Star Wars” [and] viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] [and] clicked banner[s] “Darth Vader's lightsaber” [and] visited buying area of “Darth Vader's lightsaber” [and] not visited order confirmed area of “Darth Vader's lightsaber” Как монетизировать? [impression] [click] [tr. pixel] [tr. pixel] id cookie event_id event_type campaign_id timestamp … 1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 … 2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 … 3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 … [cookies]
  • 12. cleverdata.ru | info@cleverdata.ru Как монетизировать? reducefind all users who have taken part in campaign[s] “Star Wars” viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] clicked banner[s] “Darth Vader's lightsaber” visited buying area of “Darth Vader's lightsaber” not visited order confirmed area of “Darth Vader's lightsaber” (c1, 0) (c1, 1) (c1, 2) (c1, 3) Ø map (c1, 0;1;2;3) true(0) and true(1) and true(2) and true(3) and not false(4) C1
  • 14. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Правда жизни • Стильно; • Модно; • Молодежно.
  • 16. cleverdata.ru | info@cleverdata.ru Перед тем, как смотреть на Hadoop
  • 18. cleverdata.ru | info@cleverdata.ru Материалы и инструменты Hardware (3 Nodes) • 12 Core AMD Opteron™ 6338P ~ 2.8 GHz • 64 GB RAM • 1 GBPS NICs Software • CDH 5.3.1 (Hadoop 2.5.0) • Spark 1.2.0 Data • 14.2 GB of raw data • 61.1 M of transactions • 128 MB block size
  • 19. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Время выполнения
  • 20. cleverdata.ru | info@cleverdata.ru Spark :: Exec-cores vs Num-execs
  • 21. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Инициализация MR protected void setup(Context ctx) o.a.h.c.Configured distributed cache Spark mapRegion broadcast vars
  • 22. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Параллелизм MR mapred.reduce.tasks mapreduce.job.reduces splittable formats Spark spark.default.parallelism num-executors, executor-cores in yarn numTasks в groupByKey, reduceByKey, aggregateByKey…
  • 23. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Зависимости MR o.a.h.u.Tool o.a.h.u.ToolRunner -conf app.conf -files -libjars setUserClassesTakesPrecedence Spark --jars --files --conf --driver-java-options spark.driver.extraJavaOptions spark.executor.extraJavaOptions spark.driver.userClassPathFirst spark.executor.userClassPathFirst
  • 24. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Secondary Sort MR setSortComparatorClass setGroupingComparatorClass setPartitionerClass Spark repartitionAndSortWithinPartitions mapPartitions Entire partition processing result must be able to fit in memory
  • 25. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Тестирование MR MRUnit o.a.h.h.MiniDFSCluster o.a.h.m.MiniMRCluster o.a.h.y.s.MiniYARNCluster o.a.h.m.v2.MiniMRYarnCluster Spark Local executor
  • 26. cleverdata.ru | info@cleverdata.ru Что дальше и почему Spark? • Spark Streaming; • Micro Batches; • λ-архитектура. без серьезного хирургического вмешательства
  • 28. info@cleverleaf.co.uk :: info@cleverdata.ru cleverleaf.co.uk :: cleverdata.ru 1dmp.io :: crawler.1dmp.io facebook.com/CleverData :: +7 (495) 967-66-50