SlideShare uma empresa Scribd logo
1 de 74
Baixar para ler offline
Tweaking performance
on high-load projects
Dmitriy Dumanskiy
Cogniance, mGage project
Java Team Lead
Project evolution
Mgage
Mobclix
XXXX
Mgage delivery load
3 billions req/mon.
~8 c3.xLarge Amazon instances.
Average load : 2400 req/sec
Peak : x10
Mobclix delivery load
14 billions req/mon.
~16 c3.xLarge Amazon
instances.
Average load : 6000 req/sec
Peak : x6
XXXX delivery Load
20 billions req/mon.
~14 c3.xLarge Amazon instances.
Average load : 11000 req/sec
Peak : x6
Is it a lot?
Average load : 11000 req/sec
Twitter : new tweets
15 billions a month
Average load : 5700 req/sec
Peak : x30
Delivery load
Requests per
month
Max load
per
instance,
req/sec
Requirements
Servers,
AWS c3.
xLarge
Mgage 3 billions 300
HTTP
Time 95% < 60ms
8
Mobclix 14 billions 400
HTTP
Time 95% < 100ms
16
XXXX 20 billions 800
HTTPS
Time 99% < 100ms
14
Delivery load
c3.XLarge - 4 vCPU, 2.8 GHz Intel Xeon E5-2680
LA - ~2-3
1-2 cores reserved for sudden peaks
BE tech stacks
Mobclix :
Spring, iBatis, MySql, Solr, Vertica, Cascading, Tomcat
Mgage :
Spring, Hibernate, Postgres, Distributed ehCache, Hadoop, Voldemort, Jboss
XXXX:
Spring, Hibernate, MySQL, Solr, Cascading, Redis, Tomcat
Initial problem
● ~1000 req/sec
● Peaks 6x
● 99% HTTPS with response time < 100ms
Real problem
● ~85 mln active users, ~115 mln registered users
● 11.5 messages per user per day
● ~11000 req/sec
● Peaks 6x
● 99% HTTPS with response time < 100ms
● Reliable and scalable for future grow up to 80k
Architecture
AdServer Console (UI)
Reporting
Architecture
Console (UI)
MySql
SOLR Master
SOLR Slave SOLR SlaveSOLR Slave
SOLR? Why?
● Pros:
○ Quick search on complex queries
○ Has a lot of build-in features (master-
slave replication, RDBMS integration)
● Cons:
○ Only HTTP, embedded performs worth
○ Not easy for beginners
○ Max load is ~100 req/sec per instance
“Simple” query
"-(-connectionTypes:"+"""+getConnectionType()+"""+" AND connectionTypes:[* TO
*]) AND "+"-connectionTypeExcludes:"+"""+getConnectionType()+"""+" AND " + "-(-
OSes:"+"(""+osQuery+"" OR ""+getOS()+"")"+" AND OSes:[* TO *]) AND " + "-
osExcludes:"+"(""+osQuery+"" OR ""+getOS()+"")" "AND (runOfNetwork:T OR
appIncludes:"+getAppId()+" OR pubIncludes:"+getPubId()+" OR categories:
("+categoryList+"))" +" AND -appExcludes:"+getAppId()+" AND -pubExcludes:"
+getPubId()+" AND -categoryExcludes:("+categoryList+") AND " + keywordQuery+" AND
" + "-(-devices:"+"""+getHandsetNormalized()+"""+" AND devices:[* TO *]) AND " +
"-deviceExcludes:"+"""+getHandsetNormalized()+"""+" AND " + "-(-carriers:"+"""
+getCarrier()+"""+" AND carriers:[* TO *]) AND " + "-carrierExcludes:"+"""
+getCarrier()+"""+" AND " + "-(-locales:"+"(""+locale+"" OR ""+langOnly+"")"
+" AND locales:[* TO *]) AND " + "-localeExcludes:"+"(""+locale+"" OR ""
+langOnly+"") AND " + "-(-segments:("+segmentQuery+") AND segments:[* TO *]) AND
" + "-segmentExcludes:("+segmentQuery+")" + " AND -(-geos:"+geoQuery+" AND geos:[*
TO *]) AND " + "-geosExcludes:"+geoQuery
Architecture
MySql
Solr Master
SOLR Slave
AdServer
SOLR Slave
AdServer
SOLR Slave
AdServer
No-SQL
AdServer - Solr Slave
Delivery:
volitile DeliveryData cache;
Cron Job:
DeliveryData tempCache = loadData();
cache = tempCache;
Why no-sql?
● Realtime data
● Quick response time
● Simple queries by key
● 1-2 queries to no-sql on every request. Average load
10-20k req/sec and >120k req/sec in peaks.
● Cheap solution
Why Redis? Pros
● Easy and light-weight
● Low latency and response time.
99% is < 1ms. Average latency is ~0.2ms
● Up to 100k 'get' commands per second on
c1.X-Large
● Cool features (atomic increments, sets,
hashes)
● Ready AWS service — ElastiCache
Why Redis? Cons
● Single-threaded from the box
● Utilize all cores - sharding/clustering
● Scaling/failover not easy
● Limited up to max instance memory (240GB largest
AWS)
● Persistence/swapping may delay response
● Cluster solution not production ready
DynamoDB vs Redis
Price per month Put, 95% Get, 95% Rec/sec
DynamoDB 58$ 300ms 150ms 50
DynamoDB 580$ 60ms 8ms 780
DynamoDB 5800$ 16ms 8ms 1250
Redis 200$ (c1.medium) 3ms <1ms 4000
ElastiCache 600$ (c1.xlarge) <1ms <1ms 10000
What about others?
● Cassandra
● Voldemort
● Memcached
Redis RAM problem
● 1 user entry ~ from 80 bytes to 3kb
● ~85 mln users
● Required RAM ~ from 1 GB to 300 GB
Data compression speed
Data compression size
Data compression
Json → Kryo binary → 4x times less data →
Gzipping → 2x times less data == 8x less data
Now we need < 40 GB
+ Less load on network stack
AdServer BE
Average response time — ~1.2 ms
Load — 800 req/sec with LA ~4
c3.XLarge == 4 vCPU
AdServer BE
● Logging — 12% of time (5% on SSD);
● Response generation — 15% of time;
● Redis request — 50% of time;
● All business logic — 23% of time;
Reporting
AdServer Hadoop ETL
MySQLConsole
S3 S3
Delivery logs Aggregated logs
Log structure
{ "uid":"test",
"platform":"android",
"app":"xxx",
"ts":1375952275223,
"pid":1,
"education":"Some-Highschool-or-less",
"type":"new",
"sh":1280,
"appver":"6.4.34",
"country":"AU",
"time":"Sat, 03 August 2013 10:30:39 +0200",
"deviceGroup":7,
"rid":"fc389d966438478e9554ed15d27713f51",
"responseCode":200,
"event":"ad",
"device":"N95",
"sw":768,
"ageGroup":"18-24",
"preferences":["beer","girls"] }
Log structure
● 1 mln. records == 0.6 GB.
● ~900 mln records a day == ~0.55 TB.
● 1 month up to 20 TB of data.
● Zipped data is 10 times less.
Reporting
Customer : “And we need fancy reporting”
But 20 TB of data per month is huge. So what
we can do?
Reporting
Dimensions:
device, os, osVer, sreenWidth, screenHeight,
country, region, city, carrier, advertisingId,
preferences, gender, age, income, sector,
company, language, etc...
Use case:
I want to know how many users saw my ad in San-
Francisco.
Reporting
Geo table:
Country, City, Region, CampaignId, Date, counters;
Device table:
Device, Carrier, Platform, CampaignId, Date, counters;
Uniques table:
CampaignId, UID
Predefined report types → aggregation by
predefined dimensions → 500-1000 times less
data
20 TB per month → 40 GB per month
Of course - hadoop
● Pros:
○ Unlimited (depends) horizontal scaling
● Cons:
○ Not real-time
○ Processing time directly depends on quality code
and on infrastructure cost.
○ Not all input can be scaled
○ Cluster startup is so... long
Alternatives?
● Storm
● Redshift
● Vertica
● Math models?
Elastic MapReduce
● Easy setup
● Easy extend
● Easy to monitor
Timing
● Hadoop (cascading) :
○ 25 GB in peak hour takes ~40min (-10 min). CSV
output 300MB. With cluster of 4 c3.xLarge.
● MySQL:
○ Put 300MB in DB with insert statements ~40 min.
Timing
● Hadoop (cascading) :
○ 25 GB in peak hour takes ~40min (-10 min). CSV
output 300MB. With cluster of 4 c3.xLarge.
● MySQL:
○ Put 300MB in DB with insert statements ~40 min.
● MySQL:
○ Put 300MB in DB with optimizations ~5 min.
Optimized are
● No “insert into”. Only “load data” - ~10 times faster
● “ENGINE=MyISAM“ vs “INNODB” when possible - ~5
times faster
● For “upsert” - temp table with “ENGINE=MEMORY” - IO
savings
Cascading
Hadoop:
void map(K key, V val,
OutputCollector collector) {
...
}
void reduce(K key, Iterator<V> vals,
OutputCollector collector) {
...
}
Cascading:
Scheme sinkScheme = new TextLine(new Fields(
"word", "count"));
Pipe assembly = new Pipe("wordcount");
assembly = new Each(assembly, new Fields( "line"
), new RegexGenerator(new Fields("word"), ",") );
assembly = new GroupBy(assembly, new Fields(
"word"));
Aggregator count = new Count(new Fields(
"count"));
assembly = new Every(assembly, count);
Why cascading?
Hadoop Job 1
Hadoop Job 2
Hadoop Job 3
Result of one job should be processed by another job
Lessons Learned
Redis sharding
Redis shard 0 Redis shard 1 Redis shard 2
AdServer
shardNumber = UID.hashCode() / 3
Resharding problem
All data already in shards, how to add new shards?
Resharding problem. Solution
Old Shard NewShard
1. Get NEW UID. If not
present - a).
AdServer
a) Get OLD UID 2. Save UID to new
Shard
Removal script
Postgres partitioning
● Queries on small partitions
● Distributed index
● Less index size
● Small partitions may fit RAM memory
● Easy to remove/move
Cost of IO
L1 cache 3 cycles
L2 cache 14 cycles
RAM 250 cycles
Disk 41 000 000 cycles
Network 240 000 000 cycles
Cost of IO
@Cacheable is everywhere
Hadoop
Map input : 300 MB
Map output : 80 GB
Hadoop
● mapreduce.map.output.compress = true
● codecs: GZip, BZ2 - CPU intensive
● codecs: LZO, Snappy
● codecs: JNI
~x10
Hadoop
Consider Combiner
Hadoop
Text, IntWritable, BytesWritable, NullWritable,
etc
Simpler - better
Hadoop
Missing data:
map(T value, ...) {
Log log = parse(value);
Data data = dbWrapper.getSomeMissingData(log.getCampId());
}
Hadoop
Missing data:
map(T value, ...) {
Log log = parse(value);
Data data = dbWrapper.getSomeMissingData(log.getCampId());
}
Wrong
Hadoop
Unnecessary data:
map(T value, ...) {
Log log = parse(value);
Key resultKey = makeKey(log.getCampName(), ...);
output.collect(resultKey, resultValue);
}
Hadoop
Unnecessary data:
map(T value, ...) {
Log log = parse(value);
Key resultKey = makeKey(log.getCampName(), ...);
output.collect(resultKey, resultValue);
}
Wrong
Hadoop
Unnecessary data:
RecordWriter.write(K key, V value) {
Entity entity = makeEntity(key, value);
dbWrapper.save(entity);
}
Hadoop
Unnecessary data:
RecordWriter.write(K key, V value) {
Entity entity = makeEntity(key, value);
dbWrapper.save(entity);
}
Wrong
Hadoop
public boolean equals(Object obj) {
EqualsBuilder equalsBuilder = new EqualsBuilder();
equalsBuilder.append(id, otherKey.getId());
...
}
public int hashCode() {
HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();
hashCodeBuilder.append(id);
...
}
Hadoop
public boolean equals(Object obj) {
EqualsBuilder equalsBuilder = new EqualsBuilder();
equalsBuilder.append(id, otherKey.getId());
...
}
public int hashCode() {
HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();
hashCodeBuilder.append(id);
...
}
Wrong
Hadoop
public void map(...) {
…
for (String word : words) {
output.collect(new Text(word), new IntVal(1));
}
}
Hadoop
public void map(...) {
…
for (String word : words) {
output.collect(new Text(word), new IntVal(1));
}
}
Wrong
Hadoop
class MyMapper extends Mapper {
Text word = new Text();
IntVal one = new IntVal(1);
public void map(...) {
for (String word : words) {
word.set(word);
output.collect(word, one);
}
}
}
Network
Per 1 AdServer instance :
Income traffic : ~100Mb/sec
Outcome traffic : ~50Mb/sec
LB all traffic :
Almost 10 Gb/sec
Amazon
AWS ElastiCache
SLOWLOG GET
1) 1) (integer) 35
2) (integer) 1391709950
3) (integer) 34155
4) 1) "GET"
2) "2ads10percent_rmywqesssitmfksetzvj"
2) 1) (integer) 34
2) (integer) 1391709830
3) (integer) 34863
4) 1) "GET"
2) "2ads10percent_tteeoomiimcgdzcocuqs"
AWS ElastiCache
35ms for GET? WTF?
Even java faster
AWS ElastiCache
● Strange timeouts (with SO_TIMEOUT 50ms)
● No replication for another cluster
● «Cluster» is not a cluster
● Cluster uses usual instances, so pay for 4
cores while using 1
AWS Limits. You never know where
● Network limit
● PPS rate limit
● LB limit
● Cluster start time up to 20 mins
● Scalability limits
● S3 is slow for many files
Facts
● HTTP x2 faster HTTPS
● HTTPS keep-alive +80% performance
● Java 7 40% faster Java 6 (our case)
● All IO operations minimized

Mais conteúdo relacionado

Mais procurados

ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...Altinity Ltd
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
MongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMaxim Ligus
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleEric Lubow
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday LogsJ On The Beach
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyScyllaDB
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingJason Terpko
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesScyllaDB
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidJan Graßegger
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...DataStax
 
Lightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingLightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingMongoDB
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentMongoDB
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevAltinity Ltd
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017HBaseCon
 

Mais procurados (20)

ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
MongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of Events
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and Merging
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
 
Lightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingLightning Talk: MongoDB Sharding
Lightning Talk: MongoDB Sharding
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 

Destaque

Стартапы в AI&BigData_Виталий Гончарук
Стартапы в AI&BigData_Виталий ГончарукСтартапы в AI&BigData_Виталий Гончарук
Стартапы в AI&BigData_Виталий ГончарукGeeksLab Odessa
 
Моделирование структурными уравнениями_Алексей Гаевский
Моделирование структурными уравнениями_Алексей ГаевскийМоделирование структурными уравнениями_Алексей Гаевский
Моделирование структурными уравнениями_Алексей ГаевскийGeeksLab Odessa
 
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий..."Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...GeeksLab Odessa
 
"AI&Big Data для путешественников"_Кузнецов Юрий
"AI&Big Data для путешественников"_Кузнецов Юрий "AI&Big Data для путешественников"_Кузнецов Юрий
"AI&Big Data для путешественников"_Кузнецов Юрий GeeksLab Odessa
 
Всеволод Демкин "Natural language processing на практике"
Всеволод Демкин "Natural language processing на практике"Всеволод Демкин "Natural language processing на практике"
Всеволод Демкин "Natural language processing на практике"GeeksLab Odessa
 
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...GeeksLab Odessa
 
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
Deep learning: Cложный анализ данных простыми словами_Сергей ШелпукDeep learning: Cложный анализ данных простыми словами_Сергей Шелпук
Deep learning: Cложный анализ данных простыми словами_Сергей ШелпукGeeksLab Odessa
 
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...GeeksLab Odessa
 
Презентация Ukraine Global Scholars
Презентация Ukraine Global Scholars Презентация Ukraine Global Scholars
Презентация Ukraine Global Scholars Misha Lemesh
 
освіта калуш New.pptx
освіта калуш New.pptxосвіта калуш New.pptx
освіта калуш New.pptxMiroslav Kussen
 

Destaque (10)

Стартапы в AI&BigData_Виталий Гончарук
Стартапы в AI&BigData_Виталий ГончарукСтартапы в AI&BigData_Виталий Гончарук
Стартапы в AI&BigData_Виталий Гончарук
 
Моделирование структурными уравнениями_Алексей Гаевский
Моделирование структурными уравнениями_Алексей ГаевскийМоделирование структурными уравнениями_Алексей Гаевский
Моделирование структурными уравнениями_Алексей Гаевский
 
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий..."Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
 
"AI&Big Data для путешественников"_Кузнецов Юрий
"AI&Big Data для путешественников"_Кузнецов Юрий "AI&Big Data для путешественников"_Кузнецов Юрий
"AI&Big Data для путешественников"_Кузнецов Юрий
 
Всеволод Демкин "Natural language processing на практике"
Всеволод Демкин "Natural language processing на практике"Всеволод Демкин "Natural language processing на практике"
Всеволод Демкин "Natural language processing на практике"
 
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
 
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
Deep learning: Cложный анализ данных простыми словами_Сергей ШелпукDeep learning: Cложный анализ данных простыми словами_Сергей Шелпук
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
 
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
 
Презентация Ukraine Global Scholars
Презентация Ukraine Global Scholars Презентация Ukraine Global Scholars
Презентация Ukraine Global Scholars
 
освіта калуш New.pptx
освіта калуш New.pptxосвіта калуш New.pptx
освіта калуш New.pptx
 

Semelhante a Tweaking perfomance on high-load projects_Думанский Дмитрий

Handling 20 billion requests a month
Handling 20 billion requests a monthHandling 20 billion requests a month
Handling 20 billion requests a monthDmitriy Dumanskiy
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projectsDmitriy Dumanskiy
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Amazon Web Services
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdfssuser3fb50b
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbagGordon Chung
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverAlex Pinkin
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedis Labs
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 

Semelhante a Tweaking perfomance on high-load projects_Думанский Дмитрий (20)

Handling 20 billion requests a month
Handling 20 billion requests a monthHandling 20 billion requests a month
Handling 20 billion requests a month
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 

Mais de GeeksLab Odessa

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...GeeksLab Odessa
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...GeeksLab Odessa
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторGeeksLab Odessa
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеGeeksLab Odessa
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладGeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладGeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладGeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...GeeksLab Odessa
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...GeeksLab Odessa
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...GeeksLab Odessa
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...GeeksLab Odessa
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...GeeksLab Odessa
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...GeeksLab Odessa
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...GeeksLab Odessa
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js GeeksLab Odessa
 

Mais de GeeksLab Odessa (20)

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
 

Último

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Availablegargpaaro
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 

Último (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 

Tweaking perfomance on high-load projects_Думанский Дмитрий

  • 2. Dmitriy Dumanskiy Cogniance, mGage project Java Team Lead
  • 4. Mgage delivery load 3 billions req/mon. ~8 c3.xLarge Amazon instances. Average load : 2400 req/sec Peak : x10
  • 5. Mobclix delivery load 14 billions req/mon. ~16 c3.xLarge Amazon instances. Average load : 6000 req/sec Peak : x6
  • 6. XXXX delivery Load 20 billions req/mon. ~14 c3.xLarge Amazon instances. Average load : 11000 req/sec Peak : x6
  • 7. Is it a lot? Average load : 11000 req/sec
  • 8. Twitter : new tweets 15 billions a month Average load : 5700 req/sec Peak : x30
  • 9. Delivery load Requests per month Max load per instance, req/sec Requirements Servers, AWS c3. xLarge Mgage 3 billions 300 HTTP Time 95% < 60ms 8 Mobclix 14 billions 400 HTTP Time 95% < 100ms 16 XXXX 20 billions 800 HTTPS Time 99% < 100ms 14
  • 10. Delivery load c3.XLarge - 4 vCPU, 2.8 GHz Intel Xeon E5-2680 LA - ~2-3 1-2 cores reserved for sudden peaks
  • 11. BE tech stacks Mobclix : Spring, iBatis, MySql, Solr, Vertica, Cascading, Tomcat Mgage : Spring, Hibernate, Postgres, Distributed ehCache, Hadoop, Voldemort, Jboss XXXX: Spring, Hibernate, MySQL, Solr, Cascading, Redis, Tomcat
  • 12. Initial problem ● ~1000 req/sec ● Peaks 6x ● 99% HTTPS with response time < 100ms
  • 13. Real problem ● ~85 mln active users, ~115 mln registered users ● 11.5 messages per user per day ● ~11000 req/sec ● Peaks 6x ● 99% HTTPS with response time < 100ms ● Reliable and scalable for future grow up to 80k
  • 16. SOLR? Why? ● Pros: ○ Quick search on complex queries ○ Has a lot of build-in features (master- slave replication, RDBMS integration) ● Cons: ○ Only HTTP, embedded performs worth ○ Not easy for beginners ○ Max load is ~100 req/sec per instance
  • 17. “Simple” query "-(-connectionTypes:"+"""+getConnectionType()+"""+" AND connectionTypes:[* TO *]) AND "+"-connectionTypeExcludes:"+"""+getConnectionType()+"""+" AND " + "-(- OSes:"+"(""+osQuery+"" OR ""+getOS()+"")"+" AND OSes:[* TO *]) AND " + "- osExcludes:"+"(""+osQuery+"" OR ""+getOS()+"")" "AND (runOfNetwork:T OR appIncludes:"+getAppId()+" OR pubIncludes:"+getPubId()+" OR categories: ("+categoryList+"))" +" AND -appExcludes:"+getAppId()+" AND -pubExcludes:" +getPubId()+" AND -categoryExcludes:("+categoryList+") AND " + keywordQuery+" AND " + "-(-devices:"+"""+getHandsetNormalized()+"""+" AND devices:[* TO *]) AND " + "-deviceExcludes:"+"""+getHandsetNormalized()+"""+" AND " + "-(-carriers:"+""" +getCarrier()+"""+" AND carriers:[* TO *]) AND " + "-carrierExcludes:"+""" +getCarrier()+"""+" AND " + "-(-locales:"+"(""+locale+"" OR ""+langOnly+"")" +" AND locales:[* TO *]) AND " + "-localeExcludes:"+"(""+locale+"" OR "" +langOnly+"") AND " + "-(-segments:("+segmentQuery+") AND segments:[* TO *]) AND " + "-segmentExcludes:("+segmentQuery+")" + " AND -(-geos:"+geoQuery+" AND geos:[* TO *]) AND " + "-geosExcludes:"+geoQuery
  • 18. Architecture MySql Solr Master SOLR Slave AdServer SOLR Slave AdServer SOLR Slave AdServer No-SQL
  • 19. AdServer - Solr Slave Delivery: volitile DeliveryData cache; Cron Job: DeliveryData tempCache = loadData(); cache = tempCache;
  • 20. Why no-sql? ● Realtime data ● Quick response time ● Simple queries by key ● 1-2 queries to no-sql on every request. Average load 10-20k req/sec and >120k req/sec in peaks. ● Cheap solution
  • 21. Why Redis? Pros ● Easy and light-weight ● Low latency and response time. 99% is < 1ms. Average latency is ~0.2ms ● Up to 100k 'get' commands per second on c1.X-Large ● Cool features (atomic increments, sets, hashes) ● Ready AWS service — ElastiCache
  • 22. Why Redis? Cons ● Single-threaded from the box ● Utilize all cores - sharding/clustering ● Scaling/failover not easy ● Limited up to max instance memory (240GB largest AWS) ● Persistence/swapping may delay response ● Cluster solution not production ready
  • 23. DynamoDB vs Redis Price per month Put, 95% Get, 95% Rec/sec DynamoDB 58$ 300ms 150ms 50 DynamoDB 580$ 60ms 8ms 780 DynamoDB 5800$ 16ms 8ms 1250 Redis 200$ (c1.medium) 3ms <1ms 4000 ElastiCache 600$ (c1.xlarge) <1ms <1ms 10000
  • 24. What about others? ● Cassandra ● Voldemort ● Memcached
  • 25. Redis RAM problem ● 1 user entry ~ from 80 bytes to 3kb ● ~85 mln users ● Required RAM ~ from 1 GB to 300 GB
  • 28. Data compression Json → Kryo binary → 4x times less data → Gzipping → 2x times less data == 8x less data Now we need < 40 GB + Less load on network stack
  • 29. AdServer BE Average response time — ~1.2 ms Load — 800 req/sec with LA ~4 c3.XLarge == 4 vCPU
  • 30. AdServer BE ● Logging — 12% of time (5% on SSD); ● Response generation — 15% of time; ● Redis request — 50% of time; ● All business logic — 23% of time;
  • 31. Reporting AdServer Hadoop ETL MySQLConsole S3 S3 Delivery logs Aggregated logs
  • 32. Log structure { "uid":"test", "platform":"android", "app":"xxx", "ts":1375952275223, "pid":1, "education":"Some-Highschool-or-less", "type":"new", "sh":1280, "appver":"6.4.34", "country":"AU", "time":"Sat, 03 August 2013 10:30:39 +0200", "deviceGroup":7, "rid":"fc389d966438478e9554ed15d27713f51", "responseCode":200, "event":"ad", "device":"N95", "sw":768, "ageGroup":"18-24", "preferences":["beer","girls"] }
  • 33. Log structure ● 1 mln. records == 0.6 GB. ● ~900 mln records a day == ~0.55 TB. ● 1 month up to 20 TB of data. ● Zipped data is 10 times less.
  • 34. Reporting Customer : “And we need fancy reporting” But 20 TB of data per month is huge. So what we can do?
  • 35. Reporting Dimensions: device, os, osVer, sreenWidth, screenHeight, country, region, city, carrier, advertisingId, preferences, gender, age, income, sector, company, language, etc... Use case: I want to know how many users saw my ad in San- Francisco.
  • 36. Reporting Geo table: Country, City, Region, CampaignId, Date, counters; Device table: Device, Carrier, Platform, CampaignId, Date, counters; Uniques table: CampaignId, UID
  • 37. Predefined report types → aggregation by predefined dimensions → 500-1000 times less data 20 TB per month → 40 GB per month
  • 38. Of course - hadoop ● Pros: ○ Unlimited (depends) horizontal scaling ● Cons: ○ Not real-time ○ Processing time directly depends on quality code and on infrastructure cost. ○ Not all input can be scaled ○ Cluster startup is so... long
  • 39. Alternatives? ● Storm ● Redshift ● Vertica ● Math models?
  • 40. Elastic MapReduce ● Easy setup ● Easy extend ● Easy to monitor
  • 41. Timing ● Hadoop (cascading) : ○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge. ● MySQL: ○ Put 300MB in DB with insert statements ~40 min.
  • 42. Timing ● Hadoop (cascading) : ○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge. ● MySQL: ○ Put 300MB in DB with insert statements ~40 min. ● MySQL: ○ Put 300MB in DB with optimizations ~5 min.
  • 43. Optimized are ● No “insert into”. Only “load data” - ~10 times faster ● “ENGINE=MyISAM“ vs “INNODB” when possible - ~5 times faster ● For “upsert” - temp table with “ENGINE=MEMORY” - IO savings
  • 44. Cascading Hadoop: void map(K key, V val, OutputCollector collector) { ... } void reduce(K key, Iterator<V> vals, OutputCollector collector) { ... } Cascading: Scheme sinkScheme = new TextLine(new Fields( "word", "count")); Pipe assembly = new Pipe("wordcount"); assembly = new Each(assembly, new Fields( "line" ), new RegexGenerator(new Fields("word"), ",") ); assembly = new GroupBy(assembly, new Fields( "word")); Aggregator count = new Count(new Fields( "count")); assembly = new Every(assembly, count);
  • 45. Why cascading? Hadoop Job 1 Hadoop Job 2 Hadoop Job 3 Result of one job should be processed by another job
  • 47. Redis sharding Redis shard 0 Redis shard 1 Redis shard 2 AdServer shardNumber = UID.hashCode() / 3
  • 48. Resharding problem All data already in shards, how to add new shards?
  • 49. Resharding problem. Solution Old Shard NewShard 1. Get NEW UID. If not present - a). AdServer a) Get OLD UID 2. Save UID to new Shard Removal script
  • 50. Postgres partitioning ● Queries on small partitions ● Distributed index ● Less index size ● Small partitions may fit RAM memory ● Easy to remove/move
  • 51. Cost of IO L1 cache 3 cycles L2 cache 14 cycles RAM 250 cycles Disk 41 000 000 cycles Network 240 000 000 cycles
  • 52. Cost of IO @Cacheable is everywhere
  • 53. Hadoop Map input : 300 MB Map output : 80 GB
  • 54. Hadoop ● mapreduce.map.output.compress = true ● codecs: GZip, BZ2 - CPU intensive ● codecs: LZO, Snappy ● codecs: JNI ~x10
  • 56. Hadoop Text, IntWritable, BytesWritable, NullWritable, etc Simpler - better
  • 57. Hadoop Missing data: map(T value, ...) { Log log = parse(value); Data data = dbWrapper.getSomeMissingData(log.getCampId()); }
  • 58. Hadoop Missing data: map(T value, ...) { Log log = parse(value); Data data = dbWrapper.getSomeMissingData(log.getCampId()); } Wrong
  • 59. Hadoop Unnecessary data: map(T value, ...) { Log log = parse(value); Key resultKey = makeKey(log.getCampName(), ...); output.collect(resultKey, resultValue); }
  • 60. Hadoop Unnecessary data: map(T value, ...) { Log log = parse(value); Key resultKey = makeKey(log.getCampName(), ...); output.collect(resultKey, resultValue); } Wrong
  • 61. Hadoop Unnecessary data: RecordWriter.write(K key, V value) { Entity entity = makeEntity(key, value); dbWrapper.save(entity); }
  • 62. Hadoop Unnecessary data: RecordWriter.write(K key, V value) { Entity entity = makeEntity(key, value); dbWrapper.save(entity); } Wrong
  • 63. Hadoop public boolean equals(Object obj) { EqualsBuilder equalsBuilder = new EqualsBuilder(); equalsBuilder.append(id, otherKey.getId()); ... } public int hashCode() { HashCodeBuilder hashCodeBuilder = new HashCodeBuilder(); hashCodeBuilder.append(id); ... }
  • 64. Hadoop public boolean equals(Object obj) { EqualsBuilder equalsBuilder = new EqualsBuilder(); equalsBuilder.append(id, otherKey.getId()); ... } public int hashCode() { HashCodeBuilder hashCodeBuilder = new HashCodeBuilder(); hashCodeBuilder.append(id); ... } Wrong
  • 65. Hadoop public void map(...) { … for (String word : words) { output.collect(new Text(word), new IntVal(1)); } }
  • 66. Hadoop public void map(...) { … for (String word : words) { output.collect(new Text(word), new IntVal(1)); } } Wrong
  • 67. Hadoop class MyMapper extends Mapper { Text word = new Text(); IntVal one = new IntVal(1); public void map(...) { for (String word : words) { word.set(word); output.collect(word, one); } } }
  • 68. Network Per 1 AdServer instance : Income traffic : ~100Mb/sec Outcome traffic : ~50Mb/sec LB all traffic : Almost 10 Gb/sec
  • 70. AWS ElastiCache SLOWLOG GET 1) 1) (integer) 35 2) (integer) 1391709950 3) (integer) 34155 4) 1) "GET" 2) "2ads10percent_rmywqesssitmfksetzvj" 2) 1) (integer) 34 2) (integer) 1391709830 3) (integer) 34863 4) 1) "GET" 2) "2ads10percent_tteeoomiimcgdzcocuqs"
  • 71. AWS ElastiCache 35ms for GET? WTF? Even java faster
  • 72. AWS ElastiCache ● Strange timeouts (with SO_TIMEOUT 50ms) ● No replication for another cluster ● «Cluster» is not a cluster ● Cluster uses usual instances, so pay for 4 cores while using 1
  • 73. AWS Limits. You never know where ● Network limit ● PPS rate limit ● LB limit ● Cluster start time up to 20 mins ● Scalability limits ● S3 is slow for many files
  • 74. Facts ● HTTP x2 faster HTTPS ● HTTPS keep-alive +80% performance ● Java 7 40% faster Java 6 (our case) ● All IO operations minimized