SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
@ Rubyslava 2014
Michal Hariš : michal.haris@visualdna.com
- Technical Architect, joined VisualDNA in 2012
Where were we 3 years ago
●

10 people working around one mysql table holding 50M+ user profiles
Where were we 3 years ago
●

10 people working around one mysql table holding 50M+ user profiles

●

LAMP Architecture
SCALABILITY ISSUES
Where were we 3 years ago
●

10 people working around one mysql table holding 50M+ user profiles

●

LAMP Architecture
SCALABILITY ISSUES
DECISION TO GO BIG (DATA) !
Where were we 18 months ago
●

30 strong team, of that a single tech team of roughly 15 people

●

Basically a batch architecture
●
●
●
●
●
●

●

just not MySQL but CASSANDRA + HADOOP at the back
http+php trackers with piped custom log batch process
s3 upload every 5 min
daily hdfs distcp
POC = daily hadoop inference > 6 node cassandra -> batch integrations
POC was a daily batch job which on bad days took 30 hours

One of the first commercial Cassandra cluster in the world
● very unstable
Where are we today
● Stack
● Java
● Scala
● Hadoop
● Cassandra
● Kafka
● Redis
● R
● AngularJS for the front-end
Where are we today
●

Auto-scaling geo-located Tracker Clusters - well, almost auto-scaling

●

Robust Streaming Infrastructure - aggregation of all data streams in
central infrastructure
●

bringing in 8.5k events/ second at peak

●

●

Real-time end-user products, scoring services, integrations with third
parties where possible, pre-computation infrastructure that scales more
predictively
● These are primary events which get multiplied by various speed-layer
ETL Pipeline - offloading data streams and pre-computing materialised
views onto HDFS > 30TB of primary data

●

● some data we keep only last 60 or 90 days, others we keep for ever
Decision Analytics Pipeline (or RD Pipe) > 100TB+ of secondary data i
●

Using feature-extraction machine learning methods
Where are we today
●

Still one Cassandra ring, just bigger and more stable, 16 nodes, 250M+
active user profiles

●

Lambda Architecture for real-time products like WHY Analytics
●
●
●
●
●

RD Pipe is the "batch" layer (daily) that generates active profiles as a
cassandra ("view layer")
Primary Events are enriched for user profiles produced daily by the
Enrichment service ("speed layer")
Combination of probabilistic counters and Redis cubes calculates the
current audience profiles for subscribed websites ("speed layer")
API on top of the Redis cubes serves the current audience profiles for the
front end suite of real-time analytics products ("serving layer")
Audience Analytics product suite is the good looking bit - http://www.
visualdna.com/why/
Where are we today
● 120-strong team, of that tech is roughly 60:
●
●
●
●
●

Sysadmin Team
Architecture Tech Team
Decision Analytics Tech Team
Consumer Tech Team
WHY Analytics Team
What have we learned
●

Architecture:
●

Updating json blobs in Cassandra columns is a trap
● Logging is better http://engineering.linkedin.com/distributed-systems/log-what-everysoftware-engineer-should-know-about-real-time-datas-unifying

●

●

●

Metrics are crucial in large distributed systems
● yammer metrics + graphite + icinga works well for infrastructure
● but complex event/anomalies detection and pattern analysis gives the
edge
Real-Time processing of Data Streams is not only cool, but scales
well ... until you find a bottleneck in a single component which will limit the
entire system
Batch still matters
● but could be much faster than Hadoop which falls on too much
redundant I/O and requires a coordinated ETL pipeline
What have we learned
●

Engineering:
●

●

the unix philosophy of building short, simple, clear, modular, and
extendable code applies also to a design of distributed systems not
just an OS
bad tests are better than no tests but they are still bad and most tests
only test positive outcome
● the story of Math.abs() -> actually can return negative number ->
but none of the unit-tests anticipated this -> which is why metrics
and systems with feedback control are crucial

●

●
Process:
●

●

It is possible to co-operate remotely even on complex and not-well
defined systems - atm some of the architecture team is working remotely
on permanent basis
QA is intrinsic to Architecture and local to products
Interesting issues we’re facing
1. SLAs vs. Start-up dynamics - Separate process (and to some
degree architecture) for different levels of guarantee of service

2. Globally-distributed highly-available API for random
access to our profiles - enabling decisions based on VDNA profiles on-demand
3. Our Lambda has a bottleneck at the enrichment point

-

although if we solve (2.) we will be half-way through

4. Complex data pooling attribution model
5. Cassandra still gives us some pain - it's the drivers! - interesting
about consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra/

6. Preserving start-up dynamics and culture in a company
of 200+ with offices in several cities
We’re hiring for Bratislava office!
● We’re looking for engineers and analysts and
more to be based in Bratislava

careers-cee@visualdna.com

Mais conteúdo relacionado

Mais procurados

Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScyllaDB
 
Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.DrupalSib
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysisMariaDB plc
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...ScyllaDB
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQLMariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBMariaDB plc
 
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...ScyllaDB
 
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on CassandraCassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on CassandraAnant Corporation
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedInkbajda
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScyllaDB
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@GrabShubham Tagra
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle managementdatamantra
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLMariaDB plc
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesRaghavendra Prabhu
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate Ido Shilon
 
CCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDBCCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDBMariaDB plc
 

Mais procurados (20)

Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
 
Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.Pavel Prischepa. Fffast Drupal backend.
Pavel Prischepa. Fffast Drupal backend.
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDB
 
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on CassandraCassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
CCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDBCCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDB
 

Destaque

Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...Ruth Cheesley
 
SST 2014; The Reluctant SME
SST 2014; The Reluctant SMESST 2014; The Reluctant SME
SST 2014; The Reluctant SMEElisa Sawyer
 
Business of Front-end Web Development
Business of Front-end Web DevelopmentBusiness of Front-end Web Development
Business of Front-end Web DevelopmentRachel Andrew
 
Web accessibiilty and Drupal
Web accessibiilty and DrupalWeb accessibiilty and Drupal
Web accessibiilty and DrupalThéodore Biadala
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingMark Hinkle
 
Título de experto en programación con tecnologías web
Título de experto en programación con tecnologías webTítulo de experto en programación con tecnologías web
Título de experto en programación con tecnologías webAlicantePHP
 
Rapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile IcelandRapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile IcelandMichele Ide-Smith
 
Datatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesDatatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesAndrew Fisher
 
Something from Nothing: Simple Ways to Look Sharp When Time is Short
Something from Nothing: Simple Ways to Look Sharp When Time is ShortSomething from Nothing: Simple Ways to Look Sharp When Time is Short
Something from Nothing: Simple Ways to Look Sharp When Time is Shortkenwtw
 
OpenID and decentralised social networks
OpenID and decentralised social networksOpenID and decentralised social networks
OpenID and decentralised social networksSimon Willison
 
Groovy & Grails eXchange 2012 - Building an e-commerce business with gr8 tec...
Groovy & Grails eXchange 2012 - Building an  e-commerce business with gr8 tec...Groovy & Grails eXchange 2012 - Building an  e-commerce business with gr8 tec...
Groovy & Grails eXchange 2012 - Building an e-commerce business with gr8 tec...Domingo Suarez Torres
 
UXD v. Analytics - WIAD13 Ann Arbor
UXD v. Analytics - WIAD13 Ann ArborUXD v. Analytics - WIAD13 Ann Arbor
UXD v. Analytics - WIAD13 Ann ArborChris Farnum
 
Alternative Design Workflows in a "PostPSD" Era
Alternative Design Workflows in a "PostPSD" EraAlternative Design Workflows in a "PostPSD" Era
Alternative Design Workflows in a "PostPSD" EraJeremy Fuksa
 
FSharp for Trading - CodeMesh 2013
FSharp for Trading - CodeMesh 2013FSharp for Trading - CodeMesh 2013
FSharp for Trading - CodeMesh 2013Phillip Trelford
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataOSCON Byrum
 
ReactJS maakt het web eenvoudig
ReactJS maakt het web eenvoudigReactJS maakt het web eenvoudig
ReactJS maakt het web eenvoudigRick Beerendonk
 
Big Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at EtsyBig Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at EtsyJason Davis
 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scalashinolajla
 
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureSCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureMatt Ray
 

Destaque (20)

Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
Microdata, Authorship, Google+ and Joomla! - Ruth Cheesley - Joomla! World Co...
 
SST 2014; The Reluctant SME
SST 2014; The Reluctant SMESST 2014; The Reluctant SME
SST 2014; The Reluctant SME
 
Business of Front-end Web Development
Business of Front-end Web DevelopmentBusiness of Front-end Web Development
Business of Front-end Web Development
 
Web accessibiilty and Drupal
Web accessibiilty and DrupalWeb accessibiilty and Drupal
Web accessibiilty and Drupal
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
 
Título de experto en programación con tecnologías web
Título de experto en programación con tecnologías webTítulo de experto en programación con tecnologías web
Título de experto en programación con tecnologías web
 
Rapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile IcelandRapid Product Design in the Wild - Agile Iceland
Rapid Product Design in the Wild - Agile Iceland
 
Datatium - radiation free responsive experiences
Datatium - radiation free responsive experiencesDatatium - radiation free responsive experiences
Datatium - radiation free responsive experiences
 
Something from Nothing: Simple Ways to Look Sharp When Time is Short
Something from Nothing: Simple Ways to Look Sharp When Time is ShortSomething from Nothing: Simple Ways to Look Sharp When Time is Short
Something from Nothing: Simple Ways to Look Sharp When Time is Short
 
OpenID and decentralised social networks
OpenID and decentralised social networksOpenID and decentralised social networks
OpenID and decentralised social networks
 
Groovy & Grails eXchange 2012 - Building an e-commerce business with gr8 tec...
Groovy & Grails eXchange 2012 - Building an  e-commerce business with gr8 tec...Groovy & Grails eXchange 2012 - Building an  e-commerce business with gr8 tec...
Groovy & Grails eXchange 2012 - Building an e-commerce business with gr8 tec...
 
Rails traps
Rails trapsRails traps
Rails traps
 
UXD v. Analytics - WIAD13 Ann Arbor
UXD v. Analytics - WIAD13 Ann ArborUXD v. Analytics - WIAD13 Ann Arbor
UXD v. Analytics - WIAD13 Ann Arbor
 
Alternative Design Workflows in a "PostPSD" Era
Alternative Design Workflows in a "PostPSD" EraAlternative Design Workflows in a "PostPSD" Era
Alternative Design Workflows in a "PostPSD" Era
 
FSharp for Trading - CodeMesh 2013
FSharp for Trading - CodeMesh 2013FSharp for Trading - CodeMesh 2013
FSharp for Trading - CodeMesh 2013
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
ReactJS maakt het web eenvoudig
ReactJS maakt het web eenvoudigReactJS maakt het web eenvoudig
ReactJS maakt het web eenvoudig
 
Big Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at EtsyBig Data, Big Changes: Data-Driven Product Development at Etsy
Big Data, Big Changes: Data-Driven Product Development at Etsy
 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scala
 
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud InfrastructureSCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
SCALE12X Build a Cloud Day: Chef: The Swiss Army Knife of Cloud Infrastructure
 

Semelhante a About VisualDNA Architecture @ Rubyslava 2014

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 
CDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteCDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteDivante
 
CDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteCDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteDivante
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...DevOps Enterprise Summmit
 
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsDOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsGene Kim
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?Inside Analysis
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital EnablementJoshua Gossett
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservicesTransferWiseSG
 
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)John Schneider
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoopgluent.
 

Semelhante a About VisualDNA Architecture @ Rubyslava 2014 (20)

PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
CDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteCDP.pl - tech case study by Divante
CDP.pl - tech case study by Divante
 
CDP.pl - tech case study by Divante
CDP.pl - tech case study by DivanteCDP.pl - tech case study by Divante
CDP.pl - tech case study by Divante
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
DOES14 - David Ashman, Blackboard Learn - Keep Your Head in the Clouds Tuesda...
 
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the CloudsDOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
DOES14 - David Ashman - Blackboard Learn - Keep Your Head in the Clouds
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
 
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
DevOps at Obama for America(2012) and the DNC (DevOps Days NYC Jan 2013)
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 

Último

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

About VisualDNA Architecture @ Rubyslava 2014

  • 1. @ Rubyslava 2014 Michal Hariš : michal.haris@visualdna.com - Technical Architect, joined VisualDNA in 2012
  • 2. Where were we 3 years ago ● 10 people working around one mysql table holding 50M+ user profiles
  • 3. Where were we 3 years ago ● 10 people working around one mysql table holding 50M+ user profiles ● LAMP Architecture SCALABILITY ISSUES
  • 4. Where were we 3 years ago ● 10 people working around one mysql table holding 50M+ user profiles ● LAMP Architecture SCALABILITY ISSUES DECISION TO GO BIG (DATA) !
  • 5. Where were we 18 months ago ● 30 strong team, of that a single tech team of roughly 15 people ● Basically a batch architecture ● ● ● ● ● ● ● just not MySQL but CASSANDRA + HADOOP at the back http+php trackers with piped custom log batch process s3 upload every 5 min daily hdfs distcp POC = daily hadoop inference > 6 node cassandra -> batch integrations POC was a daily batch job which on bad days took 30 hours One of the first commercial Cassandra cluster in the world ● very unstable
  • 6. Where are we today ● Stack ● Java ● Scala ● Hadoop ● Cassandra ● Kafka ● Redis ● R ● AngularJS for the front-end
  • 7. Where are we today ● Auto-scaling geo-located Tracker Clusters - well, almost auto-scaling ● Robust Streaming Infrastructure - aggregation of all data streams in central infrastructure ● bringing in 8.5k events/ second at peak ● ● Real-time end-user products, scoring services, integrations with third parties where possible, pre-computation infrastructure that scales more predictively ● These are primary events which get multiplied by various speed-layer ETL Pipeline - offloading data streams and pre-computing materialised views onto HDFS > 30TB of primary data ● ● some data we keep only last 60 or 90 days, others we keep for ever Decision Analytics Pipeline (or RD Pipe) > 100TB+ of secondary data i ● Using feature-extraction machine learning methods
  • 8. Where are we today ● Still one Cassandra ring, just bigger and more stable, 16 nodes, 250M+ active user profiles ● Lambda Architecture for real-time products like WHY Analytics ● ● ● ● ● RD Pipe is the "batch" layer (daily) that generates active profiles as a cassandra ("view layer") Primary Events are enriched for user profiles produced daily by the Enrichment service ("speed layer") Combination of probabilistic counters and Redis cubes calculates the current audience profiles for subscribed websites ("speed layer") API on top of the Redis cubes serves the current audience profiles for the front end suite of real-time analytics products ("serving layer") Audience Analytics product suite is the good looking bit - http://www. visualdna.com/why/
  • 9. Where are we today ● 120-strong team, of that tech is roughly 60: ● ● ● ● ● Sysadmin Team Architecture Tech Team Decision Analytics Tech Team Consumer Tech Team WHY Analytics Team
  • 10. What have we learned ● Architecture: ● Updating json blobs in Cassandra columns is a trap ● Logging is better http://engineering.linkedin.com/distributed-systems/log-what-everysoftware-engineer-should-know-about-real-time-datas-unifying ● ● ● Metrics are crucial in large distributed systems ● yammer metrics + graphite + icinga works well for infrastructure ● but complex event/anomalies detection and pattern analysis gives the edge Real-Time processing of Data Streams is not only cool, but scales well ... until you find a bottleneck in a single component which will limit the entire system Batch still matters ● but could be much faster than Hadoop which falls on too much redundant I/O and requires a coordinated ETL pipeline
  • 11. What have we learned ● Engineering: ● ● the unix philosophy of building short, simple, clear, modular, and extendable code applies also to a design of distributed systems not just an OS bad tests are better than no tests but they are still bad and most tests only test positive outcome ● the story of Math.abs() -> actually can return negative number -> but none of the unit-tests anticipated this -> which is why metrics and systems with feedback control are crucial ● ● Process: ● ● It is possible to co-operate remotely even on complex and not-well defined systems - atm some of the architecture team is working remotely on permanent basis QA is intrinsic to Architecture and local to products
  • 12. Interesting issues we’re facing 1. SLAs vs. Start-up dynamics - Separate process (and to some degree architecture) for different levels of guarantee of service 2. Globally-distributed highly-available API for random access to our profiles - enabling decisions based on VDNA profiles on-demand 3. Our Lambda has a bottleneck at the enrichment point - although if we solve (2.) we will be half-way through 4. Complex data pooling attribution model 5. Cassandra still gives us some pain - it's the drivers! - interesting about consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra/ 6. Preserving start-up dynamics and culture in a company of 200+ with offices in several cities
  • 13. We’re hiring for Bratislava office! ● We’re looking for engineers and analysts and more to be based in Bratislava careers-cee@visualdna.com