SlideShare uma empresa Scribd logo
1 de 121
Baixar para ler offline
It Probably Works
Tyler McMullen
CTO of Fastly
@tbmcmullen
Fastly
We’re an awesome CDN.
What is a probabilistic
algorithm?
Why bother?
–Hal Abelson and Gerald J. Sussman, SICP
“In testing primality of very large numbers chosen at
random, the chance of stumbling upon a value that
fools the Fermat test is less than the chance that
cosmic radiation will cause the computer to make
an error in carrying out a 'correct' algorithm.
Considering an algorithm to be inadequate for the
first reason but not for the second illustrates the
difference between mathematics and engineering.”
Everything is
probabilistic
Probabilistic algorithms
are not “guessing”
Provably bounded
error rates
Don’t use them if
someone’s life depends
on it.
Don’t use them if
someone’s life depends
on it.
When should I use
these?
Ok, now what?
The Count-distinct
Problem
The Problem
How many unique words are in a large
corpus of text?
The Problem
How many different users visited a
popular website in a day?
The Problem
How many unique IPs have connected to a
server over the last hour?
The Problem
How many unique URLs have been requested
through an HTTP proxy?
The Problem
How many unique URLs have been requested
through an entire network of HTTP proxies?
166.208.249.236	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /product/982"	
  200	
  20246	
  
103.138.203.165	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /article/490"	
  200	
  29870	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /page/1"	
  200	
  20409	
  
150.255.232.154	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /page/1"	
  200	
  42999	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /article/490"	
  200	
  25080	
  
150.255.232.154	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /listing/567"	
  200	
  33617	
  
103.138.203.165	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /listing/567"	
  200	
  29618	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /page/1"	
  200	
  30265	
  
166.208.249.236	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /page/1"	
  200	
  5683	
  
244.210.202.222	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /article/490"	
  200	
  47124	
  
103.138.203.165	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /listing/567"	
  200	
  48734	
  
103.138.203.165	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /listing/567"	
  200	
  27392	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /listing/567"	
  200	
  15705	
  
150.255.232.154	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /page/1"	
  200	
  22587	
  
244.210.202.222	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /product/982"	
  200	
  30063	
  
244.210.202.222	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /page/1"	
  200	
  6041	
  
166.208.249.236	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /product/982"	
  200	
  25783	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /article/490"	
  200	
  1099	
  
244.210.202.222	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /product/982"	
  200	
  31494	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /listing/567"	
  200	
  30389	
  
150.255.232.154	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /article/490"	
  200	
  10251	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /product/982"	
  200	
  19384	
  
150.255.232.154	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /product/982"	
  200	
  24062	
  
244.210.202.222	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /article/490"	
  200	
  19070	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /page/648"	
  200	
  45159	
  
191.141.247.227	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "HEAD	
  /page/648"	
  200	
  5576	
  
166.208.249.236	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /page/648"	
  200	
  41869	
  
166.208.249.236	
  -­‐	
  -­‐	
  [25/Feb/2015:07:20:13	
  +0000]	
  "GET	
  /listing/567"	
  200	
  42414	
  
def	
  count_distinct(stream):	
  
	
  	
  	
  	
  seen	
  =	
  set()	
  
	
  	
  	
  	
  for	
  item	
  in	
  stream:	
  
	
  	
  	
  	
  	
  	
  	
  	
  seen.add(item)	
  
	
  	
  	
  	
  return	
  len(seen)
Scale.
Count-distinct across
thousands of servers.
def	
  combined_cardinality(seen_sets):	
  
	
  	
  	
  	
  combined	
  =	
  set()	
  
	
  	
  	
  	
  for	
  seen	
  in	
  seen_sets:	
  
	
  	
  	
  	
  	
  	
  	
  	
  combined	
  |=	
  seen	
  
	
  	
  	
  	
  return	
  len(combined)
The set grows linearly.
Precision comes at a cost.
“example.com/user/page/1”
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
“example.com/user/page/1”
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
P(bit	
  0	
  is	
  set)	
  =	
  0.5
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
P(bit	
  0	
  is	
  set	
  
&	
  bit	
  1	
  is	
  set)	
  =	
  0.25
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
P(bit	
  0	
  is	
  set	
  	
  
&	
  bit	
  1	
  is	
  set	
  
&	
  bit	
  2	
  is	
  set)	
  =	
  0.125
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
P(bit	
  0	
  is	
  set)	
  =	
  0.5	
  
Expected	
  trials	
  =	
  2
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
P(bit	
  0	
  is	
  set	
  
&	
  bit	
  1	
  is	
  set)	
  =	
  0.25	
  
Expected	
  trials	
  =	
  4
1	
  0	
  1	
  1	
  0	
  1	
  0	
  1
P(bit	
  0	
  is	
  set	
  	
  
&	
  bit	
  1	
  is	
  set	
  
&	
  bit	
  2	
  is	
  set)	
  =	
  0.125	
  
Expected	
  trials	
  =	
  8
We expect the maximum number of leading
zeros we have seen + 1 to approximate
log2(unique items).
Improve the accuracy of
the estimate by
partitioning the input data.
class	
  LogLog(object):	
  
	
  	
  	
  	
  def	
  __init__(self,	
  k):	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.k	
  =	
  k	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.m	
  =	
  2	
  **	
  k	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.M	
  =	
  np.zeros(self.m,	
  dtype=np.int)	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.alpha	
  =	
  Alpha[k]	
  
	
  	
  
	
  	
  	
  	
  def	
  insert(self,	
  token):	
  
	
  	
  	
  	
  	
  	
  	
  	
  y	
  =	
  hash_fn(token)	
  
	
  	
  	
  	
  	
  	
  	
  	
  j	
  =	
  y	
  >>	
  (hash_len	
  -­‐	
  self.k)	
  
	
  	
  	
  	
  	
  	
  	
  	
  remaining	
  =	
  y	
  &	
  ((1	
  <<	
  (hash_len	
  -­‐	
  self.k))	
  -­‐	
  1)	
  
	
  	
  	
  	
  	
  	
  	
  	
  first_set_bit	
  =	
  (64	
  -­‐	
  self.k)	
  -­‐	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  int(math.log(remaining,	
  2))	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.M[j]	
  =	
  max(self.M[j],	
  first_set_bit)	
  
	
  	
  
	
  	
  	
  	
  def	
  cardinality(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  self.alpha	
  *	
  2	
  **	
  np.mean(self.M)
Unions of
HyperLogLogs
HyperLogLog
• Adding an item: O(1)
• Retrieving cardinality: O(1)
• Space: O(log log n)
• Error rate: 2%
HyperLogLog
For 100 million unique items,
and an error rate of 2%,
the size of the HyperLogLog is...
1,500 bytes
Reliable Broadcast
The Problem
Reliably broadcast “purge” messages across the world
as quickly as possible.
Single source of
truth
Single source of
failures
Atomic broadcast
Reliable broadcast
Gossip Protocols
“Designed for Scale”
Probabilistic Guarantees
Bimodal Multicast
• Quickly broadcast message to all servers
• Gossip to recover lost messages
One Problem
Computers have limited space
Throw away messages
“with high probability” is fine
Real World
End-to-End Latency
74ms
83ms
133ms
London
San Jose
Tokyo
0.00
0.00
0.05
0.10
0.00
0.05
0.10
0.00
0.05
0.10
0 50 100 150
Latency (ms)
Density
End-to-End Latency
42ms
74ms
83ms
133ms
New York
London
San Jose
Tokyo
0.00
0.05
0.10
0.00
0.05
0.10
0.00
0.05
0.10
0.00
0.05
0.10
0 50 100 150
Latency (ms)
Density
Density plot and 95th
percentile of purge latency by server location
Packet Loss
Good systems are boring
What was the point again?
We can build things that
are otherwise
unrealistic
We can build systems
that are more
reliable
You’re already using them.
We’re hiring!
Thanks
@tbmcmullen
What even is this?
Probabilistic Algorithms
Randomized Algorithms
Estimation Algorithms
Probabilistic Algorithms
1. An iota of theory
2. Where are they useful and where are they not?
3. HyperLogLog
4. Locality-sensitive Hashing
5. Bimodal Multicast
“An algorithm that uses randomness to improve
its efficiency”
Las Vegas
Monte Carlo
Las Vegas
def	
  find_las_vegas(haystack,	
  needle):	
  
	
  	
  	
  	
  length	
  =	
  len(haystack)	
  
	
  	
  	
  	
  while	
  True:	
  
	
  	
  	
  	
  	
  	
  	
  	
  index	
  =	
  randrange(length)	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  haystack[index]	
  ==	
  needle:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  return	
  index
Monte Carlo
def	
  find_monte_carlo(haystack,	
  needle,	
  k):	
  
	
  	
  	
  	
  length	
  =	
  len(haystack)	
  
	
  	
  	
  	
  for	
  i	
  in	
  range(k):	
  
	
  	
  	
  	
  	
  	
  	
  	
  index	
  =	
  randrange(length)	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  haystack[index]	
  ==	
  needle:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  return	
  index
– Prabhakar Raghavan (author of Randomized Algorithms)
“For many problems a randomized
algorithm is the simplest the
fastest or both.”
Naive Solution
For 100 million unique IPv4 addresses,
the size of the hash is...
>400mb
Slightly Less Naive
Add each IP to a bloom filter and keep a
counter of the IPs that don’t collide.
Slightly Less Naive
ips_seen	
  =	
  BloomFilter(capacity=expected_size,	
  error_rate=0.03)	
  
counter	
  =	
  0	
  
	
  	
  
for	
  line	
  in	
  log_file:	
  
	
  	
  	
  	
  ip	
  =	
  extract_ip(line)	
  
	
  	
  	
  	
  if	
  items_bloom.add(ip):	
  
	
  	
  	
  	
  	
  	
  	
  	
  counter	
  +=	
  1	
  
	
  	
  
print	
  "Unique	
  IPs:",	
  counter
Slightly Less Naive
• Adding an IP: O(1)
• Retrieving cardinality: O(1)
• Space: O(n)
• Error rate: 3%
kind of
Slightly Less Naive
For 100 million unique IPv4 addresses,
and an error rate of 3%,
the size of the bloom filter is...
87mb
def	
  insert(self,	
  token):	
  
	
  	
  	
  	
  #	
  Get	
  hash	
  of	
  token	
  
	
  	
  	
  	
  y	
  =	
  hash_fn(token)	
  
	
  	
  
	
  	
  	
  	
  #	
  Extract	
  `k`	
  most	
  significant	
  bits	
  of	
  `y`	
  
	
  	
  	
  	
  j	
  =	
  y	
  >>	
  (hash_len	
  -­‐	
  self.k)	
  
	
  	
  
	
  	
  	
  	
  #	
  Extract	
  remaining	
  bits	
  of	
  `y`	
  
	
  	
  	
  	
  remaining	
  =	
  y	
  &	
  ((1	
  <<	
  (hash_len	
  -­‐	
  self.k))	
  -­‐	
  1)	
  
	
  	
  
	
  	
  	
  	
  #	
  Find	
  "first"	
  set	
  bit	
  of	
  `remaining`	
  
	
  	
  	
  	
  first_set_bit	
  =	
  (64	
  -­‐	
  self.k)	
  -­‐	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  int(math.log(remaining,	
  2))	
  
	
  	
  
	
  	
  	
  	
  #	
  Update	
  `M[j]`	
  to	
  max	
  of	
  `first_set_bit`	
  
	
  	
  	
  	
  #	
  and	
  existing	
  value	
  of	
  `M[j]`	
  
	
  	
  	
  	
  self.M[j]	
  =	
  max(self.M[j],	
  first_set_bit)
def	
  cardinality(self):	
  
	
  	
  	
  	
  #	
  The	
  mean	
  of	
  `M`	
  estimates	
  `log2(n)`	
  with	
  
	
  	
  	
  	
  #	
  an	
  additive	
  bias	
  
	
  	
  	
  	
  return	
  self.alpha	
  *	
  2	
  **	
  np.mean(self.M)
The Problem
Find documents that are similar to one
specific document.
The Problem
Find images that are similar to one
specific image.
The Problem
Find graphs that are correlated to one
specific graph.
The Problem
Nearest neighbor search.
The Problem
“Find the n closest points in a d-dimensional space.”
The Problem
You have a bunch of things and you want to
figure out which ones are similar.
“There has been a lot of recent work on streaming algorithms,
i.e. algorithms that produce an output by making
one pass (or a few passes) over the data while using a
limited amount of storage space and time. To cite a few
examples, ...”
{	
  "there":	
  1,	
  "has":	
  1,	
  "been":	
  	
  1,	
  “a":4,	
  
	
  	
  "lot":	
  	
  	
  1,	
  "of":	
  	
  2,	
  "recent":1,	
  ...	
  }
• Cosine similarity
• Jaccard similarity
• Euclidian distance
• etc etc etc
Euclidian Distance
Metric space
kd-trees
Curse of Dimensionality
Locality-sensitive hashing
Locality-Sensitive Hashing for Finding Nearest Neighbors
- Slaney and Casey
Random Hyperplanes
{	
  "there":	
  1,	
  "has":	
  1,	
  "been":	
  	
  1,	
  “a":4,	
  
	
  	
  "lot":	
  	
  	
  1,	
  "of":	
  	
  2,	
  "recent":1,	
  ...	
  }
LSH
0	
  1	
  1	
  1	
  0	
  1	
  0	
  0	
  0	
  0	
  1	
  0	
  1	
  0	
  1	
  0	
  ...
Cosine Similarity
LSH
Firewall Partition
DDoS
• `

Mais conteúdo relacionado

Mais procurados

Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Kyunghoon Kim
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Machine Learning Model Bakeoff
Machine Learning Model BakeoffMachine Learning Model Bakeoff
Machine Learning Model Bakeoffmrphilroth
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupSri Ambati
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldBrian Troutwine
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsPooyan Jamshidi
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
Weather of the Century: Visualization
Weather of the Century: VisualizationWeather of the Century: Visualization
Weather of the Century: VisualizationMongoDB
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Practical Ai Class 3
Practical Ai Class 3Practical Ai Class 3
Practical Ai Class 3Oliver Zhang
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
The Weather of the Century
The Weather of the CenturyThe Weather of the Century
The Weather of the CenturyMongoDB
 
The Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationThe Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationMongoDB
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 

Mais procurados (20)

Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Machine Learning Model Bakeoff
Machine Learning Model BakeoffMachine Learning Model Bakeoff
Machine Learning Model Bakeoff
 
Ember
EmberEmber
Ember
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning Systems
 
Logs
LogsLogs
Logs
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
Weather of the Century: Visualization
Weather of the Century: VisualizationWeather of the Century: Visualization
Weather of the Century: Visualization
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Practical Ai Class 3
Practical Ai Class 3Practical Ai Class 3
Practical Ai Class 3
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
The Weather of the Century
The Weather of the CenturyThe Weather of the Century
The Weather of the Century
 
The Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationThe Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: Visualization
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 

Destaque

Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...
Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...
Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...Fastly
 
Applying Varnish
Applying VarnishApplying Varnish
Applying VarnishFastly
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisFastly
 
Confident Refactoring - Ember SF Meetup
Confident Refactoring - Ember SF MeetupConfident Refactoring - Ember SF Meetup
Confident Refactoring - Ember SF MeetupFastly
 
Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015Fastly
 
The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015
The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015
The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015Fastly
 
Fallacy of Fast
Fallacy of FastFallacy of Fast
Fallacy of FastFastly
 
Top 5 Things I've Messed Up in Live Streaming
Top 5 Things I've Messed Up in Live StreamingTop 5 Things I've Messed Up in Live Streaming
Top 5 Things I've Messed Up in Live StreamingFastly
 
Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...
Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...
Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...Fastly
 
It Probably Works
It Probably WorksIt Probably Works
It Probably WorksFastly
 
Developing a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemDeveloping a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemFastly
 
Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...
Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...
Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...Fastly
 
What we can learn from CDNs about Web Development, Deployment, and Performance
What we can learn from CDNs about Web Development, Deployment, and PerformanceWhat we can learn from CDNs about Web Development, Deployment, and Performance
What we can learn from CDNs about Web Development, Deployment, and PerformanceFastly
 
Rails Caching: Secrets From the Edge
Rails Caching: Secrets From the EdgeRails Caching: Secrets From the Edge
Rails Caching: Secrets From the EdgeFastly
 
Building a better web
Building a better webBuilding a better web
Building a better webFastly
 
The future of the edge
The future of the edge The future of the edge
The future of the edge Fastly
 
Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015
Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015
Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015Fastly
 
Making ops life easier
Making ops life easierMaking ops life easier
Making ops life easierFastly
 
From Zero to Capacity Planning
From Zero to Capacity PlanningFrom Zero to Capacity Planning
From Zero to Capacity PlanningFastly
 
Debugging Your CDN - Austin Spires at Fastly Altitude 2015
Debugging Your CDN - Austin Spires at Fastly Altitude 2015Debugging Your CDN - Austin Spires at Fastly Altitude 2015
Debugging Your CDN - Austin Spires at Fastly Altitude 2015Fastly
 

Destaque (20)

Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...
Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...
Performance Measuring & Monitoring - Catchpoint CEO Mehdi Daoudi at Fastly Al...
 
Applying Varnish
Applying VarnishApplying Varnish
Applying Varnish
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Confident Refactoring - Ember SF Meetup
Confident Refactoring - Ember SF MeetupConfident Refactoring - Ember SF Meetup
Confident Refactoring - Ember SF Meetup
 
Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015
 
The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015
The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015
The Fallacy of Fast - Ines Sombra at Fastly Altitude 2015
 
Fallacy of Fast
Fallacy of FastFallacy of Fast
Fallacy of Fast
 
Top 5 Things I've Messed Up in Live Streaming
Top 5 Things I've Messed Up in Live StreamingTop 5 Things I've Messed Up in Live Streaming
Top 5 Things I've Messed Up in Live Streaming
 
Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...
Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...
Advanced VCL Workshop - Rogier Mulhuijzen and Stephen Basile at Fastly Altitu...
 
It Probably Works
It Probably WorksIt Probably Works
It Probably Works
 
Developing a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemDeveloping a Globally Distributed Purging System
Developing a Globally Distributed Purging System
 
Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...
Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...
Interative Traffic Engineering in Changing Internet Economics - Tom Daly at L...
 
What we can learn from CDNs about Web Development, Deployment, and Performance
What we can learn from CDNs about Web Development, Deployment, and PerformanceWhat we can learn from CDNs about Web Development, Deployment, and Performance
What we can learn from CDNs about Web Development, Deployment, and Performance
 
Rails Caching: Secrets From the Edge
Rails Caching: Secrets From the EdgeRails Caching: Secrets From the Edge
Rails Caching: Secrets From the Edge
 
Building a better web
Building a better webBuilding a better web
Building a better web
 
The future of the edge
The future of the edge The future of the edge
The future of the edge
 
Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015
Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015
Migrating Target to Fastly - Eddie Roger at Fastly Altitude 2015
 
Making ops life easier
Making ops life easierMaking ops life easier
Making ops life easier
 
From Zero to Capacity Planning
From Zero to Capacity PlanningFrom Zero to Capacity Planning
From Zero to Capacity Planning
 
Debugging Your CDN - Austin Spires at Fastly Altitude 2015
Debugging Your CDN - Austin Spires at Fastly Altitude 2015Debugging Your CDN - Austin Spires at Fastly Altitude 2015
Debugging Your CDN - Austin Spires at Fastly Altitude 2015
 

Semelhante a Probabilistic algorithms estimate unique items

Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to AlgorithmsVenkatesh Iyer
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure Eman magdy
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB
 
Lunch session: Quantum Computing
Lunch session: Quantum ComputingLunch session: Quantum Computing
Lunch session: Quantum ComputingRolf Huisman
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
 
Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scaleRebecca Bilbro
 
Microsoft azure data fundamentals (dp 900) practice tests 2022
Microsoft azure data fundamentals (dp 900) practice tests 2022Microsoft azure data fundamentals (dp 900) practice tests 2022
Microsoft azure data fundamentals (dp 900) practice tests 2022SkillCertProExams
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Lecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxLecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxNatKell
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheckBeScala
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with ScalaOto Brglez
 
Types and Immutability: why you should care
Types and Immutability: why you should careTypes and Immutability: why you should care
Types and Immutability: why you should careJean Carlo Emer
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015lpgauth
 

Semelhante a Probabilistic algorithms estimate unique items (20)

Apex code benchmarking
Apex code benchmarkingApex code benchmarking
Apex code benchmarking
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
 
Lunch session: Quantum Computing
Lunch session: Quantum ComputingLunch session: Quantum Computing
Lunch session: Quantum Computing
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
 
Microsoft azure data fundamentals (dp 900) practice tests 2022
Microsoft azure data fundamentals (dp 900) practice tests 2022Microsoft azure data fundamentals (dp 900) practice tests 2022
Microsoft azure data fundamentals (dp 900) practice tests 2022
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Lecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptxLecture 2 - Bit vs Qubits.pptx
Lecture 2 - Bit vs Qubits.pptx
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheck
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with Scala
 
Types and Immutability: why you should care
Types and Immutability: why you should careTypes and Immutability: why you should care
Types and Immutability: why you should care
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015
 

Mais de Fastly

Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2Fastly
 
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at ScaleAltitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at ScaleFastly
 
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the InternetAltitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the InternetFastly
 
Altitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup StreamAltitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup StreamFastly
 
Altitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our DestinyAltitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our DestinyFastly
 
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Fastly
 
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless MigrationAltitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless MigrationFastly
 
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub PagesAltitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub PagesFastly
 
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopAltitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopFastly
 
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and WoeAltitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and WoeFastly
 
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...Fastly
 
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per dayAltitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per dayFastly
 
Altitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the EdgeAltitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the EdgeFastly
 
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsAltitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsFastly
 
Altitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly WorkshopAltitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly WorkshopFastly
 
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORKAltitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORKFastly
 
Altitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF WorkshopAltitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF WorkshopFastly
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Fastly
 
Altitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop DocsAltitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop DocsFastly
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeFastly
 

Mais de Fastly (20)

Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2
 
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at ScaleAltitude San Francisco 2018: Preparing for Video Streaming Events at Scale
Altitude San Francisco 2018: Preparing for Video Streaming Events at Scale
 
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the InternetAltitude San Francisco 2018: Building the Souther Hemisphere of the Internet
Altitude San Francisco 2018: Building the Souther Hemisphere of the Internet
 
Altitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup StreamAltitude San Francisco 2018: The World Cup Stream
Altitude San Francisco 2018: The World Cup Stream
 
Altitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our DestinyAltitude San Francisco 2018: We Own Our Destiny
Altitude San Francisco 2018: We Own Our Destiny
 
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
 
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless MigrationAltitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
Altitude San Francisco 2018: Moving Off the Monolith: A Seamless Migration
 
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub PagesAltitude San Francisco 2018: Bringing TLS to GitHub Pages
Altitude San Francisco 2018: Bringing TLS to GitHub Pages
 
Altitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation WorkshopAltitude San Francisco 2018: HTTP Invalidation Workshop
Altitude San Francisco 2018: HTTP Invalidation Workshop
 
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and WoeAltitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
Altitude San Francisco 2018: HTTP/2 Tales: Discovery and Woe
 
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
Altitude San Francisco 2018: How Magento moved to the cloud while maintaining...
 
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per dayAltitude San Francisco 2018: Scaling Ethereum to 10B requests per day
Altitude San Francisco 2018: Scaling Ethereum to 10B requests per day
 
Altitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the EdgeAltitude San Francisco 2018: Authentication at the Edge
Altitude San Francisco 2018: Authentication at the Edge
 
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsAltitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & Applications
 
Altitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly WorkshopAltitude San Francisco 2018: Testing with Fastly Workshop
Altitude San Francisco 2018: Testing with Fastly Workshop
 
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORKAltitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
Altitude San Francisco 2018: Fastly Purge Control at the USA TODAY NETWORK
 
Altitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF WorkshopAltitude San Francisco 2018: WAF Workshop
Altitude San Francisco 2018: WAF Workshop
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 
Altitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop DocsAltitude San Francisco 2018: Video Workshop Docs
Altitude San Francisco 2018: Video Workshop Docs
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Probabilistic algorithms estimate unique items

  • 2. Tyler McMullen CTO of Fastly @tbmcmullen
  • 4. What is a probabilistic algorithm?
  • 6. –Hal Abelson and Gerald J. Sussman, SICP “In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a 'correct' algorithm. Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering.”
  • 10. Don’t use them if someone’s life depends on it.
  • 11. Don’t use them if someone’s life depends on it.
  • 12. When should I use these?
  • 15. The Problem How many unique words are in a large corpus of text?
  • 16. The Problem How many different users visited a popular website in a day?
  • 17. The Problem How many unique IPs have connected to a server over the last hour?
  • 18. The Problem How many unique URLs have been requested through an HTTP proxy?
  • 19. The Problem How many unique URLs have been requested through an entire network of HTTP proxies?
  • 20. 166.208.249.236  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /product/982"  200  20246   103.138.203.165  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /article/490"  200  29870   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /page/1"  200  20409   150.255.232.154  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /page/1"  200  42999   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /article/490"  200  25080   150.255.232.154  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /listing/567"  200  33617   103.138.203.165  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /listing/567"  200  29618   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /page/1"  200  30265   166.208.249.236  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /page/1"  200  5683   244.210.202.222  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /article/490"  200  47124   103.138.203.165  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /listing/567"  200  48734   103.138.203.165  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /listing/567"  200  27392   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /listing/567"  200  15705   150.255.232.154  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /page/1"  200  22587   244.210.202.222  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /product/982"  200  30063   244.210.202.222  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /page/1"  200  6041   166.208.249.236  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /product/982"  200  25783   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /article/490"  200  1099   244.210.202.222  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /product/982"  200  31494   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /listing/567"  200  30389   150.255.232.154  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /article/490"  200  10251   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /product/982"  200  19384   150.255.232.154  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /product/982"  200  24062   244.210.202.222  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /article/490"  200  19070   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /page/648"  200  45159   191.141.247.227  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "HEAD  /page/648"  200  5576   166.208.249.236  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /page/648"  200  41869   166.208.249.236  -­‐  -­‐  [25/Feb/2015:07:20:13  +0000]  "GET  /listing/567"  200  42414  
  • 21. def  count_distinct(stream):          seen  =  set()          for  item  in  stream:                  seen.add(item)          return  len(seen)
  • 24. def  combined_cardinality(seen_sets):          combined  =  set()          for  seen  in  seen_sets:                  combined  |=  seen          return  len(combined)
  • 25. The set grows linearly.
  • 27.
  • 28.
  • 30. 1  0  1  1  0  1  0  1 “example.com/user/page/1”
  • 31. 1  0  1  1  0  1  0  1
  • 32. 1  0  1  1  0  1  0  1 P(bit  0  is  set)  =  0.5
  • 33. 1  0  1  1  0  1  0  1 P(bit  0  is  set   &  bit  1  is  set)  =  0.25
  • 34. 1  0  1  1  0  1  0  1 P(bit  0  is  set     &  bit  1  is  set   &  bit  2  is  set)  =  0.125
  • 35. 1  0  1  1  0  1  0  1 P(bit  0  is  set)  =  0.5   Expected  trials  =  2
  • 36. 1  0  1  1  0  1  0  1 P(bit  0  is  set   &  bit  1  is  set)  =  0.25   Expected  trials  =  4
  • 37. 1  0  1  1  0  1  0  1 P(bit  0  is  set     &  bit  1  is  set   &  bit  2  is  set)  =  0.125   Expected  trials  =  8
  • 38. We expect the maximum number of leading zeros we have seen + 1 to approximate log2(unique items).
  • 39. Improve the accuracy of the estimate by partitioning the input data.
  • 40. class  LogLog(object):          def  __init__(self,  k):                  self.k  =  k                  self.m  =  2  **  k                  self.M  =  np.zeros(self.m,  dtype=np.int)                  self.alpha  =  Alpha[k]              def  insert(self,  token):                  y  =  hash_fn(token)                  j  =  y  >>  (hash_len  -­‐  self.k)                  remaining  =  y  &  ((1  <<  (hash_len  -­‐  self.k))  -­‐  1)                  first_set_bit  =  (64  -­‐  self.k)  -­‐                                                  int(math.log(remaining,  2))                  self.M[j]  =  max(self.M[j],  first_set_bit)              def  cardinality(self):                  return  self.alpha  *  2  **  np.mean(self.M)
  • 42. HyperLogLog • Adding an item: O(1) • Retrieving cardinality: O(1) • Space: O(log log n) • Error rate: 2%
  • 43. HyperLogLog For 100 million unique items, and an error rate of 2%, the size of the HyperLogLog is... 1,500 bytes
  • 45. The Problem Reliably broadcast “purge” messages across the world as quickly as possible.
  • 50.
  • 51.
  • 53.
  • 54.
  • 57.
  • 58. Bimodal Multicast • Quickly broadcast message to all servers • Gossip to recover lost messages
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. One Problem Computers have limited space
  • 67.
  • 68.
  • 69.
  • 73. End-to-End Latency 42ms 74ms 83ms 133ms New York London San Jose Tokyo 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0 50 100 150 Latency (ms) Density Density plot and 95th percentile of purge latency by server location
  • 76. What was the point again?
  • 77. We can build things that are otherwise unrealistic
  • 78. We can build systems that are more reliable
  • 82. What even is this?
  • 86. Probabilistic Algorithms 1. An iota of theory 2. Where are they useful and where are they not? 3. HyperLogLog 4. Locality-sensitive Hashing 5. Bimodal Multicast
  • 87. “An algorithm that uses randomness to improve its efficiency”
  • 90. Las Vegas def  find_las_vegas(haystack,  needle):          length  =  len(haystack)          while  True:                  index  =  randrange(length)                  if  haystack[index]  ==  needle:                          return  index
  • 91. Monte Carlo def  find_monte_carlo(haystack,  needle,  k):          length  =  len(haystack)          for  i  in  range(k):                  index  =  randrange(length)                  if  haystack[index]  ==  needle:                          return  index
  • 92. – Prabhakar Raghavan (author of Randomized Algorithms) “For many problems a randomized algorithm is the simplest the fastest or both.”
  • 93. Naive Solution For 100 million unique IPv4 addresses, the size of the hash is... >400mb
  • 94. Slightly Less Naive Add each IP to a bloom filter and keep a counter of the IPs that don’t collide.
  • 95. Slightly Less Naive ips_seen  =  BloomFilter(capacity=expected_size,  error_rate=0.03)   counter  =  0       for  line  in  log_file:          ip  =  extract_ip(line)          if  items_bloom.add(ip):                  counter  +=  1       print  "Unique  IPs:",  counter
  • 96. Slightly Less Naive • Adding an IP: O(1) • Retrieving cardinality: O(1) • Space: O(n) • Error rate: 3% kind of
  • 97. Slightly Less Naive For 100 million unique IPv4 addresses, and an error rate of 3%, the size of the bloom filter is... 87mb
  • 98. def  insert(self,  token):          #  Get  hash  of  token          y  =  hash_fn(token)              #  Extract  `k`  most  significant  bits  of  `y`          j  =  y  >>  (hash_len  -­‐  self.k)              #  Extract  remaining  bits  of  `y`          remaining  =  y  &  ((1  <<  (hash_len  -­‐  self.k))  -­‐  1)              #  Find  "first"  set  bit  of  `remaining`          first_set_bit  =  (64  -­‐  self.k)  -­‐                                          int(math.log(remaining,  2))              #  Update  `M[j]`  to  max  of  `first_set_bit`          #  and  existing  value  of  `M[j]`          self.M[j]  =  max(self.M[j],  first_set_bit)
  • 99. def  cardinality(self):          #  The  mean  of  `M`  estimates  `log2(n)`  with          #  an  additive  bias          return  self.alpha  *  2  **  np.mean(self.M)
  • 100. The Problem Find documents that are similar to one specific document.
  • 101. The Problem Find images that are similar to one specific image.
  • 102. The Problem Find graphs that are correlated to one specific graph.
  • 104. The Problem “Find the n closest points in a d-dimensional space.”
  • 105. The Problem You have a bunch of things and you want to figure out which ones are similar.
  • 106.
  • 107. “There has been a lot of recent work on streaming algorithms, i.e. algorithms that produce an output by making one pass (or a few passes) over the data while using a limited amount of storage space and time. To cite a few examples, ...” {  "there":  1,  "has":  1,  "been":    1,  “a":4,      "lot":      1,  "of":    2,  "recent":1,  ...  }
  • 108. • Cosine similarity • Jaccard similarity • Euclidian distance • etc etc etc
  • 114.
  • 115.
  • 116. Locality-Sensitive Hashing for Finding Nearest Neighbors - Slaney and Casey
  • 118. {  "there":  1,  "has":  1,  "been":    1,  “a":4,      "lot":      1,  "of":    2,  "recent":1,  ...  } LSH 0  1  1  1  0  1  0  0  0  0  1  0  1  0  1  0  ...