SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Ruby on Redis	

Pascal Weemaels	

Koen Handekyn	

Oct 2013
Target	

Create a Zip file of PDF’s
based on a CSV data file	

‣  Linear version	

‣  Making it scale with Redis	


parse csv
	


create pdf
	

create pdf
	


...	


create pdf
	


zip
Step 1: linear 	

‣  Parse CSV	

•  std lib : require ‘csv’	

•  docs = CSV.read("#{DATA}.csv")
Simple Templating with String Interpolation	

invoice.html	

<<Q	

<div class="title">	

INVOICE #{invoice_nr}	


‣  Merge data into HTML	

• 

template =
File.new('invoice.html').
read

• 

html =
eval("<<QQQn#{template}
nQQQ”)

</div>	

<div class="address">	

#{name}</br>	

#{street}</br>	

#{zip} #{city}</br>	

</div>	

Q
Step 1: linear 	

‣  Create PDF	

•  prince xml using princely gem	

•  http://www.princexml.com	

•  p = Princely.new
p.add_style_sheets('invoice.css')
p.pdf_from_string(html)
Step 1: linear	

‣  Create ZIP	

•  Zip::ZipOutputstream.
open(zipfile_name)do |zos|
files.each do |file, content|
zos.new_entry(file)
zos.puts content
end
end
Full Code
	

require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval("<<WTFMFn#{template}nWTFMF")!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry "#{name}.pdf"!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!
DEMO
Step 2: from linear ...	

parse csv
	


create pdf
	

create pdf
	


...	


create pdf
	


zip
Step 2: ...to parallel	

parse csv
	


create pdf
	


create pdf
	


zip
	


Threads
	

?
	


create pdf
Multi Threaded	

‣  Advantage	

•  Lightweight (minimal overhead)	

‣  Challenges (or why is it hard)	

•  Hard to code: most data structures are not thread safe by default, they
need synchronized access	


•  Hard to test: different execution paths , timings	

•  Hard to maintain	

‣  Limitation	

•  single machine - not a solution for horizontal scalability 
beyond the multi core cpu
Step 2: ...to parallel	

parse csv
	

?
	


create pdf
	


create pdf
	


zip
	


create pdf
Multi Process	

• scale across machines	

•  advanced support for debugging and monitoring at the
OS level	


• simpler (code, testing, debugging, ...)	

•  slightly more overhead 	


	

BUT
But	

all this assumes	

“shared state across processes”	


MemCached	


parse csv
	


SQL?	


shared state
	


create pdf
	


create pdf
	


create pdf
	


shared state
	


File System
	


zip
	

… OR …
	


Terra Cotta
Hello Redis	

‣  Shared Memory Key Value Store with
High Level Data Structure support 	

•  String (String, Int, Float)	

•  Hash (Map, Dictionary) 	

•  List (Queue) 	

•  Set 	

•  ZSet (ordered by member or score)
About Redis	

•  Single threaded : 1 thread to serve them all	

•  (fit) Everything in memory	

• 

“Transactions” (multi exec)	


• 

Expiring keys	


• 

LUA Scripting	


• 

Publisher-Subscriber	


• 

Auto Create and Destroy	


• 

Pipelining	


• 

But … full clustering (master-master) is not available (yet)
Hello Redis	

‣  redis-cli	

• 
• 
• 
• 

set name “pascal” =
“pascal”
incr counter = 1
incr counter = 2
hset pascal name
“pascal”

• 

hset pascal address
“merelbeke”

• 
• 

sadd persons pascal
smembers persons =
[pascal]

• 
• 
• 
• 
• 
• 
• 

keys *
type pascal = hash
lpush todo “read” = 1
lpush todo “eat” = 2
lpop todo = “eat”
rpoplpush todo done =
“read”
lrange done 0 -1 =
“read”
Let Redis Distribute	

parse csv
	


create pdf
	


process	


process	


create pdf
	


process	


zip
	


...
Spread the Work	

parse csv
	


process	


1
	


zip
	


counter
	


Queue with data
	


create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Put PDF Create Input data on a Queue and do the counter
bookkeeping	


!
docs.each do |doc|!
data = YAML::dump(doc)!
!r.lpush 'pdf:queue’, data!
r.incr ctr” # bookkeeping!
end!
Create PDF’s	

process	


parse csv
	


zip
	


counter
	

Queue with data
	

Hash with pdfs
	


2	


1	

create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Read PDF input data from Queue and do the counter bookkeeping
and put each created PDF in a Redis hash and signal if ready	


while (true)!
_, msg = r.brpop 'pdf:queue’!
!doc = YAML::load(msg)!
#name of hash, key=docname, value=pdf!
r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc))
!
ctr = r.decr ‘ctr’

!

r.rpush ready, done if ctr == 0!
end!
Zip When Done	

parse csv
	


process	


ready
	


zip
	


3
Hash with pdfs
	


create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Wait for the ready signal 
Fetch all pdf ’s
And zip them	


!
r.brpop ready“ # wait for signal!
pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash!
create_zip pdfs # zip it
More Parallelism 	

parse csv
	


zip
	

ready 	

	

ready 	

ready

counter
counter	

counter	

	

hash 	

	

hash Pdfs
Hash with
	


Queue with data
	


create pdf
	


create pdf
	


...
Ruby on Redis	

‣ 

Put PDF Create Input data on a Queue and do the counter
bookkeeping	


# unique id for this input file!
UUID = SecureRandom.uuid!
docs.each do |doc|!
data = YAML::dump([UUID, doc])!
!r.lpush 'pdf:queue’, data!
r.incr ctr:#{UUID}” # bookkeeping!
end!
Ruby on Redis	

‣ 

Read PDF input data from Queue and do the counter bookkeeping and
put each created PDF in a Redis hash	


while (true)!
_, msg = r.brpop 'pdf:queue’!
uuid, doc = YAML::load(msg)!
r.hset(uuid, doc[0], create_pdf(*doc))!
ctr = r.decr ctr:#{uuid}

!

r.rpush ready:#{uuid}, done if ctr == 0
end!

!
Ruby on Redis	

‣ 

Wait for the ready signal 
Fetch all pdf ’s
And zip them	


!
r.brpop ready:#{UUID}“ # wait for signal!
pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash!
create_zip(pdfs) # zip it
Full Code
	

require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!

LINEAR	


require 'csv’!
require 'zip/zip'!
require 'redis'!
require 'yaml'!
require 'securerandom'!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)!
UUID = SecureRandom.uuid!
!
r = Redis.new!
my_counter = ctr:#{UUID}!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
docs.each do |doc| # distribute!!
r.lpush 'pdf:queue' , YAML::dump([UUID, doc])!
r.incr my_counter!
end!
!
r.brpop ready:#{UUID} #collect!!
create_zip(r.hgetall(UUID)) !
!
# clean up!
r.del my_counter!
r.del UUID !
puts All done!”!

MAIN	


require 'redis'!
require 'princely'!
require 'yaml’!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
r = Redis.new!
while (true)!
_, msg = r.brpop 'pdf:queue'!
uuid, doc = YAML::load(msg)!
r.hset(uuid , doc[0] , create_pdf(*doc))!
ctr = r.decr ctr:#{uuid} !
r.rpush ready:#{uuid}, done if ctr == 0!
end!

WORKER	


Key functions (create pdf and create zip)
remain unchanged.	

	

Distribution code highlighted
DEMO 2
Multi Language Participants	

parse csv
	


zip
	


counter
counter	

counter	

	

Queue with data
	


create pdf
	


hash 	

	

hash pdfs
Hash with
	


create pdf
	


...
Conclusions	

From Linear To Multi Process Distributed	

Is easy with	

Redis Shared Memory High Level Data Structures	

	

Atomic Counter for bookkeeping	

Queue for work distribution	

Queue as Signal	

Hash for result sets

Mais conteúdo relacionado

Destaque (11)

Rituales a la diosa Hécate
Rituales a la diosa HécateRituales a la diosa Hécate
Rituales a la diosa Hécate
 
Algoritma dan Struktur Data - antrian
Algoritma dan Struktur Data - antrianAlgoritma dan Struktur Data - antrian
Algoritma dan Struktur Data - antrian
 
Algoritma dan pengetahuan terkait (menghitung, konversi, dll)
Algoritma dan pengetahuan terkait (menghitung, konversi, dll) Algoritma dan pengetahuan terkait (menghitung, konversi, dll)
Algoritma dan pengetahuan terkait (menghitung, konversi, dll)
 
Queue antrian
Queue antrian Queue antrian
Queue antrian
 
Implementasi queue
Implementasi queueImplementasi queue
Implementasi queue
 
Materi Struktur data QUEUE
Materi Struktur data QUEUEMateri Struktur data QUEUE
Materi Struktur data QUEUE
 
Makalah Or Antrian
Makalah Or  AntrianMakalah Or  Antrian
Makalah Or Antrian
 
2894065
28940652894065
2894065
 
Data Structure (Queue)
Data Structure (Queue)Data Structure (Queue)
Data Structure (Queue)
 
Queue
QueueQueue
Queue
 
Queue as data_structure
Queue as data_structureQueue as data_structure
Queue as data_structure
 

Semelhante a Ruby on Redis

Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
dotCloud
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?
Docker, Inc.
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
Robert Grossman
 

Semelhante a Ruby on Redis (20)

Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
 
Front End Development Automation with Grunt
Front End Development Automation with GruntFront End Development Automation with Grunt
Front End Development Automation with Grunt
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projects
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Introduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCatsIntroduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCats
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Streams
StreamsStreams
Streams
 
מיכאל
מיכאלמיכאל
מיכאל
 
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaPyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
PDF made easy with iText 7
PDF made easy with iText 7PDF made easy with iText 7
PDF made easy with iText 7
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Ruby on Redis

  • 1. Ruby on Redis Pascal Weemaels Koen Handekyn Oct 2013
  • 2. Target Create a Zip file of PDF’s based on a CSV data file ‣  Linear version ‣  Making it scale with Redis parse csv create pdf create pdf ... create pdf zip
  • 3. Step 1: linear ‣  Parse CSV •  std lib : require ‘csv’ •  docs = CSV.read("#{DATA}.csv")
  • 4. Simple Templating with String Interpolation invoice.html <<Q <div class="title"> INVOICE #{invoice_nr} ‣  Merge data into HTML •  template = File.new('invoice.html'). read •  html = eval("<<QQQn#{template} nQQQ”) </div> <div class="address"> #{name}</br> #{street}</br> #{zip} #{city}</br> </div> Q
  • 5. Step 1: linear ‣  Create PDF •  prince xml using princely gem •  http://www.princexml.com •  p = Princely.new p.add_style_sheets('invoice.css') p.pdf_from_string(html)
  • 6. Step 1: linear ‣  Create ZIP •  Zip::ZipOutputstream. open(zipfile_name)do |zos| files.each do |file, content| zos.new_entry(file) zos.puts content end end
  • 7. Full Code require 'csv'! require 'princely'! require 'zip/zip’! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval("<<WTFMFn#{template}nWTFMF")! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry "#{name}.pdf"! zos.puts content! end! end! zipfile_name! end! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! # create a pdf for each line in the csv ! # and put it in a hash! files_h = docs.inject({}) do |files_h, doc|! files_h[doc[0]] = create_pdf(*doc)! files_h! end! ! # zip all pfd's from the hash ! create_zip files_h! !
  • 9. Step 2: from linear ... parse csv create pdf create pdf ... create pdf zip
  • 10. Step 2: ...to parallel parse csv create pdf create pdf zip Threads ? create pdf
  • 11. Multi Threaded ‣  Advantage •  Lightweight (minimal overhead) ‣  Challenges (or why is it hard) •  Hard to code: most data structures are not thread safe by default, they need synchronized access •  Hard to test: different execution paths , timings •  Hard to maintain ‣  Limitation •  single machine - not a solution for horizontal scalability beyond the multi core cpu
  • 12. Step 2: ...to parallel parse csv ? create pdf create pdf zip create pdf
  • 13. Multi Process • scale across machines •  advanced support for debugging and monitoring at the OS level • simpler (code, testing, debugging, ...) •  slightly more overhead BUT
  • 14. But all this assumes “shared state across processes” MemCached parse csv SQL? shared state create pdf create pdf create pdf shared state File System zip … OR … Terra Cotta
  • 15. Hello Redis ‣  Shared Memory Key Value Store with High Level Data Structure support •  String (String, Int, Float) •  Hash (Map, Dictionary) •  List (Queue) •  Set •  ZSet (ordered by member or score)
  • 16. About Redis •  Single threaded : 1 thread to serve them all •  (fit) Everything in memory •  “Transactions” (multi exec) •  Expiring keys •  LUA Scripting •  Publisher-Subscriber •  Auto Create and Destroy •  Pipelining •  But … full clustering (master-master) is not available (yet)
  • 17. Hello Redis ‣  redis-cli •  •  •  •  set name “pascal” = “pascal” incr counter = 1 incr counter = 2 hset pascal name “pascal” •  hset pascal address “merelbeke” •  •  sadd persons pascal smembers persons = [pascal] •  •  •  •  •  •  •  keys * type pascal = hash lpush todo “read” = 1 lpush todo “eat” = 2 lpop todo = “eat” rpoplpush todo done = “read” lrange done 0 -1 = “read”
  • 18. Let Redis Distribute parse csv create pdf process process create pdf process zip ...
  • 19. Spread the Work parse csv process 1 zip counter Queue with data create pdf process create pdf process ...
  • 20. Ruby on Redis ‣  Put PDF Create Input data on a Queue and do the counter bookkeeping ! docs.each do |doc|! data = YAML::dump(doc)! !r.lpush 'pdf:queue’, data! r.incr ctr” # bookkeeping! end!
  • 21. Create PDF’s process parse csv zip counter Queue with data Hash with pdfs 2 1 create pdf process create pdf process ...
  • 22. Ruby on Redis ‣  Read PDF input data from Queue and do the counter bookkeeping and put each created PDF in a Redis hash and signal if ready while (true)! _, msg = r.brpop 'pdf:queue’! !doc = YAML::load(msg)! #name of hash, key=docname, value=pdf! r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc)) ! ctr = r.decr ‘ctr’ ! r.rpush ready, done if ctr == 0! end!
  • 23. Zip When Done parse csv process ready zip 3 Hash with pdfs create pdf process create pdf process ...
  • 24. Ruby on Redis ‣  Wait for the ready signal Fetch all pdf ’s And zip them ! r.brpop ready“ # wait for signal! pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash! create_zip pdfs # zip it
  • 25. More Parallelism parse csv zip ready ready ready counter counter counter hash hash Pdfs Hash with Queue with data create pdf create pdf ...
  • 26. Ruby on Redis ‣  Put PDF Create Input data on a Queue and do the counter bookkeeping # unique id for this input file! UUID = SecureRandom.uuid! docs.each do |doc|! data = YAML::dump([UUID, doc])! !r.lpush 'pdf:queue’, data! r.incr ctr:#{UUID}” # bookkeeping! end!
  • 27. Ruby on Redis ‣  Read PDF input data from Queue and do the counter bookkeeping and put each created PDF in a Redis hash while (true)! _, msg = r.brpop 'pdf:queue’! uuid, doc = YAML::load(msg)! r.hset(uuid, doc[0], create_pdf(*doc))! ctr = r.decr ctr:#{uuid} ! r.rpush ready:#{uuid}, done if ctr == 0 end! !
  • 28. Ruby on Redis ‣  Wait for the ready signal Fetch all pdf ’s And zip them ! r.brpop ready:#{UUID}“ # wait for signal! pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash! create_zip(pdfs) # zip it
  • 29. Full Code require 'csv'! require 'princely'! require 'zip/zip’! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval(WTFMFn#{template}nWTFMF)! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry #{name}.pdf! zos.puts content! end! end! zipfile_name! end! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! # create a pdf for each line in the csv ! # and put it in a hash! files_h = docs.inject({}) do |files_h, doc|! files_h[doc[0]] = create_pdf(*doc)! files_h! end! ! # zip all pfd's from the hash ! create_zip files_h! ! LINEAR require 'csv’! require 'zip/zip'! require 'redis'! require 'yaml'! require 'securerandom'! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry #{name}.pdf! zos.puts content! end! end! zipfile_name! end! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)! UUID = SecureRandom.uuid! ! r = Redis.new! my_counter = ctr:#{UUID}! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! docs.each do |doc| # distribute!! r.lpush 'pdf:queue' , YAML::dump([UUID, doc])! r.incr my_counter! end! ! r.brpop ready:#{UUID} #collect!! create_zip(r.hgetall(UUID)) ! ! # clean up! r.del my_counter! r.del UUID ! puts All done!”! MAIN require 'redis'! require 'princely'! require 'yaml’! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval(WTFMFn#{template}nWTFMF)! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! r = Redis.new! while (true)! _, msg = r.brpop 'pdf:queue'! uuid, doc = YAML::load(msg)! r.hset(uuid , doc[0] , create_pdf(*doc))! ctr = r.decr ctr:#{uuid} ! r.rpush ready:#{uuid}, done if ctr == 0! end! WORKER Key functions (create pdf and create zip) remain unchanged. Distribution code highlighted
  • 31. Multi Language Participants parse csv zip counter counter counter Queue with data create pdf hash hash pdfs Hash with create pdf ...
  • 32. Conclusions From Linear To Multi Process Distributed Is easy with Redis Shared Memory High Level Data Structures Atomic Counter for bookkeeping Queue for work distribution Queue as Signal Hash for result sets