SlideShare a Scribd company logo
1 of 32
Download to read offline
Ad-hoc Big-Data Analysis with Lua
And LuaJIT
Alexander Gladysh <ag@logiceditor.com>
@agladysh
Lua Workshop 2015
Stockholm
1 / 32
Outline
Introduction
The Problem
A Solution
Assumptions
Examples
The Tools
Notes
Questions?
2 / 32
Alexander Gladysh
CTO, co-owner at LogicEditor
In löve with Lua since 2005
3 / 32
The Problem
You have a dataset to analyze,
which is too large for "small-data" tools,
and have no resources to setup and maintain (or pay for) the
Hadoop, Google Big Query etc.
but you have some processing power available.
4 / 32
Goal
Pre-process the data so it can be handled by R or Excel or
your favorite analytics tool (or Lua!).
If the data is dynamic, then learn to pre-process it and build a
data processing pipeline.
5 / 32
An Approach
Use Lua!
And (semi-)standard tools, available on Linux.
Go minimalistic while exploring, avoid frameworks,
Then move on to an industrial solution that fits your newly
understood requirements,
Or roll your own ecosystem! ;-)
6 / 32
Assumptions
7 / 32
Data Format
Plain text
Column-based (csv-like), optionally with free-form data in the
end
Typical example: web-server log files
8 / 32
Data Format Example: Raw Data
2015/10/15 16:35:30 [info] 14171#0: *901195
[lua] index:14: 95c1c06e626b47dfc705f8ee6695091a
109.74.197.145 *.example.com
GET 123456.gif?q=0&step=0&ref= HTTP/1.1 example.com
NB: This is a single, tab-separated line from a time-sorted file.
9 / 32
Data Format Example: Intermediate Data
alpha.example.com 5
beta.example.com 7
gamma.example.com 1
NB: These are several tab-separated lines from a key-sorted file.
10 / 32
Hardware
As usual, more is better: Cores, cache, memory speed and
size, HDD speeds, networking speeds...
But even a modest VM (or several) can be helpful.
Your fancy gaming laptop is good too ;-)
11 / 32
OS
Linux (Ubuntu) Server.
This approach will, of course, work for other setups.
12 / 32
Filesystem
Ideally, have data copies on each processing node, using
identical layouts.
Fast network should work too.
13 / 32
Examples
14 / 32
Bash Script Example
time pv /path/to/uid-time-url-post.gz 
| pigz -cdp 4 
| cut -d$’t’ -f 1,3 
| parallel --gnu --progress -P 10 --pipe --block=16M 
$(cat <<"EOF"
luajit ~me/url-to-normalized-domain.lua
EOF
) 
| LC_ALL=C sort -u -t$’t’ -k2 --parallel 6 -S20% 
| luajit ~me/reduce-value-counter.lua 
| LC_ALL=C sort -t$’t’ -nrk2 --parallel 6 -S20% 
| pigz -cp4 >/path/to/domain-uniqs_count-merged.gz
15 / 32
Lua Script Example: url-to-normalized-domain.lua
for l in io.lines() do
local key, value = l:match("^([^t]+)t(.*)")
if value then
value = url_to_normalized_domain(value)
end
if key and value then
io.write(key, "t", value, "n")
end
end
16 / 32
Lua Script Example: reduce-value-counter.lua 1/3
-- Assumes input sorted by VALUE
-- a foo --> foo 3
-- a foo bar 2
-- b foo quo 1
-- a bar
-- c bar
-- d quo
17 / 32
Lua Script Example: reduce-value-counter.lua 2/3
local last_key = nil, accum = 0
local flush = function(key)
if last_key then
io.write(last_key, "t", accum, "n")
end
accum = 0
last_key = key -- may be nil
end
18 / 32
Lua Script Example: reduce-value-counter.lua 3/3
for l in io.lines() do
-- Note reverse order!
local value, key = l:match("^(.-)t(.*)$")
assert(key and value)
if key ~= last_key then
flush(key)
collectgarbage("step")
end
accum = accum + 1
end
flush()
19 / 32
Tying It All Together
Basically:
You work with sorted data,
mapping and reducing it line-by-line,
in parallel where at all possible,
while trying to use as much of available hardware resources as
practical,
and without running out of memory.
20 / 32
The Tools
21 / 32
The Tools
parallel
sort, uniq, grep
cut, join, comm
pv
compression utilities
LuaJIT
22 / 32
LuaJIT?
Up to a point:
2.1 helps to speed things up,
FFI bogs down development speed.
Go plain Lua first (run it with LuaJIT),
then roll your own ecosystem as needed ;-)
23 / 32
Parallel
xargs for parallel computation
can run your jobs in parallel on a single machine
or on a "cluster"
24 / 32
Compression
gzip: default, bad
lz4: fast, large files
pigz: fast, parallelizable
xz: good compression, slow
...and many more,
be on lookout for new formats!
25 / 32
GNU sort Tricks
LC_ALL=C 
sort -t$’t’ --parallel 4 -S60% 
-k3,3nr -k2,2 -k1,1nr
Disable locale.
Specify delimiter.
Note that parallel x4 with 60% memory will consume 0.6 *
log(4) = 120% of memory.
When doing multi-key sort, specify parameters after key
number.
26 / 32
grep
http://stackoverflow.com/questions/9066609/fastest-possible-grep
27 / 32
Notes and Remarks
28 / 32
Why Lua?
Perl, AWK are traditional alternatives to Lua, but, if you’re not
very disciplined and experienced, they are much less maintainable.
29 / 32
Start Small!
Always run your scripts on small representative excerpts from
your datasets, not only while developing them locally, but on
actual data-processing nodes too.
Saves time and helps you learn the bottlenecks.
Sometimes large run still blows in your face though:
Monitor resource utilization at run-time.
30 / 32
Discipline!
Many moving parts, large turn-around times, hard to keep tabs.
Keep journal: Write down what you run and what time it took.
Store actual versions of your scripts in a source control system.
Don’t forget to sanity-check the results you get!
31 / 32
Questions?
Alexander Gladysh, ag@logiceditor.com
32 / 32

More Related Content

What's hot

Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Hadoop User Group
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
PostgresOpen
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 

What's hot (20)

kpatch.kgraft
kpatch.kgraftkpatch.kgraft
kpatch.kgraft
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
RocksDB meetup
RocksDB meetupRocksDB meetup
RocksDB meetup
 
Foss Gadgematics
Foss GadgematicsFoss Gadgematics
Foss Gadgematics
 
BKK16-505 Kernel and Bootloader Consolidation and Upstreaming
BKK16-505 Kernel and Bootloader Consolidation and UpstreamingBKK16-505 Kernel and Bootloader Consolidation and Upstreaming
BKK16-505 Kernel and Bootloader Consolidation and Upstreaming
 
Ostd.ksplice.talk
Ostd.ksplice.talkOstd.ksplice.talk
Ostd.ksplice.talk
 
In-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction supportIn-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction support
 
librados
libradoslibrados
librados
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech Talk
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 

Viewers also liked

Course outline
Course outlineCourse outline
Course outline
cocolatto
 
Rekapitulasi 2010 06 28
Rekapitulasi 2010 06 28Rekapitulasi 2010 06 28
Rekapitulasi 2010 06 28
bramanian
 
网页制作基础
网页制作基础网页制作基础
网页制作基础
loo2k
 
Presentation2
Presentation2Presentation2
Presentation2
cocolatto
 
Presentation1
Presentation1Presentation1
Presentation1
cocolatto
 
Learning in the 21st century a national report of online learning
Learning in the 21st century a national report of online learningLearning in the 21st century a national report of online learning
Learning in the 21st century a national report of online learning
tspicuzza
 

Viewers also liked (20)

Nurit Leshem Studio
Nurit Leshem StudioNurit Leshem Studio
Nurit Leshem Studio
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Quick functional UI sketches with Lua templates and mermaid.js
Quick functional UI sketches with Lua templates and mermaid.jsQuick functional UI sketches with Lua templates and mermaid.js
Quick functional UI sketches with Lua templates and mermaid.js
 
CRIS and Institutional Repository Integration: Standardising Open Access
CRIS and Institutional Repository Integration: Standardising Open AccessCRIS and Institutional Repository Integration: Standardising Open Access
CRIS and Institutional Repository Integration: Standardising Open Access
 
Design Yo' Self : Why should designers care about Open Source Software?
Design Yo' Self : Why should designers care about Open Source Software?Design Yo' Self : Why should designers care about Open Source Software?
Design Yo' Self : Why should designers care about Open Source Software?
 
Course outline
Course outlineCourse outline
Course outline
 
PRESIMETRICS
PRESIMETRICSPRESIMETRICS
PRESIMETRICS
 
Partner Training: Nonprofit Industry
Partner Training: Nonprofit IndustryPartner Training: Nonprofit Industry
Partner Training: Nonprofit Industry
 
Rekapitulasi 2010 06 28
Rekapitulasi 2010 06 28Rekapitulasi 2010 06 28
Rekapitulasi 2010 06 28
 
Meet Christian Farioli
Meet Christian FarioliMeet Christian Farioli
Meet Christian Farioli
 
Who Are You? Branding Your Nonprofit
Who Are You? Branding Your NonprofitWho Are You? Branding Your Nonprofit
Who Are You? Branding Your Nonprofit
 
Caesar's quotes
Caesar's quotesCaesar's quotes
Caesar's quotes
 
ABA Information- and Communications Technologies in Austria
ABA Information- and Communications Technologies in AustriaABA Information- and Communications Technologies in Austria
ABA Information- and Communications Technologies in Austria
 
网页制作基础
网页制作基础网页制作基础
网页制作基础
 
Presentation2
Presentation2Presentation2
Presentation2
 
Presentation1
Presentation1Presentation1
Presentation1
 
L298 h
L298 hL298 h
L298 h
 
Learning in the 21st century a national report of online learning
Learning in the 21st century a national report of online learningLearning in the 21st century a national report of online learning
Learning in the 21st century a national report of online learning
 
Open-V
Open-VOpen-V
Open-V
 
Social media in schools
Social media in schoolsSocial media in schools
Social media in schools
 

Similar to Ad-hoc Big-Data Analysis with Lua

Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 

Similar to Ad-hoc Big-Data Analysis with Lua (20)

Training
TrainingTraining
Training
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
2014 hadoop wrocław jug
2014 hadoop   wrocław jug2014 hadoop   wrocław jug
2014 hadoop wrocław jug
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
 
Tips and Tricks for Increased Development Efficiency
Tips and Tricks for Increased Development EfficiencyTips and Tricks for Increased Development Efficiency
Tips and Tricks for Increased Development Efficiency
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Java in containers
Java in containersJava in containers
Java in containers
 

Recently uploaded

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

Ad-hoc Big-Data Analysis with Lua

  • 1. Ad-hoc Big-Data Analysis with Lua And LuaJIT Alexander Gladysh <ag@logiceditor.com> @agladysh Lua Workshop 2015 Stockholm 1 / 32
  • 3. Alexander Gladysh CTO, co-owner at LogicEditor In löve with Lua since 2005 3 / 32
  • 4. The Problem You have a dataset to analyze, which is too large for "small-data" tools, and have no resources to setup and maintain (or pay for) the Hadoop, Google Big Query etc. but you have some processing power available. 4 / 32
  • 5. Goal Pre-process the data so it can be handled by R or Excel or your favorite analytics tool (or Lua!). If the data is dynamic, then learn to pre-process it and build a data processing pipeline. 5 / 32
  • 6. An Approach Use Lua! And (semi-)standard tools, available on Linux. Go minimalistic while exploring, avoid frameworks, Then move on to an industrial solution that fits your newly understood requirements, Or roll your own ecosystem! ;-) 6 / 32
  • 8. Data Format Plain text Column-based (csv-like), optionally with free-form data in the end Typical example: web-server log files 8 / 32
  • 9. Data Format Example: Raw Data 2015/10/15 16:35:30 [info] 14171#0: *901195 [lua] index:14: 95c1c06e626b47dfc705f8ee6695091a 109.74.197.145 *.example.com GET 123456.gif?q=0&step=0&ref= HTTP/1.1 example.com NB: This is a single, tab-separated line from a time-sorted file. 9 / 32
  • 10. Data Format Example: Intermediate Data alpha.example.com 5 beta.example.com 7 gamma.example.com 1 NB: These are several tab-separated lines from a key-sorted file. 10 / 32
  • 11. Hardware As usual, more is better: Cores, cache, memory speed and size, HDD speeds, networking speeds... But even a modest VM (or several) can be helpful. Your fancy gaming laptop is good too ;-) 11 / 32
  • 12. OS Linux (Ubuntu) Server. This approach will, of course, work for other setups. 12 / 32
  • 13. Filesystem Ideally, have data copies on each processing node, using identical layouts. Fast network should work too. 13 / 32
  • 15. Bash Script Example time pv /path/to/uid-time-url-post.gz | pigz -cdp 4 | cut -d$’t’ -f 1,3 | parallel --gnu --progress -P 10 --pipe --block=16M $(cat <<"EOF" luajit ~me/url-to-normalized-domain.lua EOF ) | LC_ALL=C sort -u -t$’t’ -k2 --parallel 6 -S20% | luajit ~me/reduce-value-counter.lua | LC_ALL=C sort -t$’t’ -nrk2 --parallel 6 -S20% | pigz -cp4 >/path/to/domain-uniqs_count-merged.gz 15 / 32
  • 16. Lua Script Example: url-to-normalized-domain.lua for l in io.lines() do local key, value = l:match("^([^t]+)t(.*)") if value then value = url_to_normalized_domain(value) end if key and value then io.write(key, "t", value, "n") end end 16 / 32
  • 17. Lua Script Example: reduce-value-counter.lua 1/3 -- Assumes input sorted by VALUE -- a foo --> foo 3 -- a foo bar 2 -- b foo quo 1 -- a bar -- c bar -- d quo 17 / 32
  • 18. Lua Script Example: reduce-value-counter.lua 2/3 local last_key = nil, accum = 0 local flush = function(key) if last_key then io.write(last_key, "t", accum, "n") end accum = 0 last_key = key -- may be nil end 18 / 32
  • 19. Lua Script Example: reduce-value-counter.lua 3/3 for l in io.lines() do -- Note reverse order! local value, key = l:match("^(.-)t(.*)$") assert(key and value) if key ~= last_key then flush(key) collectgarbage("step") end accum = accum + 1 end flush() 19 / 32
  • 20. Tying It All Together Basically: You work with sorted data, mapping and reducing it line-by-line, in parallel where at all possible, while trying to use as much of available hardware resources as practical, and without running out of memory. 20 / 32
  • 22. The Tools parallel sort, uniq, grep cut, join, comm pv compression utilities LuaJIT 22 / 32
  • 23. LuaJIT? Up to a point: 2.1 helps to speed things up, FFI bogs down development speed. Go plain Lua first (run it with LuaJIT), then roll your own ecosystem as needed ;-) 23 / 32
  • 24. Parallel xargs for parallel computation can run your jobs in parallel on a single machine or on a "cluster" 24 / 32
  • 25. Compression gzip: default, bad lz4: fast, large files pigz: fast, parallelizable xz: good compression, slow ...and many more, be on lookout for new formats! 25 / 32
  • 26. GNU sort Tricks LC_ALL=C sort -t$’t’ --parallel 4 -S60% -k3,3nr -k2,2 -k1,1nr Disable locale. Specify delimiter. Note that parallel x4 with 60% memory will consume 0.6 * log(4) = 120% of memory. When doing multi-key sort, specify parameters after key number. 26 / 32
  • 29. Why Lua? Perl, AWK are traditional alternatives to Lua, but, if you’re not very disciplined and experienced, they are much less maintainable. 29 / 32
  • 30. Start Small! Always run your scripts on small representative excerpts from your datasets, not only while developing them locally, but on actual data-processing nodes too. Saves time and helps you learn the bottlenecks. Sometimes large run still blows in your face though: Monitor resource utilization at run-time. 30 / 32
  • 31. Discipline! Many moving parts, large turn-around times, hard to keep tabs. Keep journal: Write down what you run and what time it took. Store actual versions of your scripts in a source control system. Don’t forget to sanity-check the results you get! 31 / 32