Cascalog is a Clojure-based query language for Hadoop that provides a powerful and easy-to-use tool for data analysis. It allows users to write queries as regular Clojure code, offering features like joins, aggregators, functions, and sorting. Cascalog is unique in that it offers the full power of Clojure at all times by integrating queries directly into the programming language. BackType uses Cascalog for tasks like identifying influencers on social media, determining exposure to URLs, and studying engagement over time.
11. What sets Cascalog apart?
Custom operations
No UDF interface
Just Clojure functions
12. What sets Cascalog apart?
Dynamic queries
Write functions that return queries
Manipulate queries as first-class entities in the
language
13. What sets Cascalog apart?
Use Cascalog side by side with other code
Appends and Distributed Copies
Consolidation
Application logic
14. Easy Experimentation
Ships with test
dataset that can be
queried locally (the
“playground”)
5 minutes to setup
Hadoop, Clojure, and
Cascalog locally - see
README
20. Cascalog at BackType
Cascalog is used to:
Identify influencers
Determine number of people exposed to URLs
on Twitter
21. Cascalog at BackType
Cascalog is used to:
Identify influencers
Determine number of people exposed to URLs
on Twitter
Identify “interesting tweets”
22. Cascalog at BackType
Cascalog is used to:
Identify influencers
Determine number of people exposed to URLs
on Twitter
Identify “interesting tweets”
Study social engagement of domains over time
23. Cascalog at BackType
Cascalog is used to:
Identify influencers
Determine number of people exposed to URLs
on Twitter
Identify “interesting tweets”
Study social engagement of domains over time
Etc, etc.
24. Cascalog at BackType
Input and output
Cascalog reads from MySQL databases and
HDFS
Cascalog writes to Cassandra and HDFS
25. Cascalog at BackType
Rapid development
Local playground dataset for development
Develop queries in the REPL
29. Cascading and Cascalog
Provided by Cascading:
Tuple abstraction and tuple manipulation
Workflow to MapReduce translation
Read and write from anywhere with Taps