Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Exploring Public Datasets and APIs with MongoDB and Analytica
1. Exploring Public APIs with
MongoDB and Analytica
Nosh Petigara
nosh@analytica.com
@noshp
@analytica_inc
2. Today
• MongoDB and public APIs
• What is Analytica?
• Demo
– Analytica shell (twitter data)
– Analytica for Excel (StackOverflow data)
3. Some data sets to explore
• Twitter API (JSON)
– https://dev.twitter.com/
• Crunchbase API (JSON)
– http://developer.crunchbase.com/
• Stackoverflow (JSON and CSV)
– http://data.stackexchange.com/
• NYTimes (JSON, XML)
– http://developer.nytimes.com/docs
4. MongoDB and public APIs
• Most APIs talk JSON
– MongoDB’s native JSON import
• APIs vary wildly (internally and between one
another)
– MongoDB is schema-free
• Data import is only half the battle
– MongoDB’s query language and aggregation
framework
6. Analytica
• Analytics & reporting platform for MongoDB
– Natively understands JSON/document hierarchy
– Tailored for analytics
– Works directly on MongoDB
• Discovery, analysis, visualization cycle
• In beta [http://analytica.com]
7. What can you do with Analytica?
• Inspect and extract data
• Augment your data model
• Calculate & aggregate
• Filter and transform data
• Join collections
8. Demos
• Today
– Twitter stats [using the Analytica Shell]
– Stackoverflow community analysis [using
Analytica for Excel]
• Not shown
– REST API
– Analytica web (Coming soon)
10. Some other examples
Tweets vs. retweets count(select(twitter.tweets.where(retweet_count <> 0)))
vs. replies
Follower counts max(twitter.tweets.user.followers_count)
Popular hashtags set twitter.byhashtag = group(tweets.by(entities.hashtags.text))
set twitter.byhashtag.tweetcount = count(tweets)
set twitter.populartags = orderdesc(byhashtag.by(tweetcount))
get twitter.populartags.text