From sensors to location-tracking, machines generate an enormous amount of information. Despite the potential opportunities for monetization, organizations have struggled to realize the promise of M2M and the Internet of Things. In this webinar, we'll explore how MongoDB can ingest, store, manage and analyze vast amounts and types of data, enabling new M2M applications that were previously not possible. We'll discuss example applications based on real-world use cases, including schema design, example queries, and aggregation.
6. Why this is game-changing
• Resource tracking
• Process optimization
• Market analysis
• Real-time decision making
• etc.
– (… including things we haven’t thought of yet…)
7. Why this is technically challenging
• Massive volumes of data
• High data ingestion rates
• Complex data analysis requirements
• Evolving data modeling needs
9. MongoDB is a ___________ database
• Open source
• High performance
• Full featured
• Document-oriented
• Horizontally scalable
10. Full Featured
• Dynamic (ad-hoc) queries
• Built-in online aggregation
• Rich query capabilities
• Traditionally consistent
• Many advanced features
• Support for many programming languages
11. Document-Oriented Database
• A document is a nestable associative array
• Document schemas are flexible
• Documents can contain various data types
(numbers, text, timestamps, blobs, etc)
15. Suppose you’ve got a lot of trucks
• You probably care about where they are
• You probably care if they’re on schedule
• You might also care what they’re carrying
(cargo and/or fuel)
16. What are these data?
• Vehicle tracking data is positional info
available via GPS
• Cargo/fuel might be continuous sensor data
(e.g., volume or weight) or inventory
(e.g., RFID)
17. What MongoDB can do here
• For GPS type data, MongoDB has long had
powerful geospatial query facilities:
– “Find all trucks within a region (within a certain
time range, perhaps)”
– “Find all trucks within 100km from the
warehouse/customer”
18. Recent additions to GeoSpatial
• MongoDB version 2.4 introduces support for
indexing and querying on various GeoJSON
data types (polygons, line strings)
– “Find trucking routes that intersect”
– “Find routes that pass nearby to a customer”
19. What about the cargo/fuel?
• Sensor data is an easy fit for MongoDB
• Cf. MMS, the MongoDB Monitoring Service
– Cloud service hosted by 10gen since 2011
– On the order of 1M measurements written per
second (i.e., 10B writes/day)
– Continuous data rendering/alerting/analysis of
ingested data, 24x7
21. Suppose you’ve got a building
• You probably keep it climate controlled…
• … and lit …
• … and perhaps secured with entry cards.
22. So you measure things
• Temperature readings and/or HVAC
utilization reports
• Lights on/off
• Swipe-ins/swipe-outs through secured doors
23. This is “just” sensor data
• Straightforward to store in MongoDB
documents
• With strategic document design, a single
server can save hundreds of thousands of
sensor reads per second
24. But how do you use this data?
• MongoDB has a built-in Aggregation
Framework that supports ad-hoc analysis
tasks over data sets
– “What rooms had the highest average Air
Conditioning utilization bracketed daily?”
– “Which secured doors have the most ‘pass-
back’ problems?”
25. Replication supports analytics
• E • Queries can be
parceled out to
different replicas
• In different DCs, say,
• Or to segregate
competing
workloads
26. Use Case #3: Who are the
clients in your neighborhood?
27. Suppose you’ve mobile customers
• You might know where they are during their
day
• You might want to pitch them offers while
they’re there
– Or perhaps notify partners in real-time
• And you might evolve your model progressively
28. Tracking locations is straightforward
• We’ve already discussed GPS data
– Recording positional information
– Geospatial queries for proximity etc.
• Though mobile customers might have more
interesting locational data than just GPS
– e.g., user U is in retail store S now
– or, user U1 is near to U2 now
29. Evolving your customer model
• One interesting thing about people is that
they differ
• And you might choose to pay attention to
changing things as your business evolves
• MongoDB makes this easy
30. Flexible Schema supports Agility
• As you become interested in new data, you
include that data in your documents
– No laborious data migration processes are
necessary – just adapt the application models to
record new information
– Queries and indexes work over polymorphic
data models, too
32. Going further
• Where do you build your next
warehouse/place your next office?
• How should you staff your retail
spaces/factory floors?
• What will next year’s smart home products
enable?
33. To recap
• MongoDB is widely used for machine-
generated sensor, GPS, inventory, and other
kinds of data
• Dynamic schema, rich query facilities, built-
in analytic features, replication and
horizontal scaling simplify M2M
architectures
Once upon a time, machine-readable data got created at the rate at which cards could be punched.
But nowadays, all kinds of systems can be generating data continuously.Image from money.cnn.com
This is, in some sense, a new space.
I presume you all know what open source means at this point, and I’m going to leave high performance unexamined for the moment
In many respects, MongoDB’s feature set is relatively analogous to the kinds of features that traditional databases have offered: dynamic queries, built-in aggregation, specialized indexing types and search facilities (geospatial and text search, for instance). Only we change the data model we use, partly for programmer productivity and also because the changed data model permits a powerful scaling approach, which, indeed, is critical to the performance requirements of M2M systems.
So our data model is based on the idea of nestable documents, rather than flat rows. These model data structures in programs considerably more conveniently than flat relational rows do, leading to gains in programmer productivity, among other things.
But MongoDB also offers a horizontal scaling architecture, whereby the capacity of a MongoDB deployment to perform reads and/or writes grows linearly with the quantity of hardware deployed. In practice, this means that growing a MongoDB cluster’s should cost you linearly in the amount you want to grow your capacity (whereas traditional vertical scaling approaches tend to evince highly nonlinear marginal scaling costs).(In practice, however, one tends not to deploy clusters of Apple iMacs for a database. At least, I’ve never seen it.)
And within a cluster, it’s customary for each of the units of horizontal scaling, the shards, to be a set of nodes that replicate your data. This gives you data redundancy for high availabilty and disaster recovery, purposes, but is also, increasingly, useful for working with large volumes of machine generated data, as we will revisit later in this webinar.
A comment about use cases: all the use cases described here are real, only the names and details have been elided to protect confidentiality. (In many cases, users consider MongoDB to be a “secret weapon” that gives them competitive advantage, which prevents us from talking publically about names and details. If you happen to be using MongoDB in any capacity and are interested in letting others know about it, we’d love to hear from you.)
Note that some of these data are potentially high-volume if you record them frequently enough, or if your buildings/campuses are very large
Can optimize for most-common use cases
So the nice thing is that you can do different work on different boxes, possibly in different DCs.
Now, individualsdon’t generate data that fast (owing to their numbers and the laws of physics), but the same ideas apply here as for more numerous or frequent sensor readings
But the interesting thing about customers is that they change, and business goals change, and so what you record about your customer will need to adapt as you go.
So, for example, maybe at the outset you record some basic info about your customers, and as you develop more information about them you have an increasingly refined idea of what each customer is about. Maybe you acquire ideas about their habits/purchases/etc.; or maybe next year’s smartphones will be able to measure people’s heart rates or body temperature. (And who knows what Google Glass will let us do?)That information is easy to merge into your existing data about your customer, just by leveraging MongoDB’s flexible schemas. And all this works with the deep features we’ve already discussed: geo, aggregation, dynamic query, etc.