This document discusses the concept of discardable, in-memory materialized queries (DIMMQs) for Hadoop. DIMMQs involve automatically creating optimized materialized views of queries in memory to improve query performance. They allow for smarter use of increasing memory in Hadoop servers by caching query results rather than entire tables. DIMMQs are compared to Spark RDDs and classic materialized views. Implementation considerations and potential applications like the Lambda architecture are covered.
This software does not yet exist. I hope and believe that it will. I don’t believe in promoting vaporware! I want to have a discussion about where Hadoop is going so as many of us can shape it. This seems like a great time and place to have it.
My background is in database. I think like a database guy. I’ve been thinking really hard about how to bring the “good stuff” from databases over to Hadoop
I have a deep knowledge of BI, and what BI tools need from their underlying data system, having written Mondrian
Hadoop today brings a lot of brute force.
It can be more effective if it were a bit smarter. We cannot make effective use of memory unless we are a bit smarter.
Lots of memory now. But still not enough
Bringing all kinds of data into the Hadoop tent - Batch, interactive, streaming, iterative
Declarative, algebraic, transparent - Applications use different copies of data without knowing it
Making Hadoop smarter – Adding brains to compliment Hadoop’s formidable brawn