Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
In memory OLAP engine
1. In memory OLAP engine
Samuel Pelletier
Kaviju inc.
samuel@kaviju.com
2. OLAP ?
• An acronym for OnLine Analytical Processing.
• In simple words, a system to query a multidimensional data set
and get answer fast for interactive reports.
• A well known implementation is an Excel Pivot Table.
3. Why build something new
• I wanted something fast, memory efficient for simple queries with
millions of facts.
• Sql queries dost not works for millions of facts with multiple
dimensions, especially with large number of rows.
• There are specialized tools for OLAP from Microsoft, Oracle and
others but they are large and expensive, too much for my needs.
• Generic cheap toolkits are not memory efficient, this is the cost for
their simplicity.
• I wanted a simple solution to deploy with minimal dependency.
4. Memory usage and time to
retrieve 1 000 000 invoice lines
• Fetching EOs uses 1.2 GB of ram in 13-19 s
• Fetching raw rows uses 750 MB of ram in 5-8 s.
• Fetching as POJOs with jdbc uses 130 MB in 4.0 s.
• Reading from file as POJOs uses 130 MB in 1.4 s.
• For 7 M rows, EOs would require 8.4 GB for gazillions of small
objects (bad for the GC).
5. Time to compute sum of sales for
1 000 000 invoice lines
• 2.1 s for "select sum(sales)..." in FrontBase with table in RAM.
• 0.5 s for @sum.sales on EOs.
• 0.12 s for @sum.sales on raw rows.
• 0.5 s for @sum.sales on POJOs.
• 0.009 s for a loop with direct attribute access on POJOs.
6. Some concepts
• Facts are the elements being analyzed.An exemple is invoice
lines.
• Facts contains measures like quantities, prices or amounts.
• Facts are linked to dimensions used to filter and aggregate
them. For invoice lines, we have product, invoice, date, etc.
• Dimensions are often part of a hierarchy, for example, products
are in a product category, dates are in a month and in a week.
7. Sample Invoice dimension hierarchy
Invoice
Line
Invoice
Date
Month
Ship to Client type
Sold to
Product
Salesman
SalesManager
Week
Client type
Measures:
Shipped Qty
Sales
Profits
8. Steps to implement an engine
• Create the Engine class.
• Create required classes to model the dimension hierarchy.
• Create theValue class for your facts.
• Create the Group class that will compute summarized results.
• Create the dimensions definition classes.
9. Engine class
• Engine class extends OlapEngine with Group andValue types.
public class SalesEngine extends OlapEngine<GroupEntry,Value>
• Create the objects required for the data model and lookup table
used to load the facts.
• Load the fact intoValue objects.
• Create and register the dimensions.
10. Create required model objects
public class Product {
public final int code;
public final String name;
public final ProductCategory category;
public Product(int code, String name, ProductCategory category) {
super();
this.code = code;
this.name = name;
this.category = category;
}
}
!
private void loadProducts() {
productsByCode = new HashMap<Integer, Product>();
!
WOResourceManager resourceManager = ERXApplication.application().resourceManager();
String fileName = "olapData/products.txt";
try ( InputStream fileData = resourceManager.inputStreamForResourceNamed(fileName, null, null);) {
InputStreamReader fileReader = new InputStreamReader(fileData, "utf-8");
BufferedReader reader = new BufferedReader(fileReader);
String line;
while ( (line = reader.readLine()) != null) {
String[] cols = line.split("t", -1);
Product product = new Product(Integer.parseInt(cols[0]), cols[0], categoryWithID(cols[1]));
productsByCode.put(product.code, product);
}
}
...
}
12. Value and GroupEntry classes
• Value classe contains your basic facts (invoice lines for example)
public class InvoiceLine extends OlapValue<Sales>
• GroupEntry is use to compute summarized results.
public class Sales extends GroupEntry<InvoiceLine>
• These are tightly coupled, a GroupEntry represent a computed
result for an array ofValues; metrics are found in both classes.
13. Value Class
public class InvoiceLine extends OlapValue<Sales> {
public Invoice invoice;
public final short lineNumber;
public Product product;
!
public int shippedQty;
public float sales;
public float profits;
!
public int salesmanNumber;
public int salesManagerNumber;
!
public InvoiceLine(int valueIndex, short lineNumber) {
super(valueIndex);
this.lineNumber = lineNumber;
}
}
14. GroupEntry class
public class Sales extends GroupEntry<InvoiceLine> {
private int shippedQty;
private double sales = 0.0;
private double profits = 0.0;
!
public Sales(GroupEntryKey<Sales, InvoiceLine> key) {
super(key);
}
!
@Override
public void addEntry(InvoiceLine entry) {
shippedQty += entry.shippedQty;
sales += entry.sales;
profits += entry.profits;
}
!
@Override
public void optimizeMemoryUsage() {
}
return sales;
}
!
...
}
15. Dimensions classes
• Dimensions implement the engine indexes and key extraction for
result aggregation.
• Dimensions are usually linked to another class representing an
entity like Invoice, Client, Product or ProductCatogory.
• Entity are value object POJO for optimal speed an memory
usage.You may add a method to get the corresponding eo.
• Dimensions are either leaf (a group of facts) or group (a group of
leaf entries).
16. Product dimension class
public class ProductDimension extends OlapLeafDimension<Sales,Integer,InvoiceLine> {
!
public ProductDimension(OlapEngine<Sales, InvoiceLine> engine) {
super(engine, "productCode");
}
!
@Override
public Integer getKeyForEntry(InvoiceLine entry) {
return entry.product.code;
}
!
@Override
public Integer getKeyForString(String keyString) {
return Integer.valueOf(keyString);
}
public ProductCategoryDimension createProductCategoryDimension() {
long startTime = System.currentTimeMillis();
ProductCategoryDimension dimension = new ProductCategoryDimension(engine, this);
!
for (Product product : salesEngine().products()) {
dimension.addIndexEntry(product.category.categoryID, product.code);
}
long fetchTime = System.currentTimeMillis() - startTime;
engine.logMessage("createProductCategoryDimension completed in "+fetchTime+"ms.");
return dimension;
}
!
private SalesEngine salesEngine() {
return (SalesEngine) engine;
}
17. Product category dimension class
public class ProductCategoryDimension extends OlapGroupDimension<Sales,Integer,InvoiceLine,ProductDimension,Integer> {
!
public ProductCategoryDimension(OlapEngine<Sales, InvoiceLine> engine, ProductDimension childDimension) {
super(engine, "productCategoryCode", childDimension);
}
!
@Override
public Integer getKeyForEntry(InvoiceLine entry) {
return entry.product.category.categoryID;
}
!
@Override
public Integer getKeyForString(String keyString) {
return Integer.valueOf(keyString);
}
18. Initialize and use in an app
• The engine is multithread capable once loaded.
• I usually create a singleton for the engine; it can also be in your
app class.
• Entity are value object POJO for optimal speed an memory
usage.You may add a method to get the corresponding eo.
• Dimensions are either leaf (a group of facts) or group (a group of
leaf entries).
19. Use in a application
public Application() {
...
SalesEngine.createEngine();
}
!
!
In the component that uses the engine
!
public OlapNavigator(WOContext context) {
super(context);
....
engine = SalesEngine.sharedEngine();
if (engine == null) {
Engine me bay null if it has not completed it's loading...
}
}
!
someFetchMethod() {
OlapResult<Sales, InvoiceLine> result = engine.resultForRequest(query);
!
rows = new NSArray<Sales>(result.getGroups());
sort or put inside a ERXDisplayGroup...
}
!
21. Java and memory
• To keep the garbage collector happy, it is better to have a
maximum heap at least 2-3 times the real usage.
• GC can kill your app performance if memory is starved.With
default setting, it may even kill your server by using multiple core
for long periods at least in 1.5 and 1.6.
• Java 1.7 contains a new collector, probable better.