Scan and lookup are two core operations in main memory column stores. A scan operation scans a column and returns a result bit vector that indicates which records satisfy a filter. Once a column scan is completed, the result bit vector is converted into a list of record numbers, which is then used to look up values from other columns of interest for a query. Recently there are several in-memory data layout proposals that aim to improve the performance of in-memory data processing. However, these solutions all stand at either end of a trade-off --- each is either good in lookup performance or good in scan performance, but not both. In this paper we present ByteSlice, a new main memory storage layout that supports both highly efficient scans and lookups. ByteSlice is a byte-level columnar layout that fully leverages SIMD data-parallelism. Micro-benchmark experiments show that ByteSlice achieves a data scan speed at less than 0.5 processor cycle per column value --- a new limit of main memory data scan, without sacrificing lookup performance. Our experiments on TPC-H data and real data show that ByteSlice offers significant performance improvement over all state-of-the-art approaches.
Paper: http://dl.acm.org/citation.cfm?id=2747642
Source: https://github.com/fzqneo/ByteSlice
More than Just Lines on a Map: Best Practices for U.S Bike Routes
ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout
1. ByteSlice: Pushing the Envelop of
Main Memory Data Processing
with a New Storage Layout
Ziqiang Feng†, Eric Lo†, Ben Kao‡, Wenjian Xu†
†The Hong Kong Polytechnic University
‡The University of Hong Kong
75. Padding / Space overhead?
• If value not a multiple of 8 bits, pad 0’s at the end.
• We focus on memory bandwidth, because:
• Adding RAM is easy
• Increasing memory bandwidth is difficult
• Extra RAM space? Cheap.
• Extra bandwidth? Early stop helps!
75