SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
Implementation of multidimensional databases with
document-oriented NoSQL
M. Chevalier, M. El Malki
1
, A. Kopliku, O. Teste, R. Tournier
Université de Toulouse, IRIT 5505, 118 Route de Narbonne, 31062 Toulouse, France
{Firstname.Lastname}@irit.fr
Abstract. NoSQL (Not Only SQL) systems are becoming popular due to
known advantages such as horizontal scalability and elasticity. In this paper, we
study the implementation of data warehouses with document-oriented NoSQL
systems. We propose mapping rules that transform the multidimensional data
model to logical document-oriented models. We consider three different logical
translations and we use them to instantiate multidimensional data warehouses.
We focus on data loading, model-to-model conversion and cuboid computation.
1 Introduction
NoSQL solutions have proven some clear advantages with respect to relational data-
base management systems (RDBMS) [14]. Nowadays, the research attention has
moved towards the use of these systems for storing “big” data and analyzing it. This
work joins our previous work on the use of NoSQL solutions for data warehousing [3]
and it joins substantial ongoing works [6] [9] [15]. In this paper, we focus on one
class of NoSQL stores, namely document-oriented systems [7].
Document-oriented systems are one of the most famous families of NoSQL systems.
Data is stored in collections, which contain documents. Each document is composed
of key-value pairs. The value can be composed of nested sub-documents. Document-
oriented stores enable more flexibility in schema design: they allow the storage of
complex structured data and heterogeneous data in one collection. Although, docu-
ment-oriented databases are declared to be “schema less” (no schema needed), most
uses convey to some data model.
When it comes to data warehouses, previous work has shown that it can be instantiat-
ed with different logical models [10]. We recall that data warehousing relies mostly
1
This study is supported by the ANRT funding under CIFRE-Capgemini partnership.
on the multidimensional data model. The latter is a conceptual model2
, and we need to
map it in document-oriented logical models. Mapping the multidimensional model to
relational databases is quite straightforward, but until now there is no work (except of
our previous [3]) that considers the direct mapping from the multidimensional con-
ceptual model to NoSQL logical models (Fig. 1. ). NoSQL models support more
complex data structures than relational model i.e. we do not only have to describe
data and the relations using atomic attributes. They have a flexible data structure (e.g.
nested elements). In this context, more than one logical model is candidate for map-
ping the multidimensional model. As well, the evolving needs might demand for
switching from one model to another. This is the scope of our work: NoSQL logical
models and their use for multidimensional data warehousing.
Fig. 1. Translations of a conceptual multidimensional model into logical models.
In this paper, we focus on multidimensional data models for data warehousing. We
compare three translations of the conceptual model at logical document-oriented
model level. We provide formalism that enables us to define the mapping from the
conceptual model to the logical model. Then, we show how we can instantiate data
warehouses in document-oriented systems. Our studies include the load of data, the
conversions model-to-model and the computation of pre-aggregate OLAP cubes.
Our motivations are multiple. The implementation of OLAP systems with NoSQL
systems is a new alternative [14] [6]. It is justified by the advantages such as more
scalability. The increasing scientific research in this direction demands for formaliza-
tion, common-agreement models and evaluation of different NoSQL systems.
The paper is organized as follows. The following section studies the state of the art. In
section 3, we formalize the multidimensional data model and OLAP cuboids. Then,
2
The conceptual level consists in describing the data in a generic way regardless the infor-
mation technologies whereas the logical level consists in using a specific technique for im-
plementing the conceptual level.
Multidimensional
OLAP
Relational
OLAP
ConceptualLevel
LogicalLevel
NoSQL
OLAP
New transformation
Existing transformation
Legend
we focus on formalisms and definitions of document-oriented models. In section 4,
we show experiments.
2 State of the art
Considerable research has focused on the translation of data warehousing concepts to
relational R-OLAP logical level [2] [5]. Multidimensional databases are mostly im-
plemented using RDBMS technologies. Mapping rules are used to convert structures
of the conceptual level (facts, dimensions and hierarchies) into a logical model based
on relations. Moreover, many researchers [1] have focused on the implementation of
optimization methods based on pre-computed aggregates (also called materialized
views, or OLAP cuboids). However, R-OLAP implementations suffer from scaling-
up to very large data volumes (i.e. “Big Data”). Research is currently under way for
new solutions such as using NoSQL systems [14]. Our approach aims at revisiting
these processes for automatically implementing multidimensional conceptual models
directly into NoSQL models.
Other studies investigate the process of transforming relational databases into a
NoSQL logical model (bottom part of Figure 1). In [12], an algorithm is introduced
for mapping a relational schema to a NoSQL schema in MongoDB [7], a document-
oriented NoSQL database. However, either these approaches not consider the concep-
tual model of data warehouses because they are limited to the logical level, i.e. trans-
forming a relational model into a documents-oriented model. In [11] the author pro-
poses an approach to optimize schema in NoSQL.
There is currently no approach that automatically and directly transforms a multidi-
mensional conceptual model into a NoSQL logical model. It is possible to transform
multidimensional conceptual models into a logical relational model, and then to trans-
form this relational model into a logical NoSQL model. However, this transformation
using the relational model as a pivot model has not been formalized as both transfor-
mations were studied independently of each other. The work presented here is a con-
tinuation of our previous work where we study and formalize the implementation of
data warehouses with NoSQL systems [3]. Our previous work considers two NoSQL
models (one column-oriented and one document oriented). This article focuses only
on document-oriented systems; we analyze three data models (with respect to 1); we
consider all cross-model mappings; we improve the formalization and we provide
new experiments.
3 MULTIDIMENSIONAL CONCEPTUAL MODEL AND OLAP CUBE
3.1 Conceptual Multidimensional Model
To ensure robust translation rules we first define the multidimensional model used at
the conceptual level [8] [12].
A multidimensional schema, namely E, is defined by (FE
, DE
, StarE
)
FE
= {F1,…, Fn} is a finite set of facts,
DE
= {D1,…, Dm} is a finite set of dimensions,
StarE
: FE
→ 2DE
is a function that associates facts of FE
to sets of dimensions
along which it can be analyzed (
E
D
2 is the power set of DE
).
A dimension, denoted Di∈DE
(abusively noted as D), is defined by (ND
, AD
, HD
)
ND
is the name of the dimension,
AD=a1D,…,auD∪idD,AllD }all,id{}a,...,a{A DDD
u
D
1
D
∪= is a set of dimen-
sion attributes,
HD={H1D,…,HvD} }H,...,H{H D
v
D
1
D
= is a set of hierarchies.
A hierarchy, denoted Hi∈HD
, is defined by (NHi
, ParamHi
, WeakHi
)
NHi
is the name of the hierarchy,
ParamHi=<idD,p1
Hi,…,pvi
Hi,AllD> >=< DH
v
H
v
D
All,p,...,p,idParam i
i
i
1
Hi
is an
ordered set of vi+2 attributes which are called parameters of the relevant gradu-
ation scale of the hierarchy, ∀k∈[1..vi], pk
Hi Hi
kp ∈AD
,
WeakHi
: ParamHi
→ 2AD-ParamHi
HiD
paramA −
2 is a function associating with
each parameter possibly one or more weak attributes.
A fact, denoted F∈FE
, is defined by (NF
, MF
)
NF
is the name of the fact,
MF={f1(m1F),…,fv(mvF)} )}m(f),...,m(f{M F
vv
F
11
F
= is a set of measures,
each associated with an aggregation function fi.
3.2 The OLAP cuboid
The pre-aggregate view or OLAP cuboid corresponds to a subset of aggregated
measures on a subset of analysis dimensions. OLAP cuboids are often pre-computed
to turn frequent analysis of data more efficient. Typically, we pre-compute aggregate
functions on given interest measures grouping on some analysis dimensions. The
OLAP cube O=(FO
,DO
) derived from E is formally composed of
FO
= (NFo
, MFo
) a fact derived from F∈FE
with NFo
= NF
a subset of measures.
DO
= Star(FO
) ⊆ DE
a subset of dimensions.
If we generate OLAP cuboids on all combination of dimension attributes, we have
an OLAP cube lattice.
Illustration: Let’s consider an excerpt of the star schema benchmark [12]. It consists
in a monitoring of a sales system. Orders are placed by customers and the lines of the
orders are analyzed. A line consists in a part (a product) bought from a supplier and
sold to a customer at a specific date. The conceptual schema of this case study is pre-
sented in Fig. 2.
Fig. 2. Graphical notations of the multidimensional conceptual model
From this schema, called ESSB
, we can define cuboids, for instance:
─ (FLineOrder
, {DCustomer
, DDate
, DSupplier
}),
─ (FLineOrder
, {DCustomer
, DDate
}).
4 Document-oriented modeling of multidimensional data
warehouses
4.1 Formalism for document-oriented data models
In the document-oriented model, data is stored in collections of documents. The struc-
ture of documents is defined by attributes. We distinguish simple attributes whose
values are atomic from compound attributes whose values are documents called
DATE
CUSTOMER
PART
Customer City
LineOrder
Quantity
Discount
Revenue
Tax
Date
Partkey
Category
Prod_Name
Month Year
Month_Name
HCust
HBrand
Dimension
Hierarchy
Fact
Measures
Parameter
Weak Attributes
HTime
All
Brand
All
Legend
–FSSB={FLineOrder}
–DSSB={DCustomer, DPart, DDate, DSupplier},
–StarSSB(FLineOrder)={DCustomer, DPart, DDate, DSupplier}
HTIME
SUPPLIER
Supplier City Region
Name
All
HSuppl
Name
Region Nation
AllType
Size
HCateg
Nation
nested documents or sub-documents. The structure of document is described as a
set of paths from the document tree. The following example illustrates the above
formalism. Consider the document tree in Fig. 3. describing a document di of the col-
lection C with identifier id:vi.
Fig. 3. Tree-like representation of documents
A document C[id:vi] is defined by:
- C: the collection where the document belongs. We use the notation C[id:vi] to
indicate a document of a collection C with identifier id:vi.
- P: all paths in the tree of the document. A path p is described as
p=C[id:vi]{k1:k2…kn:a} where k1, k2, … kn are keys within the same path ending at
the leaf node with the value of a simple attribute.
4.2 Document-oriented models for data warehousing
In document-oriented stores, the data model is determined not only by its attributes
and values, but also by the path to the data. In relational database models, the map-
ping from conceptual to logical is more straightforward. In document-oriented stores,
there are multiple candidate models that differ on the collections and structure. No
model has been proven better than the others and no mapping rules are widely accept-
ed. In this section, we present three approaches of logical document-oriented mod-
eling. These models correspond to a broad classification of possible approaches for
modeling multidimensional cuboid (one fact and its star). In the first approach, we
store all cuboid data in one collection in a flat format (without sub-document). In the
second case, we make use of nesting (embedded sub-documents) within one collec-
tion (richer expressivity). In the third case, we consider distributing the cuboid data in
more than one collection. These approaches are described below.
Region Midi Pyrénées
Partkey P09878
Prod_Name Copper valve c3
di
id
Customer
vi
Part
Date
LineOrder
Customer C02265
Name M. Smith
City Toulouse
Nation France
date 03-31-2015
month 03-2015
month_name March
year 2015
Quantity 10
Size 10x15x10
Supplier
Supplier SP015678
Discount 0
Revenue 3.5789
Brand B3
Type Connector
Category Plumbing
Name CPR Int.
City Madrid
Region Center Spain
Nation Spain
Tax 2.352
C[id:vi]{customer:Region:Midi Pyrénées}
C[id:vi]{customer:Customerkey:C02265}
C[id:vi]{customer:Name:M. Smith}
C[id:vi]{customer:City:Toulouse}
C[id:vi]{customer:Nation:France}
C[id:vi]{Part:Partkey:P09878}
C[id:vi]{Part:Prod_Name:Copper valve 3}
C[id:vi]{LineOrder:Tax:2,352}
C[id:vi]{LineOrder:Revenue:3,5789}
C[id:vi]{LineOrder:Discount:0}
C[id:vi]{LineOrder:Quantity:10}
C[id:vi]{Supplier:Nation:Spain}
C[id:vi]{Supplier:City:Madrid}
C[id:vi]{Supplier:Name:CPR int}
C[id:vi]{Supplier:Supplierkey:CP015678}
C[id:vi]{Date:Year:2015}
C[id:vi]{Date:Date:03-31-2015}
C[id:vi]{Date:Month:03-2015}
C[id:vi]{Date:Month-Name:March}
C[id:vi]{Part:Size:10x15x10}
C[id:vi]{Part:Brand:B3}
C[id:vi]{Part:Type:Connector}
C[id:vi]{Part:Category:Plumbing}
Path notationsTree notations
MLD0: For a given fact, all related dimensions attributes and all measures are com-
bined in one document at depth 0 (no sub-documents). We call this the flat model
noted MLD0.
MLD1: For a given fact, all dimension attributes are nested under the respective at-
tribute name and all measures are nested in a subdocument with key “measures”. This
model is inspired from [3]. Note that there are different ways to nest data, this is just
one of them.
MLD2: For a given fact and its dimensions, we store data in dedicated collections one
per dimension and one for the fact. Each collection is kept simple: no sub-documents.
The fact documents will have references to the dimension documents. We call this
model MLD2 (or shattered). This model has known advantages such as less memory
usage and data integrity, but it can slow down interrogation.
4.3 Mapping from the conceptual model
The formalism that we have defined earlier allows us to define a mapping from the
conceptual multidimensional model to each of the logical models defined above. Let
O=(FO
, DO
) be a cuboid for a multidimensional model for the fact FO
(MO
is a set of
measures) with dimensions in D. NF
and ND
stand for fact and dimension names.
The above mappings are detailed below. Let C be a generic collection, CD
a col-
lection for the dimension D and CF
a collection for a fact FO
. The shows how we can
map any measure m of FO
and any dimension of DO
into any of the models: MLD0,
MLD1, MLD2.
Conceptual Model to MLD0: To instantiate this model from the conceptual model,
these rules are applied:
Each cuboid O (FO
and its dimensions DO
) is translated in a collection C.
Each measure m ∈ MF
is translated into a simple attribute (i.e. C[id]{m})
For all dimension D ∈ DO
, each attribute a ∈ AD
of the dimension D is convert-
ed into a simple attribute of C (i.e. C[id]{a})
Conceptual Model to MLD1: To instantiate this model from the conceptual model,
these rules are applied:
Each cuboid O (FO
and its dimensions DO
) is translated in a collection C.
The attributes of the fact FO
will be nested in a dedicated nested document
C[id]{NF
}. Each measure m ∈ MF
is translated into a simple attribute
C[id]{NF
:m}.
For any dimension D ∈ DO
, its attributes will be nested in a dedicated nested
document C[id]{ND
}. Every attribute a ∈ AD
of the dimension D will be
mapped into a simple attribute C[id]{ND
:a}.
Conceptual Model to MLD2: To instantiate this model from the conceptual model,
these rules are applied:
Each cuboid O (FO
and its dimensions DO
), the fact FO
is translated in a collec-
tion CF
and each dimension D ∈ DO
into a collection CD
.
Each measure m ∈ MF
is translated within CF as a simple attribute (i.e.
CF
[id’]{m})
For all dimension D ∈ DO
, each attribute a ∈ AD
of the dimension D is mapped
into CD
as a simple attribute (i.e CD
[id]{a}), and if a=idD
the document CF
is
completed by a simple attribute CF
[id’]{a} (the value reference of the linked
dimension).
Table 1. Mapping rules from the conceptual model to the logical models
Conceptual
Logical
MLD0 MLD1 MLD2
∀D∈DO
, ∀a∈AD
a→ C[id]{a} a→ C[id]{ND
:a} a→ CD
[id]{a}
idD
→ CF
[id’
]{a} (*)
∀m∈MF
m→ C[id]{m} m→ C[id]{NF
:m} m→ CF
[id’
]m
(*)
The two identifiers CF
[id’
] and CD
[id] are different from each other because the
document collections are not same C
D
et C
F
.
5 Experiments
Our experimental goal is to validate the instantiation of data warehouses with the
three approaches mentioned earlier. Then, we consider converting data from one
model to the other. In the end, we generate OLAP cuboids and we compare the effort
needed by model. We rely on the SSB+ benchmark that is popular for generating data
for decision support systems. As data store, we rely on MongoDB one of the most
popular document-oriented system.
5.1 Protocol
Data: We generate data using the SSB+ [4] benchmark. The benchmark models a
simple product retail reality. It contains one fact table “LineOrder” and 4 dimensions
“Customer”, “Supplier”, “Part” and “Date”. This corresponds to a star-schema. The
dimensions are hierearchic e.g. “Date” has the hierarchy of attributes [d_date,
d_month, d_year]. We have extended it to generate raw data specific to our models in
JSON file format. This is convenient for our experimental purposes. JSON is the best
file format for Mongo data loading. We use different scale factors namely sf=1,
sf=10, sf=25 and sf=100 in our experiments. The scale factor sf=1 generates approxi-
mately 107
lines for the LineOrder fact, for sf=10 we have approximately 108
lines and
so on. In the MLD2model we will have (sf x 107
) lines for LineOrder and quite less
for the dimensions.
Data loading: Data is loaded into MongoDB using native instructions. These are
supposed to load data faster when loading from files. The current version of
MongoDB would not load data with our logical model from CSV file, thus we had to
use JSON files.
Lattice computation: To compute the pre-aggregate lattice, we use the aggregation
pipeline suggested as the most performing alternative by Mongo itself. Four levels of
pre-aggregates are computed on top of the benchmark generated data. Precisely, at
each level we aggregate data respectively on: the combination of 4 dimensions all
combinations of 3 dimensions, all combinations of 2 dimensions, all combinations of
1 dimension, 0 dimensions (all data). At each aggregation level, we apply aggregation
functions: max, min, sum and count on all dimensions.
Hardware. The experiments are done on a cluster composed of 3 PCs (4 core-i5,
8GB RAM, 2TB disks, 1Gb/s network), each being a worker node and one node acts
also as dispatcher.
5.2 Results
In Table 2, we summarize data loading times by model and scale factor. We can ob-
serve at scale factor SF1, we have 107
lines on each line order collections for a 4.2 GB
disk memory usage for MLD2 (15GB for MLD0 and MLD1). At scale factors SF10
and SF100 we have respectively 108
lines and 109
lines and 42GB (150GB MLD0 and
MLD1) and 420GB (1.5TB MLD0 and MLD1) for of disk memory usage. We ob-
serve that memory usage is lower in the MLD2 model. This is explained by the ab-
sence of redundancy in the dimensions. The collections “Customers”, “Supplier”,
“Part” and “Date” have respectively 50000 records, 3333 records, 3333333 records
and 2556 records.
In Fig. 4, we show the time needed to convert data of one model to data of another
model with SF1. When we convert data from MLD0 to MLD1 and vice-versa conver-
sion times are comparable. To transform data from MLD0 to MLD1 we just introduce
a depth of 1 in the document. On the other sense (MLD1 to MLD0), we reduce the
depth by one. The conversion is more complicated when we consider MLD0 and
MLD2. To convert MLD0 data into MLD2 we need to split data in multiple tables:
we have to apply 5 projections on original data and select only distinct keys for di-
mensions. Although, we produce less data (in memory usage), we need more pro-
cessing time than when we convert data to MLD1. Converting from MLD2 to MLD0
is the slowest process by far. This is due to the fact that most NoSQL systems (includ-
ing MongoDB) do not support joins (natively). We had to test different optimization
techniques hand-coded. The loading times fall between 5h to 125h for SF1. It might
be possible to optimize this conversion further, but the results are illustrative of the
jointure issues in MongoDB.
Table 2. Loading times by model into MongoDB
MLD0 MLD1 MLD2
SF=1
107
lines 1306s/15GB 1235s/15GB 1261s/4.2GB
SF=10
108
lines 16680s/150GB 16080s/150GB 4320s/42GB
SF=25
25.107
lines 46704s/375GB 44220s/375GB 10980s/105GB
Fig. 4. Inter-model conversion times
In Fig. 5, we sumarize experimental observations concerning the computation of
the OLAP cuboids at different levels of the OLAP lattice for SF1 using data from the
model MLD0. We report the time needed to compute the cuboid and the number of
records it contains. We compute the cuboids from one of the “upper”-in hierarchy
cuboids with less records, which makes computation faster.
We observe as expected that the number of records decreases from one level to the
lower level. The same is true for computation time. We need between 300 and 500
seconds to compute the cuboids at the first level (3 dimensions). We need between 30
seconds and 250 seconds at the second layer (2 dimensions). We need less than one
second to compute the cuboids at the third and fourth level (1 and 0 dimensions).
OLAP computation using the model MLD1 provides similar results. The
performance is significantly lower with the MLD2 model due to joins. These
differences involve only the layer 1 (depth one) of the OLAP lattice, cause the other
layers can be computed from the latter. We do not report this results for space
constraints.
MLD0MLD1 MLD2
550s
720s
870s
5h-125h
Fig. 5. Computation time and records by OLAP cuboid
Observations: We observe that we need comparable times to load data in one model
with the conversion times (except of MLD2 to MLD0). We also observe reasonable
times for computing OLAP cuboids. These observations are important. At one hand,
we show that we can instantiate data warehouses in document-oriented data systems.
On the other, we can think of pivot models or materialized views that can be comput-
ed in parallel with a chosen data model.
5 Conclusion
In this paper, we have studied the instantiation of data warehouses with document-
oriented systems. We propose three approaches at the document-oriented logical
model. Using a simple formalism, we describe the mapping from the multidimension-
al conceptual data model to the logical level.
Our experimental work illustrates the instantiation of data warehouses with each
of the three approaches. Each model has its weaknesses and strengths. The shattered
model (MLD2) uses less disk memory, but it is quite inefficient when it comes to
answering queries with joins. The simple models MLD0 and MLD1 do not show
significant performance differences. Passing from one model to another is shown to
be easy and comparable in time to “data loading from scratch”. One conversion is
significantly non-performing; it corresponds to the mapping from multiple collections
(MLD2) to one collection. Interesting results are also met in the computation of the
OLAP lattice with document-oriented models. The computation times are reasonable
enough.
Record count
Computation time
CSPD
CS
CSD CPD SPDCSP
SP PDCD SDCP
C S DP
All
9392109 documents
475s
4470422 documents
327s
9790169 documents
455s
9390777 documents
454s
317415 documents
217s
21250 documents
35s
937478 documents
237s
21250 documents
35s
937475 documents
229s
85documents
<1s
3750 documents
<1s
250 documents
<1s
1document
<1s
250 documents
<1s
62500 documents
36s
10000000 documents*
Loading time only (no processing time)
For future work, we will consider logical models in column-oriented models and
graph-oriented models. After exploring data warehouse instantiation across different
NoSQL systems, we need to generalize across logical model. We need a simple for-
malism to express model differences and we need to compare models within each
paradigm and across paradigms (document versus column).
References
1. Bosworth, A., Gray, J., Layman, A., Pirahesh, H.: Data cube: A relational aggregation op-
erator generalizing group-by, cross-tab, and sub-totals. Tech. Rep. MSRTR-95-22, Mi-
crosoft Research (February 1995).
2. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology.
SIGMOD Record 26, 65.74 (1997).
3. Chevalier, M., malki, M.E., Kopliku, A., Teste, O., Tournier, R.: Implementing Multidimen-
sional Data Warehouses into NoSQL. 17th International conference on entreprise
information systems (April 2015).
4. CHEVALIER, M., EL MALKI, M., KUPLIKU, A., TESTE, O., TOURNIER, R., Benchmark for
OLAP on NoSQL Technologies, Comparing NoSQL Multidimensional Data Warehousing
Solutions, 9
th
Int. Conf. on Research Challenges in Information Science (RCIS), IEEE, 2015.
5. Colliat, G.: Olap, relational, and multidimensional database systems. SIGMOD Rec. 25(3),
64.69 (Sep 1996).
6. Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data:
The big data revolution! 14th International Workshop on Data Warehousing and OLAP.
pp. 101-104. DOLAP '11, ACM, (2011)
7. Dede, E., Govindaraju, M., Gunter, D., Canon, R.S., Ramakrishnan, L.: Performance eval-
uation of a mongodb and hadoop platform for scientific data analysis. 4th ACM Work-
shop on Scientific Cloud Computing. pp.13-20. Science Cloud '13, ACM (2013).
8. Dehdouh, K., Boussaid, O., Bentayeb, F.: Columnar nosql star schema benchmark. In:
Model and Data Engineering, vol. 8748, pp. 281-288. Springer International Publishing
(2014).
9. Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: A conceptual model for
data warehouses. International Journal of Cooperative Information Systems 7,215-247
(1998)
10. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling. John Wiley & Sons, Inc., New York, NY, USA, 2nd ed. (2002)
11. Mior, Michael J.: Automated Schema Design for NoSQL Databases SigMOD’14
12. ONeil, P., ONeil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented
fact table indexing. In: Performance Evaluation and Benchmarking, vol.5895, pp. 237-
252. Springer Berlin Heidelberg (2009)
13. Ravat, F., Teste, O., Tournier, R., Zuruh, G.: Algebraic and graphic languages for OLAP
manipulations. IJDWM 4(1), 17-46 (2008).
14. Stonebraker, M.: New opportunities for new sql. Commun. ACM 55(11), 10-11 (Nov
2012), http://doi.acm.org/10.1145/2366316.2366319
15. Zhao, H., Ye, X.: A practice of tpc-ds multidimensional implementation on nosql data-
base systems. In: Performance Characterization and Benchmarking, vol. 8391, pp.
93{108 (2014)

Mais conteúdo relacionado

Mais procurados

2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...widespreadpromotion
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...ijnlc
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Devansh16
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualizationgiurca
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015ijcsbi
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
Stacks in algorithems & data structure
Stacks in algorithems & data structureStacks in algorithems & data structure
Stacks in algorithems & data structurefaran nawaz
 
16. Algo analysis & Design - Data Structures using C++ by Varsha Patil
16. Algo analysis & Design - Data Structures using C++ by Varsha Patil16. Algo analysis & Design - Data Structures using C++ by Varsha Patil
16. Algo analysis & Design - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE cscpconf
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.docbutest
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory basedijaia
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 

Mais procurados (19)

2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
M tree
M treeM tree
M tree
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil
 
Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Stacks in algorithems & data structure
Stacks in algorithems & data structureStacks in algorithems & data structure
Stacks in algorithems & data structure
 
16. Algo analysis & Design - Data Structures using C++ by Varsha Patil
16. Algo analysis & Design - Data Structures using C++ by Varsha Patil16. Algo analysis & Design - Data Structures using C++ by Varsha Patil
16. Algo analysis & Design - Data Structures using C++ by Varsha Patil
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.doc
 
A study on rough set theory based
A study on rough set theory basedA study on rough set theory based
A study on rough set theory based
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 

Destaque

Curso mei 734 liquidos penetrantes
Curso mei 734   liquidos penetrantesCurso mei 734   liquidos penetrantes
Curso mei 734 liquidos penetrantesProcasecapacita
 
ELEMENTOS PRINCIPALES DE UN PROCESADOR
ELEMENTOS PRINCIPALES DE UN PROCESADORELEMENTOS PRINCIPALES DE UN PROCESADOR
ELEMENTOS PRINCIPALES DE UN PROCESADORAskari Brown Park
 
101 colegio de educación profesional técnica del estado de méxic1
101   colegio  de educación profesional técnica del estado de méxic1101   colegio  de educación profesional técnica del estado de méxic1
101 colegio de educación profesional técnica del estado de méxic1diiegoo234
 
ترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصر
ترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصرترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصر
ترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصرAlhayah Translation
 
PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]
PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]
PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]Sneha Singh
 
Contract assignment
Contract assignmentContract assignment
Contract assignmentrainnie290
 

Destaque (9)

Curso mei 734 liquidos penetrantes
Curso mei 734   liquidos penetrantesCurso mei 734   liquidos penetrantes
Curso mei 734 liquidos penetrantes
 
ELEMENTOS PRINCIPALES DE UN PROCESADOR
ELEMENTOS PRINCIPALES DE UN PROCESADORELEMENTOS PRINCIPALES DE UN PROCESADOR
ELEMENTOS PRINCIPALES DE UN PROCESADOR
 
101 colegio de educación profesional técnica del estado de méxic1
101   colegio  de educación profesional técnica del estado de méxic1101   colegio  de educación profesional técnica del estado de méxic1
101 colegio de educación profesional técnica del estado de méxic1
 
1 i-121170 Engineering Manager
1 i-121170  Engineering Manager1 i-121170  Engineering Manager
1 i-121170 Engineering Manager
 
ترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصر
ترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصرترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصر
ترجمة معتمدة لسفارة اوزباكستان بالقاهرة، ترجمة معتمدة اوزباكستان في مصر
 
PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]
PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]
PATNA PIRATES - PRO KABBADI LEAGUE ( INDIA) [ SWOT Analysis]
 
Convoy
ConvoyConvoy
Convoy
 
Absorption costing
Absorption costingAbsorption costing
Absorption costing
 
Contract assignment
Contract assignmentContract assignment
Contract assignment
 

Semelhante a Dawak f v.6camera-1

Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase IJECEIAES
 
Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Zhang Bo
 
Towards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehTowards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehIJECEIAES
 
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTSUSING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTScsandit
 
Efficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP CubeEfficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP CubeIRJET Journal
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databasesijaia
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
 
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENTEVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENTijdms
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101ThienSi Le
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Zakaria Zubi
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)Ankit Rathi
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASEScscpconf
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASEScsandit
 
A Data Warehouse Engineering Process
A Data Warehouse Engineering ProcessA Data Warehouse Engineering Process
A Data Warehouse Engineering ProcessDustin Pytko
 
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYSCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYijcseit
 

Semelhante a Dawak f v.6camera-1 (20)

Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
 
Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)
 
Towards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehTowards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data wareh
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTSUSING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
USING RELATIONAL MODEL TO STORE OWL ONTOLOGIES AND FACTS
 
Efficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP CubeEfficient Information Retrieval using Multidimensional OLAP Cube
Efficient Information Retrieval using Multidimensional OLAP Cube
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databases
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
 
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENTEVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
EVALUATION CRITERIA FOR SELECTING NOSQL DATABASES IN A SINGLE-BOX ENVIRONMENT
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
 
A Data Warehouse Engineering Process
A Data Warehouse Engineering ProcessA Data Warehouse Engineering Process
A Data Warehouse Engineering Process
 
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYSCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
 

Último

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Último (20)

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 

Dawak f v.6camera-1

  • 1. Implementation of multidimensional databases with document-oriented NoSQL M. Chevalier, M. El Malki 1 , A. Kopliku, O. Teste, R. Tournier Université de Toulouse, IRIT 5505, 118 Route de Narbonne, 31062 Toulouse, France {Firstname.Lastname}@irit.fr Abstract. NoSQL (Not Only SQL) systems are becoming popular due to known advantages such as horizontal scalability and elasticity. In this paper, we study the implementation of data warehouses with document-oriented NoSQL systems. We propose mapping rules that transform the multidimensional data model to logical document-oriented models. We consider three different logical translations and we use them to instantiate multidimensional data warehouses. We focus on data loading, model-to-model conversion and cuboid computation. 1 Introduction NoSQL solutions have proven some clear advantages with respect to relational data- base management systems (RDBMS) [14]. Nowadays, the research attention has moved towards the use of these systems for storing “big” data and analyzing it. This work joins our previous work on the use of NoSQL solutions for data warehousing [3] and it joins substantial ongoing works [6] [9] [15]. In this paper, we focus on one class of NoSQL stores, namely document-oriented systems [7]. Document-oriented systems are one of the most famous families of NoSQL systems. Data is stored in collections, which contain documents. Each document is composed of key-value pairs. The value can be composed of nested sub-documents. Document- oriented stores enable more flexibility in schema design: they allow the storage of complex structured data and heterogeneous data in one collection. Although, docu- ment-oriented databases are declared to be “schema less” (no schema needed), most uses convey to some data model. When it comes to data warehouses, previous work has shown that it can be instantiat- ed with different logical models [10]. We recall that data warehousing relies mostly 1 This study is supported by the ANRT funding under CIFRE-Capgemini partnership.
  • 2. on the multidimensional data model. The latter is a conceptual model2 , and we need to map it in document-oriented logical models. Mapping the multidimensional model to relational databases is quite straightforward, but until now there is no work (except of our previous [3]) that considers the direct mapping from the multidimensional con- ceptual model to NoSQL logical models (Fig. 1. ). NoSQL models support more complex data structures than relational model i.e. we do not only have to describe data and the relations using atomic attributes. They have a flexible data structure (e.g. nested elements). In this context, more than one logical model is candidate for map- ping the multidimensional model. As well, the evolving needs might demand for switching from one model to another. This is the scope of our work: NoSQL logical models and their use for multidimensional data warehousing. Fig. 1. Translations of a conceptual multidimensional model into logical models. In this paper, we focus on multidimensional data models for data warehousing. We compare three translations of the conceptual model at logical document-oriented model level. We provide formalism that enables us to define the mapping from the conceptual model to the logical model. Then, we show how we can instantiate data warehouses in document-oriented systems. Our studies include the load of data, the conversions model-to-model and the computation of pre-aggregate OLAP cubes. Our motivations are multiple. The implementation of OLAP systems with NoSQL systems is a new alternative [14] [6]. It is justified by the advantages such as more scalability. The increasing scientific research in this direction demands for formaliza- tion, common-agreement models and evaluation of different NoSQL systems. The paper is organized as follows. The following section studies the state of the art. In section 3, we formalize the multidimensional data model and OLAP cuboids. Then, 2 The conceptual level consists in describing the data in a generic way regardless the infor- mation technologies whereas the logical level consists in using a specific technique for im- plementing the conceptual level. Multidimensional OLAP Relational OLAP ConceptualLevel LogicalLevel NoSQL OLAP New transformation Existing transformation Legend
  • 3. we focus on formalisms and definitions of document-oriented models. In section 4, we show experiments. 2 State of the art Considerable research has focused on the translation of data warehousing concepts to relational R-OLAP logical level [2] [5]. Multidimensional databases are mostly im- plemented using RDBMS technologies. Mapping rules are used to convert structures of the conceptual level (facts, dimensions and hierarchies) into a logical model based on relations. Moreover, many researchers [1] have focused on the implementation of optimization methods based on pre-computed aggregates (also called materialized views, or OLAP cuboids). However, R-OLAP implementations suffer from scaling- up to very large data volumes (i.e. “Big Data”). Research is currently under way for new solutions such as using NoSQL systems [14]. Our approach aims at revisiting these processes for automatically implementing multidimensional conceptual models directly into NoSQL models. Other studies investigate the process of transforming relational databases into a NoSQL logical model (bottom part of Figure 1). In [12], an algorithm is introduced for mapping a relational schema to a NoSQL schema in MongoDB [7], a document- oriented NoSQL database. However, either these approaches not consider the concep- tual model of data warehouses because they are limited to the logical level, i.e. trans- forming a relational model into a documents-oriented model. In [11] the author pro- poses an approach to optimize schema in NoSQL. There is currently no approach that automatically and directly transforms a multidi- mensional conceptual model into a NoSQL logical model. It is possible to transform multidimensional conceptual models into a logical relational model, and then to trans- form this relational model into a logical NoSQL model. However, this transformation using the relational model as a pivot model has not been formalized as both transfor- mations were studied independently of each other. The work presented here is a con- tinuation of our previous work where we study and formalize the implementation of data warehouses with NoSQL systems [3]. Our previous work considers two NoSQL models (one column-oriented and one document oriented). This article focuses only on document-oriented systems; we analyze three data models (with respect to 1); we consider all cross-model mappings; we improve the formalization and we provide new experiments.
  • 4. 3 MULTIDIMENSIONAL CONCEPTUAL MODEL AND OLAP CUBE 3.1 Conceptual Multidimensional Model To ensure robust translation rules we first define the multidimensional model used at the conceptual level [8] [12]. A multidimensional schema, namely E, is defined by (FE , DE , StarE ) FE = {F1,…, Fn} is a finite set of facts, DE = {D1,…, Dm} is a finite set of dimensions, StarE : FE → 2DE is a function that associates facts of FE to sets of dimensions along which it can be analyzed ( E D 2 is the power set of DE ). A dimension, denoted Di∈DE (abusively noted as D), is defined by (ND , AD , HD ) ND is the name of the dimension, AD=a1D,…,auD∪idD,AllD }all,id{}a,...,a{A DDD u D 1 D ∪= is a set of dimen- sion attributes, HD={H1D,…,HvD} }H,...,H{H D v D 1 D = is a set of hierarchies. A hierarchy, denoted Hi∈HD , is defined by (NHi , ParamHi , WeakHi ) NHi is the name of the hierarchy, ParamHi=<idD,p1 Hi,…,pvi Hi,AllD> >=< DH v H v D All,p,...,p,idParam i i i 1 Hi is an ordered set of vi+2 attributes which are called parameters of the relevant gradu- ation scale of the hierarchy, ∀k∈[1..vi], pk Hi Hi kp ∈AD , WeakHi : ParamHi → 2AD-ParamHi HiD paramA − 2 is a function associating with each parameter possibly one or more weak attributes. A fact, denoted F∈FE , is defined by (NF , MF ) NF is the name of the fact, MF={f1(m1F),…,fv(mvF)} )}m(f),...,m(f{M F vv F 11 F = is a set of measures, each associated with an aggregation function fi. 3.2 The OLAP cuboid The pre-aggregate view or OLAP cuboid corresponds to a subset of aggregated measures on a subset of analysis dimensions. OLAP cuboids are often pre-computed to turn frequent analysis of data more efficient. Typically, we pre-compute aggregate functions on given interest measures grouping on some analysis dimensions. The OLAP cube O=(FO ,DO ) derived from E is formally composed of FO = (NFo , MFo ) a fact derived from F∈FE with NFo = NF a subset of measures. DO = Star(FO ) ⊆ DE a subset of dimensions.
  • 5. If we generate OLAP cuboids on all combination of dimension attributes, we have an OLAP cube lattice. Illustration: Let’s consider an excerpt of the star schema benchmark [12]. It consists in a monitoring of a sales system. Orders are placed by customers and the lines of the orders are analyzed. A line consists in a part (a product) bought from a supplier and sold to a customer at a specific date. The conceptual schema of this case study is pre- sented in Fig. 2. Fig. 2. Graphical notations of the multidimensional conceptual model From this schema, called ESSB , we can define cuboids, for instance: ─ (FLineOrder , {DCustomer , DDate , DSupplier }), ─ (FLineOrder , {DCustomer , DDate }). 4 Document-oriented modeling of multidimensional data warehouses 4.1 Formalism for document-oriented data models In the document-oriented model, data is stored in collections of documents. The struc- ture of documents is defined by attributes. We distinguish simple attributes whose values are atomic from compound attributes whose values are documents called DATE CUSTOMER PART Customer City LineOrder Quantity Discount Revenue Tax Date Partkey Category Prod_Name Month Year Month_Name HCust HBrand Dimension Hierarchy Fact Measures Parameter Weak Attributes HTime All Brand All Legend –FSSB={FLineOrder} –DSSB={DCustomer, DPart, DDate, DSupplier}, –StarSSB(FLineOrder)={DCustomer, DPart, DDate, DSupplier} HTIME SUPPLIER Supplier City Region Name All HSuppl Name Region Nation AllType Size HCateg Nation
  • 6. nested documents or sub-documents. The structure of document is described as a set of paths from the document tree. The following example illustrates the above formalism. Consider the document tree in Fig. 3. describing a document di of the col- lection C with identifier id:vi. Fig. 3. Tree-like representation of documents A document C[id:vi] is defined by: - C: the collection where the document belongs. We use the notation C[id:vi] to indicate a document of a collection C with identifier id:vi. - P: all paths in the tree of the document. A path p is described as p=C[id:vi]{k1:k2…kn:a} where k1, k2, … kn are keys within the same path ending at the leaf node with the value of a simple attribute. 4.2 Document-oriented models for data warehousing In document-oriented stores, the data model is determined not only by its attributes and values, but also by the path to the data. In relational database models, the map- ping from conceptual to logical is more straightforward. In document-oriented stores, there are multiple candidate models that differ on the collections and structure. No model has been proven better than the others and no mapping rules are widely accept- ed. In this section, we present three approaches of logical document-oriented mod- eling. These models correspond to a broad classification of possible approaches for modeling multidimensional cuboid (one fact and its star). In the first approach, we store all cuboid data in one collection in a flat format (without sub-document). In the second case, we make use of nesting (embedded sub-documents) within one collec- tion (richer expressivity). In the third case, we consider distributing the cuboid data in more than one collection. These approaches are described below. Region Midi Pyrénées Partkey P09878 Prod_Name Copper valve c3 di id Customer vi Part Date LineOrder Customer C02265 Name M. Smith City Toulouse Nation France date 03-31-2015 month 03-2015 month_name March year 2015 Quantity 10 Size 10x15x10 Supplier Supplier SP015678 Discount 0 Revenue 3.5789 Brand B3 Type Connector Category Plumbing Name CPR Int. City Madrid Region Center Spain Nation Spain Tax 2.352 C[id:vi]{customer:Region:Midi Pyrénées} C[id:vi]{customer:Customerkey:C02265} C[id:vi]{customer:Name:M. Smith} C[id:vi]{customer:City:Toulouse} C[id:vi]{customer:Nation:France} C[id:vi]{Part:Partkey:P09878} C[id:vi]{Part:Prod_Name:Copper valve 3} C[id:vi]{LineOrder:Tax:2,352} C[id:vi]{LineOrder:Revenue:3,5789} C[id:vi]{LineOrder:Discount:0} C[id:vi]{LineOrder:Quantity:10} C[id:vi]{Supplier:Nation:Spain} C[id:vi]{Supplier:City:Madrid} C[id:vi]{Supplier:Name:CPR int} C[id:vi]{Supplier:Supplierkey:CP015678} C[id:vi]{Date:Year:2015} C[id:vi]{Date:Date:03-31-2015} C[id:vi]{Date:Month:03-2015} C[id:vi]{Date:Month-Name:March} C[id:vi]{Part:Size:10x15x10} C[id:vi]{Part:Brand:B3} C[id:vi]{Part:Type:Connector} C[id:vi]{Part:Category:Plumbing} Path notationsTree notations
  • 7. MLD0: For a given fact, all related dimensions attributes and all measures are com- bined in one document at depth 0 (no sub-documents). We call this the flat model noted MLD0. MLD1: For a given fact, all dimension attributes are nested under the respective at- tribute name and all measures are nested in a subdocument with key “measures”. This model is inspired from [3]. Note that there are different ways to nest data, this is just one of them. MLD2: For a given fact and its dimensions, we store data in dedicated collections one per dimension and one for the fact. Each collection is kept simple: no sub-documents. The fact documents will have references to the dimension documents. We call this model MLD2 (or shattered). This model has known advantages such as less memory usage and data integrity, but it can slow down interrogation. 4.3 Mapping from the conceptual model The formalism that we have defined earlier allows us to define a mapping from the conceptual multidimensional model to each of the logical models defined above. Let O=(FO , DO ) be a cuboid for a multidimensional model for the fact FO (MO is a set of measures) with dimensions in D. NF and ND stand for fact and dimension names. The above mappings are detailed below. Let C be a generic collection, CD a col- lection for the dimension D and CF a collection for a fact FO . The shows how we can map any measure m of FO and any dimension of DO into any of the models: MLD0, MLD1, MLD2. Conceptual Model to MLD0: To instantiate this model from the conceptual model, these rules are applied: Each cuboid O (FO and its dimensions DO ) is translated in a collection C. Each measure m ∈ MF is translated into a simple attribute (i.e. C[id]{m}) For all dimension D ∈ DO , each attribute a ∈ AD of the dimension D is convert- ed into a simple attribute of C (i.e. C[id]{a}) Conceptual Model to MLD1: To instantiate this model from the conceptual model, these rules are applied: Each cuboid O (FO and its dimensions DO ) is translated in a collection C. The attributes of the fact FO will be nested in a dedicated nested document C[id]{NF }. Each measure m ∈ MF is translated into a simple attribute C[id]{NF :m}. For any dimension D ∈ DO , its attributes will be nested in a dedicated nested document C[id]{ND }. Every attribute a ∈ AD of the dimension D will be mapped into a simple attribute C[id]{ND :a}.
  • 8. Conceptual Model to MLD2: To instantiate this model from the conceptual model, these rules are applied: Each cuboid O (FO and its dimensions DO ), the fact FO is translated in a collec- tion CF and each dimension D ∈ DO into a collection CD . Each measure m ∈ MF is translated within CF as a simple attribute (i.e. CF [id’]{m}) For all dimension D ∈ DO , each attribute a ∈ AD of the dimension D is mapped into CD as a simple attribute (i.e CD [id]{a}), and if a=idD the document CF is completed by a simple attribute CF [id’]{a} (the value reference of the linked dimension). Table 1. Mapping rules from the conceptual model to the logical models Conceptual Logical MLD0 MLD1 MLD2 ∀D∈DO , ∀a∈AD a→ C[id]{a} a→ C[id]{ND :a} a→ CD [id]{a} idD → CF [id’ ]{a} (*) ∀m∈MF m→ C[id]{m} m→ C[id]{NF :m} m→ CF [id’ ]m (*) The two identifiers CF [id’ ] and CD [id] are different from each other because the document collections are not same C D et C F . 5 Experiments Our experimental goal is to validate the instantiation of data warehouses with the three approaches mentioned earlier. Then, we consider converting data from one model to the other. In the end, we generate OLAP cuboids and we compare the effort needed by model. We rely on the SSB+ benchmark that is popular for generating data for decision support systems. As data store, we rely on MongoDB one of the most popular document-oriented system. 5.1 Protocol Data: We generate data using the SSB+ [4] benchmark. The benchmark models a simple product retail reality. It contains one fact table “LineOrder” and 4 dimensions “Customer”, “Supplier”, “Part” and “Date”. This corresponds to a star-schema. The dimensions are hierearchic e.g. “Date” has the hierarchy of attributes [d_date,
  • 9. d_month, d_year]. We have extended it to generate raw data specific to our models in JSON file format. This is convenient for our experimental purposes. JSON is the best file format for Mongo data loading. We use different scale factors namely sf=1, sf=10, sf=25 and sf=100 in our experiments. The scale factor sf=1 generates approxi- mately 107 lines for the LineOrder fact, for sf=10 we have approximately 108 lines and so on. In the MLD2model we will have (sf x 107 ) lines for LineOrder and quite less for the dimensions. Data loading: Data is loaded into MongoDB using native instructions. These are supposed to load data faster when loading from files. The current version of MongoDB would not load data with our logical model from CSV file, thus we had to use JSON files. Lattice computation: To compute the pre-aggregate lattice, we use the aggregation pipeline suggested as the most performing alternative by Mongo itself. Four levels of pre-aggregates are computed on top of the benchmark generated data. Precisely, at each level we aggregate data respectively on: the combination of 4 dimensions all combinations of 3 dimensions, all combinations of 2 dimensions, all combinations of 1 dimension, 0 dimensions (all data). At each aggregation level, we apply aggregation functions: max, min, sum and count on all dimensions. Hardware. The experiments are done on a cluster composed of 3 PCs (4 core-i5, 8GB RAM, 2TB disks, 1Gb/s network), each being a worker node and one node acts also as dispatcher. 5.2 Results In Table 2, we summarize data loading times by model and scale factor. We can ob- serve at scale factor SF1, we have 107 lines on each line order collections for a 4.2 GB disk memory usage for MLD2 (15GB for MLD0 and MLD1). At scale factors SF10 and SF100 we have respectively 108 lines and 109 lines and 42GB (150GB MLD0 and MLD1) and 420GB (1.5TB MLD0 and MLD1) for of disk memory usage. We ob- serve that memory usage is lower in the MLD2 model. This is explained by the ab- sence of redundancy in the dimensions. The collections “Customers”, “Supplier”, “Part” and “Date” have respectively 50000 records, 3333 records, 3333333 records and 2556 records. In Fig. 4, we show the time needed to convert data of one model to data of another model with SF1. When we convert data from MLD0 to MLD1 and vice-versa conver- sion times are comparable. To transform data from MLD0 to MLD1 we just introduce a depth of 1 in the document. On the other sense (MLD1 to MLD0), we reduce the depth by one. The conversion is more complicated when we consider MLD0 and MLD2. To convert MLD0 data into MLD2 we need to split data in multiple tables:
  • 10. we have to apply 5 projections on original data and select only distinct keys for di- mensions. Although, we produce less data (in memory usage), we need more pro- cessing time than when we convert data to MLD1. Converting from MLD2 to MLD0 is the slowest process by far. This is due to the fact that most NoSQL systems (includ- ing MongoDB) do not support joins (natively). We had to test different optimization techniques hand-coded. The loading times fall between 5h to 125h for SF1. It might be possible to optimize this conversion further, but the results are illustrative of the jointure issues in MongoDB. Table 2. Loading times by model into MongoDB MLD0 MLD1 MLD2 SF=1 107 lines 1306s/15GB 1235s/15GB 1261s/4.2GB SF=10 108 lines 16680s/150GB 16080s/150GB 4320s/42GB SF=25 25.107 lines 46704s/375GB 44220s/375GB 10980s/105GB Fig. 4. Inter-model conversion times In Fig. 5, we sumarize experimental observations concerning the computation of the OLAP cuboids at different levels of the OLAP lattice for SF1 using data from the model MLD0. We report the time needed to compute the cuboid and the number of records it contains. We compute the cuboids from one of the “upper”-in hierarchy cuboids with less records, which makes computation faster. We observe as expected that the number of records decreases from one level to the lower level. The same is true for computation time. We need between 300 and 500 seconds to compute the cuboids at the first level (3 dimensions). We need between 30 seconds and 250 seconds at the second layer (2 dimensions). We need less than one second to compute the cuboids at the third and fourth level (1 and 0 dimensions). OLAP computation using the model MLD1 provides similar results. The performance is significantly lower with the MLD2 model due to joins. These differences involve only the layer 1 (depth one) of the OLAP lattice, cause the other layers can be computed from the latter. We do not report this results for space constraints. MLD0MLD1 MLD2 550s 720s 870s 5h-125h
  • 11. Fig. 5. Computation time and records by OLAP cuboid Observations: We observe that we need comparable times to load data in one model with the conversion times (except of MLD2 to MLD0). We also observe reasonable times for computing OLAP cuboids. These observations are important. At one hand, we show that we can instantiate data warehouses in document-oriented data systems. On the other, we can think of pivot models or materialized views that can be comput- ed in parallel with a chosen data model. 5 Conclusion In this paper, we have studied the instantiation of data warehouses with document- oriented systems. We propose three approaches at the document-oriented logical model. Using a simple formalism, we describe the mapping from the multidimension- al conceptual data model to the logical level. Our experimental work illustrates the instantiation of data warehouses with each of the three approaches. Each model has its weaknesses and strengths. The shattered model (MLD2) uses less disk memory, but it is quite inefficient when it comes to answering queries with joins. The simple models MLD0 and MLD1 do not show significant performance differences. Passing from one model to another is shown to be easy and comparable in time to “data loading from scratch”. One conversion is significantly non-performing; it corresponds to the mapping from multiple collections (MLD2) to one collection. Interesting results are also met in the computation of the OLAP lattice with document-oriented models. The computation times are reasonable enough. Record count Computation time CSPD CS CSD CPD SPDCSP SP PDCD SDCP C S DP All 9392109 documents 475s 4470422 documents 327s 9790169 documents 455s 9390777 documents 454s 317415 documents 217s 21250 documents 35s 937478 documents 237s 21250 documents 35s 937475 documents 229s 85documents <1s 3750 documents <1s 250 documents <1s 1document <1s 250 documents <1s 62500 documents 36s 10000000 documents* Loading time only (no processing time)
  • 12. For future work, we will consider logical models in column-oriented models and graph-oriented models. After exploring data warehouse instantiation across different NoSQL systems, we need to generalize across logical model. We need a simple for- malism to express model differences and we need to compare models within each paradigm and across paradigms (document versus column). References 1. Bosworth, A., Gray, J., Layman, A., Pirahesh, H.: Data cube: A relational aggregation op- erator generalizing group-by, cross-tab, and sub-totals. Tech. Rep. MSRTR-95-22, Mi- crosoft Research (February 1995). 2. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Record 26, 65.74 (1997). 3. Chevalier, M., malki, M.E., Kopliku, A., Teste, O., Tournier, R.: Implementing Multidimen- sional Data Warehouses into NoSQL. 17th International conference on entreprise information systems (April 2015). 4. CHEVALIER, M., EL MALKI, M., KUPLIKU, A., TESTE, O., TOURNIER, R., Benchmark for OLAP on NoSQL Technologies, Comparing NoSQL Multidimensional Data Warehousing Solutions, 9 th Int. Conf. on Research Challenges in Information Science (RCIS), IEEE, 2015. 5. Colliat, G.: Olap, relational, and multidimensional database systems. SIGMOD Rec. 25(3), 64.69 (Sep 1996). 6. Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data: The big data revolution! 14th International Workshop on Data Warehousing and OLAP. pp. 101-104. DOLAP '11, ACM, (2011) 7. Dede, E., Govindaraju, M., Gunter, D., Canon, R.S., Ramakrishnan, L.: Performance eval- uation of a mongodb and hadoop platform for scientific data analysis. 4th ACM Work- shop on Scientific Cloud Computing. pp.13-20. Science Cloud '13, ACM (2013). 8. Dehdouh, K., Boussaid, O., Bentayeb, F.: Columnar nosql star schema benchmark. In: Model and Data Engineering, vol. 8748, pp. 281-288. Springer International Publishing (2014). 9. Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: A conceptual model for data warehouses. International Journal of Cooperative Information Systems 7,215-247 (1998) 10. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, Inc., New York, NY, USA, 2nd ed. (2002) 11. Mior, Michael J.: Automated Schema Design for NoSQL Databases SigMOD’14 12. ONeil, P., ONeil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Performance Evaluation and Benchmarking, vol.5895, pp. 237- 252. Springer Berlin Heidelberg (2009) 13. Ravat, F., Teste, O., Tournier, R., Zuruh, G.: Algebraic and graphic languages for OLAP manipulations. IJDWM 4(1), 17-46 (2008). 14. Stonebraker, M.: New opportunities for new sql. Commun. ACM 55(11), 10-11 (Nov 2012), http://doi.acm.org/10.1145/2366316.2366319 15. Zhao, H., Ye, X.: A practice of tpc-ds multidimensional implementation on nosql data- base systems. In: Performance Characterization and Benchmarking, vol. 8391, pp. 93{108 (2014)