SlideShare a Scribd company logo
1 of 25
SpecifyingAggregationFunctions inMultidimensionalModelswith OCLJordi Cabot, Jose-NorbertoMazón, JesúsPardillo, Juan TrujilloÉcole des Mines de Nantes & Universidad AlicanteER 2010
Introduction Conceptual modeling has proved to be very useful in the development of data warehouse systems. Main benefits -> benefits of conceptual modeling:  Implementation-independent view of the system Possibility of (semi)automatic code-generation Better maintainability and evolution … Several proposals in this direction.  UML Profile for multidimensional modeling of data warehouses  [Luján et al DKE 2007]  Model-driven approach for development of data warehouses [Mazón & Trujillo DSS 2008]
Conceptual Modeling of DWH (1/2) Modeling multidimensional concept at conceptual level Data structured in a multidimensional space Dimensions specify different ways the data can be viewed, aggregated, and sorted E.g., according to time, store, customer, product, etc. Events of interest for an analyst are represented as facts which are associated with cells or points in the multidimensional space and which are described in terms of a set of measures abstracted logical details: technology: relational, multidimensional, ... logical variations: star, snowflake schema, ... automatically obtain a logical representation model-driven approach
Conceptual Modeling of DWH An airline’s marketing department wants to analyze the flight activity of each member of its frequent flyer program
Conceptual Modeling of DWH (1/2) … once annotated with the Profile becomes …
Conceptual Modeling of DWH
… BUT (there’salways a ‘but’) Right now, only the structural aspects of the DWH are modeled but decision makers require a set of multidimensional queries These multidimensionalqueriesare not specified as part of the Conceptual Schema (CS) of the DWH They are only added once the DWH is implemented As a result: Breaks the MDE approach The completeness of the DWH cannot be validated until it is implemented (i.e. DWH contains enough information?) Definition of queries requires expertise in the target platform No reusability … ,[object Object],[object Object]
Very time consuming and error-prone
Don’tyouprefertohavethe “*” operatorevenif“+” isenough?,[object Object]
Makingsurethesefunctions can beintegrated in current MDD methods
Wewillapplythese new OCL functions in combinationwithour UML profilefor DWH modeling
Thefunctionsthemselves are independent of theprofile and can beusedtocomplementanyCSs,[object Object]
OCL: Basic Concepts (2/2) Template for queries Example query (total miles earned by a frequent flyer in his/her trips from Denver in a given fare) context Class::Q(p1:T1, . . . , pn:Tn): Tresult body: Query-ocl-expression context Customer::sumMiles(FareClassfc) body: self.frequentFlyerLegs−>select(f | f.fareClass=fc and f.origin.city.name=’Denver’)−>sum() ,[object Object],[object Object],[object Object]
Someexamples (1/3) MAX: Returns the element in a non-empty collection of objects of type T with the highest value.  COUNT DISTINCT: Returns the number of different elements in a collection contextCollection::max():T pre: self−>notEmpty() post: result = self−>any(e | self−>forAll(e2 | e >= e2)) context Collection::countDistinct(): Integer post: result = self−>asSet()−>size()
Someexamples (2/3) AVG: Returns the arithmetic average value of the elements in the non-empty collection.  COVARIANCE: Returns the covariance value between two ordered sets context Collection::avg():Real pre: self−>notEmpty() post: result = self−>sum() / self−>size() context OrderedSet::covariance(Y: OrderedSet):Real pre: self−>size() = Y−>size() and self−>notEmpty() post:   let avgY:Real = Y−>avg() in   let avgSelf:Real = self−>avg() in result= (1/self−>size()) *   self−>iterate(e; acc:Real=0 | acc +   ((e - avgSelf) * (Y−>at(self−>indexOf(e)) - avgY))
Someexamples (3/3) MODE: Returns the most frequent value in a collection. DESCENDING RANK: Returns the position (i.e., ranking) of an element within a Collection. contextCollection::mode(): T pre: self−>notEmpty() post: result = self−>any(e | self−>forAll(e2 |                                                    self−>count(e) >= self−>count(e2)) context Collection::rankDescending(e: T): Integer pre: self−>includes(e) post: result = self−>size() - self−>select(e2 | e >= e2)−>size() + 1
Usingour new aggregatefunctions Our functions can be used wherever a OCL standard function can be used They are called exactly in the same way Ex of use of the avgfunction to compute the average number of miles earned by a customer in each flight leg. context Customer::avgMilesPerFlightLeg():Real body: self−>frequentFlyerLegs.Miles−>avg()
MDD of our “enriched” DWH CSs To be useful, we need to make sure that CSs using our new aggregate functions can be used as input of MDD processes and tools Current MDD methods do NOT need to be extended to cope with enriched CSs Our library is written in OCL itself (platform-independent) Complex functions can be reduced to standard OCL functions Two scenarios depending on whether the target implementation platform directly supports our function In the latter, preprocessing our functions is required to reexpress them in terms of standard OCL operations Existing OCLtoX (X=Java, SQL,…) tools can help in the process
MDD Scenario 1: Directimplementation context Customer::avgMilesPerFlightLeg():Real body: self−>frequentFlyerLegs.Miles−>avg() createviewAvgMilesFlight as {   select avg(l.miles) fromcustomer c, frequentflyerlegsl where c.id=l.customer } (a) DBMS code
MDD Scenario 2: Normalization/unfolding context Customer::avgMilesPerFlightLeg():Real body: self−>frequentFlyerLegs.Miles−>avg() context Customer::avgMilesPerFlightLeg():Real post: result = self−>frequentFlyerLegs.Miles−>sum() / self−>frequentFlyerLegs.Miles−>size() class Customer { int id; String name; Vector<FrequentFlyerLegs> f; ... public floatavgMiles() { return sumMiles(f)/f.size(); } } (b) Java code
Validation Our OCL extension has been validated by using the UML Specification Environment (USE)tool Our functions have been added to USE as new user-defined functions 2-phase analysis: Syntactic analysis:  USE parses the OCL operations  and checks their syntactic correctness Semantic analysis: USE executes the operations on sample scenarios. Analyzing the results we can check if the operations behave as expected
Validation
Conclusions Complex aggregation functions should be part of the predefined constructs provided by modelinglanguages We made this possible by extending OCL Queries written with this “extended OCL” can be animated and validated at design-time and automatically implemented along with the rest of DWH CS
Further Work Giving mechanisms for defining/validating multidimensional queries at conceptual level in a more intuitive manner  Natural language OCL <-> Semantics of Business Vocabulary and Business Rules (SBVR) [Cabot et al, Inf. Syst. 2010] Verifying the proper use of the aggregation function chosen by the designer. The kind of aggregation functions to be applied depends on the kind of measure and the kind of dimension. E.g.:  Temperatures cannot be aggregated along the time nor location dimension
Continuing the discussion jtrujillo@dlsi.ua.es jordi.cabot@inria.fr http://modeling-languages.com @softmodeling

More Related Content

What's hot

On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsVillu Ruusmann
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLVillu Ruusmann
 
R reference card
R reference cardR reference card
R reference cardHesher Shih
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Raman Kannan
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?Villu Ruusmann
 
Analyzing On-Chip Interconnect with Modern C++
Analyzing On-Chip Interconnect with Modern C++Analyzing On-Chip Interconnect with Modern C++
Analyzing On-Chip Interconnect with Modern C++Jeff Trull
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data framesFAO
 
R short-refcard
R short-refcardR short-refcard
R short-refcardconline
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Generics in .NET, C++ and Java
Generics in .NET, C++ and JavaGenerics in .NET, C++ and Java
Generics in .NET, C++ and JavaSasha Goldshtein
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 

What's hot (18)

M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
 
On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) models
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMML
 
R reference card
R reference cardR reference card
R reference card
 
array
arrayarray
array
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?
 
Analyzing On-Chip Interconnect with Modern C++
Analyzing On-Chip Interconnect with Modern C++Analyzing On-Chip Interconnect with Modern C++
Analyzing On-Chip Interconnect with Modern C++
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data frames
 
R short-refcard
R short-refcardR short-refcard
R short-refcard
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
C# Generics
C# GenericsC# Generics
C# Generics
 
Generics in .NET, C++ and Java
Generics in .NET, C++ and JavaGenerics in .NET, C++ and Java
Generics in .NET, C++ and Java
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 

Similar to Aggregation Functions in Multidimensional Models with Extended OCL

Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! EdholeEdhole.com
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! EdholeEdhole.com
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoDatabricks
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesChoose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesCHOOSE
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdfKalyankumarVenkat1
 
Georgia Tech: Performance Engineering - Queuing Theory and Predictive Modeling
Georgia Tech: Performance Engineering - Queuing Theory and Predictive ModelingGeorgia Tech: Performance Engineering - Queuing Theory and Predictive Modeling
Georgia Tech: Performance Engineering - Queuing Theory and Predictive ModelingBrian Wilson
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
An Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemAn Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemIRJET Journal
 
Introduction to embedded computing and arm processors
Introduction to embedded computing and arm processorsIntroduction to embedded computing and arm processors
Introduction to embedded computing and arm processorsRAMPRAKASHT1
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkPatrick Pletscher
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_publicLong Nguyen
 

Similar to Aggregation Functions in Multidimensional Models with Extended OCL (20)

Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei Diao
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Unit 1
Unit  1Unit  1
Unit 1
 
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesChoose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
 
14 operator overloading
14 operator overloading14 operator overloading
14 operator overloading
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
 
Georgia Tech: Performance Engineering - Queuing Theory and Predictive Modeling
Georgia Tech: Performance Engineering - Queuing Theory and Predictive ModelingGeorgia Tech: Performance Engineering - Queuing Theory and Predictive Modeling
Georgia Tech: Performance Engineering - Queuing Theory and Predictive Modeling
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
An Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemAn Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing System
 
Ch08
Ch08Ch08
Ch08
 
Ch08
Ch08Ch08
Ch08
 
Bt0065
Bt0065Bt0065
Bt0065
 
B T0065
B T0065B T0065
B T0065
 
Introduction to embedded computing and arm processors
Introduction to embedded computing and arm processorsIntroduction to embedded computing and arm processors
Introduction to embedded computing and arm processors
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 

More from Jordi Cabot

AI and Software consultants: friends or foes?
AI and Software consultants: friends or foes?AI and Software consultants: friends or foes?
AI and Software consultants: friends or foes?Jordi Cabot
 
Model-driven engineering for Industrial IoT architectures
Model-driven engineering for Industrial IoT architecturesModel-driven engineering for Industrial IoT architectures
Model-driven engineering for Industrial IoT architecturesJordi Cabot
 
Smart modeling of smart software
Smart modeling of smart softwareSmart modeling of smart software
Smart modeling of smart softwareJordi Cabot
 
Modeling should be an independent scientific discipline
Modeling should be an independent scientific disciplineModeling should be an independent scientific discipline
Modeling should be an independent scientific disciplineJordi Cabot
 
¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...
¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...
¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...Jordi Cabot
 
How to sustain a tool building community-driven effort
How to sustain a tool building community-driven effortHow to sustain a tool building community-driven effort
How to sustain a tool building community-driven effortJordi Cabot
 
All Researchers Should Become Entrepreneurs
All Researchers Should Become EntrepreneursAll Researchers Should Become Entrepreneurs
All Researchers Should Become EntrepreneursJordi Cabot
 
The Software Challenges of Building Smart Chatbots - ICSE'21
The Software Challenges of Building Smart Chatbots - ICSE'21The Software Challenges of Building Smart Chatbots - ICSE'21
The Software Challenges of Building Smart Chatbots - ICSE'21Jordi Cabot
 
Low-code vs Model-Driven Engineering
Low-code vs Model-Driven EngineeringLow-code vs Model-Driven Engineering
Low-code vs Model-Driven EngineeringJordi Cabot
 
Lessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformLessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformJordi Cabot
 
Future Trends on Software and Systems Modeling
Future Trends on Software and Systems ModelingFuture Trends on Software and Systems Modeling
Future Trends on Software and Systems ModelingJordi Cabot
 
Ingeniería del Software dirigida por modelos -Versión para incrédulos
Ingeniería del Software dirigida por modelos -Versión para incrédulosIngeniería del Software dirigida por modelos -Versión para incrédulos
Ingeniería del Software dirigida por modelos -Versión para incrédulosJordi Cabot
 
Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit Jordi Cabot
 
Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...
Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...
Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...Jordi Cabot
 
An LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model TransformationsAn LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model TransformationsJordi Cabot
 
WAPIml: Towards a Modeling Infrastructure for Web APIs
WAPIml: Towards a Modeling Infrastructure for Web APIsWAPIml: Towards a Modeling Infrastructure for Web APIs
WAPIml: Towards a Modeling Infrastructure for Web APIsJordi Cabot
 
Is there a future for Model Transformation Languages?
Is there a future for Model Transformation Languages?Is there a future for Model Transformation Languages?
Is there a future for Model Transformation Languages?Jordi Cabot
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Jordi Cabot
 
Temporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformTemporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformJordi Cabot
 
UMLtoNoSQL : From UML domain models to NoSQL Databases
UMLtoNoSQL : From UML domain models to NoSQL DatabasesUMLtoNoSQL : From UML domain models to NoSQL Databases
UMLtoNoSQL : From UML domain models to NoSQL DatabasesJordi Cabot
 

More from Jordi Cabot (20)

AI and Software consultants: friends or foes?
AI and Software consultants: friends or foes?AI and Software consultants: friends or foes?
AI and Software consultants: friends or foes?
 
Model-driven engineering for Industrial IoT architectures
Model-driven engineering for Industrial IoT architecturesModel-driven engineering for Industrial IoT architectures
Model-driven engineering for Industrial IoT architectures
 
Smart modeling of smart software
Smart modeling of smart softwareSmart modeling of smart software
Smart modeling of smart software
 
Modeling should be an independent scientific discipline
Modeling should be an independent scientific disciplineModeling should be an independent scientific discipline
Modeling should be an independent scientific discipline
 
¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...
¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...
¿Quién va a desarrollar las Apps del futuro? (aviso: no serán los programador...
 
How to sustain a tool building community-driven effort
How to sustain a tool building community-driven effortHow to sustain a tool building community-driven effort
How to sustain a tool building community-driven effort
 
All Researchers Should Become Entrepreneurs
All Researchers Should Become EntrepreneursAll Researchers Should Become Entrepreneurs
All Researchers Should Become Entrepreneurs
 
The Software Challenges of Building Smart Chatbots - ICSE'21
The Software Challenges of Building Smart Chatbots - ICSE'21The Software Challenges of Building Smart Chatbots - ICSE'21
The Software Challenges of Building Smart Chatbots - ICSE'21
 
Low-code vs Model-Driven Engineering
Low-code vs Model-Driven EngineeringLow-code vs Model-Driven Engineering
Low-code vs Model-Driven Engineering
 
Lessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformLessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platform
 
Future Trends on Software and Systems Modeling
Future Trends on Software and Systems ModelingFuture Trends on Software and Systems Modeling
Future Trends on Software and Systems Modeling
 
Ingeniería del Software dirigida por modelos -Versión para incrédulos
Ingeniería del Software dirigida por modelos -Versión para incrédulosIngeniería del Software dirigida por modelos -Versión para incrédulos
Ingeniería del Software dirigida por modelos -Versión para incrédulos
 
Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit Chatbot Tutorial - Create your first bot with Xatkit
Chatbot Tutorial - Create your first bot with Xatkit
 
Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...
Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...
Création facile de chatbots - Créez votre chatbot en 20 minutes avec une plat...
 
An LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model TransformationsAn LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model Transformations
 
WAPIml: Towards a Modeling Infrastructure for Web APIs
WAPIml: Towards a Modeling Infrastructure for Web APIsWAPIml: Towards a Modeling Infrastructure for Web APIs
WAPIml: Towards a Modeling Infrastructure for Web APIs
 
Is there a future for Model Transformation Languages?
Is there a future for Model Transformation Languages?Is there a future for Model Transformation Languages?
Is there a future for Model Transformation Languages?
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
 
Temporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformTemporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platform
 
UMLtoNoSQL : From UML domain models to NoSQL Databases
UMLtoNoSQL : From UML domain models to NoSQL DatabasesUMLtoNoSQL : From UML domain models to NoSQL Databases
UMLtoNoSQL : From UML domain models to NoSQL Databases
 

Aggregation Functions in Multidimensional Models with Extended OCL

  • 1. SpecifyingAggregationFunctions inMultidimensionalModelswith OCLJordi Cabot, Jose-NorbertoMazón, JesúsPardillo, Juan TrujilloÉcole des Mines de Nantes & Universidad AlicanteER 2010
  • 2. Introduction Conceptual modeling has proved to be very useful in the development of data warehouse systems. Main benefits -> benefits of conceptual modeling: Implementation-independent view of the system Possibility of (semi)automatic code-generation Better maintainability and evolution … Several proposals in this direction. UML Profile for multidimensional modeling of data warehouses [Luján et al DKE 2007] Model-driven approach for development of data warehouses [Mazón & Trujillo DSS 2008]
  • 3. Conceptual Modeling of DWH (1/2) Modeling multidimensional concept at conceptual level Data structured in a multidimensional space Dimensions specify different ways the data can be viewed, aggregated, and sorted E.g., according to time, store, customer, product, etc. Events of interest for an analyst are represented as facts which are associated with cells or points in the multidimensional space and which are described in terms of a set of measures abstracted logical details: technology: relational, multidimensional, ... logical variations: star, snowflake schema, ... automatically obtain a logical representation model-driven approach
  • 4. Conceptual Modeling of DWH An airline’s marketing department wants to analyze the flight activity of each member of its frequent flyer program
  • 5. Conceptual Modeling of DWH (1/2) … once annotated with the Profile becomes …
  • 7.
  • 8. Very time consuming and error-prone
  • 9.
  • 11. Wewillapplythese new OCL functions in combinationwithour UML profilefor DWH modeling
  • 12.
  • 13.
  • 14. Someexamples (1/3) MAX: Returns the element in a non-empty collection of objects of type T with the highest value. COUNT DISTINCT: Returns the number of different elements in a collection contextCollection::max():T pre: self−>notEmpty() post: result = self−>any(e | self−>forAll(e2 | e >= e2)) context Collection::countDistinct(): Integer post: result = self−>asSet()−>size()
  • 15. Someexamples (2/3) AVG: Returns the arithmetic average value of the elements in the non-empty collection. COVARIANCE: Returns the covariance value between two ordered sets context Collection::avg():Real pre: self−>notEmpty() post: result = self−>sum() / self−>size() context OrderedSet::covariance(Y: OrderedSet):Real pre: self−>size() = Y−>size() and self−>notEmpty() post: let avgY:Real = Y−>avg() in let avgSelf:Real = self−>avg() in result= (1/self−>size()) * self−>iterate(e; acc:Real=0 | acc + ((e - avgSelf) * (Y−>at(self−>indexOf(e)) - avgY))
  • 16. Someexamples (3/3) MODE: Returns the most frequent value in a collection. DESCENDING RANK: Returns the position (i.e., ranking) of an element within a Collection. contextCollection::mode(): T pre: self−>notEmpty() post: result = self−>any(e | self−>forAll(e2 | self−>count(e) >= self−>count(e2)) context Collection::rankDescending(e: T): Integer pre: self−>includes(e) post: result = self−>size() - self−>select(e2 | e >= e2)−>size() + 1
  • 17. Usingour new aggregatefunctions Our functions can be used wherever a OCL standard function can be used They are called exactly in the same way Ex of use of the avgfunction to compute the average number of miles earned by a customer in each flight leg. context Customer::avgMilesPerFlightLeg():Real body: self−>frequentFlyerLegs.Miles−>avg()
  • 18. MDD of our “enriched” DWH CSs To be useful, we need to make sure that CSs using our new aggregate functions can be used as input of MDD processes and tools Current MDD methods do NOT need to be extended to cope with enriched CSs Our library is written in OCL itself (platform-independent) Complex functions can be reduced to standard OCL functions Two scenarios depending on whether the target implementation platform directly supports our function In the latter, preprocessing our functions is required to reexpress them in terms of standard OCL operations Existing OCLtoX (X=Java, SQL,…) tools can help in the process
  • 19. MDD Scenario 1: Directimplementation context Customer::avgMilesPerFlightLeg():Real body: self−>frequentFlyerLegs.Miles−>avg() createviewAvgMilesFlight as { select avg(l.miles) fromcustomer c, frequentflyerlegsl where c.id=l.customer } (a) DBMS code
  • 20. MDD Scenario 2: Normalization/unfolding context Customer::avgMilesPerFlightLeg():Real body: self−>frequentFlyerLegs.Miles−>avg() context Customer::avgMilesPerFlightLeg():Real post: result = self−>frequentFlyerLegs.Miles−>sum() / self−>frequentFlyerLegs.Miles−>size() class Customer { int id; String name; Vector<FrequentFlyerLegs> f; ... public floatavgMiles() { return sumMiles(f)/f.size(); } } (b) Java code
  • 21. Validation Our OCL extension has been validated by using the UML Specification Environment (USE)tool Our functions have been added to USE as new user-defined functions 2-phase analysis: Syntactic analysis: USE parses the OCL operations and checks their syntactic correctness Semantic analysis: USE executes the operations on sample scenarios. Analyzing the results we can check if the operations behave as expected
  • 23. Conclusions Complex aggregation functions should be part of the predefined constructs provided by modelinglanguages We made this possible by extending OCL Queries written with this “extended OCL” can be animated and validated at design-time and automatically implemented along with the rest of DWH CS
  • 24. Further Work Giving mechanisms for defining/validating multidimensional queries at conceptual level in a more intuitive manner Natural language OCL <-> Semantics of Business Vocabulary and Business Rules (SBVR) [Cabot et al, Inf. Syst. 2010] Verifying the proper use of the aggregation function chosen by the designer. The kind of aggregation functions to be applied depends on the kind of measure and the kind of dimension. E.g.: Temperatures cannot be aggregated along the time nor location dimension
  • 25. Continuing the discussion jtrujillo@dlsi.ua.es jordi.cabot@inria.fr http://modeling-languages.com @softmodeling