SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
SAMSTAR:of Poster
A Semi-Automated Lexical Method to
Titl
generate Star Schemas from an ERD
Authors
Ritu Khare, Il Yeol Song, Yuan An
PROBLEM
To develop a star schema, existing approaches analyze the
attributes of interesting business entities.
•Entities with numerical measure attributes are assumed to be the
candidates of facts
•Entities with non-numerical and descriptive attributes are assumed
to be the candidates of dimensions.

2. Annotated Dimension Design Patterns (A_DDP)
We have referred to four sources and have instantiated the six classes of
DDP to produce a list of 131 commonly used dimension entities. We refer
to these entities as Annotated DDP (A_DDPs). These entities are
frequently used entities in the business processes. Examples include
account, activity, agent, aircraft, airport, etc.

RESULTS
Invoice

We focus on the structure of the ERD. The novel features of
SAMSTAR are:
(1) the use of the notion of Connection Topology Value (CTV) in
identifying the candidates of facts and dimensions and
(2) the use of Annotated Dimensional Design Patterns (A_DDP) as
well as WordNet to extend the list of dimensions.

Customer

ALGORITHM

Quantitative
Method
(FIND FACTS
AND DIRECT
DIMENSIONS)

DDP

Fig.2 SAMSTAR Overview

Quantitative
Method
(FIND INDIRECT
DIMENSIONS

WordNet

A_DDP

1. Connection Topology Value (CTV)

CTV (e) 1* n 0.8 *

CTV(Node(i ))
i 1

* If you are scanning charts,
cartoons, illustrations or
plain text non-photo
represents scan entity having
images), an at 600 dpi,
then ‘Save As’ at 225 dpi.

where i
a direct M:1 relationship with e.
For Fig. 1, CTV is calculated in the following manner:

A

B

E

H

weight_d=100%; weight_i=80%
D
F
C
The CTV for each entity is:
CTV (H) = 1* 0 + 0.8 * 0 = 0
CTV(F) = 0
G
CTV(G) = 0
CTV(E) = 1*1 + 0.8 * CTV(H) = 1
Fig.1: Calculation of CTV
CTV(B)= 1*1 + 0.8 * (CTV(E))= 1.8
CTV(C)=1*2 + 0.8 * (CTV(G) + CTV( F)) = 2
CTV(D)=1*1 + 0.8 * (CTV(C)) = 1 + 0.8 * 2 = 2.6
CTV(A)= 1*2 + 0.8 * (CTV(B) + CTV( C)) = 2 + 0.8 * (1.8+2) = 5.04

www.ischool.drexel.edu

Order
Time

Logistics

Shipment

Order
Customer

Area

Shipment

Supplier

Product-Supplier

Order-Product

Promotion Store

Store

SAMSTAR

Order

Invoice
Customer

Store

Promotion
Store

Customer

Promotion Type

Product

Warehouse

Time

Star Schema

n

Return

Store

ER Diagram

OUR SOLUTION

Store

Invoice

Return

Warehouse

Hence, these approaches are qualitative in nature, and focus only
on the semantics of ERD.

Time

1. Pre-process the input ERD.
2. Store Entities and Relationships.
3. Let user choose weighting factors for direct and indirect relationships.
4. Calculate the CTV for all entities.
5. Calculate the threshold value, Th, for CTV.
6. Identify entities having CTV higher than the threshold Th.
These are the candidates for fact tables.
7. Decide and shortlist the fact entities.
8. For each fact entity, perform the following steps:
(i) Identify the entities having direct M:1 link with a fact entity.
(ii) Identify entities having indirect M:1 link with the fact entity.
Out of these entities, identify synonyms of entity names from WordNet.
Extract the terms which match the DDP entity list or the A_DDP List.
(iii) Combine the results to Steps 8(i) and 8(ii) to prepare a list of candidate
dimensions for a given fact.
(iv) Add time dimension to this list.
9. Decide the dimension entities.
10. Let the user post-process the Star Schemas
11. Generate the star schema(s).

Order
Product

Order

Product

Invoice

Fig.3: Result of SAMSTAR
Schemas generated by SAMSTAR are similar to those generated
by the manual steps in case study papers.
SAMSTAR generated star schemas are the superset of the ones
generated manually in the paper using the same ER diagrams,
user needs and business goals.
This shows our schemas are inclusive of all possible facts and
dimensions. This gives the designer a helpful aid to prune the
schema as per the business and user requirements.

CONTRIBUTION
•A universal method to generate star schema(s) in that we have
used generalized DDPs and WordNet to identify dimensions of a
fact table.
•Quantitative in nature in that we analyze the structure of an input
ERD.
•Identifies a set of fact candidates from a large and complex ERD.
•Automatable up to a large extent; simplifies the work of
experienced designers; and gives a smooth head-start to novices.

The paper SAMSTAR: A Semi-Automated Lexical Method for
generating Star Schemas using an Entity Relationship
Diagram was presented at Data warehousing and On-line
Analytical Processing (DOLAP 2007) workshop held in Lisbon,
Portugal, November 2007.

Mais conteúdo relacionado

Mais procurados

Chapter 8 ror analysis for multiple alternatives
Chapter 8   ror analysis for multiple alternativesChapter 8   ror analysis for multiple alternatives
Chapter 8 ror analysis for multiple alternativesBich Lien Pham
 
Lecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiLecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiBich Lien Pham
 
Chapter 18 sensitivity analysis
Chapter 18   sensitivity analysisChapter 18   sensitivity analysis
Chapter 18 sensitivity analysisBich Lien Pham
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSMohammedMedani4
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSMohammedMedani4
 
Lecture # 6 cost estimation ii
Lecture # 6 cost estimation iiLecture # 6 cost estimation ii
Lecture # 6 cost estimation iiBich Lien Pham
 
Chapter 12 independent projects & budget limitation
Chapter 12   independent projects & budget limitationChapter 12   independent projects & budget limitation
Chapter 12 independent projects & budget limitationBich Lien Pham
 
5 creating a_histogram
5 creating a_histogram5 creating a_histogram
5 creating a_histogramMedia4math
 
BIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabBIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabShiv Koppad
 
Chapter 6 annual worth analysis
Chapter 6   annual worth analysisChapter 6   annual worth analysis
Chapter 6 annual worth analysisBich Lien Pham
 
Chapter 13 breakeven analysis
Chapter 13   breakeven analysisChapter 13   breakeven analysis
Chapter 13 breakeven analysisBich Lien Pham
 
Lecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesLecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesBich Lien Pham
 

Mais procurados (16)

Chapter 8 ror analysis for multiple alternatives
Chapter 8   ror analysis for multiple alternativesChapter 8   ror analysis for multiple alternatives
Chapter 8 ror analysis for multiple alternatives
 
Lecture # 12 measures of profitability ii
Lecture # 12 measures of profitability iiLecture # 12 measures of profitability ii
Lecture # 12 measures of profitability ii
 
Chapter 18 sensitivity analysis
Chapter 18   sensitivity analysisChapter 18   sensitivity analysis
Chapter 18 sensitivity analysis
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
slides-josepcoves
slides-josepcovesslides-josepcoves
slides-josepcoves
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
Lecture # 6 cost estimation ii
Lecture # 6 cost estimation iiLecture # 6 cost estimation ii
Lecture # 6 cost estimation ii
 
Chapter 12 independent projects & budget limitation
Chapter 12   independent projects & budget limitationChapter 12   independent projects & budget limitation
Chapter 12 independent projects & budget limitation
 
5 creating a_histogram
5 creating a_histogram5 creating a_histogram
5 creating a_histogram
 
Assignment in java
Assignment in javaAssignment in java
Assignment in java
 
Thesis PPT
Thesis PPTThesis PPT
Thesis PPT
 
test
testtest
test
 
BIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using MatlabBIometrics- plotting DET and EER curve using Matlab
BIometrics- plotting DET and EER curve using Matlab
 
Chapter 6 annual worth analysis
Chapter 6   annual worth analysisChapter 6   annual worth analysis
Chapter 6 annual worth analysis
 
Chapter 13 breakeven analysis
Chapter 13   breakeven analysisChapter 13   breakeven analysis
Chapter 13 breakeven analysis
 
Lecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest ratesLecture # 4 gradients factors and nominal and effective interest rates
Lecture # 4 gradients factors and nominal and effective interest rates
 

Destaque

Para los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionesPara los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionescorronuevo
 
Fiscal access system using rfid
Fiscal access system using rfidFiscal access system using rfid
Fiscal access system using rfidEcwaytech
 
Presentacion seguridad del paciente
Presentacion seguridad del pacientePresentacion seguridad del paciente
Presentacion seguridad del pacienteFormando Enfermeria
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Ecwaytech
 
Paz ivan
Paz ivanPaz ivan
Paz ivanarinani
 
Organizational behavior paper
Organizational behavior paperOrganizational behavior paper
Organizational behavior papercornel doyle
 
1 competencias digitales profesores
1 competencias digitales profesores1 competencias digitales profesores
1 competencias digitales profesoresTICS & Partners
 
Promotion and Pricing Strategies
Promotion and Pricing StrategiesPromotion and Pricing Strategies
Promotion and Pricing StrategiesKawser Ahmad Sohan
 
বাংলাদেশের মুক্তির সংগ্রাম
বাংলাদেশের  মুক্তির সংগ্রামবাংলাদেশের  মুক্তির সংগ্রাম
বাংলাদেশের মুক্তির সংগ্রামKawser Ahmad Sohan
 
Oracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSOracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSVijayananda Mohire
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEcwaytech
 

Destaque (20)

Para los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opinionesPara los muchos que tienen diversas opiniones
Para los muchos que tienen diversas opiniones
 
Tecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidadTecnologías utilizadas en la cotidianidad
Tecnologías utilizadas en la cotidianidad
 
Edit anaylist
Edit anaylistEdit anaylist
Edit anaylist
 
Fiscal access system using rfid
Fiscal access system using rfidFiscal access system using rfid
Fiscal access system using rfid
 
Motos
MotosMotos
Motos
 
Presentacion seguridad del paciente
Presentacion seguridad del pacientePresentacion seguridad del paciente
Presentacion seguridad del paciente
 
Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015Gacetilla de prensa 12 06-2015
Gacetilla de prensa 12 06-2015
 
Brad Pitt & Angelina Jolie
Brad Pitt & Angelina JolieBrad Pitt & Angelina Jolie
Brad Pitt & Angelina Jolie
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...
 
Paz ivan
Paz ivanPaz ivan
Paz ivan
 
Resume Carrie
Resume CarrieResume Carrie
Resume Carrie
 
Organizational behavior paper
Organizational behavior paperOrganizational behavior paper
Organizational behavior paper
 
1 competencias digitales profesores
1 competencias digitales profesores1 competencias digitales profesores
1 competencias digitales profesores
 
시디론Ppt
시디론Ppt시디론Ppt
시디론Ppt
 
ASNT_Level_III_UT
ASNT_Level_III_UTASNT_Level_III_UT
ASNT_Level_III_UT
 
USAFBannual_2014
USAFBannual_2014USAFBannual_2014
USAFBannual_2014
 
Promotion and Pricing Strategies
Promotion and Pricing StrategiesPromotion and Pricing Strategies
Promotion and Pricing Strategies
 
বাংলাদেশের মুক্তির সংগ্রাম
বাংলাদেশের  মুক্তির সংগ্রামবাংলাদেশের  মুক্তির সংগ্রাম
বাংলাদেশের মুক্তির সংগ্রাম
 
Oracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSSOracle University Certificate - Communications ASAP, OSS
Oracle University Certificate - Communications ASAP, OSS
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysis
 

Semelhante a SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence PortfolioChris Seebacher
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Paulo Gandra de Sousa
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud confamarsri
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsAlithya
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsSriskandarajah Suhothayan
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015eddiebaggott
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic ConceptsHareem Aslam
 

Semelhante a SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD (20)

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
C a1 fin_10
C a1 fin_10C a1 fin_10
C a1 fin_10
 
Bw14
Bw14Bw14
Bw14
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)Patterns of Enterprise Application Architecture (by example)
Patterns of Enterprise Application Architecture (by example)
 
PoEAA by Example
PoEAA by ExamplePoEAA by Example
PoEAA by Example
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
It ready dw_day4_rev00
It ready dw_day4_rev00It ready dw_day4_rev00
It ready dw_day4_rev00
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM Mappings
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
C++
C++C++
C++
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015Dublin Ireland Spark Meetup October 15, 2015
Dublin Ireland Spark Meetup October 15, 2015
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Chapter 1 Basic Concepts
Chapter 1 Basic ConceptsChapter 1 Basic Concepts
Chapter 1 Basic Concepts
 

Último

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

SAMSTAR: A Semi-automated Lexical Method to generate Star Schemas from an ERD

  • 1. SAMSTAR:of Poster A Semi-Automated Lexical Method to Titl generate Star Schemas from an ERD Authors Ritu Khare, Il Yeol Song, Yuan An PROBLEM To develop a star schema, existing approaches analyze the attributes of interesting business entities. •Entities with numerical measure attributes are assumed to be the candidates of facts •Entities with non-numerical and descriptive attributes are assumed to be the candidates of dimensions. 2. Annotated Dimension Design Patterns (A_DDP) We have referred to four sources and have instantiated the six classes of DDP to produce a list of 131 commonly used dimension entities. We refer to these entities as Annotated DDP (A_DDPs). These entities are frequently used entities in the business processes. Examples include account, activity, agent, aircraft, airport, etc. RESULTS Invoice We focus on the structure of the ERD. The novel features of SAMSTAR are: (1) the use of the notion of Connection Topology Value (CTV) in identifying the candidates of facts and dimensions and (2) the use of Annotated Dimensional Design Patterns (A_DDP) as well as WordNet to extend the list of dimensions. Customer ALGORITHM Quantitative Method (FIND FACTS AND DIRECT DIMENSIONS) DDP Fig.2 SAMSTAR Overview Quantitative Method (FIND INDIRECT DIMENSIONS WordNet A_DDP 1. Connection Topology Value (CTV) CTV (e) 1* n 0.8 * CTV(Node(i )) i 1 * If you are scanning charts, cartoons, illustrations or plain text non-photo represents scan entity having images), an at 600 dpi, then ‘Save As’ at 225 dpi. where i a direct M:1 relationship with e. For Fig. 1, CTV is calculated in the following manner: A B E H weight_d=100%; weight_i=80% D F C The CTV for each entity is: CTV (H) = 1* 0 + 0.8 * 0 = 0 CTV(F) = 0 G CTV(G) = 0 CTV(E) = 1*1 + 0.8 * CTV(H) = 1 Fig.1: Calculation of CTV CTV(B)= 1*1 + 0.8 * (CTV(E))= 1.8 CTV(C)=1*2 + 0.8 * (CTV(G) + CTV( F)) = 2 CTV(D)=1*1 + 0.8 * (CTV(C)) = 1 + 0.8 * 2 = 2.6 CTV(A)= 1*2 + 0.8 * (CTV(B) + CTV( C)) = 2 + 0.8 * (1.8+2) = 5.04 www.ischool.drexel.edu Order Time Logistics Shipment Order Customer Area Shipment Supplier Product-Supplier Order-Product Promotion Store Store SAMSTAR Order Invoice Customer Store Promotion Store Customer Promotion Type Product Warehouse Time Star Schema n Return Store ER Diagram OUR SOLUTION Store Invoice Return Warehouse Hence, these approaches are qualitative in nature, and focus only on the semantics of ERD. Time 1. Pre-process the input ERD. 2. Store Entities and Relationships. 3. Let user choose weighting factors for direct and indirect relationships. 4. Calculate the CTV for all entities. 5. Calculate the threshold value, Th, for CTV. 6. Identify entities having CTV higher than the threshold Th. These are the candidates for fact tables. 7. Decide and shortlist the fact entities. 8. For each fact entity, perform the following steps: (i) Identify the entities having direct M:1 link with a fact entity. (ii) Identify entities having indirect M:1 link with the fact entity. Out of these entities, identify synonyms of entity names from WordNet. Extract the terms which match the DDP entity list or the A_DDP List. (iii) Combine the results to Steps 8(i) and 8(ii) to prepare a list of candidate dimensions for a given fact. (iv) Add time dimension to this list. 9. Decide the dimension entities. 10. Let the user post-process the Star Schemas 11. Generate the star schema(s). Order Product Order Product Invoice Fig.3: Result of SAMSTAR Schemas generated by SAMSTAR are similar to those generated by the manual steps in case study papers. SAMSTAR generated star schemas are the superset of the ones generated manually in the paper using the same ER diagrams, user needs and business goals. This shows our schemas are inclusive of all possible facts and dimensions. This gives the designer a helpful aid to prune the schema as per the business and user requirements. CONTRIBUTION •A universal method to generate star schema(s) in that we have used generalized DDPs and WordNet to identify dimensions of a fact table. •Quantitative in nature in that we analyze the structure of an input ERD. •Identifies a set of fact candidates from a large and complex ERD. •Automatable up to a large extent; simplifies the work of experienced designers; and gives a smooth head-start to novices. The paper SAMSTAR: A Semi-Automated Lexical Method for generating Star Schemas using an Entity Relationship Diagram was presented at Data warehousing and On-line Analytical Processing (DOLAP 2007) workshop held in Lisbon, Portugal, November 2007.