SlideShare a Scribd company logo
1 of 20
Download to read offline
Ten Things to Avoid in a Data Model

           Dr. Michael Blaha
       Modelsoft Consulting Corp
        www.modelsoftcorp.com
      E-mail: blaha@computer.org
Introduction

• A model is an abstraction of some aspect of a problem.
• A data model is a model that describes how data is represented
  and accessed, usually for a database.
    – Data modeling can be a difficult task and is often pivotal to the success or failure of
      a project.
• There are many pitfalls to data modeling as we will explain...
    – Strategic pitfalls.
    – Detailed pitfalls.
• We do not discuss detailed modeling constructs such as keys, data
  types, nullability, and referential integrity.



  PAGE 2
Strategic Pitfalls…




PAGE 3
Strategic Pitfall: Vague Purpose

• Don’t build a model without understanding the business rationale.
• The purpose for a model dictates the level of detail.
     – Just entities and relationships.
     – Fully attributed.
     – With data types and constraints.
• The purpose also dictates the level of polish, the degree of completeness, and
  the amount of time justified.
• Different kinds of data models.
     – Detailed application model for development.
     – Rough application for a purchase spec.
     – Enterprise model for integration.
• This pitfall might seem obvious, but I’ve seen modeling efforts with little
  business purpose and no clear definition of deliverables.

   PAGE 4
Strategic Pitfall: Literal Modeling

• Your job is not to do what the customer says. Your job is to solve
  the problem that the customer is imperfectly describing.
• You must pay attention to the hidden true requirements.
• You must interpret and abstract what the customer tells you.
    – You must recognize arbitrary business decisions that could easily change.
• You can raise abstraction by thinking in terms of patterns.
• The use case mentality really misses this point.




  PAGE 5
Strategic Pitfall: Literal Modeling Example

            Original
            literal
            model



            Improved
            abstract
            model



• The original model is correct, but has problems. What happens if a person gets
  promoted to a supervisor and then to a manager? Are there multiple records?
  Movement of a record? Or???
• The improved model is more abstract and softcodes the management
  hierarchy.


   PAGE 6
Strategic Pitfall: Large Size

• Avoid large models. Limit a model to no more than 200 tables.
• Large models involve more work.
• Is the large size really justified or can you simplify the model with
  abstraction?
• I rarely encounter a large model with a compelling justification.
• I don’t see this step in software development methodologies, but
  it is certainly needed.




  PAGE 7
Strategic Pitfall: Speculative Content

• Do not include content that is not needed now and “might be
  helpful” in the future..
• All this does is to make a model larger, increase development
  time, and raise cost.
• A model must fully address the requirements, but not greatly
  exceed them.
• A quality model should be readily extended, so there is no need to
  add content in advance of need.
• Speculative content runs counter to the philosophy of agile
  development.


  PAGE 8
Strategic Pitfall: Lack of Clarity

• A relational database is declarative. Declare data in your
  models.
• A domain is the set of possible values for an attribute.
      – ERwin lets you define domains and then assign them to the pertinent attributes.
• An enumeration is a domain that has a finite set of values.
      – Declare enumerations in your databases.
•   Don’t store data structures with a binary encoding.
•   Don’t use cryptic names.
•   Don’t use anonymous fields that application code must interpret.
•   Obfuscation can happen through sloppy development practices.


    PAGE 9
Strategic Pitfall: Lack of Clarity Example

                                Car table
            Enumeration   carID year   color weight
            stored in     1     2001 red      2000
            place         2     1989 red      1500
                          3     2000 blue     2500


                                  Car table              Color table
            Enumeration   carID year   colorID weight   colorID   color
            stored        1     2001 1          2000    1         red
            separately    2     1989 1          1500    2         green
                          3     2000 3          2500    3         blue


                                Car table
                          carID year   color weight
            Enumeration
                          1     2001 1        2000
            encoded
                          2     1989 1        1500
                          3     2000 3        2500


  PAGE 10
Detailed Pitfalls…




PAGE 11
Detailed Pitfall: Reckless Violation of Normal Forms

• Do not accidentally violate normal forms.
• A normal form is a guideline that increases data consistency.
• As tables satisfy higher levels of normal forms, they are less likely
  to store redundant or contradictory data.
• Denormalization is only justified when there is a major
  performance bottleneck, such as for data warehouses.
• Be suspicious of large tables (30 attributes or more).
• Be suspicious of any entity type that is difficult to define.
• It is acceptable to violate normal forms deliberately, when there
  is a good reason to do so.


  PAGE 12
Detailed Pitfall: Normal Forms Example


               Violates
               normal
               form



               Satisfies
               normal
               form


• The contact position and contact phone depend on the contact
  name.
• The contact name depends on customerPK.

  PAGE 13
Detailed Pitfall: Needless Redundancy

• Be careful with redundancy.
     – Redundancy across applications.
     – Redundancy within an application.
• Normal forms are one aspect of redundancy.
• Ideally there should be a single recording of each data item. (Rarely is this
  completely feasible.)
• Organizations are rife with applications that overlap in awkward and loosely
  controlled ways.
     – This is a major justification for data warehouses.
• Don’t include redundant data to compensate for a poorly conceived
  application.
• Redundant data is acceptable if you use built-in database features to keep
  redundant data consistent (such as materialized views).

   PAGE 14
Detailed Pitfall: Parallel Attributes

• Avoid parallel attributes for non-data-warehouse applications.
• Parallel attributes often codify arbitrary business decisions, reducing
  information system flexibility.




      Parallel attributes               Parameterized model

• Widespread use of parallel attributes often indicates a poor model.

   PAGE 15
Detailed Pitfall: Symmetric Relationships

• Avoid symmetric relationships for relational databases.
• Promote a symmetric relationship to an entity type.




      Symmetric relationship     Promotion to an entity type


• Otherwise double entry or double search.
• Symmetric relationships can be acceptable for programming.


  PAGE 16
Detailed Pitfall: Anonymous Fields

• As much as possible, clearly describe the data being stored and
  avoid anonymous fields.

                            fragment of Location table
                locationAddress1     locationAddress2    locationAddress3
                456 Chicago Street   Decatur, IL xxxxx
                198 Broadway Dr.     Suite 201           Chicago, IL xxxxx
                123 Main Street      Cairo, IL xxxxx
                Chicago, IL xxxxx



• How to distinguish the city of Chicago from Chicago street?
• May need to parse a field to separate city, state, and postal code.
• A few incidental user-defined fields are OK.

  PAGE 17
Summary

• Data modeling is often a pivotal task in building a database
  application.
• A data model determines an application’s data quality,
  extensibility, and performance — and influences whether the
  application has a chance at business success.
• You can improve your data models if you pay attention to the
  pitfalls we have covered.




  PAGE 18
Speaker Bio

• Since 1994 Dr. Michael Blaha has been a consultant and trainer in
  conceiving, architecting, modeling, designing, and tuning
  databases for dozens of organizations throughout the world.
• He has authored six U.S. patents, five widely used books, and
  many papers.
• His most recent book, Patterns of Data Modeling, was published
  in June 2010.
• Blaha received his doctorate from Washington University in St.
  Louis and is an alumnus of GE Global Research in Schenectady, NY.
• You can contact him at blaha@computer.org and
  www.modelsoftcorp.com.


  PAGE 19
Questions?




 PAGE 20

More Related Content

Similar to 10 Things to Avoid in a Data Model

Power BI Advance Modeling
Power BI Advance ModelingPower BI Advance Modeling
Power BI Advance ModelingCCG
 
Slides: The Business Value of Data Modeling
Slides: The Business Value of Data ModelingSlides: The Business Value of Data Modeling
Slides: The Business Value of Data ModelingDATAVERSITY
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
 
Bdml Presentation
Bdml PresentationBdml Presentation
Bdml Presentationpere4399
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Big Data for Small Businesses
Big Data for Small BusinessesBig Data for Small Businesses
Big Data for Small BusinessesVivastream
 
DataEd Slides: Data Modeling is Fundamental
DataEd Slides:  Data Modeling is FundamentalDataEd Slides:  Data Modeling is Fundamental
DataEd Slides: Data Modeling is FundamentalDATAVERSITY
 
Business Functional Requirements
Business Functional RequirementsBusiness Functional Requirements
Business Functional RequirementsSunil-QA
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBigML, Inc
 
Business Requirements and Functional Requirements
Business Requirements and Functional RequirementsBusiness Requirements and Functional Requirements
Business Requirements and Functional RequirementsVeneet-BA
 
Business Centric Data Modeling
Business Centric Data ModelingBusiness Centric Data Modeling
Business Centric Data ModelingDATAVERSITY
 
Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...
Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...
Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...Amazon Web Services
 
Pitfalls and pro-tips for effective and transparent Business Intelligence too...
Pitfalls and pro-tips for effective and transparent Business Intelligence too...Pitfalls and pro-tips for effective and transparent Business Intelligence too...
Pitfalls and pro-tips for effective and transparent Business Intelligence too...Data Con LA
 
Karen Lopez 10 Physical Data Modeling Blunders
Karen Lopez 10 Physical Data Modeling BlundersKaren Lopez 10 Physical Data Modeling Blunders
Karen Lopez 10 Physical Data Modeling BlundersKaren Lopez
 
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...IDERA Software
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesDATAVERSITY
 
Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to
Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to
Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to Startupfest
 

Similar to 10 Things to Avoid in a Data Model (20)

Power BI Advance Modeling
Power BI Advance ModelingPower BI Advance Modeling
Power BI Advance Modeling
 
Slides: The Business Value of Data Modeling
Slides: The Business Value of Data ModelingSlides: The Business Value of Data Modeling
Slides: The Business Value of Data Modeling
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
Bdml Presentation
Bdml PresentationBdml Presentation
Bdml Presentation
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Big Data for Small Businesses
Big Data for Small BusinessesBig Data for Small Businesses
Big Data for Small Businesses
 
DataEd Slides: Data Modeling is Fundamental
DataEd Slides:  Data Modeling is FundamentalDataEd Slides:  Data Modeling is Fundamental
DataEd Slides: Data Modeling is Fundamental
 
Business Functional Requirements
Business Functional RequirementsBusiness Functional Requirements
Business Functional Requirements
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Business Requirements and Functional Requirements
Business Requirements and Functional RequirementsBusiness Requirements and Functional Requirements
Business Requirements and Functional Requirements
 
Business Centric Data Modeling
Business Centric Data ModelingBusiness Centric Data Modeling
Business Centric Data Modeling
 
Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...
Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...
Implementing advanced design patterns for Amazon DynamoDB - ADB401 - Chicago ...
 
Pitfalls and pro-tips for effective and transparent Business Intelligence too...
Pitfalls and pro-tips for effective and transparent Business Intelligence too...Pitfalls and pro-tips for effective and transparent Business Intelligence too...
Pitfalls and pro-tips for effective and transparent Business Intelligence too...
 
Karen Lopez 10 Physical Data Modeling Blunders
Karen Lopez 10 Physical Data Modeling BlundersKaren Lopez 10 Physical Data Modeling Blunders
Karen Lopez 10 Physical Data Modeling Blunders
 
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
Geek Sync | Data Architecture and Data Governance: A Powerful Data Management...
 
Data Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical ApproachesData Modeling Best Practices - Business & Technical Approaches
Data Modeling Best Practices - Business & Technical Approaches
 
Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to
Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to
Startupfest 2016: NOAH ILIINSKY (Amazon Web Services) - How to
 
Colour model
Colour modelColour model
Colour model
 

More from ERwin Modeling

Zen of metadata 09212010
Zen of metadata 09212010Zen of metadata 09212010
Zen of metadata 09212010ERwin Modeling
 
Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010ERwin Modeling
 
Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010ERwin Modeling
 
Monetizing data management 09162010
Monetizing data management 09162010Monetizing data management 09162010
Monetizing data management 09162010ERwin Modeling
 
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...ERwin Modeling
 
Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010ERwin Modeling
 
Deciding to go cloud 09212010
Deciding to go cloud  09212010Deciding to go cloud  09212010
Deciding to go cloud 09212010ERwin Modeling
 
Cust experience a practical guide 09152010
Cust experience a practical guide 09152010Cust experience a practical guide 09152010
Cust experience a practical guide 09152010ERwin Modeling
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010ERwin Modeling
 
Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010ERwin Modeling
 
Ca e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcastCa e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcastERwin Modeling
 
Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010ERwin Modeling
 

More from ERwin Modeling (12)

Zen of metadata 09212010
Zen of metadata 09212010Zen of metadata 09212010
Zen of metadata 09212010
 
Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010
 
Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010
 
Monetizing data management 09162010
Monetizing data management 09162010Monetizing data management 09162010
Monetizing data management 09162010
 
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
 
Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010
 
Deciding to go cloud 09212010
Deciding to go cloud  09212010Deciding to go cloud  09212010
Deciding to go cloud 09212010
 
Cust experience a practical guide 09152010
Cust experience a practical guide 09152010Cust experience a practical guide 09152010
Cust experience a practical guide 09152010
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010
 
Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010
 
Ca e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcastCa e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcast
 
Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010Optimizing the design of your data warehouse 09222010
Optimizing the design of your data warehouse 09222010
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

10 Things to Avoid in a Data Model

  • 1. Ten Things to Avoid in a Data Model Dr. Michael Blaha Modelsoft Consulting Corp www.modelsoftcorp.com E-mail: blaha@computer.org
  • 2. Introduction • A model is an abstraction of some aspect of a problem. • A data model is a model that describes how data is represented and accessed, usually for a database. – Data modeling can be a difficult task and is often pivotal to the success or failure of a project. • There are many pitfalls to data modeling as we will explain... – Strategic pitfalls. – Detailed pitfalls. • We do not discuss detailed modeling constructs such as keys, data types, nullability, and referential integrity. PAGE 2
  • 4. Strategic Pitfall: Vague Purpose • Don’t build a model without understanding the business rationale. • The purpose for a model dictates the level of detail. – Just entities and relationships. – Fully attributed. – With data types and constraints. • The purpose also dictates the level of polish, the degree of completeness, and the amount of time justified. • Different kinds of data models. – Detailed application model for development. – Rough application for a purchase spec. – Enterprise model for integration. • This pitfall might seem obvious, but I’ve seen modeling efforts with little business purpose and no clear definition of deliverables. PAGE 4
  • 5. Strategic Pitfall: Literal Modeling • Your job is not to do what the customer says. Your job is to solve the problem that the customer is imperfectly describing. • You must pay attention to the hidden true requirements. • You must interpret and abstract what the customer tells you. – You must recognize arbitrary business decisions that could easily change. • You can raise abstraction by thinking in terms of patterns. • The use case mentality really misses this point. PAGE 5
  • 6. Strategic Pitfall: Literal Modeling Example Original literal model Improved abstract model • The original model is correct, but has problems. What happens if a person gets promoted to a supervisor and then to a manager? Are there multiple records? Movement of a record? Or??? • The improved model is more abstract and softcodes the management hierarchy. PAGE 6
  • 7. Strategic Pitfall: Large Size • Avoid large models. Limit a model to no more than 200 tables. • Large models involve more work. • Is the large size really justified or can you simplify the model with abstraction? • I rarely encounter a large model with a compelling justification. • I don’t see this step in software development methodologies, but it is certainly needed. PAGE 7
  • 8. Strategic Pitfall: Speculative Content • Do not include content that is not needed now and “might be helpful” in the future.. • All this does is to make a model larger, increase development time, and raise cost. • A model must fully address the requirements, but not greatly exceed them. • A quality model should be readily extended, so there is no need to add content in advance of need. • Speculative content runs counter to the philosophy of agile development. PAGE 8
  • 9. Strategic Pitfall: Lack of Clarity • A relational database is declarative. Declare data in your models. • A domain is the set of possible values for an attribute. – ERwin lets you define domains and then assign them to the pertinent attributes. • An enumeration is a domain that has a finite set of values. – Declare enumerations in your databases. • Don’t store data structures with a binary encoding. • Don’t use cryptic names. • Don’t use anonymous fields that application code must interpret. • Obfuscation can happen through sloppy development practices. PAGE 9
  • 10. Strategic Pitfall: Lack of Clarity Example Car table Enumeration carID year color weight stored in 1 2001 red 2000 place 2 1989 red 1500 3 2000 blue 2500 Car table Color table Enumeration carID year colorID weight colorID color stored 1 2001 1 2000 1 red separately 2 1989 1 1500 2 green 3 2000 3 2500 3 blue Car table carID year color weight Enumeration 1 2001 1 2000 encoded 2 1989 1 1500 3 2000 3 2500 PAGE 10
  • 12. Detailed Pitfall: Reckless Violation of Normal Forms • Do not accidentally violate normal forms. • A normal form is a guideline that increases data consistency. • As tables satisfy higher levels of normal forms, they are less likely to store redundant or contradictory data. • Denormalization is only justified when there is a major performance bottleneck, such as for data warehouses. • Be suspicious of large tables (30 attributes or more). • Be suspicious of any entity type that is difficult to define. • It is acceptable to violate normal forms deliberately, when there is a good reason to do so. PAGE 12
  • 13. Detailed Pitfall: Normal Forms Example Violates normal form Satisfies normal form • The contact position and contact phone depend on the contact name. • The contact name depends on customerPK. PAGE 13
  • 14. Detailed Pitfall: Needless Redundancy • Be careful with redundancy. – Redundancy across applications. – Redundancy within an application. • Normal forms are one aspect of redundancy. • Ideally there should be a single recording of each data item. (Rarely is this completely feasible.) • Organizations are rife with applications that overlap in awkward and loosely controlled ways. – This is a major justification for data warehouses. • Don’t include redundant data to compensate for a poorly conceived application. • Redundant data is acceptable if you use built-in database features to keep redundant data consistent (such as materialized views). PAGE 14
  • 15. Detailed Pitfall: Parallel Attributes • Avoid parallel attributes for non-data-warehouse applications. • Parallel attributes often codify arbitrary business decisions, reducing information system flexibility. Parallel attributes Parameterized model • Widespread use of parallel attributes often indicates a poor model. PAGE 15
  • 16. Detailed Pitfall: Symmetric Relationships • Avoid symmetric relationships for relational databases. • Promote a symmetric relationship to an entity type. Symmetric relationship Promotion to an entity type • Otherwise double entry or double search. • Symmetric relationships can be acceptable for programming. PAGE 16
  • 17. Detailed Pitfall: Anonymous Fields • As much as possible, clearly describe the data being stored and avoid anonymous fields. fragment of Location table locationAddress1 locationAddress2 locationAddress3 456 Chicago Street Decatur, IL xxxxx 198 Broadway Dr. Suite 201 Chicago, IL xxxxx 123 Main Street Cairo, IL xxxxx Chicago, IL xxxxx • How to distinguish the city of Chicago from Chicago street? • May need to parse a field to separate city, state, and postal code. • A few incidental user-defined fields are OK. PAGE 17
  • 18. Summary • Data modeling is often a pivotal task in building a database application. • A data model determines an application’s data quality, extensibility, and performance — and influences whether the application has a chance at business success. • You can improve your data models if you pay attention to the pitfalls we have covered. PAGE 18
  • 19. Speaker Bio • Since 1994 Dr. Michael Blaha has been a consultant and trainer in conceiving, architecting, modeling, designing, and tuning databases for dozens of organizations throughout the world. • He has authored six U.S. patents, five widely used books, and many papers. • His most recent book, Patterns of Data Modeling, was published in June 2010. • Blaha received his doctorate from Washington University in St. Louis and is an alumnus of GE Global Research in Schenectady, NY. • You can contact him at blaha@computer.org and www.modelsoftcorp.com. PAGE 19