SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
I’ve Always Wanted To
Data Model
Ian Varley, Salesforce.com
Data Week, 2013-10-02
Lightning Talk (10 minutes)
Who am I?
Ian Varley
Austin, TX
Salesforce.com
Big Data Team
@thefutureian
What’s Data Modeling?
The act of taking the intelligible
structure of the world around us, and
making it concrete enough for
computers to act on it.
(More specifically, data modeling usually
has to do with storing it in a database.)
Traditionally, data modeling has meant
Entity Attribute Relationship
modeling techniques.

There are variants that are more “OO” (like UML) but they
share most of the same core assumptions.
Many a project was sunk
due to shitty data modeling.
It’s a difficult occupation.
You have to be part engineer, part
psychologist, and part philosopher.
If you’re doing it, you’re not alone.
Lots of smart folks think about this stuff.
(David Hay, Steve Hoberman, Joe Celko, many more.)
But.
The expressive power of our
conceptual modeling techniques hasn’t
improved much since the 1970s.

We mostly look at the world in the
same static way we did 40 years ago.
Partly, this is because our discipline is
wedded to relational (SQL) DBs.

When the only tool you have
is a hammer ...
A book that opened my eyes ...

(He said a lot of the stuff I’m about to say back in 1978!)
I don’t have a lot of answers.
But I want to raise some questions.
And hopefully, start a conversation.
Here are 5 observations about the
tools of traditional data modeling.
#1: nobody actually knows
what an “entity” really is.
“Entity” is another word for Category,
in linguistics terms.
And an important property of linguistic
categories is that they are slippery.
See:
● Steven Pinker: The Stuff Of Thought
● Douglas Hofstadter: Surfaces & Essences
● George Lakoff: Women, Fire, and Dangerous Things
part: an abstract definition of
a connected set of physical
materials that serve some
purpose, and that people are
willing to buy

part: one instance of a part
type, which arrives on the QA
line at a specific time and
either does or doesn't meet
quality standards
And if you think you can “solve” the
problem, I’ve got some world trade
center insurance policies to sell you.
That said, there are a couple tools we
could adopt that would help:
● First-class Sub- / Super-Typing
● First-class Scoping and Aliasing
(Not that there aren’t ways to do this in ERD models, but
they’re unobvious and not widely used.)
#2: entities, attributes, and
relationships are really the
same thing, maaaan ...

http://the-hippie-portfolio.tumblr.com/
Say I’ve got a “parent” in my model.
Is it:
● A “parent” entity?
● A “person” entity with
an “isParent” attribute?
● Two “person” entities in
a “parent” relationship?
It’s all of them; the distinction is
arbitrary.
The real structure is just a graph … but
none of our modeling tools are that
flexible, nor is it helpful to think that
abstractly about most software.
Normally, we make the choice based
on our experience and gut feeling, and
pretend there’s a science to it.
But the whole way of thinking is a
convenience based on “records”.
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
This isn’t realistic with today’s tools, so
this is just idle speculation.
#3: prescriptive models
encourage black & white
thinking in a gray world
You have to make decisions (about
entities, attributes, relationships, types)
up front. But sometimes that’s not right.
This is a strength of (some) NoSQL
databases: you can do data first, and
surface structure later.
Sometimes the deep structure is
actually ambiguous.
This can apply broadly.
(What if an employee isn’t really “in” a department, but has
flexible membership based on where she spends her time?)
You can represent that in a traditional
data model, sure.
But you’re not encouraged to.
#4: static models make the
time dimension unwieldy
Entity models are generally silent on
the ways data changes.
Many modern databases can keep
older versions of objects.
But should they? For which entities
How many versions? etc.
Worse, what about when the model
changes at runtime, and you need to
also retain knowledge of what the old
model was?
As in #3, there are ways to model this
in entity models, but it’s not easy, so
most people just don’t think about it.
#5: boxes & lines aren’t
how we actually think
Our spatial processing of diagrams
doesn’t map well to our temporal,
spatial, and causal comprehension of
data structure.
What do people really do?
Skip making models when their
models look too complicated.
F*** THAT NOISE.
Is there an alternative? Not yet.
What could move the needle?
● Prototype based modeling
● Proper scoping
● Semantic zooming
The map is not the territory.
In conclusion …
if you dig this stuff, let’s talk!
@thefutureian

Mais conteúdo relacionado

Mais procurados

Hpai class 4 - text classification w colab - 020520 and in class demo
Hpai   class 4 - text classification w colab - 020520 and in class demoHpai   class 4 - text classification w colab - 020520 and in class demo
Hpai class 4 - text classification w colab - 020520 and in class demomelendez321
 
Hpai class 14 - brain cells and memory - 031620
Hpai   class 14 - brain cells and memory - 031620Hpai   class 14 - brain cells and memory - 031620
Hpai class 14 - brain cells and memory - 031620melendez321
 
Using Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesUsing Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesEdmund Chattoe-Brown
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieeeRaman Kannan
 
Making sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXMaking sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXjohanna kollmann
 
Hpai class 12 - potpourri & perception - 032620 actual
Hpai   class 12 - potpourri & perception - 032620 actualHpai   class 12 - potpourri & perception - 032620 actual
Hpai class 12 - potpourri & perception - 032620 actualmelendez321
 

Mais procurados (6)

Hpai class 4 - text classification w colab - 020520 and in class demo
Hpai   class 4 - text classification w colab - 020520 and in class demoHpai   class 4 - text classification w colab - 020520 and in class demo
Hpai class 4 - text classification w colab - 020520 and in class demo
 
Hpai class 14 - brain cells and memory - 031620
Hpai   class 14 - brain cells and memory - 031620Hpai   class 14 - brain cells and memory - 031620
Hpai class 14 - brain cells and memory - 031620
 
Using Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesUsing Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and Challenges
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieee
 
Making sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXMaking sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UX
 
Hpai class 12 - potpourri & perception - 032620 actual
Hpai   class 12 - potpourri & perception - 032620 actualHpai   class 12 - potpourri & perception - 032620 actual
Hpai class 12 - potpourri & perception - 032620 actual
 

Semelhante a I've Always Wanted To Data Model - Data Week 2013

Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyAnthony (Tony) Sarris
 
“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...IL Group (CILIP Information Literacy Group)
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyThe Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyHolger Bartel
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011tpgoddard
 
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxLearning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxsmile790243
 
Why Software Drives Us Crazy
Why Software Drives Us CrazyWhy Software Drives Us Crazy
Why Software Drives Us CrazyTechWell
 
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Tudor Girba
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)Truong Bomi
 
Rebecca parsons agile east
Rebecca parsons   agile eastRebecca parsons   agile east
Rebecca parsons agile eastKmanthei
 
they should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfthey should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfsrinivas9922
 
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...Cognizant
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Miningebelani
 
Object Oriented Analysis And Design
Object Oriented Analysis And DesignObject Oriented Analysis And Design
Object Oriented Analysis And DesignSahil Mahajan
 
Flexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsFlexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsSara Wachter-Boettcher
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
Understanding and Conceptualizing interaction - Mary Margarat
Understanding and Conceptualizing interaction  - Mary MargaratUnderstanding and Conceptualizing interaction  - Mary Margarat
Understanding and Conceptualizing interaction - Mary MargaratMary Margarat
 

Semelhante a I've Always Wanted To Data Model - Data Week 2013 (20)

Ai lecture1 final
Ai lecture1 finalAi lecture1 final
Ai lecture1 final
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
 
“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyThe Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011
 
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxLearning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
 
Why Software Drives Us Crazy
Why Software Drives Us CrazyWhy Software Drives Us Crazy
Why Software Drives Us Crazy
 
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
 
Rebecca parsons agile east
Rebecca parsons   agile eastRebecca parsons   agile east
Rebecca parsons agile east
 
they should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfthey should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdf
 
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Object Oriented Analysis And Design
Object Oriented Analysis And DesignObject Oriented Analysis And Design
Object Oriented Analysis And Design
 
Flexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsFlexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready Organizations
 
Theseus' data
Theseus' dataTheseus' data
Theseus' data
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
Understanding and Conceptualizing interaction - Mary Margarat
Understanding and Conceptualizing interaction  - Mary MargaratUnderstanding and Conceptualizing interaction  - Mary Margarat
Understanding and Conceptualizing interaction - Mary Margarat
 

Último

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

I've Always Wanted To Data Model - Data Week 2013

  • 1. I’ve Always Wanted To Data Model Ian Varley, Salesforce.com Data Week, 2013-10-02 Lightning Talk (10 minutes)
  • 2. Who am I? Ian Varley Austin, TX Salesforce.com Big Data Team @thefutureian
  • 4. The act of taking the intelligible structure of the world around us, and making it concrete enough for computers to act on it. (More specifically, data modeling usually has to do with storing it in a database.)
  • 5. Traditionally, data modeling has meant Entity Attribute Relationship modeling techniques. There are variants that are more “OO” (like UML) but they share most of the same core assumptions.
  • 6. Many a project was sunk due to shitty data modeling.
  • 7. It’s a difficult occupation. You have to be part engineer, part psychologist, and part philosopher.
  • 8. If you’re doing it, you’re not alone. Lots of smart folks think about this stuff. (David Hay, Steve Hoberman, Joe Celko, many more.)
  • 10. The expressive power of our conceptual modeling techniques hasn’t improved much since the 1970s. We mostly look at the world in the same static way we did 40 years ago.
  • 11. Partly, this is because our discipline is wedded to relational (SQL) DBs. When the only tool you have is a hammer ...
  • 12. A book that opened my eyes ... (He said a lot of the stuff I’m about to say back in 1978!)
  • 13. I don’t have a lot of answers. But I want to raise some questions. And hopefully, start a conversation.
  • 14. Here are 5 observations about the tools of traditional data modeling.
  • 15. #1: nobody actually knows what an “entity” really is.
  • 16. “Entity” is another word for Category, in linguistics terms. And an important property of linguistic categories is that they are slippery. See: ● Steven Pinker: The Stuff Of Thought ● Douglas Hofstadter: Surfaces & Essences ● George Lakoff: Women, Fire, and Dangerous Things
  • 17. part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards
  • 18. And if you think you can “solve” the problem, I’ve got some world trade center insurance policies to sell you.
  • 19. That said, there are a couple tools we could adopt that would help: ● First-class Sub- / Super-Typing ● First-class Scoping and Aliasing (Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)
  • 20. #2: entities, attributes, and relationships are really the same thing, maaaan ... http://the-hippie-portfolio.tumblr.com/
  • 21. Say I’ve got a “parent” in my model. Is it: ● A “parent” entity? ● A “person” entity with an “isParent” attribute? ● Two “person” entities in a “parent” relationship? It’s all of them; the distinction is arbitrary.
  • 22. The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that abstractly about most software.
  • 23. Normally, we make the choice based on our experience and gut feeling, and pretend there’s a science to it.
  • 24. But the whole way of thinking is a convenience based on “records”.
  • 25. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 26. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 27. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 28. This isn’t realistic with today’s tools, so this is just idle speculation.
  • 29. #3: prescriptive models encourage black & white thinking in a gray world
  • 30. You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.
  • 31. This is a strength of (some) NoSQL databases: you can do data first, and surface structure later.
  • 32. Sometimes the deep structure is actually ambiguous.
  • 33.
  • 34. This can apply broadly. (What if an employee isn’t really “in” a department, but has flexible membership based on where she spends her time?)
  • 35. You can represent that in a traditional data model, sure. But you’re not encouraged to.
  • 36. #4: static models make the time dimension unwieldy
  • 37. Entity models are generally silent on the ways data changes.
  • 38. Many modern databases can keep older versions of objects. But should they? For which entities How many versions? etc.
  • 39. Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old model was?
  • 40. As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.
  • 41. #5: boxes & lines aren’t how we actually think
  • 42. Our spatial processing of diagrams doesn’t map well to our temporal, spatial, and causal comprehension of data structure.
  • 43. What do people really do? Skip making models when their models look too complicated.
  • 44.
  • 46. Is there an alternative? Not yet.
  • 47. What could move the needle? ● Prototype based modeling ● Proper scoping ● Semantic zooming
  • 48. The map is not the territory.
  • 49. In conclusion … if you dig this stuff, let’s talk! @thefutureian