One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.
4. The act of taking the intelligible
structure of the world around us, and
making it concrete enough for
computers to act on it.
(More specifically, data modeling usually
has to do with storing it in a database.)
5. Traditionally, data modeling has meant
Entity Attribute Relationship
modeling techniques.
There are variants that are more “OO” (like UML) but they
share most of the same core assumptions.
10. The expressive power of our
conceptual modeling techniques hasn’t
improved much since the 1970s.
We mostly look at the world in the
same static way we did 40 years ago.
11. Partly, this is because our discipline is
wedded to relational (SQL) DBs.
When the only tool you have
is a hammer ...
12. A book that opened my eyes ...
(He said a lot of the stuff I’m about to say back in 1978!)
13. I don’t have a lot of answers.
But I want to raise some questions.
And hopefully, start a conversation.
14. Here are 5 observations about the
tools of traditional data modeling.
16. “Entity” is another word for Category,
in linguistics terms.
And an important property of linguistic
categories is that they are slippery.
See:
● Steven Pinker: The Stuff Of Thought
● Douglas Hofstadter: Surfaces & Essences
● George Lakoff: Women, Fire, and Dangerous Things
17. part: an abstract definition of
a connected set of physical
materials that serve some
purpose, and that people are
willing to buy
part: one instance of a part
type, which arrives on the QA
line at a specific time and
either does or doesn't meet
quality standards
18. And if you think you can “solve” the
problem, I’ve got some world trade
center insurance policies to sell you.
19. That said, there are a couple tools we
could adopt that would help:
● First-class Sub- / Super-Typing
● First-class Scoping and Aliasing
(Not that there aren’t ways to do this in ERD models, but
they’re unobvious and not widely used.)
20. #2: entities, attributes, and
relationships are really the
same thing, maaaan ...
http://the-hippie-portfolio.tumblr.com/
21. Say I’ve got a “parent” in my model.
Is it:
● A “parent” entity?
● A “person” entity with
an “isParent” attribute?
● Two “person” entities in
a “parent” relationship?
It’s all of them; the distinction is
arbitrary.
22. The real structure is just a graph … but
none of our modeling tools are that
flexible, nor is it helpful to think that
abstractly about most software.
23. Normally, we make the choice based
on our experience and gut feeling, and
pretend there’s a science to it.
24. But the whole way of thinking is a
convenience based on “records”.
25. I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
26. I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
27. I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?