O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Solve User Problems: Data Architecture for Humans

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 78 Anúncio

Solve User Problems: Data Architecture for Humans

Baixar para ler offline

We are bombarded with stories of the latest products to hit the market – products that will change everything we do. This causes us to focus on the latest technology, building IT for the sake of building IT. Meanwhile, the world still seems to run on Excel.
The “big innovators” who have and use unimaginably large amounts of data are not the norm. Aspiring to use the same complex technologies and patterns they do leads to poor investments and tradeoffs. This is an age-old problem rooted in the over-emphasis of technology as the agent of change. Technology isn’t the answer – it’s the platform on which people build answers.
To emphasize technology is to ignore the way tools change people and practices. The design focus in our market was on storing and making data accessible. If we want to make progress then we need to step back from the details and look at data from the perspective of the organization. Our design focus shifts to people learning and applying new insights, asking questions about how an organization can be more resilient, more efficient, or faster to sense and respond to changing conditions.
In this talk you will learn how to put your data architecture into a human frame of reference. Drawing inspiration from the history of technology and urban planning, we will see that the services provided by the things we build are what drive success, not the latest shiny distraction.

We are bombarded with stories of the latest products to hit the market – products that will change everything we do. This causes us to focus on the latest technology, building IT for the sake of building IT. Meanwhile, the world still seems to run on Excel.
The “big innovators” who have and use unimaginably large amounts of data are not the norm. Aspiring to use the same complex technologies and patterns they do leads to poor investments and tradeoffs. This is an age-old problem rooted in the over-emphasis of technology as the agent of change. Technology isn’t the answer – it’s the platform on which people build answers.
To emphasize technology is to ignore the way tools change people and practices. The design focus in our market was on storing and making data accessible. If we want to make progress then we need to step back from the details and look at data from the perspective of the organization. Our design focus shifts to people learning and applying new insights, asking questions about how an organization can be more resilient, more efficient, or faster to sense and respond to changing conditions.
In this talk you will learn how to put your data architecture into a human frame of reference. Drawing inspiration from the history of technology and urban planning, we will see that the services provided by the things we build are what drive success, not the latest shiny distraction.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Solve User Problems: Data Architecture for Humans (20)

Anúncio

Mais de mark madsen (14)

Mais recentes (20)

Anúncio

Solve User Problems: Data Architecture for Humans

  1. 1. Let’s solve user problems (data architecture for humans) March, 2021 Mark Madsen - @markmadsen - https://www.linkedin.com/in/markmadsen/
  2. 2. Where I am in my career (and number of mistakes I make)
  3. 3. © Third Nature Inc. This talk will not be about best practices Best practice in the early market is usual a euphemism for “workaround” What the innovator did may not be right, it may just be not wrong What the analyst firms call best practice is often better described as survival bias
  4. 4. © Third Nature Inc. There’s a difference between having no past and actively ignoring it
  5. 5. Copyright Third Nature, Inc. A HISTORY OF REINVENTION
  6. 6. Copyright Third Nature, Inc. "Those who cannot remember the past are condemned to repeat it.” George Santayana If there’s one lesson we can take from history, It’s that nobody learns any lessons from history.
  7. 7. Copyright Third Nature, Inc. Online Realtime For decision making Today we’ll call it streaming
  8. 8. Copyright Third Nature, Inc. Technology patterns New “Data Bases” ™ Storage virtualization Separation of storage and compute
  9. 9. Copyright Third Nature, Inc. Technology patterns New “Data Bases” ™ Storage virtualization Separation of storage and compute What year was it?
  10. 10. Copyright Third Nature, Inc. Technology patterns New “Data Bases” ™ Storage virtualization Separation of storage and compute Welcome to 1975
  11. 11. Copyright Third Nature, Inc. BETTER is New Our core beliefs in software are based on this. Progress is not a promise.
  12. 12. Copyright Third Nature, Inc. BETTER ? is New This is fundamentally a belief in leading with technology to solve problems…
  13. 13. © Third Nature Inc. Technology Adoption Some people can’t resist getting the next new thing because it’s new. Many IT organizations are like this, promoting a solution and hunting for the problem that matches it. Better to ask “What is the problem for which this technology is the answer?”
  14. 14. Copyright Third Nature, Inc. The solution to a puppy problem is not to add more puppies
  15. 15. © Third Nature Inc. Marketing and case studies: what people say vs reality Beware the case study, unless they talk about it from first hand experience, in production, and say what did not work. Which should be most of it. Most cases and vendor testimonials are: ▪ Aspirational ▪ Immature ▪ Apply to 10 companies world wide
  16. 16. © Third Nature Inc. Design tip: Be skeptical about anything you hear regarding new data platform technology • Optimism • Ignorance • Lacking info on “what it does poorly”, which you know very well about your existing vendors
  17. 17. © Third Nature Inc. Be skeptical because technology has a tendency to solve a problem with a problem. Solve scalability with brute force parallelism. Now you have an availability problem. Solve that with redundant copies. Now you have a consistency problem…
  18. 18. © Third Nature Inc. THE EVOLUTION OF ORGANIZATIONAL DATA USE
  19. 19. © Third Nature Inc. History: This is how BI was done through the 80s First there were files and reporting programs. Application files feed through a data processing pipeline to generate an output file. The file is used by a report formatter for print/screen. Files are largely single-purpose use. Every report is a program written by a developer. Data pipeline code
  20. 20. © Third Nature Inc. History: This is how BI ended the 80s The inevitable situation was... Data pipeline code
  21. 21. © Third Nature Inc. History: This is how we started the 90s Collect data in a database. Queries replaced a LOT of application code because much was just joins. We learned about “dead code” SQL SQL SQL SQL SQL
  22. 22. © Third Nature Inc. Pragmatism and Data Lessons learned during the ad- hoc SQL era of the DW market: When the technology is awkward for the users, the users will stop trying to use it. Even “simple” schemas weren’t enough for anyone other than analysts and their Brio… Led to the evolution of metadata-driven SQL- generating BI tools, ETL tools.
  23. 23. © Third Nature Inc. BI evolved to hiding query generation for end users With more regular schema models, in particular dimensional models that didn’t contain cyclic join paths, it was possible to automate SQL generation via semantic mapping layers created by analysts. We developed data pipeline building tools (aka ETL). Query via business terms made BI usable by non-technical people. ETL SQL Life got much easier…for a while
  24. 24. © Third Nature Inc. Today’s model: Lake + data engineers, looks familiar… The Lake with data pipelines to files or Hive tables is exactly the same pattern as the COBOL batch Pipeline code We already know that people don’t scale. Don’t do this
  25. 25. © Third Nature Inc. DESIGN AND COMPLEXITY TODAY
  26. 26. Copyright Third Nature, Inc. "Always design a thing by considering it in its next larger context - a chair in a room, a room in a house, a house in an environment, an environment in a city plan." – Eliel Saarinen
  27. 27. Copyright Third Nature, Inc. Order Entry Order Database Customer Service Integration Program Inventory Database Distribution Integration Program Receivables Database Accounts Receivable Data Warehouse Analysts & users This is the simplistic view people have of IT, if they see even this level of detail
  28. 28. © Third Nature Inc. © Third Nature Inc. Real complexity is based on communication, which is data flows Internal 3rd party & custom applications, event streams, logs, external & SaaS applications, 3rd party datasets… – this is the reality
  29. 29. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Monthly Production plans Weekly pre- orders for bulk cheese Availability confirmation and location In store system Store Stock Management Store EPOS data Category Supervis or Stock adjustments/ order interventions Order adjustment Stock/order interventions * * Orders (based on 6 day forecast) Dallas Distrib Centre WMS Picking/load teams Pos/Pick lists/Load sheets Confirmed Deliveries/ Confirmed picks + loads Farmers Milk intake/ silos Cheese plant Plant Processor In-house Cheese store Contract Cheese store Processor Packing plant Processor National Distribution Centre Retailer RDC Retailer Stores (550) Retailer HQ Consolidated Demand Ordering Processor NDC Customer Services Daily order - SKU/Depot/ Vol Sent @ 12.30-13:00 Delivery orders Processor HQ Sales Team/ Account Manager Processor HQ Forecasting Team Processor HQ Bulk Planning Team Cheese plant Planner/Stock office Processor HQ Milk Purchasing Team Cheese plant Transport Manager Actual daily delivery figures Daily collection planning Weekly order for delivery to Packing plant Daily & weekly Call- off Daily Call-off 15/day 22 pallet loads 15/day A80 Shortages/ Allocation instructions Annual Buying plan Milk Availability Forecast Annual prediction of milk production Shortages/ Allocation instructions Daily milk intake Weekly milk shortages shortages Spot mkt or Processor ingredients Packing plant Planning Team Processor HQ JBA Invoicing and Sales Monitor FGI and Last 5 weeks sales Expedite Changes to existing forecast - exceptions Retailer HQ Retailer Buyer Meeting every 6 weeks Packing plant Cheese ordering 10 day stock plan On line stock info 7 day order plan for bulk cheese Arrange daily delivery schedule Emergency call-off Daily optimisation of loads Service Monitor Despatch and delivery confirmations Processor NDC Transport Planning Transport Plan Processor NDC Inventory Monitoring Stock and delivery monitoring Processor NDC Warehouse management syatem Operation Instructions Key Shaded Boxes = Product flow system Un-shaded boxes = Information flow system Retailer Cheese Processor Farms Schedule weekly & Daily 10 Day plan(wed) and daily plan 15/day Changes to existing forecast - exceptions Stock availability Monthly review Annual f/cast Source: IGD Food Chain Centre, February 2008 All companies operate in the context of an industry. The external data interchanges and market signals are today as important as the internal data, for both strategic and operational decision making. Gray = companies in value chain Red = information flows and systems
  30. 30. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Monthly Production plans Weekly pre- orders for bulk cheese Availability confirmation and location In store system Store Stock Management Store EPOS data Category Supervis or Stock adjustments/ order interventions Order adjustment Stock/order interventions * * Orders (based on 6 day forecast) Dallas Distrib Centre WMS Picking/load teams Pos/Pick lists/Load sheets Confirmed Deliveries/ Confirmed picks + loads Farmers Milk intake/ silos Cheese plant Plant Processor In-house Cheese store Contract Cheese store Processor Packing plant Processor National Distribution Centre Retailer RDC Retailer Stores (550) Retailer HQ Consolidated Demand Ordering Processor NDC Customer Services Daily order - SKU/Depot/ Vol Sent @ 12.30-13:00 Delivery orders Processor HQ Sales Team/ Account Manager Processor HQ Forecasting Team Processor HQ Bulk Planning Team Cheese plant Planner/Stock office Processor HQ Milk Purchasing Team Cheese plant Transport Manager Actual daily delivery figures Daily collection planning Weekly order for delivery to Packing plant Daily & weekly Call- off Daily Call-off 15/day 22 pallet loads 15/day A80 Shortages/ Allocation instructions Annual Buying plan Milk Availability Forecast Annual prediction of milk production Shortages/ Allocation instructions Daily milk intake Weekly milk shortages shortages Spot mkt or Processor ingredients Packing plant Planning Team Processor HQ JBA Invoicing and Sales Monitor FGI and Last 5 weeks sales Expedite Changes to existing forecast - exceptions Retailer HQ Retailer Buyer Meeting every 6 weeks Packing plant Cheese ordering 10 day stock plan On line stock info 7 day order plan for bulk cheese Arrange daily delivery schedule Emergency call-off Daily optimisation of loads Service Monitor Despatch and delivery confirmations Processor NDC Transport Planning Transport Plan Processor NDC Inventory Monitoring Stock and delivery monitoring Processor NDC Warehouse management syatem Operation Instructions Key Shaded Boxes = Product flow system Un-shaded boxes = Information flow system Retailer Cheese Processor Farms Schedule weekly & Daily 10 Day plan(wed) and daily plan 15/day Changes to existing forecast - exceptions Stock availability Monthly review Annual f/cast The real data context of the organization that is assembled by the data platforms is subsets of all of these systems. The complexity of a DW is a function of the complexity of the organization and all the integration points. There’s more to it than just the systems and technologies… Data Warehouse
  31. 31. Data is transformed, cleaned, integrated, and new data is derived. This adds a level of temporal and semantic complexity to data management, and it’s always hidden. Machine learning won’t “solve” data integration. It will help in some areas, mostly with augmenting simpler tasks. Data flows – the dark matter of your architecture diagrams
  32. 32. © Third Nature Inc. © Third Nature Inc. Data Complexity, one application Meanwhile, people complain when a data model looks like this
  33. 33. This is a map of one organization’s analytic data, showing the dataset complexity inherent in a mid- sized organization. Different views of data complexity
  34. 34. Data complexity is not just based on the number of datasets, or the number of tables. It is based on the number of connections. This is an order of magnitude higher than number of objects. Organizational complexity drives communication complexity drives data complexity. Different views of data complexity
  35. 35. This view is only showing connections between objects in data sets based on data relationships. All these connections are joins you must take care with in a well managed platform.
  36. 36. Different views of data complexity A reverse gravity view, showing the mass of reused / replicated information at the center and the nodes where large interchanges occur. These different views show how complex an organization’s data really is, rather than the abstract list of sources and terabytes stored. This is why managing data is a difficult job
  37. 37. Different views – data and use The value of data is tied to its use. This shows relationships between people and data used. This and the prior diagram show an important point: 70% of the data is used and reused constantly. 30% of the data is used by one or a few people, often new data with undetermined value. This information can be used to determine where and how you should spend your limited resources and money.
  38. 38. © Third Nature Inc. Connections and uses of data are scale free networks The connections in the data have an exponential distribution. Each new copy of data (or derived set, subset, aggregate) adds N-1 possible connections. You can’t manage all the data. Which data do you spend time on? Nodes (tables) Number of connections Used often: 70% of the connections tie to a small set of data, the core of reuse. Centrally manage this Used seldom: 10% of the connections go to a large number of objects (new, low value, narrow). Locally curate this
  39. 39. Copyright Third Nature, Inc. The reality of data availability is that it can only be a subset The rest of the data is still here… There will always be more data available than ability to analyze it. Some judgement must be applied to sort the more from the less important
  40. 40. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Loosely managed data User managed data In an expanded ecosystem of data, curation processes are needed to address quality, definition and structure Closely managed data High quality, well-known Directional quality Unknown / low quality Curation is directly attached to your data architecture...
  41. 41. © Third Nature Inc. Data curation is an undeveloped practice The problem with so many sources, types, formats and latencies of data is that it is now impossible to create in advance one model for all of it. Data modeling is about the inside of a dataset. Curation is about the entire dataset. It isn’t development. It’s about: creating, labeling, organizing, finding, navigating, archiving. Data curation, rather than data modeling, is becoming the most important data management practice.
  42. 42. Copyright Third Nature, Inc. The real purpose of this work is not to help IT be more productive. IT exists to help users be more productive Starting with technology is like getting excited by a new chisel.
  43. 43. © Third Nature Inc. © Third Nature Inc. Today’s market “solution”: Replace the data warehouse with the data lake and self-service* *Picture for illustrative purposes only, no warranty express or implied, actual system you receive may vary. By a lot.
  44. 44. © Third Nature Inc. Today’s market solution: the Data Lake to replace the Data Warehouse Data hoarding is not a data management strategy
  45. 45. © Third Nature Inc. The solution to one technology problem is another technology Buy a catalog! Just add more technology to solve your non-technical problem. Now that you know what data is there - how do you find it? How do you get it?
  46. 46. © Third Nature Inc. Practices need to catch up to technologies A catalog is a useful, necessary component. It is useless without organizing principles and practices. AKA data curation and data architecture
  47. 47. © Third Nature Inc. So who maintains the catalog now? IT is already viewed as a bottleneck. Many organizations do not have full- time data administrators, and the DW team is already overtaxed.
  48. 48. © Third Nature Inc. Why not let the user drive?
  49. 49. © Third Nature Inc. Have you ever looked at user generated taxonomies? Users also have a job to do and won’t welcome more administrative work
  50. 50. Copyright Third Nature, Inc. Developers think of self- service as data access – the user must be self-reliant
  51. 51. Copyright Third Nature, Inc. Users think of self- service in terms of a finished data product
  52. 52. © Third Nature Inc. <Problem> creates <Opportunity for new technology solution> creates <Different problem> repeat Seems familiar…
  53. 53. © Third Nature Inc. WHAT CAN WE DO ABOUT THESE PROBLEMS?
  54. 54. Copyright Third Nature, Inc. Value is not in the product, it’s in the practice The poor carpenter blames his tools
  55. 55. You are a designer. You need to think like one. “Everyone designs who devises courses of action aimed at changing existing situations into preferred ones.” ~ Herbert Simon
  56. 56. Copyright Third Nature, Inc. We seldom think systemically. It’s time to start (again)
  57. 57. © Third Nature Inc. Often we got here because of bad policy, not technology – people would rather work around their data teams than work with them on data initiatives http://akvkbi.blogspot.com/2017/06/dwh-development-related-survey-results.html
  58. 58. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Design tip: any time you deny a behavior or a request, ask yourself “how will they do this on their own? What do they do instead?” Bad policy causes more problems than bad technology
  59. 59. Copyright Third Nature, Inc. Shape the architecture for the people, don’t shape try to shape people
  60. 60. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Data should be governed by policy, e.g. zoning http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
  61. 61. We need to do today what we were doing 30 years ago We spent time then to understand the users, what they wanted, the needs, and found ways to justify the work to meet those needs. We don’t do enough of this We over- emphasize this
  62. 62. © Third Nature Inc. © Third Nature Inc. Technology is a tool Tools enable you to build things You build things for people So start here
  63. 63. The primary focus should be on goals, specifically of the users “The engineer, and more generally the designer, is concerned with how things ought to be - how they ought to be in order to attain goals, and to function.” ~ Herbert Simon
  64. 64. © Third Nature Inc. © Third Nature Inc. What organizations say they want Time to value Ability to do new things more easily, aka innovate • with data • with technology Efficiency, aka reduce costs
  65. 65. © Third Nature Inc. © Third Nature Inc. What organizations say sounds like “do more with less” Time to value Ability to do new things more easily, aka innovate • with data • with technology Efficiency, aka reduce costs I end up with more questions: • TTV for whom? • TTV for what? • New thing for whom? • One time or recurring (cost or TTV)? • TTV as latency of throughput? • Local cost or global cost? • More efficient vs less flexible? This is manager-speak – we need to talk about users
  66. 66. © Third Nature Inc. The questions for the data ecosystem from an architect’s perspective What people? What goals? What uses? What time frames?
  67. 67. © Third Nature Inc. Mapping User Need: don’t work on assumptions Work-as-Prescribed Work-as-Imagined Work-as-Done Work-as-Disclosed We are oriented here. But people encounter obstacles to their work. They create solutions. The solutions become part of the work, how the work is done. Work as-done diverges from the official definition of the work as- prescribed or as-disclosed. Most tech startups have no real idea about the work – they believe they are solving technical problems and the user is IT. We need to focus here https://safetydifferently.com/the-varieties-of-human-work/
  68. 68. Starting with technology is starting in the solution space, not the problem space https://indiyoung.com/about-problem-space/
  69. 69. Analysis and data science workflows are generally poorly understood An analyst trying to answer a question has highs and lows along their workflow. The environment is defined by independent, often mismatched tools, some fit for purpose and others not, with no single product capable of meeting their needs. Each usage model has several of these maps tied to different roles Where is data? Can I access new data? Why does IT have to be involved? Green = solved Yellow = gap, poss opportunity Red = obstacle, opportunity Why can’t I store data I’m working on? How do I link new data to existing data? How do I share information with others?
  70. 70. Copyright Third Nature, Inc. User goals: more than accessing the data Explore and Understand Inform and Explain Convince and Decice Deliver Process
  71. 71. Copyright Third Nature, Inc. The real design criteria: context and point of use Information use is diverse and varies based on context: ▪ Get a quick answer ▪ Solve a one-off problem ▪ Analyze causes ▪ Do experiments ▪ Make repetitive decisions ▪ Use data in routine processes ▪ Make complex decisions ▪ Choose a course of action ▪ Convince others to take action One size doesn’t fit all.
  72. 72. Copyright Third Nature, Inc. Data architecture requires understanding data use so we can build the right infrastructure Monitor Analyze Exceptions Analyze Causes Decide Act No problem No idea Do nothing Understanding the details of uses, workflows, tasks, and activities allows us to look at the higher organizational level again Copyright Third Nature, Inc.
  73. 73. Copyright Third Nature, Inc. This is part of a larger system. Feedback loops exist and operate at different frequencies. Collect new data Monitor Analyze Exceptions Analyze Causes Decide Act Act on the process Act within the process
  74. 74. Copyright Third Nature, Inc. Data platforms are the most complex in the organization, far more complex than any web application or ERP system.
  75. 75. Copyright Third Nature, Inc. Manage your data (or it will manage you) Data management is where developers are weakest. Modern engineering practices are where data management is weakest. Users care about their tasks. You need to bridge these groups and practices in the organization if you want to do meaningful work with data.
  76. 76. © Third Nature Inc. Mark spent most of the past 25 years working in the analytics field, starting in AI at the University of Pittsburgh and autonomous robotics at Carnegie Mellon University before moving into technology management. Today he is a Fellow in the Technology & Innovation Office at Teradata. Previously, he was president of Third Nature, an advisory firm focused on services for analytics and technology strategy, and product design. Mark is an award-winning author, architect and CTO who has received awards for his work from the American Productivity & Quality Center, Smithsonian Institute, and industry associations. He is an international speaker, and chairs several conferences and program committees. You can find him on LinkedIn at https://www.linkedin.com/in/markmadsen About the Presenter
  77. 77. Copyright Third Nature, Inc. Further Reading Thinking in Systems, Donella Meadows An Introduction to Systems Thinking, Gerald Weinberg Contextual Design, Beyer & Holtzblatt Badass: making Users Awesome, Kathy Sierra Information Design, Jacobsen Data: A Guide to Humans, Phil Harvey In Search of Certainty, Burgess https://indiyoung.com/about-problem-space/ http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
  78. 78. Copyright Third Nature, Inc. CC Image Attributions Thanks to the people who supplied the creative commons licensed images used in this presentation: well town hall.jpg - http://flickr.com/photos/tuinkabouter/1135560976/ seattle library 1 - http://www.flickr.com/photos/thomashawk/2671536366/ chicken_head2.jpg - http://www.flickr.com/photos/coycholla/4901760905 egg_face1.jpg - http://www.flickr.com/photos/sally_monster/3228248457 indonesian angry mask phone - Erik De Castro Reuters.jpg

×