Path dependent-development (PyCon India)

Path Dependent Development
Nick Coghlan
@ncoghlan_dev

Red Hat Toolsmith
CPython Core Developer

Usefully Wrong
“All models are wrong. Some models are useful.”

“... the practical question is: How wrong do they
have to be to not be useful?”

George E. P. Box (statistician) “Empirical Model-Building”

Path Dependence
● “good enough to be useful” -> ship it
● The decisions we make leave their mark on
the software we ship
● These marks remain long after the scope of
the software expands to other use cases

What is “Good Enough”?
● Depends on your priorities and resources
– What are you building?
– Why are you building it?
– Who are you building it for?
– Who is building it?
– What are you building it with?
– How much risk can you tolerate?

Context Matters
● Building an intranet web service
– Trusted network
– Enforced user base
● Building a web startup
– Hostile network
– Business lives or dies by user choice
● Building hardware control and management systems
– Usage driven by hardware
– Software as a necessary evil

Trade-Offs Needed:
Inquire Within

Functionality
● Doing one (or a few) things well is often better
than doing a lot of things badly
● Adding functionality later is usually easier to
sell than taking it away (no matter how broken
it turns out to be)

Flexibility
● Don't make things configurable
● Configurability = testing and maintenance pain
● Do separate concerns (if you make it configurable
later, only one place needs to change)
● Do use flexible support tools
– SQL Alchemy makes it easy to change database
– Django locks in some major decisions (like ORM and
templating language) but provides a rich ecosystem of
prebuilt components that work well together

Security
● A lot of software is still insecure by default
– Unhashed (or poorly hashed) passwords
– Unencrypted communications channels
● Multiple layers of defence can hide this
● Try to make the “easy option” and the “secure
option” one and same
● Can be very hard to fix poor security choices

Reinventing Wheels
● Reuse means dependency management
● Often simpler to roll your own to start
● With good modularity, easy to replace later
● Watch for increasing complexity

Documentation
● How sophisticated are users expected to be?
– Installed by developers? Admins? End users?
– Intended for domain experts only?
● Is it stable enough to document?
● Documentation can highlight design flaws

Test Quality
● Fine grained tests pinpoint failures easily
● Coarse grained tests are often easier to write
● Can easily start with coarse grained tests, then add more
fine grained tests to narrow down failures
● Slow tests are better than no tests
● External dependencies are better than no tests
● Regression tests are great, but don't let them block fixes
for problems that can't be reproduced reliably

Code Reviews
● Code is written to:
– Tell the computer what to do
– Tell future maintainers what it does
● Tests cover the first, reviews the second
● Debatable value for small teams
● Highly valuable for large teams
● Needs appropriate tools

Performance & Scalability
● Don't stress about it if you don't need to
● Start with measurement infrastructure
● If simple is fast enough, stick with simple

Reliability
● Not all software is mission critical
● Pay attention to failure modes
● Error quality matters

Usability
● Humans are still a lot smarter than computers
● If users have no choice, they'll usually cope
● Hence, awful UX in most “enterprise” software

Maintainability & Business Risks
● The Bus Factor
– Most startups = 1
– Large companies want it to be higher
● Developer docs (including comments)
● Legal risks (copyrights, patents, trademarks)

Automation
● Critical to speeding up release cycles
● Is a process stable enough to automate?

Exit Strategies
● Know what you're not doing
● Have a vague idea how to fix it when needed
● Actual fixes will depend on future needs
● Sometimes, the only right answer is “No”

Patterns and Processes
● Keep your options open
● Minimise current complexity
● This is not easy
– Software architecture and design patterns
– Software processes and methodologies
● “interim” solutions may last a long time
● If you don't have a test suite, start there

Prototyping vs Implementation
● Two very different modes of development
● Prototyping
– Exploration
– Trying to figure out what is feasible
● Implementation
– Already known to be feasible
– Making it happen to a known specification
● Big difference in priorities!

Social Implications
● Design decisions are context dependent
● Easy to criticise in hindsight
● Design trade-offs can influence community
● Actually getting better at building software
● Ambitions are (more than?) keeping pace

An Innocent Start
● PulpDist: Mirroring network based on rsync
● Simple job definitions
{
"remote_server": "localhost",
"remote_path": "/demo/simple/",
"local_path": "/var/www/pub/sync_demo_raw/",
...
}
● Simple custom validator for JSON data
– Checks on individual values
– Overall sanity checks on full jobs

Don't Repeat Yourself
● Simple format turned out to be too simple
– Hard to modify given multiple jobs from same source
● Enhanced format with reusable elements
{
"mirror_id": "local_copy",
"tree_id": "simple_sync",
"site_id": "bne",
...
}
● Simple validator was no longer adequate

What To Do?
● Upgrade the existing validator
– Possible, but tedious to test properly
– Not a good wheel to reinvent
● JSON validation library
– Research would be starting from scratch
– Hard to assess quality quickly
● Relational database
– Enforces the constraints by its very nature
– Error quality would likely be poor

Two Birds...
● For validation, I needed to:
– Ensure identifiers were unique
– Ensure cross references were valid
● For UI purposes I also needed:
– To filter by component identifiers
– To sorting by various fields
● Sound familiar?

...One Stone
● An in-memory SQLite database was perfect
● But writing SQL by hand is still horrible
● SQL Alchemy in target environment
● Problem solved!
– Config loaded into DB after simple field validation
– If the DB accepts it, references are also valid

How Does The Story End?
● Still some very rough edges
– Sqlite error messages are quite user hostile
– Schema changes are triple-keyed
● Future changes?
– Master in database, JSON only as export?
– Improved error messages?
– Switch to an actual schema engine?
● Other priorities!

Q&A
Pulp:
http://pulpproject.org/
PulpDist:
https://fedorahosted.org/pulpdist/

CPython Sprints
Monday & Tuesday

Path dependent-development (PyCon India)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (9)

Semelhante a Path dependent-development (PyCon India)

Semelhante a Path dependent-development (PyCon India) (20)

Último

Último (20)

Path dependent-development (PyCon India)

Notas do Editor