Given at a local RSE group meeting. Covers code quality practices, focusing on Python but over multiple languages, with useful tools highlighted throughout.
Digital RSE: automated code quality checks - RSE group meeting
1. Some ideas for today’s talk
1
Histograms
https://henryiii.github.io/histogram-tutorial
More student progress over
the next few months
Gave similar talk ~1 year ago
uproot-browser -> exciting development
(But a few days old)
Packaging
Very active area - this year will be big
Waiting on scikit-build news
Hatch will be out around PyCon
A little might leak into this talk anyway
Could be a future “lunch seminar”?
CQ
Selected for today!
Generally applicable
Somewhat opinionated
= good discussions
Code Quality
2. DIGITAL RSE
Automated Code Quality Checks
Henry Schreiner
March 9, 2022
ISciNumPy
https://iscinumpy.dev
Scikit-HEP Developer
https://scikit-hep.org/developer
3. Packaging aside: pipx
3
$ pip install <application>
$ <application>
I’m sure you’ve seen this: Examples of applications:
build: make SDists and wheels
twine: upload SDists and wheels
cibuildwheel: make redistributable wheels
nox/tox: Python task runners
jupylite: WebAssembly Python site builder
black: Python code formatter
pypi-command-line: query PyPI
uproot-browser: ROOT file browser (HEP)
tiptop: fancy top-style monitor
rich-cli: pretty print files
cookiecutter: template packages
clang-format: format C/C++/CUDA code
pre-commit: general CQA tool
cmake: build system generator
meson: another build system generator
ninja: build system
Packages can con
fl
ict
Updates get slower over time
Lose track of why things are installed
Manual updates are painful
Hates Python being replaced
$ pipx install <application>
$ <application>
Better!
Automatic venv for each package
No con
fl
icts ever
Everything updatable / replaceable
Doesn’t like Python being replaced
$ pipx run <application>
Best!
Automatic venv caching
Never more than a week old
No pre-install or setup
No maintenance
Replace Python at will
pipx run --spec git+https://github.com/henryiii/rich-cli@patch-1 rich
pipx has
fi
rst
class support
on GHA & Azure!
4. Python aside: Nox
4
Make
fi
les
Custom language
Painful to write
Painful to maintain
Looks like garbage
OS dependent
No Python environments
Everywhere
Tox
Custom language
Concise to write
Tricky to read
Ties you to tox
OS independent
Python environments
Python package
Nox
Python, mimics pytest
Simple but vebose
Easy to read
Teaches commands
OS independent
Python environments
Python package
5. Writing a nox
fi
le.py
5
import nox
@nox.session(python=["3.7", "3.8", "3.9", "3.10"])
def tests(session: nox.Session) -> None:
"""
Run the unit and regular tests.
"""
session.install(".[test]")
session.run("pytest", *session.posargs)
7. Features of nox
7
Full control over environments
Easy
fl
y-by contributions
Transparent, simple .nox directory
Conda support
Trade speed for reproducibility
Some ideas for sessions
lint
tests
docs
build
bump
pylint
regenerate
update_pins
check_manifest
make_changelog
update_python_dependencies
See
pypa/cibuildwheel
pypa/manylinux
scikit-hep/hist
scikit-hep/boost-histogram
pybind/pybind11
scikit-hep/cookie
scikit-hep/scikit-hep.github.io
Optional environment reuse
8. Python launcher for Unix
8
Rust implementation of “py” for UNIX
But also automatically picks up .venv folder!
Meant for lazy experts
Launcher
$ py -m pytest
Classic
$ . .venv/bin/activate
(.venv) $ python -m pytest
(.venv) $ deactivate
Classic, take 2
$ .venv/bin/python -m pytest
10. Code Quality
10
Why does code quality matter?
Improve readability
Find errors before they happen
Avoid historical baggage
Reduce merge con
fl
icts
Warm fuzzy feelings
How to run
Discussion of checks
(Opinionated)
Mostly focusing on Python today
11. pre-commit
11
Poorly named?
Has a pre-commit hook mode
You don’t have to use it that way!
Generic check runner
conda
coursier
dart
docker
docker_image
dotnet
fail
golang
lua
node
perl
python
python_venv
r
ruby
rust
swift
pygrep
script
system
Written in Python
pipx, nox, homebrew, etc.
Designed for speed & reproducibility
Ultra fast environment caching
Locked environments
Easy autoupdate command
pre-commit.ci
Automatic updates
Automatic
fi
xes for PRs
Large library of hooks
https://pre-commit.com/hooks.html
Custom hooks are simple
12. Con
fi
guring pre-commit
12
Design
A hook is just a YAML dict
Fields can be overridden
Environments globally cached by git tag
Supports checks and
fi
xers
# .pre-commit-config.yaml
hooks:
- repo: https://github.com/psf/black
rev: "22.1.0"
hooks:
- id: black
# Black’s .pre-commit-hooks.yaml
- id: black
name: black
description: "Black: The uncompromising code formatter"
entry: black
language: python
minimum_pre_commit_version: 2.9.2
require_serial: true
types_or: [python, pyi]
- id: black-jupyter
name: black-jupyter
description: "Black (with Jupyter Notebook support)"
entry: black
language: python
minimum_pre_commit_version: 2.9.2
require_serial: true
types_or: [python, pyi, jupyter]
additional_dependencies: [".[jupyter]"]
13. Options for pre-commit
13
Selected options
fi
les: explicit include regex
exclude: explicit exclude regex
types_or/types/exclude_types:
fi
le types
args: control arguments
additional_dependencies: extra things to install
stages: select the git stage (like manual)
14. Running pre-commit
14
Run all checks
pre-commit run -a
Update all hooks
pre-commit autoupdate
Install as a pre-commit hook
pre-commit install
(Skip with git commit -n)
Skip checks
SKIP=… <run>
Run one check
pre-commit run -a <id>
Run manual stage
pre-commit run --hook-stage manual
15. Examples of pre-commit checks
15
Almost everything following in this talk
- repo: local
hooks:
- id: disallow-caps
name: Disallow improper capitalization
language: pygrep
entry: PyBind|Numpy|Cmake|CCache|Github|PyTest
exclude: .pre-commit-config.yaml
Don’t grep the
fi
le this is in!
“Entry” is the grep, in this case
Using pygrep “language”
Custom hook
17. pre-commit/pygrep-hooks
17
Small common pygreps
- repo: https://github.com/pre-commit/pygrep-hooks
rev: "v1.9.0"
hooks:
- id: python-check-blanket-noqa
- id: python-check-blanket-type-ignore
- id: python-no-eval
- id: python-use-type-annotations
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
Opinion: blanket CQ ignores are bad. Ignores should be speci
fi
c
Better readability/searchability - avoid hiding unrelated issue
Optional, some
fi
les might need to be excluded
18. CI (GitHub Actions)
18
on:
pull_request:
push:
branches:
- main
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- uses: pre-commit/action@v2.0.3
Great, fast caching, but maintenance only - replaced by pre-commit.ci
on:
pull_request:
push:
branches:
- main
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pipx run nox -s lint
@nox.session
def lint(session: nox.Session) -> None:
session.install("pre-commit")
session.run("pre-commit", "run", "--all-files", "--show-diff-on-failure", *session.posargs)
21. Using code formatters
21
Existing projects
Apply all-at-once, not spread out over time
Add the format commit to .git-blame-ignore-revs
22. Black
22
hooks:
- repo: https://github.com/psf/black
rev: "22.1.0"
hooks:
- id: black-jupyter
Python code formatter
Close to the one true format for Python
Almost not con
fi
gurable (this is a feature)
A good standard is better than perfection
Designed to reduce merge con
fl
icts
Reading blacked code is fast
Write your code to produce nice formatting
You can disable line/lines if you have to
Workaround for single quotes (use double)
Magic trailing comma
Online version
https://black.vercel.app
23. Write for good format
23
raise RuntimeError(
"This was not a valid value for some_value: {}".format(repr(some_value))
)
Bad:
msg = f"This was not a valid value for some_value: {some_value!r}"
raise RuntimeError(msg)
Good:
Better stacktrace
More readable
Two lines instead of three
Faster (f-string)
24. Notebook cleaner
24
hooks:
- repo: https://github.com/kynan/nbstripout
rev: "0.5.0"
hooks:
- id: nbstripout
Remove outputs from notebooks
Best if not stored in VCS
You can render outputs in JupyterBook, etc.
Use Binder or JupyterLite
27. isort
27
hooks:
- repo: https://github.com/PyCQA/isort
rev: "5.10.1"
hooks:
- id: isort
Sort your Python imports
Very con
fi
gurable
Reduces merge con
fl
icts
Grouping imports helps readers
Can inject future imports
# pyproject.toml
[tool.isort]
profile = "black"
args: ["-a", "from __future__ import annotations"]
Default groupings
Future imports
Stdlib imports
Third party packages
Local imports
28. pyupgrade
28
hooks:
- repo: https://github.com/asottile/pyupgrade
rev: "v2.31.0"
hooks:
- id: pyupgrade
args: [--py37-plus]
Update Python syntax
Avoid deprecated or obsolete code
Fairly cautious
Can target 2.7, 3, 3.x
(Mostly) not con
fi
gurable
Remove static if sys.version_info blocks
Python 2.7
Set literals
Dictionary comprehensions
Generators in functions
Format speci
fi
er & .format ⚙
Comparison for const literals (3.8 warn)
Invalid escapes
Python 3
Unicode literals
Long literals, octal literals
Modern super()
New style classes
Future import removal
yield from
Remove six compatibility code
io.open -> open
Remove error aliases
Python 3.x
f-strings (partial) (3.6) ⚙
NamedTuple/TypedDict (3.6)
subprocess.run updates (3.7)
lru_cache parens (3.8)
lru_cache(None) -> cache (3.9)
Typing & annotation rewrites (various)
abspath(__file__) removal (3.9)
29. pyupgrade limits
29
PyUpgrade does not over modernize
isinstance(x, (int, str)) -> isinstance(x, int | str) (3.10)
No match statement conversions (3.10)
Nothing converts to using walrus := (3.8) (probably a good thing!)
Except for a bit of typing
Optional[int] -> int | None (I like this one now, though)
❌
30. setup-cfg-fmt
30
hooks:
- repo: https://github.com/asottile/setup-cfg-fmt
rev: "v1.20.0"
hooks:
- id: setup-cfg-fmt
Maintain setup.cfg
fi
le
Can add and
fi
x trove classi
fi
ers
Sorts and cleans
A bit opinionated, will not
fi
x some bugs
Use args: [--max-py-version=3.9] if needed
What about pyproject.toml?
Three projects (at least) popping up
Very young
Best process unclear
ini2toml useful for conversion
32. Using code linters
32
Existing projects
Feel free to build a long ignore list
Work on one or a few at a time
You don’t have to have every check
33. hooks:
- repo: https://gitlab.com/pycqa/flake8
rev: "4.0.1"
hooks:
- id: flake8
additional_dependencies: [flake8-bugbear]
Flake8
33
Fast simple extendable linter
Very con
fi
gurable: setup.cfg or .
fl
ake8
Many plugins, local plugins easy
No auto-
fi
xers like rubocop (Ruby)
Opinion:
fl
ake8-bugbear is great
Example:
fl
ake8-print, avoid all prints
# .flake8
[flake8]
max-complexity = 12
extend-ignore = E203, E501, E722, B950
extend-select = B,B9
34. Flake8 example checks
34
Bugbear
Do not use bare except
No mutable argument defaults
getattr(x, "const") should be x.const
No assert False, use raise AssertionError
Pointless comparison ❤ pytest
PyFlakes (default)
Unused modules & variables
String formatting mistakes
No placeholders in f-string
Dictionary key repetition
Assert a tuple (it’s always true)
Various syntax errors
Unde
fi
ned names
Rede
fi
nition of unused var ❤ pytest
McCabe (default)
Complexity checks
PyCodeStyle (default)
Style checks
Flake8-print
Avoid leaking debugging print statements
35. Custom local
fl
ake8 plugin
35
import ast
import sys
from typing import NamedTuple, Iterator
class Flake8ASTErrorInfo(NamedTuple):
line_number: int
offset: int
msg: str
cls: type # unused
36. Custom local
fl
ake8 plugin
36
class Visitor(ast.NodeVisitor):
msg = "AK101 exception must be wrapped in ak._v2._util.*error"
def __init__(self) -> None:
self.errors: list[Flake8ASTErrorInfo] = []
def visit_Raise(self, node: ast.Node) -> None:
if isinstance(node.exc, ast.Call):
if isinstance(node.exc.func, ast.Attribute):
if node.exc.func.attr in {"error", "indexerror"}:
return
if node.exc.func.id in {"ImportError"}:
return
self.errors.append(
Flake8ASTErrorInfo(node.lineno, node.col_offset, self.msg, type(self))
)
37. Custom local
fl
ake8 plugin
37
class AwkwardASTPlugin:
name = "flake8_awkward"
version = "0.0.0"
def __init__(self, tree: ast.AST) -> None:
self._tree = tree
def run(self) -> Iterator[Flake8ASTErrorInfo]:
visitor = Visitor()
visitor.visit(self._tree)
yield from visitor.errors
38. Custom local
fl
ake8 plugin
38
[flake8:local-plugins]
extension =
AK1 = flake8_awkward:AwkwardASTPlugin
paths =
./dev/
def main(path: str) -> None:
with open(path) as f:
code = f.read()
node = ast.parse(code)
plugin = AwkwardASTPlugin(node)
for err in plugin.run():
print(f"{path}:{err.line_number}:{err.offset} {err.msg}")
if __name__ == "__main__":
for item in sys.argv[1:]:
main(item)
40. PyLint
40
PyLint recommends having your project installed, so it is not a good pre-commit hook (though you can do it)
It’s also a bit slow, so a good candidate for nox
@nox.session
def pylint(session: nox.Session) -> None:
session.install("-e", ".")
session.install("pylint")
session.run("pylint", "src", *session.posargs)
# pyproject.toml
[tool.pylint]
master.py-version = "3.7"
master.jobs = "0"
reports.output-format = "colorized"
similarities.ignore-imports = "yes"
messages_control.enable = ["useless-suppression"]
messages_control.disable = [
"design",
"fixme",
"line-too-long",
"wrong-import-position",
]
Code linter
Can be very opinionated
Signal to noise ratio poor
You will need to disable checks - that’s okay!
A bit more advanced / less static than
fl
ake8
But can catch hard to
fi
nd bugs!
For an example of lots of suppressions:
https://github.com/scikit-hep/awkward-1.0/blob/1.8.0/pyproject.toml
41. Example PyLint rules
41
Duplicate code
Finds large repeated code patterns
Attribute de
fi
ned outside init
Only __init__ should de
fi
ne attributes
No self use
Can be @classmethod or @staticmethod
Unnecessary code
Lambdas, comprehensions, etc.
Unreachable code
Finds things that can’t be reached
Consider using in
x in {stuff} vs chaining or’s
Arguments di
ff
er
Subclass should have matching arguments
Consider iterating dictionary
Better use of dictionary iteration
Consider merging isinstance
You can use a tuple in isinstance
Useless else on loop
They are bad enough when useful :)
Consider using enumerate
Avoid temp variables, idiomatic
Global variable not assigned
You should only declare global to assign
42. Controversial PyLint rules
42
No else after control-
fl
ow
Guard-style only
Can simply complex control
fl
ow
Removes useless indentation
if x:
return x
else:
return None
# Should be:
if x:
return x
return None
# Or:
return x if x else None
# Or:
return x or None
Design
Too many various things
Too few methods
Can just silence “design”
(I’m on the in-favor side)
45. Static type checking: MyPy
45
hooks:
- repo: https://gitlab.com/pre-commit/mirrors-mypy
rev: "v0.931"
hooks:
- id: mypy
files: src
args: [--show-error-codes]
Like a linter on steroids
Uses Python typing
Enforces correct type annotations
Designed to be iteratively enabled
Should be in a controlled environment (pre-commit or nox)
Always specify args (bad hook defaults)
Almost always need additional_dependencies
Con
fi
gure in pyproject.toml
Pros
Can catch many things tests normally catch, without writing tests
Therefore it can catch things not covered by tests (yet, hopefully)
Code is more readable with types
Sort of works without types initially
Cons
Lots of work to add all types
Typing can be tricky in Python
Active development area for Python
46. Con
fi
guring MyPy
46
[tool.mypy]
files = "src"
python_version = "3.7"
warn_unused_configs = true
strict = true
[[tool.mypy.overrides]]
module = [ "numpy.*" ]
ignore_missing_imports = true
Start small
Start without strictness
Add a check at a time
Extra libraries
Try adding them to your environment
You can ignore untyped or slow libraries
You can provide stubs for untyped libraries if you want
Tests?
Adding pytest is rather slow
I prefer to avoid tests, or keep them mostly untyped
47. Typing tricks
47
Protocols
Better than ABCs, great for duck typing
@typing.runtime_checkable
class Duck(Protocol):
def quack() -> str:
...
def f(x: Duck) -> str:
return x.quack()
class MyDuck:
def quack() -> str:
return "quack"
if typing.TYPE_CHECKING:
_: Duck = typing.cast(MyDuck, None)
Type Narrowing
Integral to how mypy works
x: Union[A, B]
if isinstance(x, A):
reveal_type(x) # A
else:
reveal_type(x) # B
Make a typed package
Must include py.typed marker
fi
le
Always use sys.version_info
Better for readers than try/except, and static
Also sys.platform instead of os.name
48. Future annotations
48
Classic code (3.5+)
from typing import Union, List
def f(x: int) -> List[int]:
return list(range(x))
def g(x: Union[str, int]) -> None:
if isinstance(x, str):
print("string", x.lower())
else:
print("int", x)
Modern code (3.7+)
from __future__ import annotations
def f(x: int) -> list[int]:
return list(range(x))
def g(x: str | int) -> None:
if isinstance(x, str):
print("string", x.lower())
else:
print("int", x)
Ultramodern code (3.10+)
def f(x: int) -> list[int]:
return list(range(x))
def g(x: str | int) -> None:
if isinstance(x, str):
print("string", x.lower())
else:
print("int", x)
With the future import, you get all the bene
fi
ts of future code in 3.7+ annotations
Typing is already extra code, simpler is better
50. pytest tips
50
Spend time learning pytest
Full of amazing things that really make testing fun!
Tests are code too
Or for C++: Catch2 or doctest, etc.
Also maybe learn Hypothesis for pytest
[tool.pytest.ini_options]
minversion = "6.0"
addopts = [
"-ra",
"--showlocals",
"--strict-markers",
"--strict-config",
]
xfail_strict = true
filterwarnings = [
"error",
]
log_cli_level = "info"
testpaths = [
"tests",
]
Don’t let warnings slip by!
Makes logging more useful
Strictness is good
Useful summary
Print out locals on errors
Use pytest.approx
Even works on numpy arrays
Remember to test for failures
If you expect a failure, test it!
Test your installed package
That’s how users will get it, not from a directory
51. pytest Tricks
51
Mock and Monkeypatch
This is how you make tricky tests “unit” tests
Fixtures
This keeps tests simple and scalable
@pytest.fixture(params=["Linux", "Darwin", "Windows"], autouse=True)
def platform_system(request, monkeypatch):
monkeypatch.setattr(platform, "system", lambda _: request.param)
Parametrize
Directly or in a
fi
xture for reuse
Use conftest.py
Fixtures available in same and nested directories
52. Running pytest
52
Show locals on failure
--show-locals/-l
Jump into a debugger on failure
--pdb
Start with last failing test
--lf
Jump into a debugger immediately
--trace or use breakpoint()
Run matching tests
-k <expression>
Run speci
fi
c test
filename.py::testname
Run speci
fi
c marker
-m <marker>
Control traceback style
--tb=<style>
53. In conclusion
53
Code quality tools can help a lot with
Readability
Reducing bugs
Boosting developer productivity
Consistency
Refactoring
Teaching others good practice too
Hopefully we have had some helpful discussions!
It’s okay to disable a check
Try to understand why it’s there
Remember there are multiple concerns involved in decisions