As a long time C# developer, I started with Python as a second language for ML purposes. Starting in Python is easy, doing engineering grade python turned out to be a lot harder, so these are 10 things I learned along the way to writing production code in Python.
9. # read the data
df = pd.read_csv('../../data/houses.csv')
# print the first five records
print(df.head())
# plot the price
df.price.plot(kind='hist', bins=100)
plt.show()
IMPERATIVE
23. import random, sys
import os
def myfunc():
rando = random.random()
return random.randint(0,100)
def multiply (a, b):
return a * b
print(multiply(myfunc(), myfunc()))
24. import random, sys
import os
def myfunc():
rando = random.random()
return random.randint(0,100)
def multiply (a, b):
return a * b
print(multiply(myfunc(), myfunc()))
25. import random
def myfunc():
rando = random.random()
return random.randint(0,100)
def multiply (a, b):
return a * b
print(multiply(myfunc(), myfunc()))
UNUSED IMPORTS
26. import random
def myfunc():
rando = random.random()
return random.randint(0,100)
def multiply (a, b):
return a * b
print(multiply(myfunc(), myfunc()))
SEPARATING LINES
27. import random
def myfunc():
rando = random.random()
return random.randint(0, 100)
def multiply(a, b):
return a * b
print(multiply(myfunc(), myfunc()))
WHITE SPACES
28. import random
def myfunc():
return random.randint(0, 100)
def multiply(a, b):
return a * b
print(multiply(myfunc(), myfunc()))
UNUSED VARIABLES
29. import random
def random_number():
return random.randint(0, 100)
def multiply(a, b):
return a * b
print(multiply(random_number(), random_number()))
WEIRD FUNCTION NAMES
32. # See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/ambv/black
rev: stable
hooks:
- id: black
language_version: python3.7
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.0.0
hooks:
- id: flake8
33. def add(a, b):
return a + b
result = add('hello', 'world')
result = add(2, 3)
def add(a: int, b: int) -> int:
return a + b
result = add('hello', 'world')
result = add(2, 3)
64. fruits = ['apples', 'oranges', 'bananas', 'grapes']
found = False
size = len(fruits)
for i in range(size):
if fruits[i] == 'cherries':
found = True
67. >>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
91. FROM C# TO PYTHON
10 THINGS I LEARNED ALONG THE WAY
Tess Ferrandez
Notas do Editor
Software Engineer and Data Scientist – In that order.
My main goal is to create software that solves a business need, sometimes that includes Machine Learning, but I’m equally happy if it doesn’t
Too often, we walk into the room algorithm first, as if adding AI/ML had it’s own value, and virtually every time we do that, we fail…
Intro
First entry point for many
Many training courses are done exclusively in Notebooks, same with Kaggle
The good
Great for exploration – unprecedented
Great for telling an analysis story – Documentation with Markdown
The bad
Executing items out of order – what did you even execute?
Testing
Debugging
CI/CD Pipeline
Reproducing
Adding to Source Control
Suggestions for good practices
Naming
Export reports as HTML
Export code as scripts
Some alternatives
Terminal
Jupyter in VS Code and PyCharm
Interactive cells in VS Code in PyCharm
More like a recipe – great for step by step tasks like exploring or cleaning data
Modularizing, putting more common tasks in procedures.
As we move to prod, we need this… the imperative style leans a lot on globals, non-dry code with many code smells that you don’t want in prod
Every statement can be seen as a mathematical function – state and mutability is avoided
Python is not a pure functional language – but this paradigm lends itself extremely well to data manipulation of large datasets as we keep iterating through, only using what we need
While python does do Object Oriented programming, it doesn’t do encapsulation, so nothing is private.
You can optionally do _myprivate var, but still, it is only convention, not real hiding
KEEP YOUR CODE AND SOCKS DRY
KEEP YOUR CODE AND SOCKS DRY
KEEP YOUR CODE AND SOCKS DRY
Pip installs from pypi (only python)
Conda from conda cloud (any language)
Difference in dependency management
Multiple overlapping
Small players
Have to train guestures
Weird angles