Python¶
We assume the readers have a good understanding of the Python programming language, as Python will be the primary programming language for demos and tutorials in this book. For engineering tips, we will cover a few topics here, including
- Environment management;
- Dependency management;
pre-commit
.
TL;DR
- Use pyenv to manage Python versions;
- Use poetry to manage dependencies;
- Always set up `pre-commit`` for your git repository.
Python Environment Management¶
Environment management is never easy, and the same is true for the Python ecosystem. People have developed a lot of tools to make environment management easier. As you could imagine, this also means we have a zoo of tools to choose from.
There are three things to manage in a Python project:
- Python version,
- Dependencies of the project, and
- An environment where we install our dependencies.
Some tools can manage all three, and some tools focus on one or two of them. We discuss two popular sets of tools: conda
and pyenv
+ poetry
.
conda
¶
Many data scientists started with the simple and out-of-the-box choice called conda
. conda
is an all-in-one toolkit to manage Python versions, environments, and project dependencies.
conda
cheatsheet
The most useful commands for conda are the following.
- Create an environment:
conda create -n my-env-name python=3.9 pip
, wheremy-env-name
is the name of the environment,python=3.9
specifies the version of Python,pip
at the end is tellingconda
to installpip
in this new environment.
- Activate an environment:
conda activate my-env-name
- Install new dependency:
conda install pandas
- List all available environments:
conda env list
Anaconda provides a nice cheatsheet.
pyenv
+ poetry
¶
conda
is powerful, but it is too powerful for a simple Python project. As of 2024, if you ask around, many Python developers will recommend poetry
.
poetry
manages dependencies and environments. We just need a tool like pyenv
to manage Python versions.
The poetry
workflow
To work with poetry in an existing project my_kuhl_project
poetry init
to initialize the project and follow the instructions;poetry env use 3.10
to specify the Python version. In this example, we use3.10
;poetry add pandas
to add a package calledpandas
.
Everything we specified will be written into the pyproject.toml
file.
poetry
provides a nice tutorial on its website.
Dependency Specifications¶
We have a few choices to specify the dependencies. The most used method at the moment is requirements.txt
. However, specifying dependencies in pyproject.toml
is a much better choice.
Python introduced pyproject.toml
in PEP518 which can be used together with poetry
to manage dependencies.
While tutorials on how to use poetry
are beyond the scope of this book, we highly recommend using poetry
in a formal project.
poetry
is sometimes slow
poetry
can be very slow as it has to load many different versions of the packages to try out in some cases56.
conda
with pip
If one insists on using conda
, here we provide a few tips for conda
users.
conda
provides its own requirement specification using environment.yaml
. However, many projects still prefer requirements.txt
even though conda
's environment.yaml
is quite powerful.
To use requirements.txt
and pip
, we always install pip
when creating a new environment, e.g., conda create -n my-env-name python=3.9 pip
.
Once the environment is activated (conda activate my-env-name
), we can use pip
to install dependencies, e.g., pip install -r requirements.txt
.
Python Styles and pre-commit
¶
In a Python project, it is important to have certain conventions or styles. To be consistent, one could follow some style guides for Python. There are official proposals, such as PEP8, and "third party" style guides, such as Google Python Style Guide 34.
We also recommend pre-commit
. pre-commit
helps us manage git hooks to be executed before each commit. Once installed, every time we run git commit -m "my commit message here"
, a series of commands will be executed first based on the configurations.
pre-commit
officially provides some hooks already, e.g., trailing-whitespace
2.
We also recommend the following hooks,
black
, which formats the code based on pre-defined styles,isort
, which orders the Python imports1,mypy
, which is a linter for Python.
The following is an example .pre-commit-config.yaml
file for a Python project.
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.2.0
hooks:
- id: check-added-large-files
- id: debug-statements
- id: detect-private-key
- id: end-of-file-fixer
- id: requirements-txt-fixer
- id: trailing-whitespace
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.960
hooks:
- id: mypy
args:
- "--no-strict-optional"
- "--ignore-missing-imports"
- repo: https://github.com/ambv/black
rev: 22.6.0
hooks:
- id: black
language: python
args:
- "--line-length=120"
- repo: https://github.com/pycqa/isort
rev: 5.10.1
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black"]
Write docstrings¶
Writing docstrings for functions and classes can help our future self understand them more easily. There are different styles for docstrings. Two of the popular ones are
Test Saves Time¶
Adding tests to our code can save us time. We will not list all these benefits of having tests. But tests can help us debug our code and ship results more confidently. For example, suppose we are developing a function and spot a bug. One of the best ways of debugging it is to write a test and put a debugger breakpoint at the suspicious line of the code. With the help of IDEs such as Visual Studio Code, this process can save us a lot of time in debugging.
Use pytest
Use pytest. RealPython provides a good short introduction. The Alan Turing Institue provides some lectures on testing and pytest.
-
Pre Commit. In: isort [Internet]. [cited 22 Jul 2022]. Available: https://pycqa.github.io/isort/docs/configuration/pre-commit.html ↩
-
pre-commit-config-pre-commit-hooks.yaml. In: Gist [Internet]. [cited 22 Jul 2022]. Available: https://gist.github.com/lynnkwong/f7591525cfc903ec592943e0f2a61ed9 ↩
-
Guido van Rossum, Barry Warsaw, Nick Coghlan. PEP 8 – Style Guide for Python Code. In: peps.python.org [Internet]. 5 Jul 2001 [cited 23 Jul 2022]. Available: https://peps.python.org/pep-0008/ ↩
-
Google Python Style Guide. In: Google Python Style Guide [Internet]. [cited 22 Jul 2022]. Available: https://google.github.io/styleguide/pyguide.html ↩
-
Poetry is extremely slow when resolving the dependencies · Issue #2094 · python-poetry/poetry. In: GitHub [Internet]. [cited 23 Jul 2022]. Available: https://github.com/python-poetry/poetry/issues/2094 ↩
-
FAQ. In: Poetry - Python dependency management and packaging made easy [Internet]. [cited 29 Jan 2024]. Available: https://python-poetry.org/docs/faq/#why-is-the-dependency-resolution-process-slow ↩