The Three of Wands: Algebraic Data Types in (typed) Python :
by:
blow post content copied from Planet Python
click here to view original post
By properly utilizing Algebraic Data Types (ADTs, not to be confused with abstract data types), you can transform certain types of invalid states from runtime errors into type-checking errors, making them an excellent method for representing data and managing state. Although ADTs may sound complex, they represent a fairly straightforward concept. They are composite types, meaning they reference or contain other types, and primarily consist of two basic categories: product and sum types. If you have experience with tuples, you&aposve worked with product types, and if you&aposve used booleans, you&aposve dealt with sum types. While there are additional types, we&aposll focus on these two fundamental categories for now.
We&aposll be using typed Python, since ADTs really shine with the help of type-checking.
Here&aposs the short version: both product and sum types are composite types that reference other types and let&aposs pretend those other types are str
and int
. Following this, a product type would contain str
AND int
, and a sum type contains str
OR int
.
So, what&aposs up with the naming? Product and sum types do sound kind of pretentious.
Sum types are called sum types because the number of possible different values in a sum type is simply the sum of all possible values of its containing types. The number of possible different values of a product type is... A Cartesian product of all the values of its containing types. This actually came as a surprise to me while researching this post, since I had a different notion of where the naming comes from[1].
Product types
Python has a ton of product types; tuples being the OG. A tuple[str, int]
is a product type over str
and int
- it contains a string AND an integer. The number of different values of this tuple is: the number of different values of str
, multiplied by the number of different values of int
. Admittedly that&aposs not very useful in this context since both strings and integers in Python are bounded by available memory, but the theory checks out.
Other product types include named tuples, dataclasses, attrs classes, msgspec classes, TypedDicts, you get the idea. You use a ton of product types every day without thinking about it so I won&apost dwell on them too much.
Sum types
bool
is the quintessential sum type in Python, being logically equivalent to a union of two singletons, True
and False
. Barring bool
, Python sum types come in the form of enums, literals or unions.
Enums have been here a while, being added back in 3.4. Here&aposs an example from the docs:
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
This is a sum type of 3 singletons. Color.RED
, Color.GREEN
and Color.BLUE
are, effectively, values inhabiting the Color
type. Meaning if a function parameter is annotated with Color
, passing in Color.RED
is valid. Enums can only contain singletons, although the singletons themselves can be of any value. So this is also legal:
class Color(Enum):
RED = (1, "a")
GREEN = (2, "b")
BLUE = (3, "c")
(Given that these are singletons, you want to be using immutable values here.)
If you&aposve ever used the Rust programming language, you may have noticed it has enum types of its own. Even though they&aposre called the same, Rust enums are different; they are equivalent to Python unions which we&aposll get to later.
Literals were added to Python in 3.8 (although since they&aposre mostly a typing construct I think they are usable on earlier versions via typing_extensions
), and they are essentially a simple, anonymous enum with some restrictions.
Here&aposs Color
as a literal:
from typing import Literal
ColorLiteral = Literal["RED", "GREEN", "BLUE"]
If a function argument was annotated as a ColorLiteral
, Mypy would ensure only these three strings were allowed to be passed in.
I think literals are super fun since they&aposre so easy to define and use. But again, literals can only contain string and int values, and singletons; you can&apost have a Literal["RED", tuple[str, int]]
. I&aposve been using them more and more since they&aposre just more airy than enums.
Unions are the heavyweight of Python sum types. They&aposve been around for ages, and in Python 3.10 they&aposve received dedicated syntax to make them more readable (which I&aposll be using here).
If you want a sum type that can be either a string or an int, you need to use a union:
StringOrInt = str | int
Now if your function takes an instance of StringOrInt
, you&aposll know it&aposs either an instance of a string or an int.
You might think unions only work for classes, so you can&apost have a union of a product type and a sum type, and you&aposd be wrong - unions can contain literals and enums too.
Sum types in practice
Here&aposs an example. Your web app has a user model, and your users have a privilege assigned to them. The privilege can be one of:
- normal user
- admin
- banned
But for banned users, you want to know when they were banned. You might go about modeling your user class like so:
class User:
privilege: str
banned_at: datetime | None
This definition sucks. First of all, the privilege field is just a string, but not any string can be a valid privilege. So let&aposs fix that.
class User:
privilege: Literal["normal", "admin", "banned"]
banned_at: datetime | None
Ok, great, now we can leverage our type-checker to ensure privilege is always valid. But our state modeling is still lacking; it&aposs possible for a user to exist with the normal privilege and a banned_at
date, and a banned user with no banned_at
field. We&aposre using two independent fields of a product type (User
) where we should be using a sum type instead.
A product type in a trench coat pretending to be a sum type is something I see all the time, especially in contexts where sum types don&apost exist or are hard to use. The most famous example is maybe found in the Go programming language, where a function that can return a successful result T
or an error will return a tuple[T, error]
, and the callers are expected to check if the error is nil before proceeding. Following example taken from the offical docs:
f, err := os.Open("filename.ext")
if err != nil {
log.Fatal(err)
}
// do something with the open *File f
There&aposs no help from the type-checker if you return both (or neither) by accident. Good times.
We can, and will, do better:
class Banned:
banned_at: datetime
class User:
privilege: Literal["normal", "admin"] | Banned
So there, as long as we run Mypy on this (and the rest of our toolkit supports these constructs), it&aposs actually impossible for our Users to get into an invalid state. Not only that, but intended valid states are obvious from the types themselves, so we don&apost need endless comments, tests and Notion pages explaining how the banned_at
field is only there for banned users.
Handling Sum Types
Booleans and enum members are singletons, so you&aposll need a chain of if
statements using identity comparisons (the is
operator):
def process_color(c: Color):
if c is Color.RED:
print("Got Red!")
elif c is Color.BLUE:
print("Got Blue!")
# and so on
Literals are values so you&aposll need ordinary equality, unless the literal contains an enum member, in which case you&aposll want identity comparisons again.
def process_color_literal(c: ColorLiteral):
if c == "RED":
print("Got Red!")
# an so on
Unions are a little more complex since they can contain literals, enums, and a lot of different product types at once. For union members that happen to be classes, you should use an isinstance
check.
def process_user(user: User):
if isinstance(user.privilege, Banned):
print("The user is banned!")
elif user.privilege == "admin":
print("The user is an admin"
# an so on
Python 3.10 introduced the match
statement to the language. Matching is cool since it presents a unified way of handling both product and sum types, and I feel it expresses the intent of the code better. The cons are its complexity (I&aposve been using it for a while and I still need to go look up the guides sometimes) and the fact it requires an additional indentation level. Here&aposs the last example written using match
:
def process_user(user: User):
match user.privilege:
case Banned(banned_at=banned_at):
print(f"The user was banned at {banned_at}!")
case "admin":
print("The user is an admin!")
# an so on
Handling Sum Types Exhaustively
Here&aposs a useful trick for exhaustively handling sum types. It can often be the case that all possible variants of a sum type need to be handled, and neither the if
chain nor the match statement provide this by default. For example, in the last example, the case of the normal
user privilege wasn&apost handled. Exhaustiveness checking is great because it can add a layer of safety to refactoring sum types: new variants can be added without them causing havoc somewhere downcode.
The typing
module contains a special function called assert_never
. If a static type checker detects this function in a place that&aposs not unreachable, it will generate an error. We can make use of this to make our checks exhaustive.
from typing import assert_never
def process_user(user: User):
match user.privilege:
case Banned(banned_at=banned_at):
print(f"The user was banned at {banned_at}!")
case "admin":
print("The user is an admin!")
case _:
assert_never(user.privilege) # ERROR
(For the if
chain approach, instead of case _
just use a bare else
.)
If we actually handle the missing variant, the error goes away.
from typing import assert_never
def process_user(user: User):
match user.privilege:
case Banned(banned_at=banned_at):
print(f"The user was banned at {banned_at}!")
case "admin":
print("The user is an admin!")
case "normal":
print("The user is a normal user!")
case _:
assert_never(user.privilege) # No error any more.
More on Naming
Ever since some ancient, long-forgotten college course I&aposve had this notion that multiplication maps to conjunction (i.e. logical AND), and addition maps to disjunction (i.e. logical OR). In other words, *
means AND, and +
means OR. Probably because someone taught me Boolean algebra for a digital electronics class. So it was always logical to me that product types (product, *
) means those types reference type A AND type B, and sum types (+
) mean those types reference type A OR type B. Which is ultimately correct.
But it turns out that&aposs not exactly where product and sum types get their name.
April 06, 2023 at 03:48PM
Click here for more details...
=============================
The original post is available in Planet Python by
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================
Post a Comment