7 approaches to checking class attributes in Python

Short description

Python handles type checking and value checking in a flexible and implicit way, and a module called typing provides support for type hints at run time. However, there is no single way to check values. Options for checking class attributes using built-in Python modules or third-party libraries include creating a validator function, using @property, using descriptors, combining decorators and descriptors. These options allow for validation during initialization and updating of attributes to ensure that the entered values are correct. Descriptors are suitable for reuse across multiple classes, while the combination of decorators and descriptors encapsulate descriptors with necessary conditions for validation.

7 approaches to checking class attributes in Python

Type checking and value checking are handled in Python in a flexible and implicit way. A module appeared in Python with Python 3 1typing, which provides support for type hints at runtime 2implementation. But there is no single way to check values.

1 Starting with Python 3.9, you no longer need to import abstract collections to describe types. Now instead of, for example, typing.Dict[x, y] can be used dict[x,y]

2 This module provides support for type hints at runtime, but this requires developing\using a separate module, such as using decorators or metaclasses.

From the documentation

This module provides support for type hints at runtime. The most fundamental support consists of Any, Union, Callable, TypeVar, and Generic types. For full specifications, see in PEP 484.

But, PEP 484 – Type Hints While the proposed typing module will contain some runtime type checking capabilities – in particular, the get_type_hints() function – runtime type checking functionality will require a separate module to be developed\used, for example using decorators or metaclasses. It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by consent.

One scenario where we need value validation is when we initialize an instance of a class. At the first stage, we want to make sure that the attributes entered are correct, for example, the email address must be in the correct format [email protected], the age must not be negative, the last name must not exceed 20 characters, etc.

In this article, I want to demonstrate 7 options for checking class attributes using built-in Python modules or third-party libraries. Wondering which option you prefer? If you know other options, write in the comments. Let’s go.

Option 1: Create a validator function

We’ll start with the simplest solution: create a validation function for each argument. Here we have 3 methods to verify name, email and age separately. Attributes are checked sequentially, if the check fails, a ValueError exception immediately occurs and the program stops.

Option 1
import re

class Citizen:
    def __init__(self, id, name, email, age):
        self.id = id
        self.name = self._is_valid_name(name)
        self.email = self._is_valid_email(email)
        self.age = self._is_valid_age(age)

    def _is_valid_name(self, name):
        if len(name) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        return name

    def _is_valid_email(self, email):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, email):
            raise ValueError("It's not an email address.")
        return email

    def _is_valid_age(self, age):
        if age < 0:
            raise ValueError("Age cannot be negative.")
        return age

xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
xiaoxu = Citizen("id1", "xiaoxu1234567890123456789", "[email protected]", 27)
# ValueError: Name cannot exceed 20 characters.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
# ValueError: It's not an email address.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", -27)
# ValueError: Age cannot be negative.

This option is simple, but on the other hand, it’s probably not the most “Pythonic” solution you’ve ever seen, and many people prefer to have a clean __init__, as far as possible. Another problem is that after the attribute is initialized, an invalid value can be assigned without throwing an exception. For example, the following may happen:

xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
xiaoxu.email = "[email protected]" 
# This email is not valid, but still accepted by the code

Option 2: Using @property

The second option uses a built-in function: @property. It works as a decorator that is added to the attribute. According to the Python documentation:

A property object has getter, setter, and deleter methods used as decorators that create a copy of the property with the appropriate accessor set to the function being decorated.

At first glance, it creates more code than the first option, but on the other hand, it removes responsibility from __init__. Each attribute has 2 methods (except id), one with @property, another with a setter. When an attribute such as citizen.name is received, the method is called @property. When an attribute’s value is set during initialization or update, such as citizen.name=”xiaoxu”, the setter method is called.

Option 2
import re

class Citizen:
    def __init__(self, id, name, email, age):
        self._id = id
        self.name = name
        self.email = email
        self.age = age

    @property
    def id(self):
        return self._id

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        if len(value) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        self._name = value

    @property
    def email(self):
        return self._email

    @email.setter
    def email(self, value):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, value):
            raise ValueError("It's not an email address.")
        self._email = value

    @property
    def age(self):
        return self._age

    @age.setter
    def age(self, value):
        if value < 0:
            raise ValueError("Age cannot be negative.")
        self._age = value

xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
xiaoxu = Citizen("id1", "xiaoxu1234567890123456789", "[email protected]", 27)
# ValueError: Name cannot exceed 20 characters.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
# ValueError: It's not an email address.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", -27)
# ValueError: Age cannot be negative.

This option moves the validation logic into the setter method of each attribute and thus saves init clean In addition, validation is applied to each update of each attribute after initialization. Thus, it is no longer accepted:

xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
xiaoxu.email = "[email protected]" 
# ValueError: It's not an email address.

The id attribute is an exception because it has no setter method. This is because I want to tell the client that this attribute should not be updated after initialization. If you try to do this, you will get an AttributeError exception.

Option 3: Descriptors

A third option uses Python descriptors, which are a powerful but often overlooked capability. Perhaps the community has become aware of this problem: starting with Python 3.9, examples of using descriptors to validate attributes have been added to the documentation.

A descriptor is an object that defines methods __get__(), __set__() or __delete__(). It modifies the default behavior when getting, setting, or removing attributes.

Here is an example of code that uses descriptors. Each attribute becomes a descriptor, which is a class with methods __get__ and __set__. When the attribute value is set, for example self.name=name, then called __set__. When an attribute is extracted, for example, print(self.name) is called __get__.

Option 3
import re

class Name:
    def __get__(self, obj, objtype=None):
        return self.value

    def __set__(self, obj, value):
        if len(value) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        self.value = value

class Email:
    def __get__(self, obj, objtype=None):
        return self.value

    def __set__(self, obj, value):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, value):
            raise ValueError("It's not an email address.")
        self.value = value

class Age:
    def __get__(self, obj, objtype=None):
        return self.value

    def __set__(self, obj, value):
        if value < 0:
            raise ValueError("Age cannot be negative.")
        self.value = value

class Citizen:

    name = Name()
    email = Email()
    age = Age()

    def __init__(self, id, name, email, age):
        self.id = id
        self.name = name
        self.email = email
        self.age = age

xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
xiaoxu = Citizen("id1", "xiaoxu1234567890123456789", "[email protected]", 27)
# ValueError: Name cannot exceed 20 characters.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
# ValueError: It's not an email address.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", -27)
# ValueError: Age cannot be negative.

This solution compared to @property. It works best when descriptors can be reused across multiple classes. For example, in the Employee class, we can simply reuse the previous descriptors without writing a lot of code:

Option 3
class Salary:
    def __get__(self, obj):
        self.value

    def __set__(self, obj, value):
        if value < 1000:
            raise ValueError("Salary cannot be lower than 1000.")
        self.value = value
        
class Employee:
    name = Name()
    email = Email()
    age = Age()
    salary = Salary()

    def __init__(self, id, name, email, age, salary):
        self.id = id
        self.name = name
        self.email = email
        self.age = age
        self.salary = salary
        
xiaoxu = Employee("id1", "xiaoxu gao", "[email protected]", 27, 1000)
xiaoxu = Employee("id1", "xiaoxu gao", "[email protected]", 27, 999)
# ValueError: Salary cannot be lower than 1000.

Option 4: Combination of decorators and descriptors

The development of option 3 is to combine decorators and descriptors. The final result looks like this, where the descriptors with the necessary conditions for validation are encapsulated in decorators:

Option 4
# Дескрипторы из Варианта 3
class Name:
    def __get__(self, obj, objtype=None):
        return self.value

    def __set__(self, obj, value):
        if len(value) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        self.value = value

class Email:
    def __get__(self, obj, objtype=None):
        return self.value

    def __set__(self, obj, value):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, value):
            raise ValueError("It's not an email address.")
        self.value = value

class Age:
    def __get__(self, obj, objtype=None):
        return self.value

    def __set__(self, obj, value):
        if value < 0:
            raise ValueError("Age cannot be negative.")
        self.value = value

# Декораторы-дескрипторы
def email(attr):
    def decorator(cls):
        setattr(cls, attr, Email())
        return cls
    return decorator

def age(attr):
    def decorator(cls):
        setattr(cls, attr, Age())
        return cls
    return decorator

def name(attr):
    def decorator(cls):
        setattr(cls, attr, Name())
        return cls
    return decorator

@email("email")
@age("age")
@name("name")
class Citizen:
    def __init__(self, id, name, email, age):
        self.id = id
        self.name = name
        self.email = email
        self.age = age

These decorators can be easily extended. For example, you can have more general rules applying multiple attributes, e.g @positive_number(attr1,attr2)

Option 5: Use __post_init__ in @dataclass

Another way to create a class in Python is to use @dataclass. Dataclass provides a decorator for automatic method generation__init__().

In addition, @dataclass also introduces a special method __post_init__()which is called from hidden __init__(). __post_init__ is a place to initialize a field based on other fields or include validation rules.

Option 5
from dataclasses import dataclass
import re

@dataclass
class Citizen:

    id: str
    name: str
    email: str
    age: int

    def __post_init__(self):
        if self.age < 0:
            raise ValueError("Age cannot be negative.")
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, self.email):
            raise ValueError("It's not an email address.")
        if len(self.name) > 20:
            raise ValueError("Name cannot exceed 20 characters.")

xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
xiaoxu = Citizen("id1", "xiaoxu1234567890123456789", "[email protected]", 27)
# ValueError: Name cannot exceed 20 characters.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", 27)
# ValueError: It's not an email address.
xiaoxu = Citizen("id1", "xiaoxu gao", "[email protected]", -27)
# ValueError: Age cannot be negative.

This option has the same effect as option 1, but uses a style @dataclass.

So far, we’ve looked at 5 options using only built-in features. In my opinion, Python’s built-in functions are already powerful enough to cover what we often need for data validation. But let’s also look around and look at some third-party libraries.

Option 6: marshmallow

Marshmallow is a Python object serialization library that converts complex data types to native Python data types and back. To understand how to serialize and validate an object, the user needs to build a schema that defines the validation rules for each attribute. Several things, in my opinion, make this library powerful:

  • Provides many out-of-the-box validation functions such as Length, Date, Range, Email, etc., which saves developers a lot of time to build them themselves. Of course, you can create your own validator.

  • Supports nested schema.

  • Marshmallow’s ValidationError contains all failed validations, whereas previous approaches threw an exception as soon as the first error was encountered. This feature helps the user to fix all errors at once.

Added additional birthday attribute for demo and nested HomeAddressSchema to show you different possibilities.

Option 6
from marshmallow import Schema, fields, validate, ValidationError

class HomeAddressSchema(Schema):
    postcode = fields.Str(validate=validate.Regexp("^\d{4}\s?\w{2}$"))
    city = fields.Str()
    country = fields.Str()

class CitizenSchema(Schema):
    id = fields.Str()
    name = fields.Str(validate=validate.Length(max=20))
    birthday = fields.Date()
    email = fields.Email()
    age = fields.Integer(validate=validate.Range(min=1))
    address = fields.Nested(HomeAddressSchema())

On top of the schema, we need to create the actual Citizen class. I use @dataclassto skip some codes __init__. Marshmallow requires a JSON object as input, so the asdict() function has been added to solve this problem.

Option 6 (continued)
from dataclasses import dataclass, asdict

@dataclass
class Citizen:
    id: str
    name: str
    birthday: str
    email: str
    age: int
    address: object

citizen = Citizen(
    id="1234",
    name="xiaoxugao",
    birthday="1990-01-01",
    email="[email protected]",
    age=1,
    address={"postcode": "1095AB", "city": "Amsterdam", "country": "NL"},
)

CitizenSchema().load(asdict(citizen))

citizen.name = "xiaoxugao1231234567890-1234567890"
citizen.email = "[email protected]"
CitizenSchema().load(asdict(citizen))
# marshmallow.exceptions.ValidationError: {'email': ['Not a valid email address.'], 'name': ['Longer than maximum length 20.']}

However, this library “allows” to update attributes with an invalid value after initialization. For example, in lines 23 and 24, it is possible to update the citizen object with an invalid name and email.

For more information, refer to the Marshmallow documentation.

Option 7: Pydantic

Pydantic is a Marshmallow-like library. It also follows the idea of ​​creating a schema or model for an object, while providing many ready-made validation classes such as PositiveInt, EmailStr, etc. Compared to Marshmallow, Pydantic integrates validation rules into the object class rather than creating a separate schema class.

Here’s how we can achieve the same goal with Pydantic. ValidationError stores all 3 errors found in the object.

Option 7
from pydantic import BaseModel, ValidationError, validator, PositiveInt, EmailStr

class HomeAddress(BaseModel):
    postcode: str
    city: str
    country: str

    class Config:
        anystr_strip_whitespace = True

    @validator('postcode')
    def dutch_postcode(cls, v):
        if not re.match("^\d{4}\s?\w{2}$", v):
            raise ValueError("must follow regex ^\d{4}\s?\w{2}$")
        return v

class Citizen(BaseModel):
    id: str
    name: str
    birthday: str
    email: EmailStr
    age: PositiveInt
    address: HomeAddress

    @validator('birthday')
    def valid_date(cls, v):
        try:
            datetime.strptime(v, "%Y-%m-%d")
            return v
        except ValueError:
            raise ValueError("date must be in YYYY-MM-DD format.")

try:
    citizen = Citizen(
        id="1234",
        name="xiaoxugao1234567889901234567890",
        birthday="1990-01-32",
        email="xiaoxugao@gmail.",
        age=0,
        address=HomeAddress(
            postcode="1095AB", city=" Amsterdam", country="NL"
        ),
    )
    print(citizen)
except ValidationError as e:
    print(e)
    
# 3 validation errors for Citizen
# birthday
#   date must be in YYYY-MM-DD format. (type=value_error)
# email
#   value is not a valid email address (type=value_error.email)
# age
#   ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

I personally prefer to have only one class with all the validation rules because of its clarity.

In fact, Pydantic can do much more than that. It can also export a schema file via the schema_json method.

print(Citizen.schema_json(indent=2))
json
{
  "title": "Citizen",
  "type": "object",
  "properties": {
    "id": {
      "title": "Id",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "birthday": {
      "title": "Birthday",
      "type": "string"
    },
    "email": {
      "title": "Email",
      "type": "string",
      "format": "email"
    },
    "age": {
      "title": "Age",
      "exclusiveMinimum": 0,
      "type": "integer"
    },
    "address": {
      "$ref": "#/definitions/HomeAddress"
    }
  },
  "required": [
    "id",
    "name",
    "birthday",
    "email",
    "age",
    "address"
  ],
  "definitions": {
    "HomeAddress": {
      "title": "HomeAddress",
      "type": "object",
      "properties": {
        "postcode": {
          "title": "Postcode",
          "type": "string"
        },
        "city": {
          "title": "City",
          "type": "string"
        },
        "country": {
          "title": "Country",
          "type": "string"
        }
      },
      "required": [
        "postcode",
        "city",
        "country"
      ]
    }
  }
}

Schema is compatible with JSON Schema Core, JSON Schema Validation, and OpenAPI. But as in Marshmallow, this library also “allows” updating attributes with an invalid value after initialization.

Conclusion

In this article I told[автор статьи – девушка] about 7 approaches to checking class attributes. 5 of them are based on built-in Python functions, and 2 are based on third-party libraries.

With built-in features, developers can fully control every detail of validation. But it may require more development and maintenance time.

With the help of third-party libraries, developers can save themselves the development of general rules and write less code. But at the same time, they need to know if these out-of-the-box features really meet their expectations. For example, there may be differing opinions on how to check email format. Also, for both Marshmallow and Pydantic, they allow an invalid update after initialization, which can be dangerous in some cases.

I hope you enjoyed this article.

Related posts