掘金 后端 ( ) • 2024-05-11 11:44

前言

在许多场景中,我们常常需要执行Python对象的序列化、反序列化操作。例如,在开发REST API时,或者在进行一些面向对象化的数据加载和保存时,这一功能经常派上用场。

经常cv Python代码的臭宝,接触最多的应该是通过jsonpickle模块进行序列化或反序列化,这是一种常见的做法。

import json

data = {'name': 'John', 'age': 30, 'city': 'New York'}
serialized_data = json.dumps(data)

往往Python对象的序列化、反序列化同时也要伴随着数据的处理和校验。

而今天要自我介绍的主角:Marshmallow,则为我们带来更强大的数据序列化和反序列化,更优雅的参数校验、数据处理能力。

Github地址:https://github.com/marshmallow-code/marshmallow

它可以帮助您将复杂的数据类型(例如对象或数据库记录)转换为Python数据类型(例如字典),反之亦然。

它被大量用于FlaskDjangoWeb开发框架中,以处理数据输入和输出。

楔子

为了执行序列化或反序列化操作,首先需要一个操作对象。在这里,我们先定义一个类:

class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

# 实例化一个小说对象
interesting_novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=300)

现在的需求是将这个小说对象转成字典。你会怎么来实现呢?

笨方法

  • 手动创建一个字典,将小说对象的属性映射到字典的键和值。
novel_dict = {
    "title": interesting_novel.title,
    "author": interesting_novel.author,
    "genre": interesting_novel.genre,
    "pages": interesting_novel.pages
}

  • 使用vars函数:

Python 中的 vars 函数可以返回对象的 dict 属性,该属性包含了对象的所有属性和对应的值。这样,你可以直接使用 vars 将对象转换为字典:

novel_dict = vars(interesting_novel)

  • 使用__dict__属性:

对于具有__dict__属性的对象,可以直接访问该属性获取对象的属性和值。

novel_dict = interesting_novel.__dict__

  • 使用json.loads(json.dumps(obj))

利用JSON库,通过先将对象转换为JSON字符串,然后再将其解析为字典。

import json

novel_json = json.dumps(interesting_novel, default=lambda o: o.__dict__)
novel_dict = json.loads(novel_json)

数据类使用dataclass/attrs的内置方法

  • dataclass版本
from dataclasses import dataclass, asdict
from loguru import logger


@dataclass
class Novel:
    title: str
    author: str
    genre: str
    pages: int


# 实例化一个小说对象
interesting_novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=300)

# 将对象序列化为字典
novel_dict = asdict(interesting_novel)
logger.info(novel_dict)

# 将字典反序列化为对象
new_novel = Novel(**novel_dict)
logger.info(new_novel)

  • attrs版本
import attr
import cattr
from loguru import logger


@attr.define
class Novel:
    title = attr.ib()
    author = attr.ib()
    genre = attr.ib()
    pages = attr.ib()


# 实例化一个小说对象
interesting_novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=300)

# 将对象序列化为字典
novel_dict = cattr.unstructure(interesting_novel)
logger.info(novel_dict)

# 将字典反序列化为对象
new_novel_dict = {'title': 'AI之旅', 'author': 'HaiGe', 'genre': 'Fantasy', 'pages': 668}
new_novel = cattr.structure(new_novel_dict, Novel)
logger.info(new_novel)

更优雅的方案:marshmallow

上面介绍的几种序列化与反序列化方法看起来已经相当炸裂,但是为什么还要选择 marshmallow 呢?

其实,尽管这些方法能够完成基本的序列化和反序列化任务,但在处理更加复杂的数据结构、数据验证、预处理逻辑以及与其他库的集成等方面,它们可能显得不够灵活和方便。而 marshmallow 库正是为了解决这些问题而诞生的。

marshmallow 提供了强大而灵活的schema(模式)定义,可以精确地控制数据的序列化和反序列化过程。它支持复杂的数据结构、自定义验证器、预处理逻辑等高级功能,同时与许多其他常见的Python库和框架无缝集成。

无论是构建RESTful API、数据持久化、数据迁移或者简单的数据处理、数据验证等领域,marshmallow都能发挥出色的作用,特别适合于需要处理复杂数据结构、进行数据交换的场景

此外,marshmallow还提供了丰富的文档和社区支持,使得学习和使用起来更加容易。因此,尽管已经有了许多其他方法,但选择marshmallow依然是一个明智的选择,特别是在处理复杂的数据结构和场景下。

marshmallow库的基本用法

安装

要使用marshmallow这个库,需要先安装下:

# 3.20.2
pip3 install marshmallow  

A. 序列化与反序列化

marshmallow提供了灵活且强大的数据序列化与反序列化功能,可以将复杂的Python数据类型转换为JSONXML等格式,也能反向将外部数据解析为Python对象。

from marshmallow import Schema, fields, post_load


class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str()
    author = fields.Str()
    genre = fields.Str()
    pages = fields.Int()

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)


# 创建一个 Novel 对象
novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=300)

# 序列化 Novel 对象
novel_schema = NovelSchema()
serialized_data = novel_schema.dump(novel)
print(serialized_data)

# 反序列化
deserialized_data = novel_schema.load(serialized_data)
print(deserialized_data)

这里我们需要稍微区分一下schemadump方法和dumps方法:dump()方法返回的是dict格式,而dumps()方法返回的是JSON字符串。
同理,load方法用来加载字典,而loads方法用来加载JSON字符串。


让我们再来看下多个对象序列化与反序列化,同样非常简单!!

from marshmallow import Schema, fields, post_load
from loguru import logger


class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str()
    author = fields.Str()
    genre = fields.Str()
    pages = fields.Int()

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)


# 创建一个 Novel 对象
novel1 = Novel(title="海哥python1", author="暴走的海鸽", genre="Fantasy", pages=300)
novel2 = Novel(title="海哥python2", author="暴走的海鸽", genre="Fantasy", pages=300)
novel3 = Novel(title="海哥python3", author="暴走的海鸽", genre="Fantasy", pages=300)

novels = [novel1, novel2, novel3]

# 序列化 Novel 对象
novel_schema = NovelSchema(many=True)
serialized_data = novel_schema.dump(novels)
logger.info(serialized_data)

# 反序列化
deserialized_data = novel_schema.load(serialized_data)
logger.info(deserialized_data)

此外,Schema类具有两个参数用于控制序列化的输出,即onlyexcludeonly 参数返回的输出结果仅包含列表中指定的类属性,而exclude则正好相反,它会排除列表中指定的类属性。

from marshmallow import Schema, fields, post_load, validates, ValidationError, validate
from loguru import logger


class Novel:
    def __init__(self, title, author, genre="Fantasy2", pages=300):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str(validate=validate.Length(min=1, max=10))
    author = fields.Str()
    genre = fields.Str()
    pages = fields.Int()

    @post_load
    def make(self, data, **kwargs):
        logger.info(data)
        return Novel(**data)

    @validates('pages')
    def validate_pages(self, value):
        if value <= 0:
            raise ValidationError('Pages must be a positive integer.')


# Create a Novel object
novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=-300)

# Serialize the Novel object
novel_schema = NovelSchema(only=("title", "author",))
serialized_data = novel_schema.dump(novel)
logger.info(serialized_data)

B. 数据验证

数据验证是marshmallow的另一个重要特性,它能够定制对数据进行验证,包括类型验证长度验证自定义验证等,保证数据的完整性和正确性。

内置的常见验证器有:

from marshmallow import Schema, fields, post_load, validates, ValidationError, validate
from loguru import logger


class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str(validate=validate.Length(min=1, max=10))
    author = fields.Str()
    genre = fields.Str()
    pages = fields.Int()

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)

    @validates('pages')
    def validate_pages(self, value):
        if value <= 0:
            raise ValidationError('Pages must be a positive integer.')


# Create a Novel object
novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=-300)

# Serialize the Novel object
novel_schema = NovelSchema()
serialized_data = novel_schema.dump(novel)
logger.info(serialized_data)

# Deserialization
try:
    deserialized_data = novel_schema.load(serialized_data)
    logger.info(deserialized_data)
except ValidationError as e:
    logger.error(e.messages)
    logger.error(e.valid_data)

在这个例子中,我们对title使用了validate字段验证,并且定义了一个validate_pages方法,用于验证pages字段。如果pages字段的值小于等于0,将会引发一个ValidationError异常。在反序列化时,如果遇到校验失败,Marshmallow将会捕获异常,并将校验错误信息存储在messages属性中。

如果需要对属性进行缺失验证,则在schema中规定required参数,即表明该参数是必要的,不可缺失。

from marshmallow import Schema, fields, post_load, validates, ValidationError, validate
from loguru import logger


class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str(validate=validate.Length(min=1, max=10))
    author = fields.Str(required=True)
    genre = fields.Str()
    pages = fields.Int()

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)

    @validates('pages')
    def validate_pages(self, value):
        if value <= 0:
            raise ValidationError('Pages must be a positive integer.')


# Create a Novel object
# novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=-300)
novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=-300)

# Serialize the Novel object
novel_schema = NovelSchema()
serialized_data = novel_schema.dump(novel)
logger.info(serialized_data)

# Deserialization
serialized_data.pop("author")  # 移除author
try:
    deserialized_data = novel_schema.load(serialized_data)
    logger.info(deserialized_data)
except ValidationError as e:
    logger.error(e.messages)
    logger.error(e.valid_data)

我们给author字段定义了required属性,但是反序列化的时候并没有传入,具体报错如下:

Marshmallow在序列化和反序列化字段方面也提供了默认值,并且非常清晰地区分它们!例如,load_default参数用于在反序列化时自动填充数据,而dump_default参数则用于在序列化时自动填充数据。

from marshmallow import Schema, fields, post_load, validates, ValidationError, validate
from loguru import logger


class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str(validate=validate.Length(min=1, max=1000))
    author = fields.Str(required=True)
    genre = fields.Str()
    pages = fields.Int(load_default=300, dump_default=500)  # 设置反序列化默认值为300,序列化默认值为500

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)

    @validates('pages')
    def validate_pages(self, value):
        if value <= 0:
            raise ValidationError('Pages must be a positive integer.')


# Create a Novel object
novel = {"title": "公众号:海哥python", "author": "暴走的海鸽", "genre": "Fantasy"}

# Serialize the Novel object
novel_schema = NovelSchema()
serialized_data = novel_schema.dump(novel)
logger.info(f"序列化:{serialized_data}")

# Deserialization
novel2 = {"title": "公众号:海哥python", "author": "暴走的海鸽", "genre": "Fantasy"}
try:
    deserialized_data = novel_schema.load(novel2)
    logger.info(f"反序列化:{deserialized_data}")
except ValidationError as e:
    logger.error(e.messages)
    logger.error(f"合法的数据:{e.valid_data}")

在序列化过程中,Schema对象默认会使用与其自身定义相同的fields属性名,但也可以根据需要进行自定义。
如果使用和生成与架构不匹配的数据,则可以通过data_key参数指定输出键,类似于起了别名。

from marshmallow import Schema, fields, post_load, validates, ValidationError, validate
from loguru import logger


class Novel:
    def __init__(self, title, author, genre, pages):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r})"


class NovelSchema(Schema):
    title = fields.Str(validate=validate.Length(min=1, max=1000))
    author = fields.Str(data_key="author_name")
    genre = fields.Str()
    pages = fields.Int(missing=300, default=500)  # 设置反序列化默认值为300,序列化默认值为500

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)

    @validates('pages')
    def validate_pages(self, value):
        if value <= 0:
            raise ValidationError('Pages must be a positive integer.')


# Create a Novel object
novel = {"title": "公众号:海哥python", "author_name": "暴走的海鸽2", "genre": "Fantasy"}

# Serialize the Novel object
novel_schema = NovelSchema()
serialized_data = novel_schema.dump(novel)
logger.info(f"序列化:{serialized_data}")

# Deserialization
novel2 = {"title": "公众号:海哥python", "author": "暴走的海鸽", "genre": "Fantasy"}
try:
    deserialized_data = novel_schema.load(novel2)
    logger.info(f"反序列化:{deserialized_data}")
except ValidationError as e:
    logger.error(e.messages)
    logger.error(f"合法的数据:{e.valid_data}")

C. 自定义字段类型

通过marshmallow,我们可以轻松定义自定义字段,满足各种特殊数据类型的序列化、反序列化需求,使得数据处理更加灵活。

from datetime import datetime

from marshmallow import Schema, fields, post_load, validates, ValidationError
from loguru import logger


class CustomField(fields.Int):
    def _deserialize(self, value, attr, obj, **kwargs):
        # 将数字加1
        return value + 1


class CustomDateField(fields.Field):
    def _deserialize(self, value, attr, obj, **kwargs):
        return value.strftime('%Y-%m-%d')


class Novel:
    def __init__(self, title, author, genre, pages, date):
        self.title = title
        self.author = author
        self.genre = genre
        self.pages = pages
        self.date = date

    def __repr__(self):
        return f"Novel(title={self.title!r}, author={self.author!r}, genre={self.genre!r}, pages={self.pages!r}, date={self.date!r})"


class NovelSchema(Schema):
    title = fields.Str()
    author = fields.Str()
    genre = fields.Str()
    pages = CustomField()
    date = CustomDateField()

    @post_load
    def make(self, data, **kwargs):
        return Novel(**data)

    @validates('pages')
    def validate_pages(self, value):
        if value <= 0:
            raise ValidationError('Pages must be a positive integer.')


# Create a Novel object
novel = Novel(title="The Enchanting Adventure", author="Jane Doe", genre="Fantasy", pages=300,
              date=datetime(2024, 3, 13))

# Serialize the Novel object
novel_schema = NovelSchema()
serialized_data = novel_schema.dump(novel)
logger.info(serialized_data)

# Deserialization
try:
    deserialized_data = novel_schema.load(serialized_data)
    logger.info(deserialized_data)
except ValidationError as e:
    logger.error(e.messages)
    logger.error(e.valid_data)

高级应用技巧和场景

部分加载

在多个位置使用同一Schema时,您可能只想通过传递partial来跳过required验证。

from marshmallow import Schema, fields


class UserSchema(Schema):
    name = fields.String(required=True)
    age = fields.Integer(required=True)


result = UserSchema().load({"age": 42}, partial=True)
# OR UserSchema(partial=True).load({'age': 42})
print(result)  # => {'age': 42}

您可以通过设置partial=True来完全忽略缺少的字段。

class UserSchema(Schema):
    name = fields.String(required=True)
    age = fields.Integer(required=True)


result = UserSchema().load({"age": 42}, partial=True)
# OR UserSchema(partial=True).load({'age': 42})
print(result)  # => {'age': 42}

处理未知字段

默认情况下,如果遇到Schema中没有匹配Field项的键,load将引发marshmallow.exceptions.ValidationError

from marshmallow import Schema, fields, INCLUDE


class UserSchema(Schema):
    name = fields.String(required=True)
    age = fields.Integer(required=True)

    # class Meta:
    #     unknown = INCLUDE


result = UserSchema().load({"age": 42, "name": "公众号: 海哥python", "email": "[email protected]"})
# OR UserSchema(partial=True).load({'age': 42})
print(result)  # => {'age': 42}

我们可以对未知字段进行处理:

  • 可以在Meta类中指定unknown Schema
  • 在实例化时: schema = UserSchema(unknown=INCLUDE)
  • 调用load时: UserSchema().load(data, unknown=INCLUDE)

该选项接受以下选项之一:

  • RAISE (默认值): ValidationError 如果存在任何未知字段,则引发
  • EXCLUDE :排除未知字段
  • INCLUDE :接受并包含未知字段

dump_only“只读”和load_only“只写”字段

from datetime import datetime
from marshmallow import Schema, fields, INCLUDE


class UserSchema(Schema):
    name = fields.Str()
    # password is "write-only"
    password = fields.Str(load_only=True)
    # created_at is "read-only"
    created_at = fields.DateTime(dump_only=True)


# 序列化
user_data = {"name": "Alice", "password": "s3cr3t", "created_at": datetime.now()}
user_schema = UserSchema()
serialized_data = user_schema.dump(user_data)
print("序列化:", serialized_data)

# 反序列化
user_input = {"name": "Bob", "password": "pass123"}
user_schema = UserSchema()
try:
    deserialized_data = user_schema.load(user_input)
    print("反序列化:", deserialized_data)
except Exception as e:
    print("反序列化报错:", e)

排序

对于某些用例,维护序列化输出的字段顺序可能很有用。要启用排序,请将ordered选项设置为true。这将指示marshmallow将数据序列化到collections.OrderedDict

#!usr/bin/env python
# -*- coding:utf-8 _*-
# __author__:lianhaifeng
# __time__:2024/3/14 21:40
import datetime

from marshmallow import Schema, fields, INCLUDE

from collections import OrderedDict


class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_time = datetime.datetime.now()


class UserSchema(Schema):
    uppername = fields.Function(lambda obj: obj.name.upper())

    class Meta:
        fields = ("name", "email", "created_time", "uppername")
        ordered = True


u = User("Charlie", "[email protected]")
schema = UserSchema()
res = schema.dump(u)
print(isinstance(res, OrderedDict))
# True
print(res)
# OrderedDict([('name', 'Charlie'), ('email', '[email protected]'), ('created_time', '2019-08-05T20:22:05.788540+00:00'), ('uppername', 'CHARLIE')])

嵌套模式

对于嵌套属性,marshmallow毫无疑问也能胜任,这正是我认为marshmallow非常强大的地方。

一个Blog可能有一个作者,由User对象表示。

import datetime as dt
from pprint import pprint
from marshmallow import Schema, fields


class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.created_at = dt.datetime.now()
        self.friends = []
        self.employer = None


class Blog:
    def __init__(self, title, author):
        self.title = title
        self.author = author  # A User object


class UserSchema(Schema):
    name = fields.String()
    email = fields.Email()
    created_at = fields.DateTime()


class BlogSchema(Schema):
    title = fields.String()
    author = fields.Nested(UserSchema)


user = User(name="Monty", email="[email protected]")
blog = Blog(title="Something Completely Different", author=user)
result = BlogSchema().dump(blog)
pprint(result)

# {'title': u'Something Completely Different',
#  'author': {'name': u'Monty',
#             'email': u'[email protected]',
#             'created_at': '2014-08-17T14:58:57.600623+00:00'}}

更多嵌套玩法详见: https://marshmallow.readthedocs.io/en/stable/nesting.html

扩展 Schema

预处理和后处理方法

可以使用pre_loadpost_loadpre_dumppost_dump装饰器注册数据预处理和后处理方法。

from marshmallow import Schema, fields, post_load, pre_load, pre_dump, post_dump
from loguru import logger


class User:
    def __init__(self, username, email):
        self.username = username
        self.email = email

    def __repr__(self):
        return f"User(username={self.username!r}, email={self.email!r})"


class UserSchema(Schema):
    username = fields.Str()
    email = fields.Email()

    @pre_load
    def preprocess_data(self, data, **kwargs):
        # 在反序列化之前对数据进行预处理
        if 'username' in data:
            data['username'] = data['username'].lower()  # 将用户名转换为小写
        logger.info("do pre_load...")
        return data

    @post_load
    def make_user(self, data, **kwargs):
        # 在反序列化之后创建用户对象
        logger.info("do post_load...")
        return User(**data)

    @pre_dump
    def prepare_data(self, data, **kwargs):
        # 在序列化之前对数据进行预处理
        logger.info(type(data))
        if isinstance(data, User):
            data.username = data.username.upper()

        elif 'username' in data:
            data['username'] = data['username'].upper()  # 将用户名转换为大写
        logger.info("do pre_dump...")
        return data

    @post_dump
    def clean_data(self, data, **kwargs):
        # 在序列化之后对序列化结果进行清理
        logger.info(type(data))
        if 'email' in data:
            del data['email']  # 删除 email 字段
        logger.info("do post_dump...")
        return data


# 准备要反序列化的数据
input_data = [{
    "username": "公众号:海哥Python",
    "email": "[email protected]"
}]

# 创建 Schema 对象并进行反序列化
user_schema = UserSchema()
result = user_schema.load(input_data, many=True)

logger.info(f"Post Load Result: {result}")  # 输出反序列化后的结果

# 创建一个 User 对象
user = User(username="公众号:海哥Python", email="[email protected]")

# 序列化 User 对象
serialized_data = user_schema.dump(user)

logger.info(f"Post Dump Result: {serialized_data}")  # 输出序列化后的结果

自定义错误处理

import logging
from marshmallow import Schema, fields


class AppError(Exception):
    pass


class UserSchema(Schema):
    email = fields.Email()

    def handle_error(self, exc, data, **kwargs):
        """Log and raise our custom exception when (de)serialization fails."""
        logging.error(exc.messages)
        raise AppError("An error occurred with input: {0}".format(data))


schema = UserSchema()
schema.load({"email": "invalid-email"})  # raises AppError

场景示例

REST API中基于marshmallow做参数校验是一种相对优雅的操作。

安装:

# Flask                    3.0.2
# Flask-SQLAlchemy         3.1.1
pip install flask flask-sqlalchemy

应用代码

# demo.py
import datetime

from flask import Flask, request
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy.exc import NoResultFound

from marshmallow import Schema, ValidationError, fields, pre_load

app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///quotes.db"

db = SQLAlchemy(app)

##### MODELS #####


class Author(db.Model):  # type: ignore
    id = db.Column(db.Integer, primary_key=True)
    first = db.Column(db.String(80))
    last = db.Column(db.String(80))


class Quote(db.Model):  # type: ignore
    id = db.Column(db.Integer, primary_key=True)
    content = db.Column(db.String, nullable=False)
    author_id = db.Column(db.Integer, db.ForeignKey("author.id"))
    author = db.relationship("Author", backref=db.backref("quotes", lazy="dynamic"))
    posted_at = db.Column(db.DateTime)

##### SCHEMAS #####


class AuthorSchema(Schema):
    id = fields.Int(dump_only=True)
    first = fields.Str()
    last = fields.Str()
    formatted_name = fields.Method("format_name", dump_only=True)

    def format_name(self, author):
        return f"{author.last}, {author.first}"


# Custom validator
def must_not_be_blank(data):
    if not data:
        raise ValidationError("Data not provided.")


class QuoteSchema(Schema):
    id = fields.Int(dump_only=True)
    author = fields.Nested(AuthorSchema, validate=must_not_be_blank)
    content = fields.Str(required=True, validate=must_not_be_blank)
    posted_at = fields.DateTime(dump_only=True)

    # Allow client to pass author's full name in request body
    # e.g. {"author': 'Tim Peters"} rather than {"first": "Tim", "last": "Peters"}
    @pre_load
    def process_author(self, data, **kwargs):
        author_name = data.get("author")
        if author_name:
            first, last = author_name.split(" ")
            author_dict = dict(first=first, last=last)
        else:
            author_dict = {}
        data["author"] = author_dict
        return data


author_schema = AuthorSchema()
authors_schema = AuthorSchema(many=True)
quote_schema = QuoteSchema()
quotes_schema = QuoteSchema(many=True, only=("id", "content"))

##### API #####


@app.route("/authors")
def get_authors():
    authors = Author.query.all()
    # Serialize the queryset
    result = authors_schema.dump(authors)
    return {"authors": result}


@app.route("/authors/<int:pk>")
def get_author(pk):
    try:
        author = Author.query.filter(Author.id == pk).one()
    except NoResultFound:
        return {"message": "Author could not be found."}, 400
    author_result = author_schema.dump(author)
    quotes_result = quotes_schema.dump(author.quotes.all())
    return {"author": author_result, "quotes": quotes_result}


@app.route("/quotes/", methods=["GET"])
def get_quotes():
    quotes = Quote.query.all()
    result = quotes_schema.dump(quotes, many=True)
    return {"quotes": result}


@app.route("/quotes/<int:pk>")
def get_quote(pk):
    try:
        quote = Quote.query.filter(Quote.id == pk).one()
    except NoResultFound:
        return {"message": "Quote could not be found."}, 400
    result = quote_schema.dump(quote)
    return {"quote": result}


@app.route("/quotes/", methods=["POST"])
def new_quote():
    json_data = request.get_json()
    if not json_data:
        return {"message": "No input data provided"}, 400
    # Validate and deserialize input
    try:
        data = quote_schema.load(json_data)
    except ValidationError as err:
        return err.messages, 422
    first, last = data["author"]["first"], data["author"]["last"]
    author = Author.query.filter_by(first=first, last=last).first()
    if author is None:
        # Create a new author
        author = Author(first=first, last=last)
        db.session.add(author)
    # Create new quote
    quote = Quote(
        content=data["content"], author=author, posted_at=datetime.datetime.utcnow()
    )
    db.session.add(quote)
    db.session.commit()
    result = quote_schema.dump(Quote.query.get(quote.id))
    return {"message": "Created new quote.", "quote": result}


if __name__ == "__main__":
    with app.app_context():
        db.create_all()
        app.run(debug=True, port=5000)

启动服务:

python .\demo.py

安装httpie进行测试:

pip install httpie

添加有效报价:

添加无效报价:

查询报价:

常与marshmallow搭档的库还有flask-marshmallowflask-smorest

小结

总的来说,marshmallow库具有强大的序列化、反序列化和数据验证功能,能够适用于各种复杂的数据处理场景,使得数据处理变得更加便捷和高效。

  • 相比于Python标准库中的json库,marshmallow提供了更为灵活且功能强大的数据序列化与验证功能,适用于更多复杂数据结构的处理。
  • Django框架中的序列化器相比,marshmallow更加轻量且不依赖于特定的框架,能够在各种Python项目中灵活应用。

适用于Web开发、RESTful API构建、数据持久化、数据验证等多种场景,尤其适合处理复杂数据结构的情况。

随着数据处理需求的日益复杂,marshmallow作为一款强大的数据序列化与验证库,必将在Python数据处理领域有着更为广阔的发展前景。

无论您是在开发Web API、进行表单验证、与数据库交互,还是进行数据导入和导出,Marshmallow都能够高效地处理数据。希望本文能够帮助您更好地理解和使用Marshmallow

Marshmallow的强大远不止这些,更多使用技巧请查阅官方文档!

最后

今天的分享就到这里。如果觉得不错,点赞关注安排起来吧。

参考

https://marshmallow.readthedocs.io/en/stable/
https://rest-apis-flask.teclado.com/docs/flask_smorest/validation_with_marshmallow/ https://marshmallow.readthedocs.io/en/stable/nesting.html
https://www.cnblogs.com/ChangAn223/p/11305376.html
https://blog.51cto.com/u_15703497/6858837 https://www.cnblogs.com/erhuoyuan/p/17450757.html https://github.com/marshmallow-code/marshmallow-sqlalchemy