Skip to content

Guide

The dataclass_struct decorator

Use the dataclass_struct decorator to convert a class into a stdlib dataclass with struct packing/unpacking functionality:

def dataclass_struct(
    *,
    size: Literal["native", "std"] = "native",
    byteorder: Literal["native", "big", "little", "network"] = "native",
    validate_defaults: bool = True,
    **dataclass_kwargs,
):
    ...

The size argument can be either "native" (the default) or "std" and controls the size and alignment of fields:

size byteorder Notes
"native" "native" The default. Native alignment and padding.
"std" "native" Standard integer sizes and system endianness, no alignment/padding.
"std" "little" Standard integer sizes and little endian, no alignment/padding.
"std" "big" Standard integer sizes and big endian, no alignment/padding.
"std" "network" Equivalent to byteorder="big".

Decorated classes are transformed to a standard Python dataclass with boilerplate __init__, __repr__, __eq__ etc. auto-generated. The additional dataclass_kwargs keyword arguments will be passed through to the stdlib dataclass decorator: all standard keyword arguments are supported except for slots and weakref_slot.

In addition to the standard dataclass methods, two methods are added to the class:

A class attribute named __dataclass_struct__ is also added (see Inspecting dataclass-structs).

Default value validation

Default attribute values will be validated against their expected type and allowable value range. For example,

import dataclasses_struct as dcs

@dcs.dataclass_struct()
class Test:
    x: dcs.UnsignedChar = -1

will raise a ValueError. This can be disabled by passing validate_defaults=False to the decorator.

Inspecting dataclass-structs

A class or object can be checked to see if it is a dataclass-struct using the is_dataclass_struct function.

>>> dcs.is_dataclass_struct(Test)
True
>>> t = Test(100)
>>> dcs.is_dataclass_struct(t)
True

The get_struct_size function will return the size in bytes of the packed representation of a dataclass-struct class or an instance of one.

>>> dcs.get_struct_size(Test)
234

An additional class attribute, __dataclass_struct__, is added to the decorated class that contains the packed size, struct format string, and struct mode.

>>> Test.__dataclass_struct__.size
234
>>> Test.__dataclass_struct__.format
'@cc??bBhHiIQqqNnPfdd100s4xqq2x3xq2x'
>>> Test.__dataclass_struct__.mode
'@'

Native size mode

In "native" mode (the default), the struct is packed based on the platform and compiler on which Python was built: padding bytes may be added to maintain proper alignment of the fields and byte ordering (endianness) follows that of the platform. (The byteorder argument must also be "native".)

In "native" size mode, integer type sizes follow those of the standard C integer types of the platform (int, unsigned short etc.).

@dcs.dataclass_struct()
class NativeStruct:
    signed_char: dcs.SignedChar
    signed_short: dcs.Short
    unsigned_long_long: dcs.UnsignedLongLong
    void_pointer: dcs.Pointer

Standard size mode

In "std" mode, the struct is packed without any additional padding for alignment.

The "std" size mode supports four different byteorder values: "native" (the default), "little", "big", and "network". The "native" setting uses the system byte order (similar to "native" size mode, but without alignment). The "network" setting is equivalent to "big".

The "std" size uses platform-independent integer sizes, similar to using the integer types from stdint.h in C. When used with byteorder set to "little", "big", or "network", it is appropriate for marshalling data across different platforms.

@dcs.dataclass_struct(size="std", byteorder="native")
class NativeStruct:
    int8_t: dcs.I8
    uint64_t: dcs.U64

Supported type annotations

See the reference page for the complete list of type annotations.

Native integer types

These types are only supported in "native" size mode. Their native Python types are all int.

Type annotation Equivalent C type
SignedChar signed char
UnsignedChar unsigned char
Short short
UnsignedShort unsigned short
Int int
int (builtin type, alias to Int) int
UnsignedInt unsigned int
Long long
UnsignedLong unsigned long
LongLong long long
UnsignedLongLong unsigned long long
UnsignedSize size_t
SignedSize ssize_t (POSIX extension)
Pointer void *

Standard integer types

These types are only supported in "std" size mode. Their native Python types are all int.

Type annotation Equivalent C type
I8 int8_t
U8 uint8_t
I16 int16_t
U16 uint16_t
I32 int32_t
U32 uint32_t
I64 int64_t
U64 uint64_t

Floating point types

Supported in both size modes. The native Python type is float.

Type annotation Equivalent C type
F16 Extension type (see below)
F32 float
F64 double
float (builtin alias to F64) double

F16 is a half precision floating point. Some compilers provide support for it on certain platforms (e.g. GCC, Clang). It is also available as std::float16_t in C++23.

Note that floating point fields are always packed and unpacked using the IEEE 754 format, regardless of the underlying format used by the platform.

Boolean

The builtin bool type or dataclasses_struct.Bool type can be used to represent a boolean, which uses a single byte in either native or standard size modes.

Nested structs

Classes decorated with dataclass_struct can be used as fields in other classes, as long as they have the same size and byteorder settings.

@dcs.dataclass_struct()
class Vector2d:
    x: float
    y: float

@dcs.dataclass_struct()
class Vectors:
    direction: Vector2d
    velocity: Vector2d

# Will raise TypeError:
@dcs.dataclass_struct(size="std")
class VectorsStd:
    direction: Vector2d
    velocity: Vector2d

Default values for nested class fields cannot be set directly, as Python doesn't allow using mutable default values in dataclasses. To get around this, pass frozen=True to the inner class' dataclass_struct decorator. Alternatively, pass a zero-argument callable that returns an instance of the class to the default_factory keyword argument of dataclasses.field. For example:

from dataclasses import field

@dcs.dataclass_struct()
class VectorsStd:
    direction: Vector2d
    velocity: Vector2d = field(default_factory=lambda: Vector2d(0, 0))

The return type of the default_factory will be validated unless validate_defaults=False is passed to the dataclass_struct decorator. Note that this means the callable passed to default_factory will be called once during class creation.

Characters

In both size modes, a single byte can be packed by annotating a field with the builtin bytes type or the dataclasses_struct.Char type. The field's unpacked Python representation will be a bytes of length 1.

@dcs.dataclass_struct()
class Chars:
    char: dcs.Char = b'x'
    builtin: bytes = b'\x04'

Bytes arrays

Fixed-length

Fixed-length byte arrays can be represented in both size modes by annotating a field with typing.Annotated and a positive length. The field's unpacked Python representation will be a bytes object zero-padded or truncated to the specified length.

from typing import Annotated

@dcs.dataclass_struct()
class FixedLength:
    fixed: Annotated[bytes, 10]
>>> FixedLength.from_packed(FixedLength(b'Hello, world!').pack())
FixedLength(fixed=b'Hello, wor')

Tip: null-terminated strings

Fixed-length bytes arrays are truncated to the exact length specified in the Annotated argument. If you require bytes arrays to always be null-terminated (e.g. for passing to a C API), add a PadAfter annotation to the field:

@dcs.dataclass_struct()
class FixedLengthNullTerminated:
    # Equivalent to `unsigned char[11]` in C
    fixed: Annotated[bytes, 10, dcs.PadAfter(1)]
>>> FixedLengthNullTerminated(b"0123456789A").pack()
b'0123456789\x00'

Length-prefixed

One issue with fixed-length bytes arrays is that data shorter than the length will be zero-padded when unpacking to the Python type:

>>> packed = FixedLength(b'Hello').pack()
>>> packed
b'Hello\x00\x00\x00\x00\x00'
>>> FixedLength.from_packed(packed)
FixedLength(fixed=b'Hello\x00\x00\x00\x00\x00')

An alternative is to use length-prefixed arrays, also known as Pascal strings. These store the length of the array in the first byte, meaning that the available length without truncation is 255. To use length-prefixed arrays, annotate a bytes with LengthPrefixed:

from typing import Annotated

@dcs.dataclass_struct()
class PascalStrings:
    s: Annotated[bytes, dcs.LengthPrefixed(10)]  # (1)!
  1. The length passed to LengthPrefixed must be between 2 and 256 inclusive.
>>> packed = PascalStrings(b"12345").pack()
>>> packed
b'\x05Hello\x00\x00\x00\x00'
>>> PascalStrings.from_packed(packed)
PascalStrings(s=b'Hello')

Note

The size passed to LengthPrefixed is the size of the packed representation of the field including the size byte, so the maximum length the array can be without truncation is one less than the size.

Fixed-length arrays

Fixed-length arrays can be represented by annotating a list field with typing.Annotated and a positive length.

from typing import Annotated

@dcs.dataclass_struct()
class FixedLength:
    fixed: Annotated[list[int], 5]
>>> FixedLength.from_packed(FixedLength([1, 2, 3, 4, 5]).pack())
FixedLength(fixed=[1, 2, 3, 4, 5])

The values stored in fixed-length arrays can also be classes decorated with dataclass_struct.

from typing import Annotated

@dcs.dataclass_struct()
class Vector2d:
    x: float
    y: float

@dcs.dataclass_struct()
class FixedLength:
    fixed: Annotated[list[Vector2d], 3]
>>> FixedLength.from_packed(FixedLength([Vector2d(1.0, 2.0), Vector2d(3.0, 4.0), Vector2d(5.0, 6.0)]).pack())
FixedLength(fixed=[Vector2d(x=1.0, y=2.0), Vector2d(x=3.0, y=4.0), Vector2d(x=5.0, y=6.0)])

Fixed-length arrays can also be multi-dimensional by nesting Annotated list types.

from typing import Annotated

@dcs.dataclass_struct()
class TwoDimArray:
    fixed: Annotated[list[Annotated[list[int], 2]], 3]
>>> TwoDimArray.from_packed(TwoDimArray([[1, 2], [3, 4], [5, 6]]).pack())
TwoDimArray(fixed=[[1, 2], [3, 4], [5, 6]])

As with nested structs, a default_factory must be used to set a default value. For example:

from dataclasses import field
from typing import Annotated

@dcs.dataclass_struct()
class DefaultArray:
    x: Annotated[list[int], 3] = field(default_factory=lambda: [1, 2, 3])

The returned default value's length and type and values of its items will be validated unless validate_defaults=False is passed to the dataclass_struct decorator.

Manual padding

Padding can be manually controlled by annotating a type with PadBefore or PadAfter:

@dcs.dataclass_struct()
class WithPadding:
    # 4 padding bytes will be added before this field
    pad_before: Annotated[int, dcs.PadBefore(4)]

    # 2 padding bytes will be added before this field
    pad_after: Annotated[int, dcs.PadAfter(2)]

    # 3 padding bytes will be added before this field and 2 after
    pad_before_and_after: Annotated[int, dcs.PadBefore(3), dcs.PadAfter(2)]

A b'\x00' will be inserted into the packed representation for each padding byte.

>>> padded = WithPadding(100, 200, 300)
>>> packed = padded.pack()
>>> packed
b'\x00\x00\x00\x00d\x00\x00\x00\xc8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,\x01\x00\x00\x00\x00'
>>> WithPadding.from_packed(packed)
WithPadding(pad_before=100, pad_after=200, pad_before_and_after=300)

Type checking

Mypy

To work correctly with mypy, an extension is required; add to your mypy.ini:

[mypy]
plugins = dataclasses_struct.ext.mypy_plugin

Pyright/Pylance

Due to current limitations, Microsoft's Pylance Visual Studio extension and its open-source core Pyright will report an attribute access error on the pack and from_packed methods:

import dataclasses_struct as dcs

@dcs.dataclass_struct()
class Test:
    x: int

t = Test(10)
t.pack()
# pyright error: Cannot access attribute "pack" for class "Test"

A fix for this is planned in the future. As a workaround in the meantime, you can add stubs for the generated functions and attribute to the class:

from typing import ClassVar, TYPE_CHECKING
from collections.abc import Buffer  # import from typing_extensions on Python <3.12
import dataclasses_struct as dcs

@dcs.dataclass_struct()
class Test:
    x: int

    if TYPE_CHECKING:

        __dataclass_struct__: ClassVar[dcs.DataclassStructInternal]

        def pack(self) -> bytes: ...

        @classmethod
        def from_packed(cls, data: Buffer) -> "Test": ...

The DataclassStructProtocol class can then be used as a type hint where packing/unpacking is required. E.g.

def pack_dataclass_struct_to_file(path: str, struct: dcs.DataclassStructProtocol):
    data = struct.pack()
    with open(path, "wb") as f:
        f.write(data)

pack_dataclass_struct_to_file(Test(x=12))