Guide
The dataclass_struct
decorator
Use the dataclass_struct
decorator to convert a class into a stdlib
dataclass
with struct
packing/unpacking functionality:
def dataclass_struct(
*,
size: Literal["native", "std"] = "native",
byteorder: Literal["native", "big", "little", "network"] = "native",
validate_defaults: bool = True,
**dataclass_kwargs,
):
...
The size
argument can be either "native"
(the default) or "std"
and
controls the size and alignment of fields:
size |
byteorder |
Notes |
---|---|---|
"native" |
"native" |
The default. Native alignment and padding. |
"std" |
"native" |
Standard integer sizes and system endianness, no alignment/padding. |
"std" |
"little" |
Standard integer sizes and little endian, no alignment/padding. |
"std" |
"big" |
Standard integer sizes and big endian, no alignment/padding. |
"std" |
"network" |
Equivalent to byteorder="big" . |
Decorated classes are transformed to a standard Python
dataclass with boilerplate
__init__
, __repr__
, __eq__
etc. auto-generated. The additional
dataclass_kwargs
keyword arguments will be passed through to the stdlib
dataclass
decorator:
all standard keyword arguments are supported except for slots
and
weakref_slot
.
In addition to the standard dataclass
methods, two methods
are added to the class:
pack
, which packs an instance of the class tobytes
.from_packed
, which is a class method that returns a new instance of the class from its packed representation in an object that implements the buffer protococol (bytes
,bytearray
, memory-mapped file objects etc.).
A class attribute named
__dataclass_struct__
is also added (see Inspecting
dataclass-structs).
Default value validation
Default attribute values will be validated against their expected type and allowable value range. For example,
will raise a ValueError
. This can be disabled by passing
validate_defaults=False
to the decorator.
Inspecting dataclass-structs
A class or object can be checked to see if it is a dataclass-struct using the
is_dataclass_struct
function.
The get_struct_size
function will return
the size in bytes of the packed representation of a dataclass-struct class or an
instance of one.
An additional class attribute,
__dataclass_struct__
,
is added to the decorated class that contains the packed size, struct
format
string, and
struct
mode.
>>> Test.__dataclass_struct__.size
234
>>> Test.__dataclass_struct__.format
'@cc??bBhHiIQqqNnPfdd100s4xqq2x3xq2x'
>>> Test.__dataclass_struct__.mode
'@'
Native size mode
In "native"
mode (the default), the struct is packed based on the platform and
compiler on which Python was built: padding bytes may be added to maintain
proper alignment of the fields and byte ordering (endianness) follows that of
the platform. (The byteorder
argument must also be "native"
.)
In "native"
size mode, integer type sizes follow those of the standard C
integer types of the platform (int
, unsigned short
etc.).
@dcs.dataclass_struct()
class NativeStruct:
signed_char: dcs.SignedChar
signed_short: dcs.Short
unsigned_long_long: dcs.UnsignedLongLong
void_pointer: dcs.Pointer
Standard size mode
In "std"
mode, the struct is packed without any additional padding for
alignment.
The "std"
size mode supports four different byteorder
values: "native"
(the default), "little"
, "big"
, and "network"
. The "native"
setting uses
the system byte order (similar to "native"
size mode, but without alignment).
The "network"
setting is equivalent to "big"
.
The "std"
size uses platform-independent integer sizes, similar to using the
integer types from stdint.h
in C. When used with byteorder
set to
"little"
, "big"
, or "network"
, it is appropriate for marshalling data
across different platforms.
@dcs.dataclass_struct(size="std", byteorder="native")
class NativeStruct:
int8_t: dcs.I8
uint64_t: dcs.U64
Supported type annotations
See the reference page for the complete list of type annotations.
Native integer types
These types are only supported in "native"
size mode. Their native Python
types are all int
.
Type annotation | Equivalent C type |
---|---|
SignedChar |
signed char |
UnsignedChar |
unsigned char |
Short |
short |
UnsignedShort |
unsigned short |
Int |
int |
int (builtin type, alias to Int ) |
int |
UnsignedInt |
unsigned int |
Long |
long |
UnsignedLong |
unsigned long |
LongLong |
long long |
UnsignedLongLong |
unsigned long long |
UnsignedSize |
size_t |
SignedSize |
ssize_t (POSIX extension) |
Pointer |
void * |
Standard integer types
These types are only supported in "std"
size mode. Their native Python types
are all int
.
Type annotation | Equivalent C type |
---|---|
I8 |
int8_t |
U8 |
uint8_t |
I16 |
int16_t |
U16 |
uint16_t |
I32 |
int32_t |
U32 |
uint32_t |
I64 |
int64_t |
U64 |
uint64_t |
Floating point types
Supported in both size modes. The native Python type is float
.
Type annotation | Equivalent C type |
---|---|
F16 |
Extension type (see below) |
F32 |
float |
F64 |
double |
float (builtin alias to F64 ) |
double |
F16
is a half precision floating point. Some compilers provide support for it
on certain platforms (e.g.
GCC,
Clang).
It is also available as
std::float16_t
in C++23.
Note that floating point fields are always packed and unpacked using the IEEE 754 format, regardless of the underlying format used by the platform.
Boolean
The builtin bool
type or dataclasses_struct.Bool
type can be used to
represent a boolean, which uses a single byte in either native or standard size
modes.
Nested structs
Classes decorated with dataclass_struct
can be used as fields in other
classes, as long as they have the same size
and byteorder
settings.
@dcs.dataclass_struct()
class Vector2d:
x: float
y: float
@dcs.dataclass_struct()
class Vectors:
direction: Vector2d
velocity: Vector2d
# Will raise TypeError:
@dcs.dataclass_struct(size="std")
class VectorsStd:
direction: Vector2d
velocity: Vector2d
Default values for nested class fields cannot be set directly, as Python doesn't
allow using mutable default values in dataclasses. To get around this, pass
frozen=True
to the inner class' dataclass_struct
decorator. Alternatively,
pass a zero-argument callable that returns an instance of the class to the
default_factory
keyword argument of
dataclasses.field
.
For example:
from dataclasses import field
@dcs.dataclass_struct()
class VectorsStd:
direction: Vector2d
velocity: Vector2d = field(default_factory=lambda: Vector2d(0, 0))
The return type of the default_factory
will be validated unless
validate_defaults=False
is passed to the dataclass_struct
decorator. Note
that this means the callable passed to default_factory
will be called once
during class creation.
Characters
In both size modes, a single byte can be packed by annotating a field with the
builtin bytes
type or the dataclasses_struct.Char
type. The field's
unpacked Python representation will be a bytes
of length 1.
Bytes arrays
Fixed-length
Fixed-length byte arrays can be represented in both size modes by annotating a
field with typing.Annotated
and a positive length. The field's unpacked Python
representation will be a bytes
object zero-padded or truncated to the
specified length.
Tip: null-terminated strings
Fixed-length bytes
arrays are truncated to the exact length specified in
the Annotated
argument. If you require bytes
arrays to always be
null-terminated (e.g. for passing to a C API), add a PadAfter
annotation to the field:
Length-prefixed
One issue with fixed-length bytes
arrays is that data shorter than the length
will be zero-padded when unpacking to the Python type:
>>> packed = FixedLength(b'Hello').pack()
>>> packed
b'Hello\x00\x00\x00\x00\x00'
>>> FixedLength.from_packed(packed)
FixedLength(fixed=b'Hello\x00\x00\x00\x00\x00')
An alternative is to use length-prefixed arrays, also known as Pascal
strings. These store the length
of the array in the first byte, meaning that the available length without
truncation is 255. To use length-prefixed arrays, annotate a bytes
with
LengthPrefixed
:
from typing import Annotated
@dcs.dataclass_struct()
class PascalStrings:
s: Annotated[bytes, dcs.LengthPrefixed(10)] # (1)!
- The length passed to
LengthPrefixed
must be between 2 and 256 inclusive.
>>> packed = PascalStrings(b"12345").pack()
>>> packed
b'\x05Hello\x00\x00\x00\x00'
>>> PascalStrings.from_packed(packed)
PascalStrings(s=b'Hello')
Note
The size passed to LengthPrefixed
is
the size of the packed representation of the field including the size
byte, so the maximum length the array can be without truncation is one less
than the size.
Fixed-length arrays
Fixed-length arrays can be represented by annotating a list
field with
typing.Annotated
and a positive length.
from typing import Annotated
@dcs.dataclass_struct()
class FixedLength:
fixed: Annotated[list[int], 5]
The values stored in fixed-length arrays can also be classes
decorated with dataclass_struct
.
from typing import Annotated
@dcs.dataclass_struct()
class Vector2d:
x: float
y: float
@dcs.dataclass_struct()
class FixedLength:
fixed: Annotated[list[Vector2d], 3]
>>> FixedLength.from_packed(FixedLength([Vector2d(1.0, 2.0), Vector2d(3.0, 4.0), Vector2d(5.0, 6.0)]).pack())
FixedLength(fixed=[Vector2d(x=1.0, y=2.0), Vector2d(x=3.0, y=4.0), Vector2d(x=5.0, y=6.0)])
Fixed-length arrays can also be multi-dimensional by nesting Annotated
list
types.
from typing import Annotated
@dcs.dataclass_struct()
class TwoDimArray:
fixed: Annotated[list[Annotated[list[int], 2]], 3]
>>> TwoDimArray.from_packed(TwoDimArray([[1, 2], [3, 4], [5, 6]]).pack())
TwoDimArray(fixed=[[1, 2], [3, 4], [5, 6]])
As with nested structs, a default_factory
must be used to
set a default value. For example:
from dataclasses import field
from typing import Annotated
@dcs.dataclass_struct()
class DefaultArray:
x: Annotated[list[int], 3] = field(default_factory=lambda: [1, 2, 3])
The returned default value's length and type and values of its items will be
validated unless validate_defaults=False
is passed to the dataclass_struct
decorator.
Manual padding
Padding can be manually controlled by annotating a type with
PadBefore
or
PadAfter
:
@dcs.dataclass_struct()
class WithPadding:
# 4 padding bytes will be added before this field
pad_before: Annotated[int, dcs.PadBefore(4)]
# 2 padding bytes will be added before this field
pad_after: Annotated[int, dcs.PadAfter(2)]
# 3 padding bytes will be added before this field and 2 after
pad_before_and_after: Annotated[int, dcs.PadBefore(3), dcs.PadAfter(2)]
A b'\x00'
will be inserted into the packed representation for each padding
byte.
>>> padded = WithPadding(100, 200, 300)
>>> packed = padded.pack()
>>> packed
b'\x00\x00\x00\x00d\x00\x00\x00\xc8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,\x01\x00\x00\x00\x00'
>>> WithPadding.from_packed(packed)
WithPadding(pad_before=100, pad_after=200, pad_before_and_after=300)
Type checking
Mypy
To work correctly with mypy
, an extension is
required; add to your mypy.ini
:
Pyright/Pylance
Due to current limitations, Microsoft's
Pylance
Visual Studio extension and its open-source core
Pyright will report an attribute access
error
on the pack
and from_packed
methods:
import dataclasses_struct as dcs
@dcs.dataclass_struct()
class Test:
x: int
t = Test(10)
t.pack()
# pyright error: Cannot access attribute "pack" for class "Test"
A fix for this is planned in the future. As a workaround in the meantime, you can add stubs for the generated functions and attribute to the class:
from typing import ClassVar, TYPE_CHECKING
from collections.abc import Buffer # import from typing_extensions on Python <3.12
import dataclasses_struct as dcs
@dcs.dataclass_struct()
class Test:
x: int
if TYPE_CHECKING:
__dataclass_struct__: ClassVar[dcs.DataclassStructInternal]
def pack(self) -> bytes: ...
@classmethod
def from_packed(cls, data: Buffer) -> "Test": ...
The DataclassStructProtocol
class can then be used as a type hint where packing/unpacking is required. E.g.