# Serialization Specification _NOTE_: This specification is primarily defined in the context of Rust, but aims to be implementable across different programming languages. ## Definitions - **Variant**: A specific constructor or case of an enum type. - **Variant Payload**: The associated data of a specific enum variant. - **Discriminant**: A unique identifier for an enum variant, typically represented as an integer. - **Basic Types**: Primitive types that have a direct, well-defined binary representation. ## Endianness By default, this serialization format uses little-endian byte order for basic numeric types. This means multi-byte values are encoded with their least significant byte first. Endianness can be configured with the following methods, allowing for big-endian serialization when required: - [`with_big_endian`](https://docs.rs/bincode/2.0.0/bincode/config/struct.Configuration.html#method.with_big_endian) - [`with_little_endian`](https://docs.rs/bincode/2.0.0/bincode/config/struct.Configuration.html#method.with_little_endian) ### Byte Order Considerations - Multi-byte values (integers, floats) are affected by endianness - Single-byte values (u8, i8) are not affected - Struct and collection serialization order is not changed by endianness ## Basic Types ### Boolean Encoding - Encoded as a single byte - `false` is represented by `0` - `true` is represented by `1` - During deserialization, values other than 0 and 1 will result in an error [`DecodeError::InvalidBooleanValue`](https://docs.rs/bincode/2.0.0/bincode/error/enum.DecodeError.html#variant.InvalidBooleanValue) ### Numeric Types - Encoded based on the configured [IntEncoding](#intencoding) - Signed integers use 2's complement representation - Floating point types use IEEE 754-2008 standard - `f32`: 4 bytes (binary32) - `f64`: 8 bytes (binary64) #### Floating Point Special Values - Subnormal numbers are preserved - Also known as denormalized numbers - Maintain their exact bit representation - `NaN` values are preserved - Both quiet and signaling `NaN` are kept as-is - Bit pattern of `NaN` is maintained exactly - No normalization or transformation of special values occurs - Serialization and deserialization do not alter the bit-level representation - Consistent with IEEE 754-2008 standard for floating-point arithmetic ### Character Encoding - `char` is encoded as a 32-bit unsigned integer representing its Unicode Scalar Value - Valid Unicode Scalar Value range: - 0x0000 to 0xD7FF (Basic Multilingual Plane) - 0xE000 to 0x10FFFF (Supplementary Planes) - Surrogate code points (0xD800 to 0xDFFF) are not valid - Invalid Unicode characters can be acquired via unsafe code, this is handled as: - during serialization: data is written as-is - during deserialization: an error is raised [`DecodeError::InvalidCharEncoding`](https://docs.rs/bincode/2.0.0/bincode/error/enum.DecodeError.html#variant.InvalidCharEncoding) - No additional metadata or encoding scheme beyond the raw code point value All tuples have no additional bytes, and are encoded in their specified order, e.g. ```rust let tuple = (u32::min_value(), i32::max_value()); // 8 bytes let encoded = bincode::encode_to_vec(tuple, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 0, 0, 0, 0, // 4 bytes for first type: u32 255, 255, 255, 127 // 4 bytes for second type: i32 ]); ``` ## IntEncoding Bincode currently supports 2 different types of `IntEncoding`. With the default config, `VarintEncoding` is selected. ### VarintEncoding Encoding an unsigned integer v (of any type excepting u8/i8) works as follows: 1. If `u < 251`, encode it as a single byte with that value. 1. If `251 <= u < 2**16`, encode it as a literal byte 251, followed by a u16 with value `u`. 1. If `2**16 <= u < 2**32`, encode it as a literal byte 252, followed by a u32 with value `u`. 1. If `2**32 <= u < 2**64`, encode it as a literal byte 253, followed by a u64 with value `u`. 1. If `2**64 <= u < 2**128`, encode it as a literal byte 254, followed by a u128 with value `u`. `usize` is being encoded/decoded as a `u64` and `isize` is being encoded/decoded as a `i64`. See the documentation of [VarintEncoding](https://docs.rs/bincode/2.0.0/bincode/config/struct.Configuration.html#method.with_variable_int_encoding) for more information. ### FixintEncoding - Fixed size integers are encoded directly - Enum discriminants are encoded as u32 - Lengths and usize are encoded as u64 See the documentation of [FixintEncoding](https://docs.rs/bincode/2.0.0/bincode/config/struct.Configuration.html#method.with_fixed_int_encoding) for more information. ## Enums Enums are encoded with their variant first, followed by optionally the variant fields. The variant index is based on the `IntEncoding` during serialization. Both named and unnamed fields are serialized with their values only, and therefore encode to the same value. ```rust #[derive(bincode::Encode)] pub enum SomeEnum { A, B(u32), C { value: u32 }, } // SomeEnum::A let encoded = bincode::encode_to_vec(SomeEnum::A, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 0, 0, 0, 0, // first variant, A // no extra bytes because A has no fields ]); // SomeEnum::B(0) let encoded = bincode::encode_to_vec(SomeEnum::B(0), bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 1, 0, 0, 0, // second variant, B 0, 0, 0, 0 // B has 1 unnamed field, which is an u32, so 4 bytes ]); // SomeEnum::C { value: 0u32 } let encoded = bincode::encode_to_vec(SomeEnum::C { value: 0u32 }, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 2, 0, 0, 0, // third variant, C 0, 0, 0, 0 // C has 1 named field which is a u32, so 4 bytes ]); ``` ### Options `Option` is always serialized using a single byte for the discriminant, even in `Fixint` encoding (which normally uses a `u32` for discriminant). ```rust let data: Option = Some(123); let encoded = bincode::encode_to_vec(data, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 1, 123, 0, 0, 0 // the Some(..) tag is the leading 1 ]); let data: Option = None; let encoded = bincode::encode_to_vec(data, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 0 // the None tag is simply 0 ]); ``` # Collections ## General Collection Serialization Collections are encoded with their length value first, followed by each entry of the collection. The length value is based on the configured `IntEncoding`. ### Serialization Considerations - Length is always serialized first - Entries are serialized in the order they are returned from the iterator implementation. - Iteration order depends on the collection type - Ordered collections (e.g., `Vec`): Iteration from lowest to highest index - Unordered collections (e.g., `HashMap`): Implementation-defined iteration order - Duplicate keys are not checked in bincode, but may be resulting in an error when decoding a container from a list of pairs. ### Handling of Specific Collection Types #### Linear Collections (`Vec`, Arrays, etc.) - Serialized by iterating from lowest to highest index - Length prefixed - Each item serialized sequentially ```rust let list = vec![0u8, 1u8, 2u8]; let encoded = bincode::encode_to_vec(list, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 3, 0, 0, 0, 0, 0, 0, 0, // length of 3u64 0, // entry 0 1, // entry 1 2, // entry 2 ]); ``` #### Key-Value Collections (`HashMap`, etc.) - Serialized as a sequence of key-value pairs - Iteration order is implementation-defined - Each entry is a tuple of (key, value) ### Special Collection Considerations - Bincode will serialize the entries based on the iterator order. - Deserialization is deterministic but the collection implementation might not guarantee the same order as serialization. **Note**: Fixed-length arrays do not have their length encoded. See [Arrays](#arrays) for details. # String and &str ## Encoding Principles - Strings are encoded as UTF-8 byte sequences - No null terminator is added - No Byte Order Mark (BOM) is written - Unicode non-characters are preserved ### Encoding Details - Length is encoded first using the configured `IntEncoding` - Raw UTF-8 bytes follow the length - Supports the full range of valid UTF-8 sequences - `U+0000` and other code points can appear freely within the string ### Unicode Handling - During serialization, the string is encoded as a sequence of the given bytes. - Rust strings are UTF-8 encoded by default, but this is not enforced by bincode - No normalization or transformation of text - If an invalid UTF-8 sequence is encountered during decoding, an [`DecodeError::Utf8`](https://docs.rs/bincode/2.0.0/bincode/error/enum.DecodeError.html#variant.Utf8) error is raised ```rust let str = "Hello 🌍"; // Mixed ASCII and Unicode let encoded = bincode::encode_to_vec(str, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 10, 0, 0, 0, 0, 0, 0, 0, // length of the string, 10 bytes b'H', b'e', b'l', b'l', b'o', b' ', 0xF0, 0x9F, 0x8C, 0x8D // UTF-8 encoded string ]); ``` ### Comparison with Other Types - Treated similarly to `Vec` in serialization - See [Collections](#collections) for more information about length and entry encoding # Arrays Array length is never encoded. Note that `&[T]` is encoded as a [Collection](#collections). ```rust let arr: [u8; 5] = [10, 20, 30, 40, 50]; let encoded = bincode::encode_to_vec(arr, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 10, 20, 30, 40, 50, // the bytes ]); ``` This applies to any type `T` that implements `Encode`/`Decode` ```rust #[derive(bincode::Encode)] struct Foo { first: u8, second: u8 }; let arr: [Foo; 2] = [ Foo { first: 10, second: 20, }, Foo { first: 30, second: 40, }, ]; let encoded = bincode::encode_to_vec(&arr, bincode::config::legacy()).unwrap(); assert_eq!(encoded.as_slice(), &[ 10, 20, // First Foo 30, 40, // Second Foo ]); ``` ## TupleEncoding Tuple fields are serialized in first-to-last declaration order, with no additional metadata. - No length prefix is added - Fields are encoded sequentially - No padding or alignment adjustments are made - Order of serialization is deterministic and matches the tuple's declaration order ## StructEncoding Struct fields are serialized in first-to-last declaration order, with no metadata representing field names. - No length prefix is added - Fields are encoded sequentially - No padding or alignment adjustments are made - Order of serialization is deterministic and matches the struct's field declaration order - Both named and unnamed fields are serialized identically ## EnumEncoding Enum variants are encoded with a discriminant followed by optional variant payload. ### Discriminant Allocation - Discriminants are automatically assigned by the derive macro in declaration order - First variant starts at 0 - Subsequent variants increment by 1 - Explicit discriminant indices are currently not supported - Discriminant is always represented as a `u32` during serialization. See [Discriminant Representation](#discriminant-representation) for more details. - Maintains the original enum variant semantics during encoding ### Variant Payload Encoding - Tuple variants: Fields serialized in declaration order - Struct variants: Fields serialized in declaration order - Unit variants: No additional data encoded ### Discriminant Representation - Always encoded as a `u32` - Encoding method depends on the configured `IntEncoding` - `VarintEncoding`: Variable-length encoding - `FixintEncoding`: Fixed 4-byte representation ### Handling of Variant Payloads - Payload is serialized immediately after the discriminant - No additional metadata about field names or types - Payload structure matches the variant's definition