APX VM 2.0 (draft)
This is the instruction set for APX Virtual Machine v2.0. It’s a special kind of VM which sole purpose is used for data serialization.
The VM can execute in two different modes:
- Pack — Serialize an in-memory data structure (variant type) into a byte-array.
- Unpack — Deserialize a byte-array back into an in-memory data structure.
The data structure is assumed to be compatible with how most scripting language already represent variables in-memory.
- Scalar (int, string, boolean)
- Array (each value can be scalar, array, or record).
- Record/Struct (each value can be a scalar, array or record).
In many programming languages, this way of storing data is called variant or variant type. Its representation is compatible with JSON.
Different APX implementations use different methods of data representation.
Examples:
Language | Data Representation |
---|---|
C | dtl_type |
C++ | dtl (for C++) |
Python | Python native types (int, list, dict etc.) |
VBA | Variant, Scripting.Dictionary |
APX Programs
An APX program is a simple byte array containing serialized (binary) instructions how to pack or unpack a data element in a port. The programs are compiled once and then executed many times by the APX Virtual Machine (APX VM) when needed.
Program encoding
APX uses a variant length program encoding where each instruction always starts with a 1 byte instruction header. Depending on the instruction header, additional data (usually array lengths) are encoded directly after instuction header. The next instruction starts immediately after the end of the previous instruction (with zero padding, zero alignment).
Before the list of instructions starts, there is a small program header which tells whether this is a pack or unpack program (with some additional information).
Program header
All programs starts with an 10 byte program header. Instructions for the program directly follows the program header.
Byte | Description | Value |
---|---|---|
Byte 0 | Magic Number 0 (letter ‘A’) | 0x41 |
Byte 1 | Magic Number 1 (letter ‘P’) | 0x50 |
Byte 2 | Magic Number 2 (letter ‘X’) | 0x58 |
Byte 3 | APX VM major version number | 2 |
Byte 4 | APX VM minor version number | 0 |
Byte 5 (high nibble) | Program Flags | 0..15 |
Byte 5 (low nibble) | Program Type (PACK or UNPACK) | 0..1 |
Byte 6-9 (little endian) | Maximum expected data size | 0..(2^32)-1 |
Program Flags
Flag | Name | Description |
---|---|---|
0x10 | DYNAMIC_DATA | Active if there are dynamic arrays anywhere in the program |
0x20 | QUEUED_DATA | Active if this is a queued port |
0x40 | RESERVED | |
0x80 | RESERVED |
Instruction set
Each instruction starts with a one-byte header which is also the length of most instructions. Some instructions have additional data after the header. There are currently 6 kinds of instructions (called opcodes). Each opcode comes with up to 16 different variants. Finally, there is a context-dependant flag bit which can be set. In total, this gives us up to 6*16*2 or 192 different instruction choices.
Instruction Header Format:
bit 7 | bits 3-6 | bits 0-2 |
---|---|---|
1 flag bit | 4 variant bits | 3 opcode bits |
Opcode 0 — UNPACK
Unpack data from input buffer. If flag bit is set is an indication that data is of array type and next instruction must be an ARRAY instruction.
Variant ID | Data Type | Data Size (bytes) | Description |
---|---|---|---|
0 | UINT8 | 1 | 8-bit unsigned integer |
1 | UINT16 | 2 | 16-bit unsigned integer |
2 | UINT32 | 4 | 32-bit unsigned integer |
3 | UINT64 | 8 | 64-bit unsigned integer |
4 | SINT8 | 1 | 8-bit signed integer |
5 | SINT16 | 2 | 16-bit signed integer |
6 | SINT32 | 4 | 32-bit signed integer |
7 | SINT64 | 8 | 64-bit signed integer |
8 | ARRAY | N/A | Array value (usually array of records) |
9 | RECORD | N/A | Record value |
10 | BOOL | 1 | Boolean value |
11 | BYTES | 1 | A Bytes object (blob) |
12 | STR | 1 | String |
13 | RESERVED | 1 | Null-terminated string |
14 | RESERVED | ||
15 | RESERVED |
Opcode 1 — PACK
Pack data to output buffer. Variants and use of flag bit is identical to UNPACK instruction.
Opcode 2 — DATA_SIZE
Used to describe a data size (array or element). Array sizes are used to describe length of arrays while elements sizes are used only together with queued ports. This instruction is used to encode an actual integer into the program. The type and size of the integer is described by this instruction header (see variant). This information is used by the VM to decode the integer which is stored as the next byte (or bytes) directly following this header.
ARRAY_SIZE: If flag bit is 0 the array is of fixed length. If the flag bit is 1, this indicates a dynamic array. In that case the current array length is read from the data buffer while (at the same time) the maximum array length is read from next program byte (or bytes).
ELEMENT_SIZE: Flag is not yet in use.
Variant ID | Data Type | Integer Size (bytes) | Description |
---|---|---|---|
0 | ARRAY_SIZE_U8 | 1 | Used when array length is 0-255 |
1 | ARRAY_SIZE_U16 | 2 | Used when array length is 256-65535 |
2 | ARRAY_SIZE_U32 | 4 | Used when array length is bigger than 65535 |
3 | ELEMENT_SIZE_U8 | 1 | Used when element size is 0-255 bytes |
4 | ELEMENT_SIZE_U16 | 2 | Used when element size is 256-65535 bytes |
5 | ELEMENT_SIZE_U32 | 4 | Used when element size is bigger than 65535 bytes |
Opcode 3 — DATA_CTRL
Controls how data is read/written.
Variant ID | Data Type | Bytes after header |
---|---|---|
0 | RECORD_SELECT | Variable |
1 | LIMIT_CHECK_U8 | 2 |
2 | LIMIT_CHECK_U16 | 4 |
3 | LIMIT_CHECK_U32 | 8 |
4 | LIMIT_CHECK_U64 | 16 |
5 | LIMIT_CHECK_S8 | 2 |
6 | LIMIT_CHECK_S16 | 4 |
7 | LIMIT_CHECK_S32 | 8 |
8 | LIMIT_CHECK_S64 | 16 |
Variant 0 — RECORD_SELECT
Selects the name of next record member. The name is a null-terminated string encoded into the program starting at the next byte (and ends when null-terminator is encountered).
When flag is 1 it means that this is going to be the last field of the record.
Variants 1-8 — LIMIT_CHECK
A limit check encodes two integers directly after the instruction header. The first integer is the lower limit while the second integer is the upper limit. The instruction variant determines both signedness and size of the integers (same encoding applies to both).
When flag is 1 it means that the data where the limit check applies is an array and the limit check shall be applied to all elements.
In pack-programs the limit check is encoded just before the actual pack instruction (to prevend illegal data to be serialized) while in unpack-programs the limit check is encoded after the unpack instruction (data needs to be unpacked before it can be checked).
Opcode 4 — FLOW_CTRL
Controls program flow.
Variant ID | Data Type |
---|---|
0 | ARRAY_NEXT |
Variant 0 — ARRAY_NEXT
Move to the next array element. This is primarily used when data type is array of records (array of structs).
Opcode 5 — UNPACK2
Reserved for 16 future data types.
Opcode 6 — PACK2
Reserved for 16 future data types.
OpCode 7 — RESERVED
Reserved for future use.