Jax Archive
Jax (or Jenny's Archive eXchange) is an awful backronym and a simple archive format intended to be easy and cheap (processing power-wise) to parse. Structurally, it is similar to tar
, while not being nearly as wasteful. Instead of fixed-length fields, it uses Pascal-style strings for archive names and descriptions, as well as an additional 'descriptor' field. The 'descriptor' is a simple key/value style field for additional metadata.
Magic Number
Jax-format archives all begin with the bytes 5E 6A 61 78
(^jax
in ASCII.)
Records
Jax archives are structured as follows:
- Magic number
- Record for File 1
- Data for File 1
- Record for File 2
- Data for File 2
- ...
Records are not aligned to any particular byte boundary and neither are the files within the archive, in the interest of creating smaller archives without the need for compression.
Records are structured as follows. Note that a char
is an 8-bit unsigned integer, a long
is a 64-bit signed integer, a ulong
is a 64-bit unsigned integer, and a ushort
is a 16-bit unsigned integer.
Type | Name | Details |
---|---|---|
char |
Record Type | d for directory, s for symbolic link, and r for a regular file.
|
string |
Record Name | Pascal-style, see Strings |
string |
Description | Also Pascal-style, see Descriptors for parsing information |
ulong |
Timestamp | UNIX timestamp of last change, seconds since 12:00 AM or 00:00 on January 1st, 1970 |
ushort |
File Mode | UNIX file mode, `644` by default |
ushort |
File Owner | UNIX file owner ID, `0` by default |
ushort |
File Group | UNIX file group ID, `0` by default |
long |
Size | File size, in bytes. |
After the record, the file's content begins. If you just want to read the next record, you can skip `Size` bytes ahead and begin reading from there. Note that the archive's magic number is only seen once at the very beginning and does not repeat per-record.
Strings
Strings are encoded in Pascal-style, meaning they are prefaced by their length. The length is encoded as an unsigned 16-bit integer (`ushort`). Strings must be encoded as UTF-8.
Descriptors
Descriptors are stored as Pascal-style strings, but they have a strict format. Thankfully, the format is fairly easy to both assemble and parse.
Descriptors are an extensible key/value format for extra file metadata. They look like this:
key1=value1;key2=value2;bool1;bool2=false
All keys and values are strings. Key/value pairs are separated by the semicolon (;
, U+003B
). The end of a key and the beginning of a value is denoted by the equals sign (=
, U+003E
).
Key/value pairs may be missing the value (see `bool1` above). In this case, it must be assumed that the value is true
. A pair must contain at least the key.
A trailing semicolon is not required. All of the following are valid:
key1=value1;key2=value2;bool1;bool2=false;
key1=value1;key2=value2;bool1=true;bool2=false;
key1=value1;key2=value2;bool1;bool2=false
key1=value1;key2=value2;bool1=true;bool2=false
Storing a semicolon (;
) or equals sign (=
) in a key or value is done by encoding it as a URL entity (%3b
for semicolons, %3e
for equals signs. This is case sensitive.) Note that no other URL entities are to be parsed or encoded.
The following is an example of an encoded equals sign in a value:
einstein-relativity=e%3emc^2;
By storing information in a simple dictionary, the Jax archive format is extensible with up to 65,536 bytes of additional metadata. Take the following representation of a partial Jax entry (in pseudocode):
[ type = 'r',
name = 'Test File.001',
description = ['nextFile': 'Test File.002',
'ownerName': 'John Doe',
'groupName': 'wheel' ],
size = 512000,
uid = 6756,
group = 12 ]
This example is by no means part of the Jax standard, but it demonstrates a potential use case for the extensible key/value store.
Utilities
There is no official implementation of Jax yet. The cenix project contains a Jax implementation and utility capable of reading and writing Jax archives. It is available on GitHub.