Jax Archive

From DisNCord Community Wiki
Jump to navigation Jump to search

Jax (or Jenny's Archive eXchange) is an awful backronym and a simple archive format intended to be easy and cheap (processing power-wise) to parse. Structurally, it is similar to tar, while not being nearly as wasteful. Instead of fixed-length fields, it uses Pascal-style strings for archive names and descriptions, as well as an additional 'descriptor' field. The 'descriptor' is a simple key/value style field for additional metadata.

Magic Number

Jax-format archives all begin with the bytes 5E 6A 61 78 (^jax in ASCII.)

Records

Jax archives are structured as follows:

  • Magic number
  • Record for File 1
  • Data for File 1
  • Record for File 2
  • Data for File 2
  • ...

Records are not aligned to any particular byte boundary and neither are the files within the archive, in the interest of creating smaller archives without the need for compression.

Records are structured as follows. Note that a char is an 8-bit unsigned integer, a long is a 64-bit signed integer, a ulong is a 64-bit unsigned integer, and a ushort is a 16-bit unsigned integer.

Jax archive record structure
Type Name Details
char Record Type d for directory, s for symbolic link, and r for a regular file.
string Record Name Pascal-style, see Strings
string Description Also Pascal-style, see Descriptors for parsing information
ulong Timestamp UNIX timestamp of last change, seconds since 12:00 AM or 00:00 on January 1st, 1970
ushort File Mode UNIX file mode, `644` by default
ushort File Owner UNIX file owner ID, `0` by default
ushort File Group UNIX file group ID, `0` by default
long Size File size, in bytes.

After the record, the file's content begins. If you just want to read the next record, you can skip `Size` bytes ahead and begin reading from there. Note that the archive's magic number is only seen once at the very beginning and does not repeat per-record.

Strings

Strings are encoded in Pascal-style, meaning they are prefaced by their length. The length is encoded as an unsigned 16-bit integer (`ushort`). Strings must be encoded as UTF-8.

Descriptors

Descriptors are stored as Pascal-style strings, but they have a strict format. Thankfully, the format is fairly easy to both assemble and parse.

Descriptors are an extensible key/value format for extra file metadata. They look like this:

key1=value1;key2=value2;bool1;bool2=false

All keys and values are strings. Key/value pairs are separated by the semicolon (;, U+003B). The end of a key and the beginning of a value is denoted by the equals sign (=, U+003E).

Key/value pairs may be missing the value (see `bool1` above). In this case, it must be assumed that the value is true. A pair must contain at least the key.

A trailing semicolon is not required. All of the following are valid:

key1=value1;key2=value2;bool1;bool2=false;
key1=value1;key2=value2;bool1=true;bool2=false;
key1=value1;key2=value2;bool1;bool2=false
key1=value1;key2=value2;bool1=true;bool2=false

Storing a semicolon (;) or equals sign (=) in a key or value is done by encoding it as a URL entity (%3b for semicolons, %3e for equals signs. This is case sensitive.) Note that no other URL entities are to be parsed or encoded.

The following is an example of an encoded equals sign in a value:

einstein-relativity=e%3emc^2;

By storing information in a simple dictionary, the Jax archive format is extensible with up to 65,536 bytes of additional metadata. Take the following representation of a partial Jax entry (in pseudocode):

[ type = 'r',
  name = 'Test File.001',
  description = ['nextFile':  'Test File.002',
                 'ownerName': 'John Doe',
                 'groupName': 'wheel' ],
  size = 512000,
  uid = 6756,
  group = 12 ]

This example is by no means part of the Jax standard, but it demonstrates a potential use case for the extensible key/value store.

Utilities

There is no official implementation of Jax yet. The cenix project contains a Jax implementation and utility capable of reading and writing Jax archives. It is available on GitHub.