libisofs/doc/zisofs2_format.txt

240 lines
8.6 KiB
Plaintext

Description of the zisofs2 Format
Revision 2.0-dev
as of zisofs2-tools by
Valentín KIVACHUK BURDÁ and Thomas SCHMITT
1 Oct 2020
The zisofs2 format was invented by Valentín KIVACHUK BURDÁ and
Thomas SCHMITT (as extension of zisofs by H. Peter Anvin). It compresses
data file content, marks it by a header and provides a pointer array for
coarse random access. Within a RRIP enhanced ISO 9660 image the format
is additionally marked by a System Use entry with signature "ZF".
The uncompressed size of a single zisofs2 compressed file is restricted
to 2^64 - 1 bytes. Larger files shall not be compressed.
The format of version 1 of zisofs is supported by this specification.
Using it for files with uncompressed size smaller than 4 GiB is friendly
towards software which does not know about zisofs2.
See section **LEGACY** for a summary of version 1 of zisofs.
Data Types
ISO 9660:7.3.1 - little endian 4-byte words
ISO 9660:7.1.1 - unsigned single bytes
ISO 9660:7.3.3 - 8-bytes value, first in little endian, then big endian.
#uint64 - 8-bytes unsigned value in little endian
Supported compressors
The file header has this layout:
@alg_id @alg_char Description
1 'PZ' (50)(5A) Zlib
2 'XZ' (78)(7A) XZ
3 'L4' (6C)(34) LZ4
4 'ZD' (7A)(64) Zstandard
5 'B2' (62)(32) Bzip2
@alg_id is a 7.1.1 value. @alg_char is 2 ASCII characters stored as 2 bytes
Values of @alg_id = 0 and @alg_char = 'pz'(70)(7A) are reserved and
must not be used. Other compressors are allowed and may be added to this
list in the future
Compressor strategy
The default strategy for a compressor is to compress each input data block
independently. The zisofs2 spec may define in the future other strategies,
which will have a new @alg_id, @alg_char and a description in this section.
File Header
The file header has this layout:
Offset Type Identifier Contents
--------------------------------------------------------------------------
0 (8 bytes) @hdr_magic Magic num (EF 22 55 A1 BC 1B 95 A0)
8 7.1.1 @hdr_version File header version (0)
9 7.1.1 @hdr_size header_size >> 2 (6)
10 7.1.1 @alg_id Algorithm Type (>=1)
11 7.1.1 @hdr_bsize log2(block_size) (15, 16, or 17)
12 #uint64 @size Uncompressed file size
20 (4 bytes) - Padding. Ignored
So its size is 24.
Readers shall be able to handle log2(block_size) values 15, 16 and 17
i.e. block sizes 32 kB, 64 kB, and 128 kB. Writers must not use
other sizes.
Block Pointers
There are ceil(input_size / block_size) input resp. output blocks.
Each input block is of fixed size whereas the output blocks have varying
size (down to 0). For each output block there is an offset pointer giving
its byte address in the overall file content. The next block pointer in the
array tells the start of the next block which begins immediately after the
end of its predecessor. A final pointer (*eob*) gives the first invalid
byte address and thus marks the end of the last block.
So there are ceil(input_size / block_size) + 1 block pointers.
They are stored directly after the file header, i.e. beginning at byte 24,
as an array of values in #uint64 format (8 bytes).
Legacy format (zisofs) may be used, which is described in section *LEGACY*
Data Part
The data part begins immediately after the pointer array (*eob*). In
principle it consists of the variable length output blocks as delivered by
different compression algorithms when fed with the fixed size input blocks.
A special case of input and output block is defined:
Zero-length blocks represent a block full of 0-bytes.
Such input blocks do not get processed by compress2() but shall be mapped
to 0-sized output directly. Vice versa 0-sized blocks have to bypass
uncompress() when being read.
ZF System Use Entry Format
The ZF entry follows the general layout of SUSP resp. RRIP.
Its fields are:
[1] "BP 1 to BP 2 - Signature Word" shall be (5A)(46) ("ZF").
[2] "BP 3 - Length" shall specify as an 8-bit number the length in
bytes of the ZF entry recorded according to ISO 9660:7.1.1.
This length is 16 decimal.
Refer to **LEGACY**
[3] "BP 4 - System Use Entry Version" shall be 2 as in ISO 9660:7.1.1.
Refer to **LEGACY**
[4] "BP 5 to BP 6 - Algorithm" shall be two chars to indicate the
compression algorithm. For example, (50)(5A) ("PZ")
(This is a copy of @alg_char). Refer to **LEGACY**
[5] "BP 7 - Header Size Div 4" shall specify as an 8-bit number the
number of 4-byte words in the header part of the file data recorded
according to ISO 9660:7.1.1.
(This is a copy of @hdr_size).
[6] "BP 8 - Log2 of Block Size" shall specify as an 8-bit number the
binary logarithm of the compression block size recorded according to
ISO 9660:7.1.1.
(This is a copy of header byte 13 (@hdr_bsize), resp. header BP 14.
The value has to be 15, 16 or 17 i.e. 32 kiB, 64 kiB, or 128 kiB.)
[7] "BP 9 to BP 16 - Virtual Uncompressed File Size" shall contain
as a 64-bit unsigned little endian number the uncompressed
file size represented by the given extent. Refer to **LEGACY**
| 'Z' | 'F' | LENGTH | 2 | 'P' | 'Z' | HEADER SIZE DIV 4 |
| LOG2 BLOCK SIZE | UNCOMPRESSED SIZE |
Example (block size 128 kiB, uncompressed file size = 40 TB):
{ 'Z', 'F', 16, 2, 'P', 'Z', 8, 17,
0x00, 0x80, 0xCA, 0x39, 0x61, 0x24, 0x00, 0x00 }
**LEGACY**
zisofs2 supports old readers by respecting the zisofs format. This section
describes which definitions from zisofs2 must change to be compatible
with zisofs.
- General behaviour
The uncompressed size of a single zisofs compressed file is restricted
to 4 GiB - 1. Larger files shall not be compressed.
- Supported algorithms
Only algorithm Zlib with default strategy is supported.
- The file header must follow this structure:
Offset Type Identifier Contents
0 (8 bytes) @hdr_magic Magic number (37 E4 53 96 C9 DB D6 07)
8 7.3.1 @size Uncompressed file size
12 7.1.1 @hdr_size header_size >> 2 (4)
13 7.1.1 @hdr_bsize log2(block_size) (15, 16, or 17)
14 (2 bytes) - Reserved, must be zero
So its size is 16.
- Block pointers
The array must use ISO 9660:7.3.1 (4 bytes) values.
- ZF entry
Its fields are:
[1] "BP 1 to BP 2 - Signature Word" shall be (5A)(46) ("ZF").
[2] "BP 3 - Length" must be 16 decimal.
[3] "BP 4 - System Use Entry Version" must be 1.
[4] "BP 5 to BP 6 - Algorithm" must be (70)(7A) ("pz").
[5] "BP 7 - Header Size Div 4" - same as zisofs2.
[6] "BP 8 - Log2 of Block Size" - same as zisofs2.
[7] "BP 9 to BP 16 - Uncompressed Size" This field shall be recorded
according to ISO 9660:7.3.3.
(This number is the same as @size )
| 'Z' | 'F' | LENGTH | 1 | 'p' | 'z' | HEADER SIZE DIV 4 |
| LOG2 BLOCK SIZE | UNCOMPRESSED SIZE |
Example (block size 32 kiB, uncompressed file size = 1,234,567 bytes):
{ 'Z', 'F', 16, 1, 'p', 'z', 4, 15,
0x87, 0xD6, 0x12, 0x00, 0x00, 0x12, 0xD6, 0x87 }
References:
zisofs2-tools
https://github.com/vk496/zisofs2-tools
zisofs-tools
http://freshmeat.net/projects/zisofs-tools/
zlib:
/usr/include/zlib.h
cdrtools with mkisofs
ftp://ftp.berlios.de/pub/cdrecord/alpha
ECMA-119 aka ISO 9660
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-119.pdf
SUSP 1.12
ftp://ftp.ymi.com/pub/rockridge/susp112.ps
RRIP 1.12
ftp://ftp.ymi.com/pub/rockridge/rrip112.ps
---------------------------------------------------------------------------
This text is under
Copyright (c) 2009 - 2010, 2020 Thomas SCHMITT <scdbackup@gmx.net>
Copyright (c) 2020 - Valentín KIVACHUK BURDÁ <vk18496@gmail.com>
It shall reflect the effective technical specifications as implemented in
zisofs2-tools and the Linux kernel. So please contact mailing list
<bug-xorriso@gnu.org> or to the copyright holders in private, if you
want to make changes.
Only if you cannot reach the copyright holder for at least one month it is
permissible to modify and distribute this text under the license "GPLv3".