205 lines
8.5 KiB
Plaintext
205 lines
8.5 KiB
Plaintext
|
|
Description of libisofs MD5 checksumming
|
|
|
|
by Thomas Schmitt - mailto:scdbackup@gmx.net
|
|
Libburnia project - mailto:libburn-hackers@pykix.org
|
|
16 Aug 2009
|
|
|
|
|
|
MD5 is a 128 bit message digest with a very low probability to be the same for
|
|
any pair of differing data files. It is described in RFC 1321. and can be
|
|
computed e.g. by program md5sum.
|
|
|
|
libisofs can equip its images with MD5 checksums for superblock, directory
|
|
tree, the whole session, and for each single data file.
|
|
See libisofs.h, iso_write_opts_set_record_md5().
|
|
|
|
The data file checksums get loaded together with the directory tree if this
|
|
is enabled by iso_read_opts_set_no_md5(). Loaded checksums can be inquired by
|
|
iso_image_get_session_md5() and iso_file_get_md5().
|
|
|
|
Stream recognizable checksum tags occupy exactly one block each. They can
|
|
be detected by submitting a read-in block to iso_util_decode_md5_tag().
|
|
|
|
libisofs has own MD5 computation functions:
|
|
iso_md5_start(), iso_md5_compute(), iso_md5_clone(), iso_md5_end()
|
|
|
|
|
|
Representation in the Image
|
|
|
|
The checksums are stored as stream recognizable checksum tags and as a compact
|
|
array at the end of the session. The latter allows to quickly load many
|
|
file checksums from media with slow random access.
|
|
|
|
|
|
The Checksum Array
|
|
|
|
Location and layout of the checksum array is recorded as AAIP attribute
|
|
"isofs.ca" of the root node.
|
|
See doc/susp_aaip_2_0.txt for a general description of AAIP and
|
|
doc/susp_aaip_isofs_names.txt for the layout of "isofs.ca".
|
|
|
|
The single data files hold an index to their MD5 checksum in individual AAIP
|
|
attributes "isofs.cx". Index I means: array base address + 16 * I.
|
|
|
|
If there are N checksummed data files then the array consists of N + 2 entries
|
|
with 16 bytes each.
|
|
|
|
Entry number 0 holds a session checksum which covers the range from the session
|
|
start block up to (but not including) the start block of the checksum area.
|
|
This range is described by attribute "isofs.ca" of the root node.
|
|
|
|
Entries 1 to N hold the checksums of individual data files.
|
|
|
|
Entry number N + 1 holds the MD5 checksum of entries 0 to N.
|
|
|
|
|
|
The Checksum Tags
|
|
|
|
Because the inquiry of AAIP attributes demands loading of the image tree,
|
|
there are also checksum tags which can be detected on the fly when reading
|
|
and checksumming the session from the start point as learned from a media
|
|
table-of-content.
|
|
|
|
The superblock checksum tag is written after the ECMA-119 volume descriptors.
|
|
The tree checksum tag is written after the ECMA-119 directory entries.
|
|
The session checksum tag is written after all payload including the checksum
|
|
array. (Then follows eventual padding.)
|
|
|
|
The tags are a single lines of printable text, padded by 0 bytes. They have
|
|
the following format:
|
|
|
|
Tag_id pos=# range_start=# range_size=# [next=#] md5=# self=#\n
|
|
|
|
Tag_id distinguishes the three tag types
|
|
"libisofs_sb_checksum_tag_v1" Superblock tag
|
|
"libisofs_tree_checksum_tag_v1" Directory tree tag
|
|
"libisofs_checksum_tag_v1" Session tag
|
|
|
|
|
|
Example (session starts at at Logical Block Address 32):
|
|
|
|
<... ECMA-119 System Area and Volume Descriptors ...>
|
|
libisofs_sb_checksum_tag_v1 pos=50 range_start=32 range_size=18 md5=17471035f1360a69eedbd1d0c67a6aa2 self=52d602210883eeababfc9cd287e28682
|
|
<... ECMA-119 Directory Entries ...>
|
|
libisofs_tree_checksum_tag_v1 pos=334 range_start=32 range_size=302 md5=41acd50285339be5318decce39834a45 self=fe100c338c8f9a494a5432b5bfe6bf3c
|
|
<... Data file payload and checksum array ...>
|
|
libisofs_checksum_tag_v1 pos=81554 range_start=32 range_size=81522 md5=8adb404bdf7f5c0a078873bb129ee5b9 self=57c2c2192822b658240d62cbc88270cb
|
|
|
|
|
|
There are five tag parameters. The first three are decimal numbers, the others
|
|
are strings of 32 hex digits:
|
|
|
|
pos=
|
|
gives the block address where the tag supposes itself to be stored.
|
|
If this does not match the block address where the tag is found then this
|
|
either indicates that the tag is payload of the image or that the image has
|
|
been relocated. (The latter makes the image unusable.)
|
|
|
|
range_start=
|
|
The block address where the session is supposed to start. If this does not
|
|
match the session start on media then the volume descriptors of the
|
|
image have been relocated. (This can happen with overwriteable media. If
|
|
checksumming started at LBA 0 and finds range_start=32, then one has to
|
|
restart checksumming at LBA 32. See libburn/doc/cookbook.txt
|
|
"ISO 9660 multi-session emulation on overwriteable media" for background
|
|
information.)
|
|
|
|
range_size=
|
|
The number of blocks beginning at range_start which are covered by the
|
|
checksum of the tag.
|
|
|
|
Only with superblock tag and tree tag:
|
|
next=
|
|
The block address where the next tag is supposed to be found. This is
|
|
to avoid the small possibility that a checksum tag with matching position
|
|
is part of a directory entry or data file. The superblock tag is quite
|
|
uniquely placed directly after the ECMA-119 Volume Descriptor Set Terminator
|
|
where no such cleartext is supposed to reside by accident.
|
|
|
|
md5=
|
|
The checksum payload of the tag as lower case hex digits.
|
|
|
|
self=
|
|
The MD5 checksum of the tag itself up to and including the last hex digit of
|
|
parameter "md5=".
|
|
|
|
The newline character at the end is mandatory. For now all bytes of the
|
|
block after that newline shall be zero. There may arise future extensions.
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
|
Usage at Read Time
|
|
|
|
Checking Before Image Tree Loading
|
|
|
|
In order to check for a trustworthy loadable image tree, read the first 32
|
|
blocks from to the session start and look in block 16 to 32 for the superblock
|
|
checksum tag by
|
|
iso_util_decode_md5_tag(block, &tag_type, &pos,
|
|
&range_start, &range_size, &next_tag, md5, 2);
|
|
If it appears and has plausible parameters, then check whether its MD5 matches
|
|
the MD5 of the data blocks which were read before.
|
|
(Keep the original MD5 context of the data blocks and clone one for obtaining
|
|
the MD5 bytes.)
|
|
Compute the block into the MD5 checksum after your are done with interpreting
|
|
it.
|
|
|
|
If those MD5s match, then compute the checksum block into the kept MD5 context
|
|
and go on with reading and computing for the tree checksum tag. This will be
|
|
found at block address next_tag, verified and parsed by:
|
|
iso_util_decode_md5_tag(block, &tag_type, &pos,
|
|
&range_start, &range_size, &next_tag, md5, 3);
|
|
Again, if the parameters match the reading state, the MD5 must match the
|
|
MD5 computed from the data blocks which were before.
|
|
If so, then the tree is ok and safe to be loaded by iso_image_import().
|
|
|
|
|
|
Checking a Whole Session
|
|
|
|
In order to check the trustworthyness of a whole session, continue reading
|
|
and checksumming after the tree was verified.
|
|
|
|
Read and checksum the blocks. When reaching block address next_tag (from the
|
|
tree tag) submit this block to
|
|
|
|
iso_util_decode_md5_tag(block, &tag_type, &pos,
|
|
&range_start, &range_size, &next_tag, md5, 1);
|
|
|
|
If this returns 1, then check whether the returned parameters pos, range_start,
|
|
and range_size match the state of block reading, and whether the returned
|
|
bytes in parameter md5 match the MD5 computed from the data blocks which were
|
|
read before the tag block.
|
|
|
|
|
|
Checking Single Files in a Loaded Image
|
|
|
|
Once the image has been loaded, you can obtain MD5 sums from IsoNode objects
|
|
which fulfill
|
|
iso_node_get_type(node) == LIBISO_FILE
|
|
|
|
The recorded checksum can be obtained by
|
|
iso_file_get_md5(image, (IsoFile *) node, md5, 0);
|
|
|
|
For accessing the file data in the loaded image use
|
|
iso_file_get_stream((IsoFile *) node);
|
|
to get the data stream of the object.
|
|
The checksums cover the data content as it was actually written into the ISO
|
|
image stream, not necessarily as it was on hard disk before or afterwards.
|
|
This implies that content filtered files bear the MD5 of the filtered data
|
|
and not of the original files on disk. When checkreading, one has to avoid
|
|
any reverse filtering. Dig out the stream which directly reads image data
|
|
by calling iso_stream_get_input_stream() until it returns NULL and use
|
|
iso_stream_get_size() rather than iso_file_get_size().
|
|
|
|
Now you may call iso_stream_open(), iso_stream_read(), iso_stream_close()
|
|
for reading file content from the loaded image.
|
|
|
|
|
|
Session Check in a Loaded Image
|
|
|
|
iso_image_get_session_md5() gives start LBA and session payload size as of
|
|
"isofs.ca" and the session checksum as of the checksum array.
|
|
|