You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
266 lines
11 KiB
266 lines
11 KiB
|
|
Description of libisofs MD5 checksumming |
|
|
|
by Thomas Schmitt - mailto:scdbackup@gmx.net |
|
Libburnia project - mailto:libburn-hackers@pykix.org |
|
16 Aug 2009 |
|
|
|
|
|
MD5 is a 128 bit message digest with a very low probability to be the same for |
|
any pair of differing data files. It is described in RFC 1321. and can be |
|
computed e.g. by program md5sum. |
|
|
|
libisofs can equip its images with MD5 checksums for superblock, directory |
|
tree, the whole session, and for each single data file. |
|
See libisofs.h, iso_write_opts_set_record_md5(). |
|
|
|
The data file checksums get loaded together with the directory tree if this |
|
is enabled by iso_read_opts_set_no_md5(). Loaded checksums can be inquired by |
|
iso_image_get_session_md5() and iso_file_get_md5(). |
|
|
|
Stream recognizable checksum tags occupy exactly one block each. They can |
|
be detected by submitting a block to iso_util_decode_md5_tag(). |
|
|
|
libisofs has own MD5 computation functions: |
|
iso_md5_start(), iso_md5_compute(), iso_md5_clone(), iso_md5_end(), |
|
iso_md5_match() |
|
|
|
|
|
Representation in the Image |
|
|
|
There may be several stream recognizable checksum tags and a compact array |
|
of MD5 items at the end of the session. The latter allows to quickly load many |
|
file checksums from media with slow random access. |
|
|
|
|
|
The Checksum Array |
|
|
|
Location and layout of the checksum array is recorded as AAIP attribute |
|
"isofs.ca" of the root node. |
|
See doc/susp_aaip_2_0.txt for a general description of AAIP and |
|
doc/susp_aaip_isofs_names.txt for the layout of "isofs.ca". |
|
|
|
The single data files hold an index to their MD5 checksum in individual AAIP |
|
attributes "isofs.cx". Index I means: array base address + 16 * I. |
|
|
|
If there are N checksummed data files then the array consists of N + 2 entries |
|
with 16 bytes each. |
|
|
|
Entry number 0 holds a session checksum which covers the range from the session |
|
start block up to (but not including) the start block of the checksum area. |
|
This range is described by attribute "isofs.ca" of the root node. |
|
|
|
Entries 1 to N hold the checksums of individual data files. |
|
|
|
Entry number N + 1 holds the MD5 checksum of entries 0 to N. |
|
|
|
|
|
The Checksum Tags |
|
|
|
Because the inquiry of AAIP attributes demands loading of the image tree, |
|
there are also checksum tags which can be detected on the fly when reading |
|
and checksumming the session from the start point as learned from a media |
|
table-of-content. |
|
|
|
The superblock checksum tag is written after the ECMA-119 volume descriptors. |
|
The tree checksum tag is written after the ECMA-119 directory entries. |
|
The session checksum tag is written after all payload including the checksum |
|
array. (Then follows eventual padding.) |
|
|
|
The tags are a single lines of printable text, padded by 0 bytes. They have |
|
the following format: |
|
|
|
Tag_id pos=# range_start=# range_size=# [session_start|next=#] md5=# self=#\n |
|
|
|
Tag_id distinguishes the following tag types |
|
"libisofs_rlsb32_checksum_tag_v1" Relocated 64 kB superblock tag |
|
"libisofs_sb_checksum_tag_v1" Superblock tag |
|
"libisofs_tree_checksum_tag_v1" Directory tree tag |
|
"libisofs_checksum_tag_v1" Session tag |
|
|
|
A relocated superblock may appear at LBA 0 of an image which was produced for |
|
being stored in a disk file or on overwriteable media (e.g. DVD+R, BD-RE). |
|
Typically there is a first session recorded with a superblock at LBA 32 and |
|
the next session may follow shortly after its session tag. (There may be a gap |
|
of padding, often 150 blocks, and aligning to the next address that is |
|
divisible by 32.) Normally no session starts after the address given by |
|
parameter session_start=. |
|
|
|
Session oriented media like CD-R[W], DVD+R, BD-R will have no relocated |
|
superblock but rather bear a table-of-content on media level (to be inquired |
|
by MMC commands(. |
|
|
|
|
|
Example: |
|
A relocated superblock which points to the last session. Then the first session |
|
which starts at Logical Block Address 32. The following sessions have the same |
|
structure as the first one. |
|
|
|
LBA 0: |
|
<... ECMA-119 System Area and Volume Descriptors ...> |
|
LBA 18: |
|
libisofs_rlsb32_checksum_tag_v1 pos=18 range_start=0 range_size=18 session_start=311936 md5=6fd252d5b1db52b3c5193447081820e4 self=526f7a3c7fefce09754275c6b924b6d9 |
|
<... padding up to LBA 32 ...> |
|
LBA 32: |
|
<... First Session: ECMA-119 System Area and Volume Descriptors ...> |
|
libisofs_sb_checksum_tag_v1 pos=50 range_start=32 range_size=18 md5=17471035f1360a69eedbd1d0c67a6aa2 self=52d602210883eeababfc9cd287e28682 |
|
<... ECMA-119 Directory Entries (the tree of file names) ...> |
|
LBA 334: |
|
libisofs_tree_checksum_tag_v1 pos=334 range_start=32 range_size=302 md5=41acd50285339be5318decce39834a45 self=fe100c338c8f9a494a5432b5bfe6bf3c |
|
<... Data file payload and checksum array ...> |
|
LBA 81554: |
|
libisofs_checksum_tag_v1 pos=81554 range_start=32 range_size=81522 md5=8adb404bdf7f5c0a078873bb129ee5b9 self=57c2c2192822b658240d62cbc88270cb |
|
|
|
<... more sessions ...> |
|
|
|
LBA 311936: |
|
<... Last Session: ECMA-119 System Area and Volume Descriptors ...> |
|
LBA 311954: |
|
libisofs_sb_checksum_tag_v1 pos=311954 range_start=311936 range_size=18 next=312286 md5=7f1586e02ac962432dc859a4ae166027 self=2c5fce263cd0ca6984699060f6253e62 |
|
<... Last Session: tree, tree checksum tag, data payload, session tag ...> |
|
|
|
|
|
There are several tag parameters. Addresses are given as decimal numbers, MD5 |
|
checksums as strings of 32 hex digits. |
|
|
|
pos= |
|
gives the block address where the tag supposes itself to be stored. |
|
If this does not match the block address where the tag is found then this |
|
either indicates that the tag is payload of the image or that the image has |
|
been relocated. (The latter makes the image unusable.) |
|
|
|
range_start= |
|
The block address where the session is supposed to start. If this does not |
|
match the session start on media then the volume descriptors of the |
|
image have been relocated. (This can happen with overwriteable media. If |
|
checksumming started at LBA 0 and finds range_start=32, then one has to |
|
restart checksumming at LBA 32. See libburn/doc/cookbook.txt |
|
"ISO 9660 multi-session emulation on overwriteable media" for background |
|
information.) |
|
|
|
range_size= |
|
The number of blocks beginning at range_start which are covered by the |
|
checksum of the tag. |
|
|
|
Only with superblock tag and tree tag: |
|
next= |
|
The block address where the next tag is supposed to be found. This is |
|
to avoid the small possibility that a checksum tag with matching position |
|
is part of a directory entry or data file. The superblock tag is quite |
|
uniquely placed directly after the ECMA-119 Volume Descriptor Set Terminator |
|
where no such cleartext is supposed to reside by accident. |
|
|
|
Only with relocated 64 kB superblock tag: |
|
session_start= |
|
The start block address (System Area) of the session to which the relocated |
|
superblock points. |
|
|
|
md5= |
|
The checksum payload of the tag as lower case hex digits. |
|
|
|
self= |
|
The MD5 checksum of the tag itself up to and including the last hex digit of |
|
parameter "md5=". |
|
|
|
The newline character at the end is mandatory. For now all bytes of the |
|
block after that newline shall be zero. There may arise future extensions. |
|
|
|
|
|
------------------------------------------------------------------------------- |
|
|
|
Usage at Read Time |
|
|
|
Checking Before Image Tree Loading |
|
|
|
In order to check for a trustworthy loadable image tree, read the first 32 |
|
blocks from to the session start and look in block 16 to 32 for a superblock |
|
checksum tag by |
|
iso_util_decode_md5_tag(block, &tag_type, &pos, |
|
&range_start, &range_size, &next_tag, md5, 0); |
|
|
|
If a tag of type 2 or 4 appears and has plausible parameters, then check |
|
whether its MD5 matches the MD5 of the data blocks which were read before. |
|
|
|
With tag type 2: |
|
|
|
Keep the original MD5 context of the data blocks and clone |
|
one for obtaining the MD5 bytes. |
|
If the MD5s match, then compute the checksum block into the kept MD5 context |
|
and go on with reading and computing for the tree checksum tag. This will be |
|
found at block address next_tag, verified and parsed by: |
|
iso_util_decode_md5_tag(block, &tag_type, &pos, |
|
&range_start, &range_size, &next_tag, md5, 3); |
|
|
|
Again, if the parameters match the reading state, the MD5 must match the |
|
MD5 computed from the data blocks which were before. |
|
If so, then the tree is ok and safe to be loaded by iso_image_import(). |
|
|
|
With tag type 4: |
|
|
|
End the MD5 context and start a new context for the session which you will |
|
read next. |
|
|
|
You may look for the first session by starting to read at LBA 32, or you may |
|
look for the last session by starting to read at the address given by parameter |
|
session_start=. The former is suitable for a check of the whole image, the |
|
latter is the shortest way to ensure that the tree of the last session is |
|
not corrupted. |
|
|
|
|
|
Checking a Whole Session |
|
|
|
In order to check the trustworthyness of a whole session, continue reading |
|
and checksumming after the tree was verified. |
|
|
|
Read and checksum the blocks. When reaching block address next_tag (from the |
|
tree tag) submit this block to |
|
|
|
iso_util_decode_md5_tag(block, &tag_type, &pos, |
|
&range_start, &range_size, &next_tag, md5, 1); |
|
|
|
If this returns 1, then check whether the returned parameters pos, range_start, |
|
and range_size match the state of block reading, and whether the returned |
|
bytes in parameter md5 match the MD5 computed from the data blocks which were |
|
read before the tag block. |
|
|
|
|
|
Checking Single Files in a Loaded Image |
|
|
|
An image may consist of many sessions wherein many data blocks may not belong |
|
to files in the directory tree of the most recent session. Checking this |
|
tree and all its data files can ensure that all actually valid data in the |
|
image are trustworthy. This will leave out the trees of the older sessions |
|
and the obsolete data blocks of overwritten or deleted files. |
|
|
|
Once the image has been loaded, you can obtain MD5 sums from IsoNode objects |
|
which fulfill |
|
iso_node_get_type(node) == LIBISO_FILE |
|
|
|
The recorded checksum can be obtained by |
|
iso_file_get_md5(image, (IsoFile *) node, md5, 0); |
|
|
|
For accessing the file data in the loaded image use |
|
iso_file_get_stream((IsoFile *) node); |
|
to get the data stream of the object. |
|
The checksums cover the data content as it was actually written into the ISO |
|
image stream, not necessarily as it was on hard disk before or afterwards. |
|
This implies that content filtered files bear the MD5 of the filtered data |
|
and not of the original files on disk. When checkreading, one has to avoid |
|
any reverse filtering. Dig out the stream which directly reads image data |
|
by calling iso_stream_get_input_stream() until it returns NULL and use |
|
iso_stream_get_size() rather than iso_file_get_size(). |
|
|
|
Now you may call iso_stream_open(), iso_stream_read(), iso_stream_close() |
|
for reading file content from the loaded image. |
|
|
|
|
|
Session Check in a Loaded Image |
|
|
|
iso_image_get_session_md5() gives start LBA and session payload size as of |
|
"isofs.ca" and the session checksum as of the checksum array. |
|
|
|
For reading you may use the IsoDataSource object which you submitted |
|
to iso_image_import() when reading the image. If this source is associated |
|
to a libburn drive, then libburn function burn_read_data() can read directly |
|
from it. |
|
|
|
|