Re-implemented the CRC functions under own copyright

This commit is contained in:
Thomas Schmitt 2012-02-19 10:10:11 +00:00
parent 9a668d1e77
commit f31037729a
3 changed files with 560 additions and 159 deletions

View File

@ -1 +1 @@
#define Cdrskin_timestamP "2012.02.13.102837"
#define Cdrskin_timestamP "2012.02.19.101022"

View File

@ -1,11 +1,194 @@
/* -*- indent-tabs-mode: t; tab-width: 8; c-basic-offset: 8; -*- */
/* Copyright (c) 2012 Thomas Schmitt <scdbackup@gmx.net>
Provided under GPL version 2 or later.
Containing disabled code pieces from other GPL programs.
They are just quotes for reference.
The activated code uses plain polynomial division and other primitve
algorithms to build tables of pre-computed CRC values. It then computes
the CRCs by algorithms which are derived from mathematical considerations
and from analysing the mathematical meaning of the disabled code pieces.
The comments here are quite detailed in order to prove my own understanding
of the topic.
*/
#ifdef HAVE_CONFIG_H
#include "../config.h"
#endif
#include "crc.h"
/* Exploration ts B00214 :
ECMA-130, 22.3.6 "CRC field"
"This field contains the inverted parity bits. The CRC code word must be
divisible by the check polynomial. [...]
The generating polynomial shall be
G(x) = x^16 + x^12 + x^5 + 1
"
Also known as CRC-16-CCITT, CRC-CCITT
Used in libburn for raw write modes in sector.c.
There is also disabled code in read.c which would use it.
ts B11222:
The same algorithm is prescribed for CD-TEXT in MMC-3 Annex J.
"CRC Field consists of 2 bytes. Initiator system may use these bytes
to check errors in the Pack. The polynomial is x^16 + x^12 + x^5 + 1.
All bits shall be inverted."
libburn/cdtext.c uses a simple bit shifting function : crc_11021()
ts B20211:
Discussion why both are equivalent in respect to their result:
Both map the bits of the given bytes to a polynomial over the finite field
of two elements "GF(2)". If bytes 0 .. M are given, then bit n of byte m
is mapped to the coefficient of x exponent (n + ((M - m) * 8) + 16).
I.e. they translate the bits into a polynomial with the highest bit
becomming the coefficient of the highest power of x. Then this polynomial
is multiplied by (x exp 16).
The set of all such polynomials forms a commutative ring. Its addition
corresponds to bitwise exclusive or. Addition and subtraction are identical.
Multiplication with polynomials of only one single non-zero coefficient
corresponds to leftward bit shifting by the exponent of that coefficient.
The same rules apply as with elementary school arithmetics on integer
numbers, but with surprising results due to the finite nature of the
coefficient number space.
Note that multiplication is _not_ an iteration of addition here.
Function crc_11021() performs a division with residue by the euclidian
algorithm. I.e. it splits polynomial d into quotient q(d) and residue r(d)
in respect to the polynomial p = x exp 16 + x exp 12 + x exp 5 + x exp 0
d = p * q(d) + r(d)
where r(d) is of a polynomial degree lower than p, i.e. only x exp 15
or lower have non-zero coefficients.
The checksum crc(D) is derived by reverse mapping (r(d) * (x exp 16)).
I.e. by mapping the coefficient of (x exp n) to bit n of the 16 bit word
crc(D).
The function result is the bit-wise complement of crc(D).
Function crc_ccitt uses a table ccitt_table of r(d) values for the
polynomials d which represent the single byte values 0x00 to 0xff.
It computes r(d) by computing the residues of an iteratively expanded
polynomial. The expansion of the processed byte string A by the next byte B
from the input byte string happens by shifting the string 8 bits to the
left, and by oring B onto bits 0 to 7.
In the space of polynomials, the already processed polynomial "a" (image of
byte string A) gets expanded by polynomial b (the image of byte B) like this
a * X + b
where X is (x exp 8), i.e. the single coefficient polynomial of degree 8.
The following argumentation uses algebra with commutative, associative
and distributive laws.
Valid especially with polynomials is this rule:
(1): r(a + b) = r(a) + r(b)
because r(a) and r(b) are of degree lower than degree(p) and
degree(a + b) <= max(degree(a), degree(b))
Further valid are:
(2): r(a) = r(r(a))
(3): r(p * a) = 0
The residue of this expanded polynomial can be expressed by means of the
residue r(a) which is known from the previous iteration step, and the
residue r(b) which may be looked up in ccitt_table.
r(a * X + b)
= r(p * q(a) * X + r(a) * X + p * q(b) + r(b))
Applying rule (1):
= r(p * q(a) * X) + r(r(a) * X) + r(p * q(b)) + r(r(b))
Rule (3) and rule (2):
= r(r(a) * X) + r(b)
Be h(a) and l(a) chosen so that: r(a) = h(a) * X + l(a),
and l(a) has zero coefficients above (x exp 7), and h(a) * X has zero
coefficients below (x exp 8). (They correspond to the high and low byte
of the 16 bit word crc(A).)
So the previous statement can be written as:
= r(h(a) * X * X) + r(l(a) * X) + r(b)
Since the degree of l(a) is lower than 8, the degree of l(a) * X is lower
than 16. Thus it cannot be divisible by p which has degree 16.
So: r(l(a) * X) = l(a) * X
This yields
= l(a) * X + r(h(a) * X * X + b)
h(a) * X * X is the polynomial representation of the high byte of 16 bit
word crc(A).
So in the world of bit patterns the iteration step is:
crc(byte string A expanded by byte B)
= (low_byte(crc(A)) << 8) ^ crc(high_byte(crc(A)) ^ B)
And this is what function crc_ccitt() does, modulo swapping the exor
operants and the final bit inversion which is prescribed by ECMA-130
and MMC-3 Annex J.
*/
/* Plain implementation of polynomial division on a Galois field, where
addition and subtraction both are binary exor. Euclidian algorithm.
Divisor is x^16 + x^12 + x^5 + 1 = 0x11021.
This is about ten times slower than the table driven algorithm.
*/
static int crc_11021(unsigned char *data, int count, int flag)
{
int acc = 0, i;
for (i = 0; i < count * 8 + 16; i++) {
acc = (acc << 1);
if (i < count * 8)
acc |= ((data[i / 8] >> (7 - (i % 8))) & 1);
if (acc & 0x10000)
acc ^= 0x11021;
}
return acc;
}
/* This is my own table driven implementation for which i claim copyright.
Copyright (c) 2012 Thomas Schmitt <scdbackup@gmx.net>
*/
unsigned short crc_ccitt(unsigned char *data, int count)
{
static unsigned short crc_tab[256], tab_initialized = 0;
unsigned short acc = 0;
unsigned char b[1];
int i;
if (!tab_initialized) {
/* Create table of byte residues */
for (i = 0; i < 256; i++) {
b[0] = i;
crc_tab[i] = crc_11021(b, 1, 0);
}
tab_initialized = 1;
}
/* There seems to be a speed advantage on amd64 if (acc << 8) is the
second operant of exor, and *(data++) seems faster than data[i].
*/
for (i = 0; i < count; i++)
acc = crc_tab[(acc >> 8) ^ *(data++)] ^ (acc << 8);
/* ECMA-130 22.3.6 and MMC-3 Annex J (CD-TEXT) want the result with
inverted bits
*/
return ~acc;
}
/*
This was the function inherited with libburn-0.2.
static unsigned short ccitt_table[256] = {
0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7,
0x8108, 0x9129, 0xA14A, 0xB16B, 0xC18C, 0xD1AD, 0xE1CE, 0xF1EF,
@ -41,6 +224,330 @@ static unsigned short ccitt_table[256] = {
0x6E17, 0x7E36, 0x4E55, 0x5E74, 0x2E93, 0x3EB2, 0x0ED1, 0x1EF0
};
unsigned short crc_ccitt(unsigned char *q, int len)
{
unsigned short crc = 0;
while (len-- > 0)
crc = ccitt_table[(crc >> 8 ^ *q++) & 0xff] ^ (crc << 8);
return ~crc;
}
*/
/* Exploration ts B00214 :
ECMA-130, 14.3 "EDC field"
"The EDC field shall consist of 4 bytes recorded in positions 2064 to 2067.
The error detection code shall be a 32-bit CRC applied on bytes 0 to 2063.
The least significant bit of a data byte is used first. The EDC codeword
must be divisible by the check polynomial:
P(x) = (x^16 + x^15 + x^2 + 1) . (x^16 + x^2 + x + 1)
The least significant parity bit (x^0) is stored in the most significant
bit position of byte 2067.
"
Used for raw writing in sector.c
ts B20211:
Discussion why function crc_32() implements above prescription of ECMA-130.
See end of this file for the ofunction inherited with libburn-0.2.
The mentioned polynomial product
(x^16 + x^15 + x^2 + 1) . (x^16 + x^2 + x + 1)
yields this sum of x exponents
32 31 18 16
18 17 4 2
17 16 3 1
16 15 2 0
======================================
32 31 16 15 4 3 1 0
(The number of x^18 and x^17 is divisible by two and thus 0 in GF(2).)
This yields as 33 bit number:
0x18001801b
If above prescription gets implemented straight forward by function
crc_18001801b(), then its results match the ones of crc_32() with all test
strings which i could invent.
The function consists of a conventional polynomial division with reverse
input order of bits per byte.
Further it swaps the bits in the resulting 32 bit word. That is because
sector.c:sector_headers writes the 4 bytes of crc_32() as little endian.
The ECMA-130 prescription rather demands big endianness and bit swapping
towards the normal bit order in bytes:
"The EDC field shall consist of 4 bytes recorded in positions 2064 to 2067.
[...]
The least significant parity bit (x^0) is stored in the most
significant bit position of byte 2067."
-----------------------------------------------------------------------
*/
/* Overall bit mirroring of a 32 bit word */
unsigned int rfl32(unsigned int acc)
{
unsigned int inv_acc;
int i;
inv_acc = 0;
for (i = 0; i < 32; i++)
if (acc & (1 << i))
inv_acc |= 1 << (31 - i);
return inv_acc;
}
/* Plain implementation of polynomial division on a Galois field, where
addition and subtraction both are binary exor. Euclidian algorithm.
Divisor is (x^16 + x^15 + x^2 + 1) * (x^16 + x^2 + x + 1).
This is about ten times slower than the table driven algorithm.
@param flag bit0= do not mirror bits in input bytes and result word
(Useful for building the byte indexed CRC table)
*/
static unsigned int crc_18001801b(unsigned char *data, int count, int flag)
{
long int acc = 0, i, top;
unsigned int inv_acc;
for (i = 0; i < count * 8 + 32; i++) {
top = acc & 0x80000000;
acc = (acc << 1);
if (i < count * 8) {
if (flag & 1)
/* Normal bit sequence of input bytes */
acc |= ((data[i / 8] >> (7 - (i % 8))) & 1);
else
/* Bit sequence of input bytes mirrored */
acc |= ((data[i / 8] >> (i % 8)) & 1);
}
if (top)
acc ^= 0x18001801b;
}
if (flag & 1)
return (unsigned int) acc;
/* The bits of the whole 32 bit result are mirrored for ECMA-130
output compliance and for sector.c habit to store CRC little endian
although ECMA-130 prescribes it big endian.
*/
inv_acc = rfl32((unsigned int) acc);
return inv_acc;
}
/*
-----------------------------------------------------------------------
Above discussion why crc_ccitt() and crc_11021() yield identical results
can be changed from 16 bit to 32 bit by chosing h(a) and l(a) so that:
r(a) = h(a) * X * X * X + l(a)
h(a) corresponds to the highest byte of crc(A), whereas l(a) corresponds
to the lower three bytes of crc(A).
This yields
r(a * X + b)
= l(a) * X + r(h(a) * X * X * X * X + b)
h(a) * X * X * X * X is the polynomial representation of the high byte of
32 bit word crc(A).
So in the world of bit patterns we have:
crc(byte string A expanded by byte B)
= (lowest_three_bytes(crc(A)) << 8) ^ crc(high_byte(crc(A)) ^ B)
Regrettably this does not yet account for the byte-internal mirroring of
bits during the conversion from bit pattern to polynomial, and during
conversion from polynomial residue to bit pattern.
Be rfl8(D) the result of byte-internal mirroring of bit pattern D,
and mirr8(d) its corresponding polynom.
Be now h(a) and l(a) chosen so that: r(mirr8(a)) = h(a) * X * X * X + l(a)
This corresponds to highest byte and lower three bytes of crc(A).
r(mirr8(a) * X + mirr8(b))
= r(h(a) * X * X * X * X) + r(l(a) * X) + r(mirr8(b))
= l(a)) * X + r(h(a) * X * X * X * X + mirr8(b))
The corresponding bit pattern operation is
crc(mirrored byte string A expanded by mirrored byte B)
= (lowest_three_bytes(crc(A)) << 8) ^ crc(high_byte(crc(A)) ^ rfl8(B))
This demands a final result mirroring to meet the ECMA-130 prescription.
rfl8() can be implemented as lookup table.
The following function crc32_by_tab() yields the same results as functions
crc_18001801b() and crc_32():
-----------------------------------------------------------------------
*/
/* Byte-internal bit mirroring function.
*/
unsigned int rfl8(unsigned int acc)
{
unsigned int inv_acc;
int i, j;
inv_acc = 0;
for (j = 0; j < 4; j++)
for (i = 0; i < 8; i++)
if (acc & (1 << (i + 8 * j)))
inv_acc |= 1 << ((7 - i) + 8 * j);
return inv_acc;
}
#ifdef Libburn_with_crc_illustratioN
/* Not needed for libburn. The new implementation of function crc_32() is the
one that is used.
*/
unsigned int crc32_by_tab(unsigned char *data, int count, int flag)
{
static unsigned int crc_tab[256], tab_initialized = 0;
static unsigned char mirr_tab[256];
unsigned int acc, inv_acc;
unsigned char b[1];
int i;
if (!tab_initialized) {
for (i = 0; i < 256; i++) {
b[0] = i;
/* Create table of non-mirrored 0x18001801b residues */
crc_tab[i] = crc_18001801b(b, 1, 1);
/* Create table of mirrored byte values */
mirr_tab[i] = rfl8(i);
}
tab_initialized = 1;
}
acc = 0;
for (i = 0; i < count; i++)
acc = (acc << 8) ^ crc_tab[(acc >> 24) ^ mirr_tab[data[i]]];
/* The bits of the whole 32 bit result are mirrored for ECMA-130
output compliance and for sector.c habit to store CRC little endian
although ECMA-130 prescribes it big endian.
*/
inv_acc = rfl32((unsigned int) acc);
return inv_acc;
}
#endif /* Libburn_with_crc_illustratioN */
/*
-----------------------------------------------------------------------
Above function yields sufficient performance, nevertheless the old function
crc_32() (see below) is faster by avoiding the additional mirror table
lookup.
A test with 10 times 650 MB on 3000 MHz amd64:
crc_18001801b : 187 s
crc32_by_tab : 27 s
crc_32 : 16 s
So how does crc_32() avoid the application of bit mirroring to B ?.
Inherited crc_32() performs
crc = crc32_table[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
Above function crc32_by_tab() would be
crc = crc_tab[(crc >> 24) ^ mirr_tab[*data++]] ^ (crc << 8);
The shortcut does not change the polynomial representation of the algorithm
or the mapping from and to bit patterns. It only mirrors the bit direction
in the bytes and in the 32-bit words which are involved in the bit pattern
computation. This affects input (which is desired), intermediate state
(which is as good as unmirrored), and final output (which would be slightly
undesirable if libburn could not use the mirrored result anyway).
Instead of the high byte (crc >> 24), the abbreviated algorithm uses
the low byte of the mirrored intermediate checksum (crc & 0xffL).
Instead of shifting the other three intermediate bytes to the left
(crc << 8), the abbreviated algorithm shifts them to the right (crc >> 8).
In both cases they overwrite the single byte that was used for computing
the table index.
The byte indexed table of CRC values needs to hold mirrored 32 bit values.
The byte index [(crc ^ *data++) & 0xffL] would need to be mirrored, which
would eat up the gain of not mirroring the input bytes. But this mirroring
can be pre-computed into the table by exchanging each value with the value
of its mirrored index.
So this relation exists between the CRC table crc_tab[] of crc32_by_tab()
and the table crc32_table[] of the abbreviated algorithm crc_32():
crc_tab[i] == rfl32(crc32_table[rfl8(i)])
for i={0..255}.
I compared the generated table in crc32_by_tab() by this test
for (i = 0; i < 256; i++) {
if (rfl32(crc_tab[rfl8(i)]) != crc32_table[i] ||
crc_tab[i] != rfl32(crc32_table[rfl8(i)])) {
printf("DEVIATION : i = %d\n", i);
exit(1);
}
}
No screaming abort happened.
-----------------------------------------------------------------------
*/
/* This is my own mirrored table implementation for which i claim copyright.
With gcc -O2 it shows the same efficiency as the inherited implementation
below. With -O3, -O1, or -O0 it is only slightly slower.
Copyright (c) 2012 Thomas Schmitt <scdbackup@gmx.net>
*/
unsigned int crc_32(unsigned char *data, int count)
{
static unsigned int crc_tab[256], tab_initialized = 0;
unsigned int acc = 0;
unsigned char b[1];
int i;
if (!tab_initialized) {
/* Create table of mirrored 0x18001801b residues in
bit-mirrored index positions.
*/
for (i = 0; i < 256; i++) {
b[0] = i;
crc_tab[rfl8(i)] = rfl32(crc_18001801b(b, 1, 1));
}
tab_initialized = 1;
}
for (i = 0; i < count; i++)
acc = (acc >> 8) ^ crc_tab[(acc & 0xff) ^ data[i]];
/* The bits of the whole 32 bit result stay mirrored for ECMA-130
output 8-bit mirroring and for sector.c habit to store the CRC
little endian although ECMA-130 prescribes it big endian.
*/
return acc;
}
/*
-----------------------------------------------------------------------
This was the function inherited with libburn-0.2 which implements the
abbreviated algorithm. Its obscure existence led me to above insights.
My compliments to the (unknown) people who invented this.
unsigned long crc32_table[256] = {
0x00000000L, 0x90910101L, 0x91210201L, 0x01B00300L,
0x92410401L, 0x02D00500L, 0x03600600L, 0x93F10701L,
@ -108,125 +615,6 @@ unsigned long crc32_table[256] = {
0x71C0FC00L, 0xE151FD01L, 0xE0E1FE01L, 0x7070FF00L
};
/* Exploration ts B00214 :
ECMA-130, 22.3.6 "CRC field"
Generating polynomial: x^16 + x^12 + x^5 + 1
Also known as CRC-16-CCITT, CRC-CCITT
Used in libburn for raw write modes in sector.c.
There is also disabled code in read.c which would use it.
ts B11222:
The same algorithm is prescribed for CD-TEXT.
libburn/cdtext.c uses a simple bit shifting function : crc_11021()
ts B20211:
Discussion why both are equivalent in respect to their result:
Both map the bits of the given bytes to a polynomial over the finite field
of two elements. If bytes 0 .. M are given, then bit n of byte m is mapped
to the coefficient of x exponent (n + ((M - m) * 8) + 16).
I.e. they translate the bits into a polynomial with the highest bit
becomming the coefficient of the highest power of x. Then this polynomial
is multiplied by (x exp 16).
The set of all such polynomials forms a commutative ring. Its addition
corresponds to bitwise exclusive or. Addition and subtraction are identical.
Multiplication with polynomials of only one single non-zero coefficient
corresponds to leftward bit shifting by the exponent of that coefficient.
The same rules apply as with elementary school arithmetics on integer
numbers, but with surprising results due to the finite nature of the
coefficient number space.
Note that multiplication is _not_ an iteration of addition here.
Function crc_11021() performs a division with residue by the euclidian
algorithm. I.e. it splits polynomial d into quotient q(d) and residue r(d)
in respect to the polynomial p = x exp 16 + x exp 12 + x exp 5 + x exp 0
d = p * q(d) + r(d)
where r(d) is of a polynomial rank lower than p, i.e. only x exp 15
or lower have non-zero coefficients.
The checksum crc(D) is derived by reverse mapping (r(d) * (x exp 16)).
I.e. by mapping the coefficient of (x exp n) to bit n of the 16 bit word
crc(D).
The function result is the bit-wise complement of crc(D).
Function crc_ccitt uses a table ccitt_table of r(d) values for the
polynomials d which represent the single byte values 0x00 to 0xff.
It computes r(d) by computing the residues of an iteratively expanded
polynomial. The expansion of the processed byte string A by the next byte B
from the input byte string happens by shifting the string 8 bits to the
left, and by oring B onto bits 0 to 7.
In the space of polynomials, the already processed polynomial "a" (image of
byte string A) gets expanded by polynomial b (the image of byte B) like this
a * X + b
where X is (x exp 8), i.e. the single coefficient polynomial of rank 8.
The following argumentation uses algebra with commutative, associative
and distributive laws.
Valid especially with polynomials is this rule:
(1): r(a + b) = r(a) + r(b)
because r(a) and r(b) are of rank lower than rank(p) and
rank(a + b) <= max(rank(a), rank(b))
Further valid are:
(2): r(a) = r(r(a))
(3): r(p * a) = 0
The residue of this expanded polynomials can be expressed by means of the
residue r(a) which is known from the previous iteration step, and the
residue r(b) which may be looked up in ccitt_table.
r(a * X + b)
= r(p * q(a) * X + r(a) * X + p * q(b) + r(b))
Applying rule (1):
= r(p * q(a) * X) + r(r(a) * X) + r(p * q(b)) + r(r(b))
Rule (3) and rule (2):
= r(r(a) * X) + r(b)
Be h(a) and l(a) chosen so that: r(a) = h(a) * X + l(a),
and l(a) has zero coefficients above (x exp 7), and h(a) * X has zero
coefficients below (x exp 8). (They correspond to the high and low byte
of the 16 bit word crc(A).)
Now we have:
= r(h(a) * X * X) + r(l(a) * X) + r(b)
Since the rank of l(a) is lower than 8, rank of l(a) * X is lower than 16.
Thus it cannot be divisible by p which has rank 16.
So: r(l(a) * X) = l(a) * X
This yields
= l(a) * X + r(h(a) * X * X + b)
h(a) * X * X is the polynomial representation of the high byte of 16 bit
word crc(A).
So in the world of bit patterns we have:
crc(byte string A expanded by byte B)
= (low_byte(crc(A)) << 8) ^ crc(high_byte(crc(A)) ^ B)
And this is what function crc_ccitt() does, modulo swapping the exor
operants and some C obfuscation.
*/
unsigned short crc_ccitt(unsigned char *q, int len)
{
unsigned short crc = 0;
while (len-- > 0)
crc = ccitt_table[(crc >> 8 ^ *q++) & 0xff] ^ (crc << 8);
return ~crc;
}
/* Exploration ts B00214 :
ECMA-130, 14.3 "EDC field"
"The EDC codeword must be divisible by the check polynomial:
P(x) = (x^16 + x^15 + x^2 + 1) . (x^16 + x^2 + x + 1)
"
Used for raw writing in sector.c
*/
unsigned int crc_32(unsigned char *data, int len)
{
unsigned int crc = 0;
@ -235,3 +623,6 @@ unsigned int crc_32(unsigned char *data, int len)
crc = crc32_table[(crc ^ *data++) & 0xffL] ^ (crc >> 8);
return crc;
}
*/

View File

@ -1,11 +1,21 @@
/* -*- indent-tabs-mode: t; tab-width: 8; c-basic-offset: 8; -*- */
/* Copyright (c) 2004 - 2006 Derek Foreman, Ben Jansens
Copyright (c) 2012 Thomas Schmitt <scdbackup@gmx.net>
Provided under GPL version 2 or later.
*/
#ifndef BURN__CRC_H
#define BURN__CRC_H
#ifdef Xorriso_standalonE
/* Source module crc.c of yet unclear ancestry is excluded from GNU xorriso */
/* ts B20219 : The functions have been re-implemented from scratch after
studying texts about CRC computation and understanding the
meaning of the underlying ECMA-130 specs.
Nevertheless, there is no need to include them into xorriso
because it does neither CD-TEXT nor raw CD writing.
*/
#ifndef Libburn_no_crc_C
#define Libburn_no_crc_C 1
#endif