legacy/libburn/trunk/doc/cdtext.txt

449 lines
17 KiB
Plaintext
Raw Normal View History

Description of CD-TEXT
Guided by Leon Merten Lohse via libcdio-devel@gnu.org
by reading mmc3r10g.pdf from http://www.t10.org/ftp/t10/drafts/mmc3/
by docs and results of cdtext.zip from http://www.sonydadc.com/file/
by reading source of libcdio from http://www.gnu.org/s/libcdio
which quotes source of cdrecord from ftp://ftp.berlios.de/pub/cdrecord/alpha
Language codes were learned from http://tech.ebu.ch/docs/tech/tech3264.pdf
Genre codes were learned from libcdio and confirmed by
http://helpdesk.audiofile-engineering.com/index.php?pg=kb.page&id=123
For libburnia-project.org by Thomas Schmitt <scdbackup@gmx.net>
Content:
- CD-TEXT from the view of the user
- Content specifications of particular pack types
- Format of a CD-TEXT packs array
- Overview of libburn API calls for CD-TEXT
-------------------------------------------------------------------------------
CD-TEXT from the view of the user:
CD-TEXT records attributes of disc and tracks on audio CD.
The attributes are grouped into blocks which represent particular languages.
Up to 8 blocks are possible.
There are 13 defined attribute categories, which are called Pack Types and are
identified by a single-byte code:
0x80 = Title
0x81 = Names of performers
0x82 = Names of Songwriters
0x83 = Names of Composers,
0x84 = Names of Arrangers
0x85 = Messages
0x86 = text-and-binary: Disc Identification
0x87 = text-and-binary: Genre Identification
0x88 = binary: Table of Content information
0x89 = binary: Second Table of Content information
(0x8a to 0x8c are reserved.)
0x8d = Closed Information
0x8e = UPC/EAN code of the album and ISRC code of each track
0x8f = binary: Size Information of the Block
Some of these categories apply to the whole disc only:
0x86, 0x87, 0x88, 0x89, 0x8d
Some have to be additionally attributed to each track, if they are present for
the whole disc:
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x8e
One describes the overall content of a block and in part of all other blocks:
0x8f
The total size of a block's attribute set is restricted by the fact that it
has to be stored in at most 253 records with 12 bytes of payload. These records
are called Text Packs.
A shortcut for repeated identical track texts is provided, so that a text
that is identical to the one of the previous track occupies only 2 or 4 bytes.
-------------------------------------------------------------------------------
Content specification of particular pack types:
Pack types 0x80 to 0x85 and 0x8e contain 0-terminated cleartext. If double byte
characters are used, then two 0-bytes terminate the cleartext.
The meaning of 0x80 to 0x85 should be clear by above list. They are encoded
according to the Character Code of their block. Either as ISO-8859-1 single
byte characters, or as 7-bit ASCII single byte characters, or as MS-JIS double
byte characters.
More info to 0x8e is given below.
Pack type 0x86 (Disc Identification) is documented by Sony as "Catalog Number:
(use ASCII Code) Catalog Number of the album". So it is not really binary
but might be non-printable, and should contain only bytes with bit7 = 0.
Pack type 0x87 contains 2 binary bytes, followed by 0-terminated cleartext.
The two binary bytes form a big-endian index to the following list.
0x0000 = "Not Used" (Sony prescribes to use this if no genre applies)
0x0001 = "Not Defined"
0x0002 = "Adult Contemporary"
0x0003 = "Alternative Rock"
0x0004 = "Childrens Music"
0x0005 = "Classical"
0x0006 = "Contemporary Christian"
0x0007 = "Country"
0x0008 = "Dance"
0x0009 = "Easy Listening"
0x000a = "Erotic"
0x000b = "Folk"
0x000c = "Gospel"
0x000d = "Hip Hop"
0x000e = "Jazz"
0x000f = "Latin"
0x0010 = "Musical"
0x0011 = "New Age"
0x0012 = "Opera"
0x0013 = "Operetta"
0x0014 = "Pop Music"
0x0015 = "Rap"
0x0016 = "Reggae"
0x0017 = "Rock Music"
0x0018 = "Rhythm & Blues"
0x0019 = "Sound Effects"
0x001a = "Spoken Word"
0x001b = "World Music"
Sony documents the cleartext part as "Genre information that would supplement
the Genre Code, such as 'USA Rock music in the 60s'". Always ASCII encoded.
Pack type 0x88 records information from the CDs Table of Content, as of
READ PMA/TOC/ATIP Format 0010b (mmc5r03c.pdf, table 490 TOC Track Descriptor
Format, Q Sub-channel).
See below, Format of CD-TEXT packs, for more details about the content of
pack type 0x88.
Pack type 0x89 is yet quite unclear. See below, Format of CD-TEXT packs, for
an example of this pack type.
Pack type 0x8d is documented by Sony as "Closed Information: (use 8859-1 Code)
Any information can be recorded on disc as memorandum. Information in this
field will not be read by CD TEXT players available to the public."
Always ISO-8859-1 encoded.
Pack type 0x8e is documented by Sony as "UPC/EAN Code (POS Code) of the album.
This field typically consists of 13 characters." Always ASCII encoded.
Pack type 0x8f summarizes the whole list of text packs of a block.
See below, Format of CD-TEXT packs, for details.
-------------------------------------------------------------------------------
Format of a CD-TEXT packs array:
The attributes are represented on CD as Text Packs in the sub-channel of
the Lead-in of the disc.
The format is explained in part in MMC-3 (mmc3r10g.pdf, Annex J) and in part by
the documentation of Sony's cdtext.zip.
Each pack consists of a 4-byte header, 12 bytes of payload, and 2 bytes of CRC.
The first byte of each pack tells the pack type. See above for a list of types.
The second byte tells the track number to which the first text piece in
a pack is associated. Number 0 means the whole album. Higher numbers are
valid for types 0x80 to 0x85, and 0x8e. With these types, there should be
one text for the disc and one for each track.
With types 0x88 and 0x89, the second byte bears a track number, too.
With type 0x8f, the second byte counts the record parts from 0 to 2.
The third byte is a sequential counter.
The fourth byte is the Block Number and Character Position Indicator.
It consists of three bit fields:
bit7 = Double Bytes Character Code (0= single byte characters)
bit4-6 = Block Number (groups text packs in language blocks)
bit0-3 = Character position. Either the number of characters which
the current text inherited from the previous pack, or
15 if the current text started before the previous pack.
The 12 payload bytes contain pieces of 0-terminated texts or binary data.
A text may span over several packs. Unused characters in a pack are used for
the next text of the same pack type. If no text of the same type follows,
then the remaining text bytes are set to 0.
The CRC algorithm uses divisor 0x11021. The resulting 16-bit residue of the
polynomial division gets exored with 0xffff and written as big-endian
number to bytes 16 and 17 of the pack.
The text packs are grouped in up to 8 blocks of at most 256 packs. Each block
is in charge for one language. Sequence numbers of each block are counted
separately. All packs of block 0 come before the packs of block 1.
The limitation of block number and sequence numbers imply that there are at
most 2048 text packs possible. (READ TOC/PMS/ATIP could retrieve 3640 packs,
as it is limited to 64 kB - 2.)
If a text of a track (pack types 0x80 to 0x85 and 0x8e) repeats identically
for the next track, then it may be represented by a TAB character (ASCII 9)
for single byte texts, resp. two TAB characters for double byte texts.
(This should be used because 256 * 12 bytes is few space for 99 tracks.)
The two binary bytes of pack type 0x87 are written to the first 0x87 pack of
a block. They may or may not be repeated at the start of the follow-up packs
of type 0x87.
The first pack of type 0x88 in a block records in its payload bytes:
0 : PMIN of POINT A1 = First Track Number
1 : PMIN of POINT A2 = Last Track Number
2 : unknown, 0 in Sony example
3 : PMIN of POINT A2 = Start position of Lead-Out
4 : PSEC of POINT A2 = Start position of Lead-Out
5 : PFRAME of POINT A2 = Start position of Lead-Out
6 to 11 : unknown, 0 in Sony example
The following packs record PMIN, PSEC, PFRAME of the POINTs between the
lowest track number (min 01h) and the highest track number (max 63h).
The payload of the last pack is padded by 0s.
The Sony .TOC example:
A0 01
A1 14
A2 63:02:18
01 00:02:00
02 04:11:25
03 08:02:50
04 11:47:62
...
13 53:24:25
14 57:03:25
yields
88 00 23 00 01 0e 00 3f 02 12 00 00 00 00 00 00 12 00
88 01 24 00 00 02 00 04 0b 19 08 02 32 0b 2f 3e 67 2d
...
88 0d 27 00 35 18 19 39 03 19 00 00 00 00 00 00 ea af
Pack type 0x89 is yet quite unclear. Especially what the information shall
mean to the user of the CD. The time points in the Sony example are in the
time range of the tracks numbers that are given before the time points:
01 02:41:48 01 02:52:58
06 23:14:25 06 23:29:60
07 28:30:39 07 28:42:30
13 55:13:26 13 55:31:50
yields
89 01 28 00 01 04 00 00 00 00 02 29 30 02 34 3a f3 0c
89 06 29 00 02 04 00 00 00 00 17 0e 19 17 1d 3c 73 92
89 07 2a 00 03 04 00 00 00 00 1c 1e 27 1c 2a 1e 72 20
89 0d 2b 00 04 04 00 00 00 00 37 0d 1a 37 1f 32 0b 62
The track numbers are stored in the track number byte of the packs. The two
time points are stored in byte 6 to 11 of the payload. Byte 0 of the payload
seems to be a sequential counter. Byte 1 always 4 ? Byte 2 to 5 always 0 ?
Pack type 0x8f summarizes the whole list of text packs of a block.
So there is one group of three 0x8f packs per block.
Nevertheless each 0x8f group tells the highest sequence number and the
language code of all blocks.
The payload bytes of three 0x8f packs form a 36 byte record. The track number
bytes of the three packs have the values 0, 1, 2.
Byte :
0 : Character code for pack types 0x80 to 0x85:
0x00 = ISO-8859-1
0x01 = 7 bit ASCII
0x80 = MS-JIS (japanese Kanji, double byte characters)
1 : Number of first track
2 : Number of last track
3 : libcdio source states: "cd-text information copyright byte"
Probably 3 means "copyrighted", 0 means "not copyrighted".
4 - 19 : Pack count of the various types 0x80 to 0x8f.
Byte number N tells the count of packs of type 0x80 + (N - 4).
I.e. the first byte in this field of 16 counts packs of type 0x80.
20 - 27 : Highest sequence byte number of blocks 0 to 7.
28 - 36 : Language code for blocks 0 to 7 (tech3264.pdf appendix 3)
Not all of these Codes have ever been seen with CD-TEXT, though.
0x00 = Unknown
0x01 = Albanian
0x02 = Breton
0x03 = Catalan
0x04 = Croatian
0x05 = Welsh
0x06 = Czech
0x07 = Danish
0x08 = German
0x09 = English
0x0a = Spanish
0x0b = Esperanto
0x0c = Estonian
0x0d = Basque
0x0e = Faroese
0x0f = French
0x10 = Frisian
0x11 = Irish
0x12 = Gaelic
0x13 = Galician
0x14 = Icelandic
0x15 = Italian
0x16 = Lappish
0x17 = Latin
0x18 = Latvian
0x19 = Luxembourgian
0x1a = Lithuanian
0x1b = Hungarian
0x1c = Maltese
0x1d = Dutch
0x1e = Norwegian
0x1f = Occitan
0x20 = Polish
0x21 = Portuguese
0x22 = Romanian
0x23 = Romansh
0x24 = Serbian
0x25 = Slovak
0x26 = Slovenian
0x27 = Finnish
0x28 = Swedish
0x29 = Turkish
0x2a = Flemish
0x2b = Wallon
0x45 = Zulu
0x46 = Vietnamese
0x47 = Uzbek
0x48 = Urdu
0x49 = Ukrainian
0x4a = Thai
0x4b = Telugu
0x4c = Tatar
0x4d = Tamil
0x4e = Tadzhik
0x4f = Swahili
0x50 = Sranan Tongo
0x51 = Somali
0x52 = Sinhalese
0x53 = Shona
0x54 = Serbo-croat
0x55 = Ruthenian
0x56 = Russian
0x57 = Quechua
0x58 = Pushtu
0x59 = Punjabi
0x5a = Persian
0x5b = Papamiento
0x5c = Oriya
0x5d = Nepali
0x5e = Ndebele
0x5f = Marathi
0x60 = Moldavian
0x61 = Malaysian
0x62 = Malagasay
0x63 = Macedonian
0x64 = Laotian
0x65 = Korean
0x66 = Khmer
0x67 = Kazakh
0x68 = Kannada
0x69 = Japanese
0x6a = Indonesian
0x6b = Hindi
0x6c = Hebrew
0x6d = Hausa
0x6e = Gurani
0x6f = Gujurati
0x70 = Greek
0x71 = Georgian
0x72 = Fulani
0x73 = Dari
0x74 = Churash
0x75 = Chinese
0x76 = Burmese
0x77 = Bulgarian
0x78 = Bengali
0x79 = Bielorussian
0x7a = Bambora
0x7b = Azerbaijani
0x7c = Assamese
0x7d = Armenian
0x7e = Arabic
0x7f = Amharic
E.g. these three packs
42 : 8f 00 2a 00 01 01 03 00 06 05 04 05 07 06 01 02 48 65
43 : 8f 01 2b 00 00 00 00 00 00 00 06 03 2c 00 00 00 c0 20
44 : 8f 02 2c 00 00 00 00 00 09 00 00 00 00 00 00 00 11 45
decode to
Byte :Value Meaning
0 : 01 = ASCII 7-bit
1 : 01 = first track is 1
2 : 03 = last track is 3
3 : 00 = copyright (0 = public domain, 3 = copyrighted ?)
4 : 06 = 6 packs of type 0x80
5 : 05 = 5 packs of type 0x81
6 : 04 = 4 packs of type 0x82
7 : 05 = 5 packs of type 0x83
8 : 07 = 7 packs of type 0x84
9 : 06 = 6 packs of type 0x85
10 : 01 = 1 pack of type 0x86
11 : 02 = 2 packs of type 0x87
12 : 00 = 0 packs of type 0x88
13 : 00 = 0 packs of type 0x89
14 : 00 00 00 00 = 0 packs of types 0x8a to 0x8d
18 : 06 = 6 packs of type 0x8e
19 : 03 = 3 packs of type 0x8f
20 : 2c = last sequence for block 0
This matches the sequence number of the last text pack (0x2c = 44)
21 : 00 00 00 00 00 00 00 = last sequence numbers for block 1..7 (none)
28 : 09 = language code for block 0: English
29 : 00 00 00 00 00 00 00 = language codes for block 1..7 (none)
-------------------------------------------------------------------------------
libburn API calls for CD-TEXT (see libburn/libburn.h for details):
libburn can retrieve the set of text packs from a CD:
int burn_disc_get_leadin_text(struct burn_drive *d,
unsigned char **text_packs, int *num_packs,
int flag);
It can write a text pack set with a CD SAO session.
This set may be attached as array of readily formatted text packs by:
int burn_write_opts_set_leadin_text(struct burn_write_opts *opts,
unsigned char *text_packs,
int num_packs, int flag);
Alternatively it may be defined by attaching CD-TEXT attributes to burn_session
and burn_track:
int burn_session_set_cdtext_par(struct burn_session *s,
int char_codes[8], int copyrights[8],
int languages[8], int flag);
int burn_session_set_cdtext(struct burn_session *s, int block,
int pack_type, char *pack_type_name,
unsigned char *payload, int length, int flag);
int burn_track_set_cdtext(struct burn_track *t, int block,
int pack_type, char *pack_type_name,
unsigned char *payload, int length, int flag);
These attributes can then be converted into an array of text packs by:
int burn_cdtext_from_session(struct burn_session *s,
unsigned char **text_packs, int *num_packs,
int flag);
or they can be written as array of text packs to CD when burning begins and
no array of pre-formatted packs was attached to the write options by
burn_write_opts_set_leadin_text().
There are calls for inspecting the attached attributes:
int burn_session_get_cdtext_par(struct burn_session *s,
int char_codes[8], int copyrights[8],
int block_languages[8], int flag);
int burn_session_get_cdtext(struct burn_session *s, int block,
int pack_type, char *pack_type_name,
unsigned char **payload, int *length, int flag);
int burn_track_get_cdtext(struct burn_track *t, int block,
int pack_type, char *pack_type_name,
unsigned char **payload, int *length, int flag);
and for removing attached attributes:
int burn_session_dispose_cdtext(struct burn_session *s, int block);
int burn_track_dispose_cdtext(struct burn_track *t, int block);