From 565be458c74e0bb443e81761a0f950e7e04067b6 Mon Sep 17 00:00:00 2001 From: Thomas Schmitt Date: Thu, 2 Jan 2014 13:58:58 +0000 Subject: [PATCH] Clarified in man xorriso the roles of character sets --- xorriso/xorriso.1 | 44 ++++++++++------ xorriso/xorriso.info | 120 ++++++++++++++++++++++++------------------- xorriso/xorriso.texi | 47 +++++++++++------ 3 files changed, 126 insertions(+), 85 deletions(-) diff --git a/xorriso/xorriso.1 b/xorriso/xorriso.1 index 95284e3a..bec66fc1 100644 --- a/xorriso/xorriso.1 +++ b/xorriso/xorriso.1 @@ -9,7 +9,7 @@ .\" First parameter, NAME, should be all caps .\" Second parameter, SECTION, should be 1-8, maybe w/ subsection .\" other parameters are allowed: see man(7), man(1) -.TH XORRISO 1 "Version 1.3.5, Dec 28, 2013" +.TH XORRISO 1 "Version 1.3.5, Jan 02, 2014" .\" Please adjust this date whenever revising the manpage. .\" .\" Some roff macros, for reference: @@ -3308,30 +3308,42 @@ on differently nationalized terminals. The meanings of byte codes are defined in \fBcharacter sets\fR which have names. Shell command iconv \-l lists them. .br -Character sets should not matter as long as only english alphanumeric +The file names on hard disk are assumed to be encoded by the +\fBlocal character set\fR which is also used for the communication +with the user. +Byte codes 32 to 126 of the local character set must match the US\-ASCII +characters of the same code. ISO\-8859 and UTF\-8 fulfill this demand. +.br +By default, \fBxorriso\fR uses the character set as told by +shell command "locale" with argument "charmap". This may be influenced +by environment variables LC_ALL, LC_CTYPE, or LANG and should match the +expectations of the terminal. +In some situations it may be necessary to set it by command \-local_charset. +.br +Local character sets should not matter as long as only english alphanumeric characters are used for file names or as long as all writers and readers -of the media use the same character set. +of the media use the same local character set. Outside these constraints it may be necessary to let \fBxorriso\fR -convert byte codes. +convert byte codes from and to other character sets. .br -There is an input conversion from input character set to the local character -set which applies when an ISO image gets loaded. A conversion from local -character set to the output character set is performed when an -image tree gets written. The sets can be defined independently by commands +The Rock Ridge file names in ISO filesystems are assumed to be +encoded by the \fBinput character set\fR. +The Rock Ridge file names which get written with ISO filesystems will be +encoded by the \fBoutput character set\fR. +.br +The sets can be defined independently by commands \-in_charset and \-out_charset. Normally one will have both identical, if ever. +Other than the local character set, these two character sets may deviate +from US\-ASCII. .br -If conversions are desired then \fBxorriso\fR needs to know the name of the -local character set. \fBxorriso\fR can inquire the same info as -shell command -"locale" with argument "charmap". This may be influenced by environment -variables LC_ALL, LC_CTYPE, or LANG and should match the expectations of -the terminal. +The output character sets for Joliet and HFS+ are not influenced by these +commands. Joliet uses output character set UCS\-2 or UTF\-16. HFS+ uses UTF\-16. .br The default output charset is the local character set of the terminal where \fBxorriso\fR runs. So by default no conversion happens between local filesystem -names and emerging names in the image. The situation stays ambigous and the -reader has to riddle what character set was used. +names and emerging Rock Ridge names in the image. The situation stays +ambigous and the reader has to riddle what character set was used. .br By command \-auto_charset it is possible to attribute the output charset name to the image. This makes the situation unambigous. But if your terminal diff --git a/xorriso/xorriso.info b/xorriso/xorriso.info index 04f8eb97..ecbc82b5 100644 --- a/xorriso/xorriso.info +++ b/xorriso/xorriso.info @@ -2897,25 +2897,36 @@ the same byte string may appear as different peculiar national characters on differently nationalized terminals. The meanings of byte codes are defined in *character sets* which have names. Shell command iconv -l lists them. -Character sets should not matter as long as only english alphanumeric -characters are used for file names or as long as all writers and readers -of the media use the same character set. Outside these constraints it -may be necessary to let `xorriso' convert byte codes. -There is an input conversion from input character set to the local -character set which applies when an ISO image gets loaded. A conversion -from local character set to the output character set is performed when -an image tree gets written. The sets can be defined independently by -commands -in_charset and -out_charset. Normally one will have both -identical, if ever. -If conversions are desired then `xorriso' needs to know the name of the -local character set. `xorriso' can inquire the same info as shell -command "locale" with argument "charmap". This may be influenced by -environment variables LC_ALL, LC_CTYPE, or LANG and should match the -expectations of the terminal. +The file names on hard disk are assumed to be encoded by the *local +character set* which is also used for the communication with the user. +Byte codes 32 to 126 of the local character set must match the US-ASCII +characters of the same code. ISO-8859 and UTF-8 fulfill this demand. +By default, `xorriso' uses the character set as told by shell command +"locale" with argument "charmap". This may be influenced by environment +variables LC_ALL, LC_CTYPE, or LANG and should match the expectations +of the terminal. In some situations it may be necessary to set it by +command -local_charset. +Local character sets should not matter as long as only english +alphanumeric characters are used for file names or as long as all +writers and readers of the media use the same local character set. +Outside these constraints it may be necessary to let `xorriso' convert +byte codes from and to other character sets. +The Rock Ridge file names in ISO filesystems are assumed to be encoded +by the *input character set*. The Rock Ridge file names which get +written with ISO filesystems will be encoded by the *output character +set*. +The sets can be defined independently by commands -in_charset and +-out_charset. Normally one will have both identical, if ever. Other +than the local character set, these two character sets may deviate from +US-ASCII. +The output character sets for Joliet and HFS+ are not influenced by +these commands. Joliet uses output character set UCS-2 or UTF-16. HFS+ +uses UTF-16. The default output charset is the local character set of the terminal where `xorriso' runs. So by default no conversion happens between local -filesystem names and emerging names in the image. The situation stays -ambigous and the reader has to riddle what character set was used. +filesystem names and emerging Rock Ridge names in the image. The +situation stays ambigous and the reader has to riddle what character +set was used. By command -auto_charset it is possible to attribute the output charset name to the image. This makes the situation unambigous. But if your terminal character set does not match the character set of the local @@ -4902,7 +4913,7 @@ File: xorriso.info, Node: CommandIdx, Next: ConceptIdx, Prev: Legal, Up: Top * -cd sets working directory in ISO: Navigate. (line 7) * -cdx sets working directory on disk: Navigate. (line 16) * -changes_pending overrides change status: Writing. (line 13) -* -charset sets input/output character set: Charset. (line 43) +* -charset sets input/output character set: Charset. (line 54) * -check_md5 verifies file checksum: Verify. (line 154) * -check_md5_r verifies file tree checksums: Verify. (line 170) * -check_media reads media block by block: Verify. (line 21) @@ -4991,7 +5002,7 @@ File: xorriso.info, Node: CommandIdx, Next: ConceptIdx, Prev: Legal, Up: Top * -list_speeds lists available write speeds: Writing. (line 146) * -lns creates ISO symbolic link: Insert. (line 176) * -load addresses a particular session as input: Loading. (line 35) -* -local_charset sets terminal character set: Charset. (line 47) +* -local_charset sets terminal character set: Charset. (line 58) * -logfile logs output channels to file: Frontend. (line 20) * -ls lists files in ISO image: Navigate. (line 26) * -lsd lists files in ISO image: Navigate. (line 34) @@ -5138,10 +5149,10 @@ File: xorriso.info, Node: ConceptIdx, Prev: CommandIdx, Up: Top * cdrecord, Emulation: Emulation. (line 116) * Character Set, _definition: Charset. (line 6) * Character Set, for input, -in_charset: Loading. (line 116) -* Character Set, for input/output, -charset: Charset. (line 43) +* Character Set, for input/output, -charset: Charset. (line 54) * Character Set, for output, -out_charset: SetWrite. (line 276) * Character set, learn from image, -auto_charset: Loading. (line 122) -* Character Set, of terminal, -local_charset: Charset. (line 47) +* Character Set, of terminal, -local_charset: Charset. (line 58) * CHRP partition, _definition: Bootable. (line 158) * Closed media, _definition: Media. (line 43) * Comment, #: Scripting. (line 173) @@ -5227,6 +5238,7 @@ File: xorriso.info, Node: ConceptIdx, Prev: CommandIdx, Up: Top * Image, set volume set id, -volset_id: SetWrite. (line 185) * Image, set volume timestamp, -volume_date: SetWrite. (line 212) * Image, show id strings, -pvd_info: Inquiry. (line 115) +* Input Character Set, _definition: Charset. (line 25) * Insert, enable overwriting, -overwrite: SetInsert. (line 127) * Insert, file exclusion absolute, -not_paths: SetInsert. (line 55) * Insert, file exclusion from file, -not_list: SetInsert. (line 67) @@ -5255,6 +5267,7 @@ File: xorriso.info, Node: ConceptIdx, Prev: CommandIdx, Up: Top * Jigdo Template Extraction, _definition: Jigdo. (line 6) * LBA, _definition: Drives. (line 17) * List delimiter, _definition: Processing. (line 9) +* Local Character Set, _definition: Charset. (line 11) * MBR, _definition: Extras. (line 26) * MBR, set, -boot_image system_area=: Bootable. (line 126) * MD5, control handling, -md5: Loading. (line 183) @@ -5284,6 +5297,7 @@ File: xorriso.info, Node: ConceptIdx, Prev: CommandIdx, Up: Top * Navigate, tell disk working directory, -pwdx: Navigate. (line 23) * Navigate, tell ISO working directory, -pwd: Navigate. (line 20) * Next writeable address, -grow_blindly: AqDrive. (line 46) +* Output Character Set, _definition: Charset. (line 26) * Overwriteable media, _definition: Media. (line 14) * Ownership, global in ISO image, -uid: SetWrite. (line 282) * Ownership, in ISO image, -chown: Manip. (line 49) @@ -5426,37 +5440,37 @@ Node: SetWrite107824 Node: Bootable128409 Node: Jigdo144799 Node: Charset149046 -Node: Exception151808 -Node: DialogCtl157928 -Node: Inquiry160526 -Node: Navigate166843 -Node: Verify175141 -Node: Restore184173 -Node: Emulation191260 -Node: Scripting201562 -Node: Frontend209333 -Node: Examples218940 -Node: ExDevices220118 -Node: ExCreate220777 -Node: ExDialog222062 -Node: ExGrowing223327 -Node: ExModifying224132 -Node: ExBootable224636 -Node: ExCharset225188 -Node: ExPseudo226080 -Node: ExCdrecord226978 -Node: ExMkisofs227295 -Node: ExGrowisofs228635 -Node: ExException229770 -Node: ExTime230224 -Node: ExIncBackup230683 -Node: ExRestore234663 -Node: ExRecovery235596 -Node: Files236166 -Node: Seealso237465 -Node: Bugreport238188 -Node: Legal238769 -Node: CommandIdx239780 -Node: ConceptIdx256442 +Node: Exception152361 +Node: DialogCtl158481 +Node: Inquiry161079 +Node: Navigate167396 +Node: Verify175694 +Node: Restore184726 +Node: Emulation191813 +Node: Scripting202115 +Node: Frontend209886 +Node: Examples219493 +Node: ExDevices220671 +Node: ExCreate221330 +Node: ExDialog222615 +Node: ExGrowing223880 +Node: ExModifying224685 +Node: ExBootable225189 +Node: ExCharset225741 +Node: ExPseudo226633 +Node: ExCdrecord227531 +Node: ExMkisofs227848 +Node: ExGrowisofs229188 +Node: ExException230323 +Node: ExTime230777 +Node: ExIncBackup231236 +Node: ExRestore235216 +Node: ExRecovery236149 +Node: Files236719 +Node: Seealso238018 +Node: Bugreport238741 +Node: Legal239322 +Node: CommandIdx240333 +Node: ConceptIdx256995  End Tag Table diff --git a/xorriso/xorriso.texi b/xorriso/xorriso.texi index baed4f00..27fb406e 100644 --- a/xorriso/xorriso.texi +++ b/xorriso/xorriso.texi @@ -50,7 +50,7 @@ @c man .\" First parameter, NAME, should be all caps @c man .\" Second parameter, SECTION, should be 1-8, maybe w/ subsection @c man .\" other parameters are allowed: see man(7), man(1) -@c man .TH XORRISO 1 "Version 1.3.5, Dec 28, 2013" +@c man .TH XORRISO 1 "Version 1.3.5, Jan 02, 2014" @c man .\" Please adjust this date whenever revising the manpage. @c man .\" @c man .\" Some roff macros, for reference: @@ -3861,30 +3861,45 @@ on differently nationalized terminals. The meanings of byte codes are defined in @strong{character sets} which have names. Shell command iconv -l lists them. @* -Character sets should not matter as long as only english alphanumeric +@cindex Local Character Set, _definition +The file names on hard disk are assumed to be encoded by the +@strong{local character set} which is also used for the communication +with the user. +Byte codes 32 to 126 of the local character set must match the US-ASCII +characters of the same code. ISO-8859 and UTF-8 fulfill this demand. +@* +By default, @command{xorriso} uses the character set as told by +shell command "locale" with argument "charmap". This may be influenced +by environment variables LC_ALL, LC_CTYPE, or LANG and should match the +expectations of the terminal. +In some situations it may be necessary to set it by command -local_charset. +@* +Local character sets should not matter as long as only english alphanumeric characters are used for file names or as long as all writers and readers -of the media use the same character set. +of the media use the same local character set. Outside these constraints it may be necessary to let @command{xorriso} -convert byte codes. +convert byte codes from and to other character sets. @* -There is an input conversion from input character set to the local character -set which applies when an ISO image gets loaded. A conversion from local -character set to the output character set is performed when an -image tree gets written. The sets can be defined independently by commands +@cindex Input Character Set, _definition +The Rock Ridge file names in ISO filesystems are assumed to be +encoded by the @strong{input character set}. +@cindex Output Character Set, _definition +The Rock Ridge file names which get written with ISO filesystems will be +encoded by the @strong{output character set}. +@* +The sets can be defined independently by commands -in_charset and -out_charset. Normally one will have both identical, if ever. +Other than the local character set, these two character sets may deviate +from US-ASCII. @* -If conversions are desired then @command{xorriso} needs to know the name of the -local character set. @command{xorriso} can inquire the same info as -shell command -"locale" with argument "charmap". This may be influenced by environment -variables LC_ALL, LC_CTYPE, or LANG and should match the expectations of -the terminal. +The output character sets for Joliet and HFS+ are not influenced by these +commands. Joliet uses output character set UCS-2 or UTF-16. HFS+ uses UTF-16. @* The default output charset is the local character set of the terminal where @command{xorriso} runs. So by default no conversion happens between local filesystem -names and emerging names in the image. The situation stays ambigous and the -reader has to riddle what character set was used. +names and emerging Rock Ridge names in the image. The situation stays +ambigous and the reader has to riddle what character set was used. @* By command -auto_charset it is possible to attribute the output charset name to the image. This makes the situation unambigous. But if your terminal