iconv_ibmkanji(5) — Macro Packages and Conventions

NAME

iconv_ibmkanji − Specification for controlling conversion between IBM Kanji and Tru64 UNIX Japanese codesets

DESCRIPTION

The iconv utility supports the ability to convert the encoding of characters between IBM Kanji System Characters (IBM Kanji) and one of the following Tru64 UNIX codesets: DEC Kanji, Super DEC Kanji, Japanese EUC, or Shift JIS. You choose the type of conversion by specifying the appropriate values for the utility’s from-code and to-code parameters, as follows:

Type of Code Conversion	from-code	to-code
IBM Kanji to DEC Kanji	ibmkanji	deckanji
IBM Kanji to Super DEC Kanji	ibmkanji	sdeckanji
IBM Kanji to Japanese EUC	ibmkanji	eucJP
IBM Kanji to Shift JIS	ibmkanji	SJIS
DEC Kanji to IBM Kanji	deckanji	ibmkanji
Super DEC Kanji to IBM Kanji	sdeckanji	ibmkanji
Japanese EUC to IBM Kanji	eucJP	ibmkanji
Shift JIS to IBM Kanji	SJIS	ibmkanji

Conversion behavior for the following items is affected by the definition of environment variables or profile entries in the user’s environment. For more information, see the “Environment Variables” and “Profile” sections.

•The UDC (User-Defined Character) mapping table that is used for UDC conversion

This table must be an ASCII text file that contains UDC mapping information. The table affects conversion of user-defined characters between the codesets.

•The EBCDIC to/from ISO code (ASCII, JIS Roman characters) mapping table that is used for conversion

This table must be ASCII text file that contains information on how to map characters between EBCDIC and ISO code.

•The K-shift code

This is a one- or two-byte hexadecimal code that marks the beginning of Kanji mode.

•The A-shift code

This is a one- or two-byte hexadecimal code that marks the beginning of EBCDIC mode.

•The status of the initial mode (Kanji or EBCDIC) at the time iconv command starts or the first time the iconv() function is called after calling the iconv_open() function that initializes the converter in a program

The status keywords are either kanji_mode or ebcdic_mode.

•How to treat undefined characters when these are detected in Kanji mode

Specify this action by using one of the following keywords:

abortStop codeset conversion.

passOutput the undefined characters without any processing and continue codeset conversion.

replaceOutput padding characters instead of the undefined characters and continue codeset conversion.

dismissIgnore the undefined characters and continue codeset conversion.

•The two-byte padding character used in Kanji mode

This value is meaningful when replace is chosen for the processing of undefined characters in Kanji mode. Specify the padding character by its hexadecimal value.

•How to treat undefined characters when these are detected in EBCDIC mode

Specify this action by using one of the following keywords:

abort
Stop codeset conversion.

pass
Output the undefined characters without any processing and continue codeset conversion.

replace
Output padding characters instead of the undefined characters and continue codeset conversion.

dismiss
Ignore the undefined characters and continue codeset conversion.

•The one-byte padding character used in EBCDIC mode

This value is meaningful when replace is chosen for the processing of undefined characters in EBCDIC mode. Specify the padding character by its hexadecimal value.

When the to-code parameter for the conversion is ibmkanji, you can also specify the following items for conversion behavior:

•Whether the initial shift code is output at the start of conversion if the status of the initial mode (Kanji or EBCDIC) is different from the mode of the first input character

The start of conversion is the time the iconv utility starts processing, or when the iconv() function is called just after opening the converter with iconv_open(). Keyword values for this item are yes or no.

•Whether or not the utility outputs the last shift code when iconv() is called with a zero length input string, and the current mode (Kanji or EBCDIC) is different from the mode specified by the last shift state

Keyword values for this item are yes or no.

•The last status (Kanji mode or EBCDIC mode)

Specify kanji_mode or ebcdic_mode for this value. It is meaningful only when yes is the setting for whether the utility outputs the last shift code.

If the items that control conversion behavior are specified by both environment variables and the profile file, values set by environment variables override values set by comparable entries in the profile. Note that values for all conversion control items are case-sensitive, whether they are set by environment variables or in the profile. The following table contains the default values for each conversion control item:

Conversion Control Item	Default Value
UDC mapping table	None
K shift code	0x0e
A shift code	0x0f
Initial state	ebcdic_mode
Processing for undefined characters
in Kanji mode	abort
Processing for undefined characters
in EBCDIC mode	pass

The default padding characters are white spaces, whose code values for each destination codeset are noted in the following table. These padding characters are output when you specify replace for processing of undefined characters and do not explicitly specify the padding character.

Mode	Default Value	Destination Codeset
Kanji mode	0x44e9	ibmkanji
	0xa1a1	deckanji, sdeckanji,
		or eucJP
	0x8140	SJIS
EBCDIC mode	0x40	ibmkanji
	0x20	deckanji, sdeckanji,
		eucJP, or SJIS

The default EBCDIC−ISO mapping table is as follows;

•For conversion from IBM Kanji to other codesets: /usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl

•For conversion from other codesets to IBM Kanji: /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl

These mapping tables map both EBCDIC and ISO code, which includes JIS Roman characters. The kana_ebcdic.tbl mapping table also maps ISO lowercase characters to EBCDIC uppercase characters.

The following default values for conversion control items are meaningful when the iconv utility’s to-code conversion parameter is ibmkanji:

Conversion Control Item	Default
Output the initial shift code?	yes
Output the last shift code?	yes
Output the last status?	ebcdic_mode

Environment Variables

This section discusses the environment variables that you can set to control conversion behavior. The names for these variables adhere to the following format:

fromcode_tocode_controlitem

The name segments for fromcode or tocode can be one of the following key words:

For Codeset:	Use:
IBM Kanji	IBMKANJI
DEC Kanji	DECKANJI
Super DEC Kanji	SDECKANJI
Japanese EUC	EUCJP
Shift JIS	SJIS

The name segments for controlitem can be one of the following keywords:

For Control Item:	Use:
UDC mapping table	UDC_TABLE
EBCDIC-ISO mapping table	EBCDIC_TABLE
K shift code	K_SHIFT_CODE
A shift code	A_SHIFT_CODE
Initial state	INITIAL_STATE
Processing of undefined characters
in Kanji mode	KANJI_EXCEPT_PROC
Processing of undefined characters
in EBCDIC mode	EBCDIC_EXCEPT_PROC
Padding characters
in Kanji mode	PADDING_2BYTE_CHAR
Padding characters
in EBCDIC mode	PADDING_1BYTE_CHAR
Output initial
shift code	INITIAL_SHIFT_CODE
Output last
shift code	TRAILER_SHIFT_CODE
Last status	LAST_STATE
File path of the profile	PROFILE

Following are examples of using the setenv C shell command to define environment variables to control conversion behavior. In these examples, the fromcode name segment indicates Japanese EUC and the tocode name segment indicates IBM Kanji:

setenv EUCJP_IBMKANJI_UDC_TABLE eucjp_ibmkanji_udc.tbl
setenv EUCJP_IBMKANJI_EBCDIC_TABLE kana_ebcdic.tbl
setenv EUCJP_IBMKANJI_K_SHIFT_CODE 0x0e
setenv EUCJP_IBMKANJI_A_SHIFT_CODE 0x0f
setenv EUCJP_IBMKANJI_INITIAL_STATE ebcdic_mode
setenv EUCJP_IBMKANJI_KANJI_EXCEPT_PROC replace
setenv EUCJP_IBMKANJI_EBCDIC_EXCEPT_PROC replace
setenv EUCJP_IBMKANJI_PADDING_2BYTE_CHAR 0x44e9
setenv EUCJP_IBMKANJI_PADDING_1BYTE_CHAR 0x40
setenv EUCJP_IBMKANJI_INITIAL_SHIFT_CODE yes
setenv EUCJP_IBMKANJI_TRAILER_SHIFT_CODE yes
setenv EUCJP_IBMKANJI_LAST_STATE ebcdic_mode
setenv EUCJP_IBMKANJI_INITIAL_SHIFT_CODE yes
setenv EUCJP_IBMKANJI_TRAILER_SHIFT_CODE yes
setenv EUCJP_IBMKANJI_LAST_STATE ebcdic_mode
setenv EUCJP_IBMKANJI_PROFILE .eucjp_ibmkanji_profile

Directory Search Path

When you specify a file name without a directory, the iconv utility searches the following directories and uses the first file found:

1.Current directory

2.Home directory

3.The iconv/data subdirectory of the directory specified by the environment variable LOCPATH

4./usr/lib/nls/loc/iconv/data

5./usr/i18n/lib/nls/loc/iconv/data

If you specify a relative directory path for a file, the utility searches these same directories in the same order and uses the first file found.

Profile File

Entry lines in the profile file adhere to the following format:

entry_name string_value

The entry_name and string_value fields are separated by spaces or tabs. Do not append a colon (:) after entry_name. The file can also include blank lines and comment entries, which begin with the # character.

Following are the entry_name values for different conversion control items:

Conversion Control Item	entry_name
UDC mapping table	udc_mapping_table
EBCDIC−ISO mapping table	ebcdic_mapping_table
K shift code	k_shift_code
A shift code	a_shift_code
Initial state	initial_state
Processing undefined characters
in Kanji mode	kanji_except_proc
Processing undefined characters
in EBCDIC mode	ebcdic_except_proc
Padding character
in Kanji mode	padding_2byte_char
Padding character
in EBCDIC mode	padding_1byte_char
Output initial
shift code	output_initial_shift_code
Output last
shift code	output_trailer_shift_code
Last state	last_state

Following is a sample profile for converting from Japanese EUC to IBM Kanji.

#
# sample profile for eucJP_ibmkanji
#
udc_mapping_table          eucjp_ibmkanji_udc.tbl
ebcdic_mapping_table       kana_ebcdic.tbl
k_shift_code               0x0e         # ebcdic -> kanji
a_shift_code               0x0f         # kanji -> ebcdic
initial_state              ebcdic_mode
kanji_except_proc          replace
ebcdic_except_proc         replace
padding_2byte_char         0x44e9       # kanji mode
padding_1byte_char         0x40         # ebcdic mode
output_initial_shift_code yes
output_trailer_shift_code yes
last_state                 ebcdic_mode

The default file names for the profile are as follows;

Code Conversion	Default Profile Name

IBM Kanji to DEC Kanji	.ibmkanji_deckanji_profile
IBM Kanji to Super DEC Kanji	.ibmkanji_sdeckanji_profile
IBM Kanji to Shift JIS	.ibmkanji_sjis_profile
IBM Kanji to Japanese EUC	.ibmkanji_eucjp_profile

DEC Kanji to IBM Kanji	.deckanji_ibmkanji_profile
Super DEC Kanji to IBM Kanji	.sdeckanji_ibmkanji_profile
Shift JIS to IBM Kanji	.sjis_ibmkanji_profile
Japanese EUC to IBM Kanji	.eucjp_ibmkanji_profile

By default, the iconv utility checks the directory search path mentioned in the "Directory Search Path" section and uses the first profile it finds. However, you can also specify an arbitrary file path for your profile instead of the default names by defining the following environment variables:

Code Conversion	Profile Path Environment Variable
IBM Kanji to DEC Kanji	IBMKANJI_DECKANJI_PROFILE
IBM Kanji to Super DEC Kanji	IBMKANJI_SDECKANJI_PROFILE
IBM Kanji to Shift JIS	IBMKANJI_SJIS_PROFILE
IBM Kanji to Japanese EUC	IBMKANJI_EUCJP_PROFILE

DEC Kanji to IBM Kanji	DECKANJI_IBMKANJI_PROFILE
Super DEC Kanji to IBM Kanji	SDECKANJI_IBMKANJI_PROFILE
Shift JIS to IBM Kanji	SJIS_IBMKANJI_PROFILE
Japanese EUC to IBM Kanji	EUCJP_IBMKANJI_PROFILE

UDC Mapping Table

Entries in a UDC mapping table adhere to the following format:

fromcode tocode

Each of these values is a two-byte hexadecimal number. In the case of Super DEC Kanji and Japanese EUC, three-byte hexadecimal values that begin with SS3 (0x8f), such as 0x8fxxxx, are also valid.

You can specify ranges of UDC from and to values in the same file entry by using a hyphen to separate the codes that start and end each range:

start_fromcode-end_fromcode start_tocode-end_tocode

When specifying entries that include ranges of values, the number of codes in the from range must always equal the number of codes in the to range. A UDC mapping table can also include blank lines and comment lines, which begin with the # character. Following is an example of a UDC mapping table:

# ibmkanji            eucJP
0x6941-0x72fe         0xf5a1-0xfefe           # udc
0x7341-0x7cfe         0x8ff5a1-0X8ffefe       # udc
0x7d41-0x7ffe         0x8feea1-0X8ff0fe       # udc

The first entry in this file specifies a range of IBM Kanji values from 0x6941 to 0x72fe that are mapped to Japanese EUC code values in the range 0xf5a1 to 0xfefe. You can find additional sample UDC mapping table files in the /usr/i18n/examples/iconv/data directory.

EBCDIC−ISO Mapping Table

Entries in an EBCDIC−ISO mapping table adhere to the following format:

fromcode tocode

Each code is a one-byte hexadecimal number. You can specify a range of character codes as follows:

start_fromcode-end_fromcode start_tocode-end_tocode

When using the range format, the number of hex values in the from range must be the same as the number of hex values in the to range.

The EBCDIC−/ISO mapping table can also include blank lines and comment entries, which begin with the # character.

Following is an example of EBCDIC−ISO code mapping table:

# EBCDIC                Kana
0x40                    0x20            # space
0x4f                    0x21            # ’!’
0x7f                    0x22            # ’"’
.                       .
.                       .
.                       .
0xc1-0xc9               0x41-0x49       # ’A’ - ’I’
0xd1-0xd9               0x4a-0x52       # ’J’ - ’R’
0xe2-0xe9               0x53-0x5a       # ’S’ - ’Z’
.                       .
.                       .
.                       .

In this example, the first column of values are from codes and the second column of values are to codes. The first three value entry lines specify mapping for single characters, whereas the last three value entry lines specify mapping for ranges of characters. You can find additional sample EBCDIC−ISO mapping tables in the /usr/i18n/lib/nls/loc/iconv/data directory.

NOTES

This reference page contains code conversion specifications that apply only to conversion between IBM Kanji System characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_JEF(5) for code conversion specifications between Fujitsu JEF characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_KEIS(5) for code conversion specifications between Hitachi KEIS characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_intro(5) for information about conversion between DEC Kanji, Super DEC Kanji, Japanese EUC, Shift JIS, and other Tru64 UNIX codesets.

Museum

Related Articles