code_page(5) — Macro Packages and Conventions

NAME

code_page, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861, cp862, cp863, cp865, cp866, cp869, cp874, cp949, dingbats, symbol − Coded character sets that are used on Microsoft Windows and NT systems

DESCRIPTION

Code pages are coded character sets that are used on Microsoft Windows, Windows 95, and NT systems. Just as there are different UNIX codesets, there are different PC code pages, each supporting a particular set of character encodings.

A DIGITAL UNIX system supplies one locale, en_US.cp850, that directly supports a PC code-page format (MS-DOS Latin 1). For all other locales, data in code-page format is supported only through codeset converters. These converters can be run directly by users or by software or applications that exchange data between PC and DIGITAL UNIX systems. Fonts and other kinds of character support are available only for the native UNIX codeset to which a code page can be converted. See the i18n_intro(5) reference page for introductory information on locales and codesets. See the iconv_intro(5) reference page for an introduction to codeset conversion and the name format and location of codeset converters.

The following table lists and describes the code pages that have conversion support on a DIGITAL UNIX system:

Code Page	Description
cp437	MS-DOS United States
cp737	Greek
cp775	Baltic languages
cp850	MS-DOS Multilingual (Latin 1)
cp852	MS-DOS Slavic (Latin 2)
cp855	IBM Cyrillic
cp857	IBM Turkish
cp860	MS-DOS Portuguese
cp861	MS-DOS Icelandic
cp862	Hebrew
cp863	MS-DOS Canadian French
cp865	MS-DOS Nordic languages
cp866	MS-DOS Russian
cp869	IBM Modern Greek
cp874	Thai
cp949	Korean
dingbats	Microsoft dingbat characters
symbol	Microsoft miscellaneous symbol characters

In all cases, a code page can be converted to and from the UCS-2, UCS-4, and UTF-8 codesets. In addition, some code pages can be converted directly to ISO codesets as shown in the following table.

Code Page	Can Be Converted Directly to:
cp437	ISO8859-1
cp737	ISO8859-7
cp775	ISO8859-4
cp850	ISO8859-1
cp852	ISO8859-2
cp855	ISO8859-5
cp857	ISO8859-9
cp860	ISO8859-1
cp861	ISO8859-1
cp862	ISO8859-8
cp863	ISO8859-1
cp865	ISO8859-1
cp866	ISO8859-5
cp869	ISO8859-7
cp874	TACTIS

See Unicode(5) for information about UCS-2, UCS-4, and UTF-8. Reference pages for UNIX implementations of the ISO codesets have the name format iso8859-number(5).

There are no codeset converters for Chinese and Japanese code pages because identical character encoding is provided in existing UNIX codesets. For Traditional Chinese, character encoding in PC code pages is identical to that in the Big-5 (big5) codeset. For Simplified Chinese, character encoding in PC code pages is identical to that in the DEC Hanzi (dechanzi) codeset. For Japanese, character encoding in PC code pages is identical to that in the Shift JIS (SJIS) codeset.

Caution

Conversion of text that starts out in code-page format to the DEC Korean (deckorean) codeset may result in loss of data. All of the DIGITAL UNIX codeset equivalents for the cp949 (Korean) code page support all the Hanja and miscellaneous characters also supported by the code page. However, only the UCS-2, UCS-4, and UTF-8 codesets support the complete set of Hangul characters supported by the cp949 code page. The deckorean codeset supports only a subset of these Hangul characters. Therefore, if data is converted from cp949 format to UCS-2, UCS-4, or UTF-8, no data is lost. However, if the data is then converted from UCS-2, UCS-4, or UTF-8 to deckorean, the unsupported Hangul characters will be lost.

Museum

Related Articles

code_page(5) — Macro Packages and Conventions

NAME

DESCRIPTION

SEE ALSO