HPNLS(7)

NAME

hpnls − HP Native Language Support (NLS) Model

SYNOPSIS

ls /usr/lib/nls/*

HP−UX COMPATIBILITY

Level: HP−UX/STANDARD

Origin: HP

DESCRIPTION

The HP Native Language Support (NLS) model includes several capabilities that reduce or eliminate the barriers that would otherwise make HP−UX difficult to use in a non-English language. The three main categories, Character Set Support, Local Customs, and Messages, are subdivided into smaller categories in order to adequately reflect the extent of the Native Language Support.

CHARACTER SET SUPPORT −
A major NLS objective is to provide capabilities for adapting character sequences to local language needs.

CHARACTER CODE SIZE −
The length of the character code governs the number of distinct characters that can be included in the character set.

7−BIT −
The ASCII character set consists of 33 control characters including DEL, space, and 94 printable characters. (See ascii(7).) This is sufficient to span the Latin alphabet, upper and lowercase, plus punctuation and special symbols. Seven bits of information is sufficient to distinguish the characters in such a set.

8−BIT −
The use of an 8 bit character code allows 67 control codes, space, and 188 printable characters. In the case of European characters, this provides sufficient space for accented vowels, consonants with special forms, and other special symbols. (See roman8(7)). This is also sufficient to hold the phonetic Japanese character set Katakana. (See kana8(7).)

16−BIT −
A number of languages have very large character sets that require more than the 188 printable characters provided by the 8−bit character codes. Sixteen−bit character codes are available for these languages. To simplify processing, 16−bit printable characters are formed from pairs of 8−bit printable characters (neither byte may contain a control code or a space). This allows representation of up to 35344 characters.

CHARACTER TYPING −
Character processing which depends on character type must take into account the character type changes that vary with the character set being used. For example, an alphabetic character in the ROMAN8 character set may align with a punctuation character in the KANA8 set.

SHIFTING −
While the ROMAN8 character set has uppercase and lowercase for most alphabetic characters, some languages discard accents when characters are shifted to uppercase. Other alphabetic characters may not be shifted at all, when there is no notion of "case" in the underlying language.

COLLATING −
The ASCII collation order, while generally tolerated, is not adequate for American dictionary usage. Different languages sort characters from the ROMAN8 set in different orders. Some languages require that character pairs, such as "ch" and "ll" in Spanish, be sorted as single characters. Ideographic character sets may have multiple orderings. For example, Japanese kanjis may be sorted in phonetic order; in a different order based on the number of strokes in the ideogram; or according, first, to the radical (root) of the character and, second, to the number of strokes added to the radical.

DIRECTIONALITY −
The assumption that displayed text goes from left to right does not hold for all languages. Some Middle Eastern languages go from right to left. Far Eastern languages usually use vertical columns, starting from the right.

CODING SCHEME CONSIDERATIONS −
Although most HP supported 8−bit character sets preserve the ASCII codes in the range of 0 to 127, 16−bit character sets may use these byte values in 2−byte characters. Software that assigns special meaning to bytes (metacharacters) in this range must distinguish between 1−byte and 2−byte characters. In multilingual environments, standard escape code sequences are used to indicate change to alternate character sets. Since these sequences are not usually printed or displayed, the number of characters output is usually less than the number of bytes in the sequence. Any software that must locate a character within a sequence must accommodate this.

LOCAL CUSTOMS −
Some aspects of Native Language Support relate more to local customs of a particular geographic location than to the characters used to write the language.

REPRESENTATION OF NUMBERS −
The character used to denote the radix of a decimal number varies for different regions. Similarly the use of a "thousands" indicator or grouping of (usually three) digits may vary with local custom.

CURRENCY REPRESENTATION −
The symbol for currency varies from country to country. The symbol may either precede or follow the numeric value. Some currencies allow decimal fractions while others use alternate methods of representing smaller monetary values.

DATE AND TIME REPRESENTATION −
Month and weekday names vary with language (if they are not omitted entirely). Abbreviations may be other than three characters, or may not be allowed at all. Even when a strictly numeric representation is used, the order of year, month, and day as well as the delimiters which separate them is not universal.

DATE AND TIME ADJUSTMENTS −
The HP−UX system clock runs on Greenwich Mean Time (GMT). Corrections to local time zones consist of adding or subtracting whole or fractional hours from GMT. The Gregorian calendar is most common, but some locales use different methods for determining meridian day and year; usually based on seasonal, astronomical, or historical events.

MESSAGES −
The need for messages to be readable by users is perhaps the most significant justification for implementing Native Language Support.

MESSAGE CONTENT −
Error messages, prompts, expected responses, and mnemonic command names should be based on the user’s native language.

MESSAGE STRUCTURE −
Messages must often be built from segments. To accommodate grammatical differences, it may be necessary to change the order in which the fragments are connected.

EXAMPLE

A "fully localized" version of "pr" would

Never strip the 8th bit of a character code.

Properly format the date in each page header.

Use the message catalog system to select user error messages.

FILES

/usr/lib/nls/*

Museum

Related Articles