Museum

Home

Lab Overview

Retrotechnology Articles

⇒ Online Manual

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

insertmsg(1)

gencat(1)

catgets(3C)

catopen(3C)

setlocale(3C)

wconv(3X)

wctype(3X)

wstring(3X)

environ(5)

lang(5)

hpnls(5)

NAME

hpnls − HP Native Language Support (NLS) Model

DESCRIPTION

Native Language Support (NLS) reduces or eliminates the barriers that would otherwise make HP-UX difficult to use in a non-English-speaking work environment.  NLS is available at the user-command level as well as through commands and libraries that can be used to develop international software applications. 

Many existing C library routines have been modified to operate based upon a program’s locale.  A locale is the run-time NLS environment of a program which is loaded by setlocale() (see setlocale(3C)). For a complete list of what library routines are affected by setlocale(), see setlocale(3C)).

In addition to routines that operate based on the program’s locale, there are also commands and routines to provide a messaging system for accessing program messages based on the language requirements of the end-user. 

Many HP-UX commands have been modified to operate in a manner sensitive to the language requirements of the end-user.  These language requirements are established through the internationalization environment variables (see environ(5)). The EXTERNAL INFLUENCES/Environment Variables section of the manual entry for each command that has NLS capabilities describes which environment variables the command is sensitive to. 

In addition, the portnls routines are a set of library routines that perform miscellaneous language-dependent operations.  portnls is intended to provide portability between HP-UX and MPE (another HP operating system).  See portnls(5) for more information.

Below are areas of functionality that are considered language-sensitive :

Character Handling
NLS provides for handling characters outside the 7-bit USASCII codeset.  Most languages require a minimum of 8-bits to support all the characters needed to communicate in that language.  Characters must be handled according to the requirements of the language they represent. 

Codesets with 8-bit characters have been defined to support phonetic languages, such as the Western European languages.  The use of an 8-bit character allows for an additional 128 characters beyond the USASCII codeset. 

More than 8 bits are needed to uniquely define codes for characters required by ideographic languages such as Japanese.  For such languages, multibyte codesets are used in which a character is represented by a sequence of one or more bytes.  Multibyte codesets are defined according to the rules of a multibyte encoding scheme.  Encoding schemes define the particular sequences of byte values that can be used to form characters.  The EUC encoding scheme is supported by HP-UX.  However, only the one- and two-byte forms of EUC are currently supported.  Refer to the Native Language Support User’s Guide for more information about EUC. 

Character Classification
Characters have many attributes associated with them. For example, characters may be classified as printable, alphabetic, numeric, etc. These attributes are commonly referred to as ctype characteristics. Characters and their associated attributes differ between languages. Character processing that depends on character classification must be sensitive to these differences.

Shifting
The notion of uppercase and lowercase differs between languages. For example, in some languages accents are discarded when characters are shifted to uppercase. Some languages have no notion of uppercase and lowercase characters. For example, shifting a character has no effect in ideographic languages.

Collating
Collating sequences differ between languages and most languages require multiple collating sequences. The following collation features are available to provide a full “dictionary-” or “context-based” language-dependent comparison :

Two-to-one conversions
Some languages, such as Spanish, require two adjacent characters to occupy one position in the collating sequence. Examples are CH (which follows C) and LL (which follows L). 

One-to-two conversions
Some languages, such as German, require one character (such as “sharp S”) to occupy two adjacent positions in the collating sequence.

Don’t-care characters
Some languages designate certain characters to be ignored in character comparisons. For example, if - is a don’t-care character, the strings REACT and RE-ACT would equal each other when compared. 

Uppercase/lowercase and accent priority
Many languages require a “two-pass” collating algorithm. In the first pass, accents are stripped from their letters and the resulting two strings are compared. If they are equal, a second pass with the accents reinserted is performed to break the tie. Uppercase/lowercase differences can also be first ignored then used to break ties in this fashion.

Two common methods of collation for phonetic languages are folded and nonfolded. A folded collating sequence is made up of the uppercase and lowercase characters intermixed. An unfolded collating sequence is made up of all the uppercase characters followed by the lowercase characters. For example, collating the characters a b c A B C with folded collation would result in the following order :

A a B b C c

Collating the same characters with unfolded collation would result in the following order :

A B C a b c

For languages in which folded and unfolded collation methods are defined, HP-UX uses folded as the default.  The setlocale modifier nofold can be used to enable the nonfolded collating method (see environ(5)). The nlsinfo command reports the collating methods supported for each language (see nlsinfo(1)).

Directionality
Two properties of text files and Native Languages must be understood to process text in non-Western languages. They are the mode of the language, and the order of the characters.

Mode refers to the direction that a language is naturally read.  European languages read from left to right, some Middle Eastern languages read from right to left, and Far Eastern languages usually use vertical columns, beginning from the right. 

Order describes the order in which characters are written, stored in a file, or displayed.  Keyboard order refers to the order of keystrokes by a user.  Screen order refers to the order in which characters are displayed on a terminal screen or printed. 

Screen order can differ from keyboard order when using a terminal that supports mixing Latin and non-Latin text, each requiring different directionality.  In the following example, the text mode is right-to-left; n represents a non-Latin character, l represents a Latin character, and the numbers represent the order in which the sequence is typed. 

In keyboard order, the letters would be stored in a file as follows:

n1 n2 n3 l4 l5 l6

In screen order, the letters would be stored in a file as follows:

n1 n2 n3 l6 l5 l4

However, both screen-order and key-order sequences would look identical on the screen because the terminal would be configured to display the characters properly according to the directionality requirements of both the Latin and non-Latin languages. 

Local Customs
NLS supports customs that are specific to a particular geographic region such as representation of numeric and monetary data, date, and time.  These customs can differ not only between languages, but also between regions that share a common language. 

Representation of numbers
The character used to denote the radix of a decimal number varies for different regions. Similarly the use of a "thousands" indicator or grouping of digits can vary with local custom. Characters used to represent digits can also vary for different regions.

Monetary representation
The currency symbol and the formatting of monetary values varies from country to country. For instance, the symbol can either precede or follow the monetary value. Some currencies allow decimal fractions while others use alternate methods of representing smaller monetary values.

Date and time representation
While the Gregorian calendar is most common, some countries use other methods for determining meridian day and year, usually based on seasonal, astronomical, or historical events. Month and weekday names as well as the format of date and time varies from country to country. Even when a strictly numeric date/time representation is used, the order of year, month and day, and the delimiters that separate them, is not universal.

The HP-UX system clock runs on Coordinated Universal Time.  Time zone adjustments for a particular regions can be specified through the TZ environment variable (see environ(5)).

Messages
Messages issued by a program should be sensitive to the language of the end-user. NLS provides a messaging facility for extracting hard-coded strings (messages) from an application source code and storing them externally to the code.  Utilities are provided to aid the translation of messages such that at runtime the program accesses messages that coincide with the end-user’s native language. 

FILES

/usr/lib/nls/*

AUTHOR

hpnls was developed by HP. 

SEE ALSO

insertmsg(1), gencat(1), catgets(3C), catopen(3C), setlocale(3C), wconv(3X), wctype(3X), wstring(3X), environ(5), lang(5). 

Native Language Support User’s Guide.

For additional information, see the EXTERNAL INFLUENCES/Environment Variables section of applicable manual entries for commands and library routines. 

Hewlett-Packard Company  —  HP-UX Release 9.0: August 1992

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026