sort(1)

NAME

sort − sort or merge files

SYNOPSIS

sort [-m] [-o output] [-bdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z recsz] [-T dir] [file ...]

sort [-c] [-bdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z recsz] [-T dir] [file ...]

Obsolete forms:

sort [-mu] [-o output] [-bdfilnrM] [-t char] [-y [kmem]] [-z recsz] [-T dir] [+ pos1 [- pos2]] [file ...]

sort [-c] [-u] [-bdfilnrM] [-t char] [-y [kmem]] [-z recsz] [-T dir] [+ pos1 [- pos2]] [file ...]

DESCRIPTION

sort performs one of the following functions:

1. Sorts lines of all the named files together and writes the result to the specified output.

2. Merges lines of all the named (presorted) files together and writes the result to the specified output.

3. Checks that a single input file is correctly presorted.

The standard input is read if - is used as a file name or no input files are specified.

Comparisons are based on one or more sort keys extracted from each line of input. By default, there is one sort key, the entire input line. Ordering is lexicographic by characters using the collating sequence of the current locale. If the locale is not specified or is set to the POSIX locale, then ordering is lexicographic by bytes in machine-collating sequence. If the locale includes multi-byte characters, single-byte characters are machine-collated before multi-byte characters.

Behavior Modification Options

The following options alter the default behavior:

-c Check that the single input file is sorted according to the ordering rules. No output is produced; the exit code is set to indicate the result.

-m Merge only; the input files are assumed to be already sorted.

-o output The argument given is the name of an output file to use instead of the standard output. This file can be the same as one of the input files.

-u Unique: suppress all but one in each set of lines having equal keys. If used with the -c option, check to see that there are no lines with duplicate keys, in addition to checking that the input file is sorted.

-y [kmem] The amount of main memory used by the sort can have a large impact on its performance. If this option is omitted, sort begins using a system default memory size, and continues to use more space as needed. If this option is presented with a value, kmem, sort starts using that number of kilobytes of memory, unless the administrative minimum or maximum is violated, in which case the corresponding extremum will be used. Thus, -y 0 is guaranteed to start with minimum memory. By convention, -y (with no argument) starts with maximum memory.

-z recsz The size of the longest line read is recorded in the sort phase so that buffers can be allocated during the merge phase. If the sort phase is omitted via the -c or -m options, a popular system default size will be used. Lines longer than the buffer size will cause sort to terminate abnormally. Supplying the actual number of bytes in the longest line to be merged (or some larger value) will prevent abnormal termination.

-T dir Use dir as the directory for temporary scratch files rather than the default directory, which is is one of the following, tried in order: the directory as specified in the TMPDIR environment variable; /usr/tmp, and finally, /tmp.

Ordering Rule Options

When ordering options appear before restricted sort key specifications, the ordering rules are applied globally to all sort keys. When attached to a specific sort key (described below), the ordering options override all global ordering options for that key.

The following options override the default ordering rules:

-d Quasi-dictionary order: only alphanumeric characters and blanks (spaces and tabs), as defined by LC_CTYPE are significant in comparisons (see environ(5)). The -d option is ignored for languages with multi-byte characters; all characters are significant.

-f Fold letters. Prior to being compared, all lowercase letters are effectively converted into their uppercase equivalents, as defined by LC_CTYPE. The -f option is ignored for languages with multi-byte characters; all characters are collated unfolded.

-i In non-numeric comparisons, ignore all characters which are non-printable, as defined by LC_CTYPE. For the ASCII character set, octal character codes 001 through 037 and 0177 are ignored. For languages with multi-byte characters, the -i option is ignored.

-n The sort key is restricted to an initial numeric string consisting of optional blanks, an optional minus sign, zero or more digits with optional radix character, and optional thousands separators. The radix and thousands separator characters are defined by LC_NUMERIC. The field is sorted by arithmetic value. An empty (missing) numeric field is treated as arithmetic zero. Leading zeros and plus or minus signs on zeros do not affect the ordering. The -n option implies the -b option (see below).

-r Reverse the sense of comparisons.

-l This option is ignored. Previously it was used to activate sorting using the collation rules associated with the user’s LANG variable (see environ(5)). Language-sensitive collation is now the standard behavior.

-M Compare as months. The first several non-blank characters of the field are folded to uppercase and compared with the langinfo(3C) items ABMON_1 < ABMON_2 < ... < ABMON_12. An invalid field is treated as being less than ABMON_1 string. For example, American month names are compared such that JAN < FEB < ... < DEC. An invalid field is treated as being less than all months. The -M option implies the -b option (see below).

Field Separator Options

The treatment of field separators can be altered using the options:

-t char Use char as the field separator character; char is not considered to be part of a field (although it can be included in a sort key). Each occurrence of char is significant (for example, <char><char> delimits an empty field). If -t is not specified, <blank> characters will be used as default field separators; each maximal sequence of <blank> characters that follows a non-<blank> character is a field separator.

-b Ignore leading blanks when determining the starting and ending positions of a restricted sort key. If the -b option is specified before the first -k option (+ pos1 argument), it is applied to all -k options (+ pos1 arguments). Otherwise, the -b option can be attached independently to each -k field_start or field_end option (+ pos1 or (- pos2 argument; see below). Note that the -b option is only effective when restricted sort key specifications are given.

Restricted Sort Key

-k keydef The keydef argument defines a restricted sort key. The format of this definition is

field_start[type][ ,field_end [type]]

which defines a key field beginning at field_start and ending at field_end. The characters at positions field_start and field_end are included in the key field, providing that field_end does not precede field_start. A missing field_end means the end of the line. Fields and characters within fields are numbered starting with 1. Note that this is different than the obsolete form of restricted sort keys, where numbering starts at 0. See WARNINGS below.

Specifying field_start and field_end involves the notion of a field, a minimal sequence of characters followed by a field separator or a new-line. By default, the first blank of a sequence of blanks acts as the field separator. All blanks in a sequence of blanks are considered to be part of the next field; for example, all blanks at the beginning of a line are considered to be part of the first field.

The arguments field_start and field_end each have the form m.n which are optionally followed by one or more of the type options b, d, f, i, n, r, or M. These modifiers have the functionality for this key only, that their command-line counterparts have for the entire record.

A field_start position specified by m.n is interpreted to mean the nth character in the mth field. A missing n means .1, indicating the first character of the mth field. If the -b option is in effect, n is counted from the first non-blank character in the mth field.

A field_end position specified by m.n is interpreted to mean the nth character in the mth field. If n is missing, the mth field ends at the last character of the field. If the -b option is in effect, n is counted from the first non-<blank> character in the mth field.

Multiple -k options are permitted and are significant in command line order. A maximum of 10 -k options can be given. If no -k option is specified, a default sort key of the entire line is used. When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant. If all the specified keys compare equal, the entire record is used as the final key.

The -k option is intended to replace the obsolete [+ pos1 [+ pos2]] notation, using field_start and field_end respectively. The fully specified [+ pos1 [+ pos2]] form:

+w.x-y .z

is equivalent to:

-k w+1.x+1,y.0 (if z == 0)
-k w+1.x+1,y+1. z (if z > 0)

Obsolete Restricted Sort Key

The notation +pos1 -pos2 restricts a sort key to one beginning at pos1 and ending at pos2. The characters at positions pos1 and pos2 are included in the sort key (provided that pos2 does not precede pos1). A missing -pos2 means the end of the line.

Specifying pos1 and pos2 involves the notion of a field, a minimal sequence of characters followed by a field separator or a new-line. By default, the first blank (space or tab) of a sequence of blanks acts as the field separator. All blanks in a sequence of blanks are considered to be part of the next field; for example, all blanks at the beginning of a line are considered to be part of the first field.

pos1 and pos2 each have the form m.n optionally followed by one or more of the flags bdfinrM. A starting position specified by +m.n is interpreted to mean character n+1 in field m+1. A missing .n means .0, indicating the first character of field m+1. If the b flag is in effect, n is counted from the first non-blank in field m+1; +m.0b refers to the first non-blank character in field m+1.

A last position specified by -m.n is interpreted to mean the nth character (including separators) after the last character of the m th field. A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect, n is counted from the last leading blank in field m+1; -m.1b refers to the first non-blank in field m+1.

EXTERNAL INFLUENCES

Environment Variables

LC_COLLATE determines the default ordering rules applied to the sort.

LC_CTYPE determines the behavior of character classification for the -d, -f, and -i options.

LC_NUMERIC determines the definition of the radix and thousands separator characters for the -n option.

LC_TIME determines the month names for the -M option.

LANG determines the language in which messages are displayed.

If either LC_COLLATE, LC_CTYPE, LC_NUMERIC, or LC_TIME is not specified in the environment or is set to the empty string, the value of LANG is used as a default for each unspecified or empty variable. If LANG is not specified or is set to the empty string, a default of POSIX (see lang(5)) is used. If any of the internationalization variable contains an invalid setting, sort behaves as if all internationalization variables were set to POSIX. See environ(5).

International Code Set Support

Single- and multi-byte character code sets are supported.

EXAMPLES

Sort the contents of infile with the second field as the sort key:

sort -k 2,2 infile

Sort, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the first two characters of the second field as the sort key:

sort -r -o outfile -k 2.1,2.2 infile1 infile2

Sort, in reverse order, the contents of infile1 and infile2, using the first non-blank character of the fourth field as the sort key:

sort -r -k 4.1b,4.1b infile1 infile2

Print the password file (/etc/passwd) sorted by numeric user ID (the third colon-separated field):

sort -t: -k 3n,3 /etc/passwd

Print the lines of the presorted file infile, suppressing all but the first occurrence of lines having the same third field:

sort -mu -k 3,3 infile

DIAGNOSTICS

sort exits with one of the following values:

0 All input files were output successfully, or -c was specified and the input file was correctly presorted.

1 Under the -c option, the file was not ordered as specified, or if the -c and -u options were both specified, two input lines were found with equal keys. This exit status is not returned if the -c option is not used.

>1 An error occurred such as when one or more input lines are too long.

When the last line of an input file is missing a new-line character, sort appends one, prints a warning message, and continues.

If an error occurs when accessing the tables that contain the collation rules for the specified language, sort prints a warning message and defaults to the POSIX locale.

If a -d, -f, or -i option is specified for a language with multi-byte characters, sort prints a warning message and ignores the option.

WARNINGS

Numbering of fields and characters within fields (-k option) has changed to conform to the POSIX standard. Beginning at HP-UX Release 9.0, the -k option numbers fields and characters within fields, starting with 1. Prior to HP-UX Release 9.0, numbering started at 0.

A field separator specified by the -t option is recognized only if it is a single-byte character.

The character type classification categories alpha, digit, space, and print are not defined for multi-byte characters. For languages with multi-byte characters, all characters are significant in comparisons.

FILES

/usr/tmp/stm???
/tmp/stm???

STANDARDS CONFORMANCE

sort: SVID2, XPG2, XPG3, POSIX.2

Hewlett-Packard Company — HP-UX Release 9.0: August 1992

Museum

Related Articles