Museum

Home

Lab Overview

Retrotechnology Articles

⇒ Online Manual

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

kprofile(1)

prof(1)

runon(1)

pfm(7)

uprofile(1)  —  Commands

NAME

uprofile − Profile user code with the EV4 performance counters

SYNOPSIS

uprofile [-v] [-i] [-all|-each|-one] [STATS] <command> [<arg> ...]

DESCRIPTION

The uprofile command uses the EV4 performance counters to produce a fine-grained PC profile of a user program. The program creates (or overwrites) the file umon.out, enables the performance counters on the chip, runs the specified command, then writes the profile data into the umon.out file. 

FLAGS

-vEngages verbose mode, which prints some useful information about the program being profiled. 

-iUses integer (32 bit) sample buckets in the generated umon.out file(s).  While this makes the generated file twice as large as using the default short (16 bit) buckets, it is almost impossible to overflow a bucket.  Recommended when profiling programs that spend much time in small, tight loops.  If a bucket overflows when not using the -i option, a warning is produced with the offending PC. 

-all|-each|-one
Specifies which mode to use for profiling on multi-processor machines. Using the -all flag (the default) aggregates all CPU’s data into one umon.out file.  Using the -each flag collects separate profiles for each CPU, and writes the output into a set of files named umon.out.n, where n is the CPU number.  Using the -one flag only profiles the current CPU.  The uprofile program must be run using the runon command for the -one mode to work. 

<command> [<arg> ...]
Specifies the command to profile and its optional arguments. It is not necessary for the command to have been compiled with the normal profiling switch (-p). 

STATS

The EV4 has two performance counter registers, each of which can be separately programmed.  The statistics each counter can collect are:

Counter0Stats Counter1Stats
0disabled 1disabled
issues dcache
pipedry icache
loads dualissues
pipefrozen mispredicts
branches floatops
cycles intops
PALcycles stores
nonissues novictims
victims

When either of these counters reaches 4096 events (e.g. 4096 cycles), an interrupt is triggered. This interrupt causes the PC to be recorded. 

Either counter0 or counter1 may be disabled, by specifying "0disabled" or "1disabled" as the counter statistic.  This can be used to isolate specific event types, such as loads, without extraneous data being generated.  Both counters cannot be disabled. 

By default, the system counts cycles on counter 0, and disables counter 1.  A 150 MHz EV4 produces 36621 samples per second, which is much higher than the normal hardclock-driven profiling rate of 1024 per second.  Note that this produces a heavy interrupt load on the system, which can noticeably slow performance.  Also, because the interrupt rate is not tied to the 1024 Hz hardclock, the number of "seconds" reported by the prof command is incorrect.  Only the percentages of elapsed time are reliable. 

Alternate events can be specified to produce some interesting results.  For example, specifying "uprofile PALcycles icache <command>" will generate a statistical list of what routines tend to use PAL cycles and generate instruction cache misses.  For a complete description of each of the various statistics available, consult the pfm(7) reference page. 

NOTES

The kernel in use must have the pfm pseudo-device configured in to it. To do this, add the following line to the kernel configuration file, and rebuild the kernel:

        pseudo-device       pfm

The victim and novictim statistics rely on the external performance counter pin connections as described in the EV4 chip specification.  Currently, only the DEC 3000/400,/500,/600,/800 workstations have these connections.  Attempts to display either of these statistics on other platforms (while allowed) will typically generate empty data. 

User-level profiling is only possible on EV4 Pass 3 or later processors.  Attempts to do this on a Pass 2 processor will gather PC samples for every process running on the system; this will lead to massively erroneous samples in the data stream. 

FILES

/dev/pfcntrThe performance counter device file. 

umon.out[.n]The generated statistics file(s).  Use as the mon.out file for the prof command. 

RELATED INFORMATION

kprofile(1), prof(1), runon(1), pfm(7)

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026