|
|
| |
PMC.MIPS24K(3) |
FreeBSD Library Functions Manual |
PMC.MIPS24K(3) |
pmc.mips24k —
measurement events for MIPS24K family CPUs
Performance Counters Library (libpmc, -lpmc)
MIPS PMCs are present in MIPS 24k and other processors in the MIPS family.
There are two counters supported by the hardware and each is 32
bits wide.
MIPS PMCs are documented in MIPS32
24K Processor Core Family Software User's Manual, MIPS
Technologies Inc., December 2008.
MIPS programmable PMCs support the following events:
CYCLE
- (Event 0, Counter 0/1) Total number of cycles. The performance counters
are clocked by the top-level gated clock. If the core is built with that
clock gater present, none of the counters will increment while the clock
is stopped - due to a WAIT instruction.
INSTR_EXECUTED
- (Event 1, Counter 0/1) Total number of instructions completed.
BRANCH_COMPLETED
- (Event 2, Counter 0) Total number of branch instructions completed.
BRANCH_MISPRED
- (Event 2, Counter 1) Counts all branch instructions which completed, but
were mispredicted.
RETURN
- (Event 3, Counter 0) Counts all JR R31 instructions completed.
RETURN_MISPRED
- (Event 3, Counter 1) Counts all JR $31 instructions which completed, used
the RPS for a prediction, but were mispredicted.
RETURN_NOT_31
- (Event 4, Counter 0) Counts all JR $xx (not $31) and JALR instructions
(indirect jumps).
RETURN_NOTPRED
- (Event 4, Counter 1) If RPS use is disabled, JR $31 will not be
predicted.
ITLB_ACCESS
- (Event 5, Counter 0) Counts ITLB accesses that are due to fetches showing
up in the instruction fetch stage of the pipeline and which do not use a
fixed mapping or are not in unmapped space. If an address is fetched twice
from the pipe (as in the case of a cache miss), that instruction willcount
as 2 ITLB accesses. Since each fetch gets us 2 instructions,there is one
access marked per double word.
ITLB_MISS
- (Event 5, Counter 1) Counts all misses in the ITLB except ones that are on
the back of another miss. We cannot process back to back misses and thus
those are ignored. They are also ignored if there is some form of address
error.
DTLB_ACCESS
- (Event 6, Counter 0) Counts DTLB access including those in unmapped
address spaces.
DTLB_MISS
- (Event 6, Counter 1) Counts DTLB misses. Back to back misses that result
in only one DTLB entry getting refilled are counted as a single miss.
JTLB_IACCESS
- (Event 7, Counter 0) Instruction JTLB accesses are counted exactly the
same as ITLB misses.
JTLB_IMISS
- (Event 7, Counter 1) Counts instruction JTLB accesses that result in no
match or a match on an invalid translation.
JTLB_DACCESS
- (Event 8, Counter 0) Data JTLB accesses.
JTLB_DMISS
- (Event 8, Counter 1) Counts data JTLB accesses that result in no match or
a match on an invalid translation.
IC_FETCH
- (Event 9, Counter 0) Counts every time the instruction cache is accessed.
All replays, wasted fetches etc. are counted. For example, following a
branch, even though the prediction is taken, the fall through access is
counted.
IC_MISS
- (Event 9, Counter 1) Counts all instruction cache misses that result in a
bus request.
DC_LOADSTORE
- (Event 10, Counter 0) Counts cached loads and stores.
DC_WRITEBACK
- (Event 10, Counter 1) Counts cache lines written back to memory due to
replacement or cacheops.
DC_MISS
- (Event 11, Counter 0/1) Counts loads and stores that miss in the
cache
LOAD_MISS
- (Event 13, Counter 0) Counts number of cacheable loads that miss in the
cache.
STORE_MISS
- (Event 13, Counter 1) Counts number of cacheable stores that miss in the
cache.
INTEGER_COMPLETED
- (Event 14, Counter 0) Non-floating point, non-Coprocessor 2
instructions.
FP_COMPLETED
- (Event 14, Counter 1) Floating point instructions completed.
LOAD_COMPLETED
- (Event 15, Counter 0) Integer and co-processor loads completed.
STORE_COMPLETED
- (Event 15, Counter 1) Integer and co-processor stores completed.
BARRIER_COMPLETED
- (Event 16, Counter 0) Direct jump (and link) instructions completed.
MIPS16_COMPLETED
- (Event 16, Counter 1) MIPS16c instructions completed.
NOP_COMPLETED
- (Event 17, Counter 0) NOPs completed. This includes all instructions that
normally write to a general purpose register, but where the destination
register was set to r0.
INTEGER_MULDIV_COMPLETED
- (Event 17, Counter 1) Integer multiply and divide instructions completed.
(MULxx, DIVx, MADDx, MSUBx).
RF_STALL
- (Event 18, Counter 0) Counts the total number of cycles where no
instructions are issued from the IFU to ALU (the RF stage does not
advance) which includes both of the previous two events. The RT_STALL is
different than the sum of them though because cycles when both stalls are
active will only be counted once.
INSTR_REFETCH
- (Event 18, Counter 1) replay traps (other than uTLB)
STORE_COND_COMPLETED
- (Event 19, Counter 0) Conditional stores completed. Counts all events,
including failed stores.
STORE_COND_FAILED
- (Event 19, Counter 1) Conditional store instruction that did not update
memory. Note: While this event and the SC instruction count event can be
configured to count in specific operating modes, the timing of the events
is much different and the observed operating mode could change between
them, causing some inaccuracy in the measured ratio.
ICACHE_REQUESTS
- (Event 20, Counter 0) Note that this only counts PREFs that are actually
attempted. PREFs to uncached addresses or ones with translation errors are
not counted
ICACHE_HIT
- (Event 20, Counter 1) Counts PREF instructions that hit in the cache
L2_WRITEBACK
- (Event 21, Counter 0) Counts cache lines written back to memory due to
replacement or cacheops.
L2_ACCESS
- (Event 21, Counter 1) Number of accesses to L2 Cache.
L2_MISS
- (Event 22, Counter 0) Number of accesses that missed in the L2 cache.
L2_ERR_CORRECTED
- (Event 22, Counter 1) Single bit errors in L2 Cache that were detected and
corrected.
EXCEPTIONS
- (Event 23, Counter 0) Any type of exception taken.
RF_CYCLES_STALLED
- (Event 24, Counter 0) Counts cycles where the LSU is in fixup and cannot
accept a new instruction from the ALU. Fixups are replays within the LSU
that occur when an instruction needs to re-access the cache or the
DTLB.
IFU_CYCLES_STALLED
- (Event 25, Counter 0) Counts the number of cycles where the fetch unit is
not providing a valid instruction to the ALU.
ALU_CYCLES_STALLED
- (Event 25, Counter 1) Counts the number of cycles where the ALU pipeline
cannot advance.
UNCACHED_LOAD
- (Event 33, Counter 0) Counts uncached and uncached accelerated loads.
UNCACHED_STORE
- (Event 33, Counter 1) Counts uncached and uncached accelerated
stores.
CP2_REG_TO_REG_COMPLETED
- (Event 35, Counter 0) Co-processor 2 register to register instructions
completed.
MFTC_COMPLETED
- (Event 35, Counter 1) Co-processor 2 move to and from instructions as well
as loads and stores.
IC_BLOCKED_CYCLES
- (Event 37, Counter 0) Cycles when IFU stalls because an instruction miss
caused the IFU not to have any runnable instructions. Ignores the stalls
due to ITLB misses as well as the 4 cycles following a redirect.
DC_BLOCKED_CYCLES
- (Event 37, Counter 1) Counts all cycles where integer pipeline waits on
Load return data due to a D-cache miss. The LSU can signal a "long
stall" on a D-cache misses, in which case the waiting TC might be
rescheduled so other TCs can execute instructions till the data
returns.
L2_IMISS_STALL_CYCLES
- (Event 38, Counter 0) Cycles where the main pipeline is stalled waiting
for a SYNC to complete.
L2_DMISS_STALL_CYCLES
- (Event 38, Counter 1) Cycles where the main pipeline is stalled because of
an index conflict in the Fill Store Buffer.
DMISS_CYCLES
- (Event 39, Counter 0) Data miss is outstanding, but not necessarily
stalling the pipeline. The difference between this and D$ miss stall
cycles can show the gain from non-blocking cache misses.
L2_MISS_CYCLES
- (Event 39, Counter 1) L2 miss is outstanding, but not necessarily stalling
the pipeline.
UNCACHED_BLOCK_CYCLES
- (Event 40, Counter 0) Cycles where the processor is stalled on an uncached
fetch, load, or store.
MDU_STALL_CYCLES
- (Event 41, Counter 0) Cycles where the processor is stalled on an uncached
fetch, load, or store.
FPU_STALL_CYCLES
- (Event 41, Counter 1) Counts all cycles where integer pipeline waits on
FPU return data.
CP2_STALL_CYCLES
- (Event 42, Counter 0) Counts all cycles where integer pipeline waits on
CP2 return data.
COREXTEND_STALL_CYCLES
- (Event 42, Counter 1) Counts all cycles where integer pipeline waits on
CorExtend return data.
ISPRAM_STALL_CYCLES
- (Event 43, Counter 0) Count all pipeline bubbles that are a result of
multicycle ISPRAM access. Pipeline bubbles are defined as all cycles that
IFU doesn't present an instruction to ALU. The four cycles after a
redirect are not counted.
DSPRAM_STALL_CYCLES
- (Event 43, Counter 1) Counts stall cycles created by an instruction
waiting for access to DSPRAM.
CACHE_STALL_CYCLES
- (Event 44, Counter 0) Counts all cycles the where pipeline is stalled due
to CACHE instructions. Includes cycles where CACHE instructions themselves
are stalled in the ALU, and cycles where CACHE instructions cause
subsequent instructions to be stalled.
LOAD_TO_USE_STALLS
- (Event 45, Counter 0) Counts all cycles where integer pipeline waits on
Load return data.
BASE_MISPRED_STALLS
- (Event 45, Counter 1) Counts stall cycles due to skewed ALU where the
bypass to the address generation takes an extra cycle.
CPO_READ_STALLS
- (Event 46, Counter 0) Counts all cycles where integer pipeline waits on
return data from MFC0, RDHWR instructions.
BRANCH_MISPRED_CYCLES
- (Event 46, Counter 1) This counts the number of cycles from a mispredicted
branch until the next non-delay slot instruction executes.
IFETCH_BUFFER_FULL
- (Event 48, Counter 0) Counts the number of times an instruction cache miss
was detected, but both fill buffers were already allocated.
FETCH_BUFFER_ALLOCATED
- (Event 48, Counter 1) Number of cycles where at least one of the IFU fill
buffers is allocated (miss pending).
EJTAG_ITRIGGER
- (Event 49, Counter 0) Number of times an EJTAG Instruction Trigger Point
condition matched.
EJTAG_DTRIGGER
- (Event 49, Counter 1) Number of times an EJTAG Data Trigger Point
condition matched.
FSB_LT_QUARTER
- (Event 50, Counter 0) Fill store buffer less than one quarter full.
FSB_QUARTER_TO_HALF
- (Event 50, Counter 1) Fill store buffer between one quarter and one half
full.
FSB_GT_HALF
- (Event 51, Counter 0) Fill store buffer more than half full.
FSB_FULL_PIPELINE_STALLS
- (Event 51, Counter 1) Cycles where the pipeline is stalled because the
Fill-Store Buffer in LSU is full.
LDQ_LT_QUARTER
- (Event 52, Counter 0) Load data queue less than one quarter full.
LDQ_QUARTER_TO_HALF
- (Event 52, Counter 1) Load data queue between one quarter and one half
full.
LDQ_GT_HALF
- (Event 53, Counter 0) Load data queue more than one half full.
LDQ_FULL_PIPELINE_STALLS
- (Event 53, Counter 1) Cycles where the pipeline is stalled because the
Load Data Queue in the LSU is full.
WBB_LT_QUARTER
- (Event 54, Counter 0) Write back buffer less than one quarter full.
WBB_QUARTER_TO_HALF
- (Event 54, Counter 1) Write back buffer between one quarter and one half
full.
WBB_GT_HALF
- (Event 55, Counter 0) Write back buffer more than one half full.
WBB_FULL_PIPELINE_STALLS
- (Event 55 Counter 1) Cycles where the pipeline is stalled because the Load
Data Queue in the LSU is full.
REQUEST_LATENCY
- (Event 61, Counter 0) Measures latency from miss detection until critical
dword of response is returned, Only counts for cacheable reads.
REQUEST_COUNT
- (Event 61, Counter 1) Counts number of cacheable read requests used for
previous latency counter.
The following table shows the mapping between the PMC-independent aliases
supported by Performance Counters Library (libpmc,
-lpmc) and the underlying hardware events used.
pmc(3),
pmc.atom(3),
pmc.core(3),
pmc.iaf(3),
pmc.k7(3),
pmc.k8(3),
pmc.octeon(3),
pmc.soft(3),
pmc.tsc(3),
pmc_cpuinfo(3),
pmclog(3),
hwpmc(4)
The pmc library first appeared in
FreeBSD 6.0.
The Performance Counters Library (libpmc, -lpmc) library
was written by Joseph Koshy
<jkoshy@FreeBSD.org>.
MIPS support was added by George Neville-Neil
<gnn@FreeBSD.org>.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |