Intel(r) Performance Counter Monitor
Classes | Macros | Functions
cpucounters.h File Reference

Main CPU counters header. More...

#include "types.h"
#include "msr.h"
#include "pci.h"
#include "client_bw.h"
#include "width_extender.h"
#include <vector>
#include <limits>
#include <string>
#include <string.h>
#include <semaphore.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

Go to the source code of this file.

Classes

struct  TopologyEntry
 
class  ServerPCICFGUncore
 Object to access uncore counters in a socket/processor with microarchitecture codename SandyBridge-EP (Jaketown) or Ivytown-EP or Ivytown-EX. More...
 
class  PCIeCounterState
 
class  PCM
 CPU Performance Monitor. More...
 
struct  PCM::CustomCoreEventDescription
 Custom Core event description. More...
 
struct  PCM::ExtendedCustomCoreEventDescription
 Extended custom core event description. More...
 
class  BasicCounterState
 Basic core counter state. More...
 
class  UncoreCounterState
 Basic uncore counter state. More...
 
class  ServerUncorePowerState
 Server uncore power counter state. More...
 
class  CoreCounterState
 (Logical) core-wide counter state More...
 
class  SocketCounterState
 Socket-wide counter state. More...
 
class  SystemCounterState
 System-wide counter state. More...
 

Macros

#define INTEL_PCM_VERSION   "V2.8 ($Format:%ci ID=%h$)"
 
#define INTELPCM_API
 
#define NOMINMAX
 

Functions

template<class CounterStateType >
uint64 getQPIClocks (uint32 port, const CounterStateType &before, const CounterStateType &after)
 Returns QPI LL clock ticks. More...
 
template<class CounterStateType >
int32 getThermalHeadroom (const CounterStateType &, const CounterStateType &after)
 
template<class CounterStateType >
uint64 getQPIL0pTxCycles (uint32 port, const CounterStateType &before, const CounterStateType &after)
 Returns the number of QPI cycles in power saving half-lane mode. More...
 
template<class CounterStateType >
uint64 getQPIL1Cycles (uint32 port, const CounterStateType &before, const CounterStateType &after)
 Returns the number of QPI cycles in power saving shutdown mode. More...
 
template<class CounterStateType >
double getNormalizedQPIL0pTxCycles (uint32 port, const CounterStateType &before, const CounterStateType &after)
 Returns the ratio of QPI cycles in power saving half-lane mode. More...
 
template<class CounterStateType >
double getNormalizedQPIL1Cycles (uint32 port, const CounterStateType &before, const CounterStateType &after)
 Returns the ratio of QPI cycles in power saving shutdown mode. More...
 
template<class CounterStateType >
uint64 getDRAMClocks (uint32 channel, const CounterStateType &before, const CounterStateType &after)
 Returns DRAM clock ticks. More...
 
template<class CounterStateType >
uint64 getMCCounter (uint32 channel, uint32 counter, const CounterStateType &before, const CounterStateType &after)
 Direct read of memory controller PMU counter (counter meaning depends on the programming: power/performance/etc) More...
 
template<class CounterStateType >
uint64 getPCUCounter (uint32 counter, const CounterStateType &before, const CounterStateType &after)
 Direct read of power control unit PMU counter (counter meaning depends on the programming: power/performance/etc) More...
 
template<class CounterStateType >
uint64 getPCUClocks (const CounterStateType &before, const CounterStateType &after)
 Returns clock ticks of power control unit. More...
 
template<class CounterStateType >
uint64 getConsumedEnergy (const CounterStateType &before, const CounterStateType &after)
 Returns energy consumed by processor, exclusing DRAM (measured in internal units) More...
 
template<class CounterStateType >
uint64 getDRAMConsumedEnergy (const CounterStateType &before, const CounterStateType &after)
 Returns energy consumed by DRAM (measured in internal units) More...
 
template<class CounterStateType >
double getConsumedJoules (const CounterStateType &before, const CounterStateType &after)
 Returns Joules consumed by processor (excluding DRAM) More...
 
template<class CounterStateType >
double getDRAMConsumedJoules (const CounterStateType &before, const CounterStateType &after)
 Returns Joules consumed by DRAM. More...
 
INTELPCM_API SystemCounterState getSystemCounterState ()
 Reads the counter state of the system. More...
 
INTELPCM_API SocketCounterState getSocketCounterState (uint32 socket)
 Reads the counter state of a socket. More...
 
INTELPCM_API CoreCounterState getCoreCounterState (uint32 core)
 Reads the counter state of a (logical) core. More...
 
template<class CounterStateType >
double getIPC (const CounterStateType &before, const CounterStateType &after)
 Computes average number of retired instructions per core cycle (IPC) More...
 
template<class CounterStateType >
uint64 getInstructionsRetired (const CounterStateType &before, const CounterStateType &after)
 Computes the number of retired instructions. More...
 
template<class CounterStateType >
double getExecUsage (const CounterStateType &before, const CounterStateType &after)
 Computes average number of retired instructions per time intervall. More...
 
template<class CounterStateType >
uint64 getInstructionsRetired (const CounterStateType &now)
 Computes the number of retired instructions. More...
 
template<class CounterStateType >
uint64 getCycles (const CounterStateType &before, const CounterStateType &after)
 Computes the number core clock cycles when signal on a specific core is running (not halted) More...
 
template<class CounterStateType >
uint64 getRefCycles (const CounterStateType &before, const CounterStateType &after)
 Computes the number of reference clock cycles while clock signal on the core is running. More...
 
template<class CounterStateType >
uint64 getCycles (const CounterStateType &now)
 Computes the number executed core clock cycles. More...
 
double getCoreIPC (const SystemCounterState &before, const SystemCounterState &after)
 Computes average number of retired instructions per core cycle for the entire system combining instruction counts from logical cores to corresponding physical cores. More...
 
double getTotalExecUsage (const SystemCounterState &before, const SystemCounterState &after)
 Computes average number of retired instructions per time intervall for the entire system combining instruction counts from logical cores to corresponding physical cores. More...
 
template<class CounterStateType >
double getAverageFrequency (const CounterStateType &before, const CounterStateType &after)
 Computes average core frequency also taking Intel Turbo Boost technology into account. More...
 
template<class CounterStateType >
double getActiveAverageFrequency (const CounterStateType &before, const CounterStateType &after)
 Computes average core frequency when not in powersaving C0-state (also taking Intel Turbo Boost technology into account) More...
 
template<class CounterStateType >
double getRelativeFrequency (const CounterStateType &before, const CounterStateType &after)
 Computes average core frequency also taking Intel Turbo Boost technology into account. More...
 
template<class CounterStateType >
double getActiveRelativeFrequency (const CounterStateType &before, const CounterStateType &after)
 Computes average core frequency when not in powersaving C0-state (also taking Intel Turbo Boost technology into account) More...
 
template<class CounterStateType >
double getCyclesLostDueL3CacheMisses (const CounterStateType &before, const CounterStateType &after)
 Estimates how many core cycles were potentially lost due to L3 cache misses. More...
 
template<class CounterStateType >
double getCyclesLostDueL2CacheMisses (const CounterStateType &before, const CounterStateType &after)
 Estimates how many core cycles were potentially lost due to missing L2 cache but still hitting L3 cache. More...
 
template<class CounterStateType >
double getL2CacheHitRatio (const CounterStateType &before, const CounterStateType &after)
 Computes L2 cache hit ratio. More...
 
template<class CounterStateType >
double getL3CacheHitRatio (const CounterStateType &before, const CounterStateType &after)
 Computes L3 cache hit ratio. More...
 
template<class CounterStateType >
uint64 getL3CacheMisses (const CounterStateType &before, const CounterStateType &after)
 Computes number of L3 cache misses. More...
 
template<class CounterStateType >
uint64 getL2CacheMisses (const CounterStateType &before, const CounterStateType &after)
 Computes number of L2 cache misses. More...
 
template<class CounterStateType >
uint64 getL2CacheHits (const CounterStateType &before, const CounterStateType &after)
 Computes number of L2 cache hits. More...
 
template<class CounterStateType >
uint64 getL3CacheOccupancy (const CounterStateType &now)
 Computes L3 Cache Occupancy. More...
 
template<class CounterStateType >
uint64 getL3CacheHitsNoSnoop (const CounterStateType &before, const CounterStateType &after)
 Computes number of L3 cache hits where no snooping in sibling L2 caches had to be done. More...
 
template<class CounterStateType >
uint64 getL3CacheHitsSnoop (const CounterStateType &before, const CounterStateType &after)
 Computes number of L3 cache hits where snooping in sibling L2 caches had to be done. More...
 
template<class CounterStateType >
uint64 getL3CacheHits (const CounterStateType &before, const CounterStateType &after)
 Computes total number of L3 cache hits. More...
 
template<class CounterStateType >
uint64 getInvariantTSC (const CounterStateType &before, const CounterStateType &after)
 Computes number of invariant time stamp counter ticks. More...
 
template<class CounterStateType >
double getCoreCStateResidency (int state, const CounterStateType &before, const CounterStateType &after)
 Computes residency in the core C-state. More...
 
template<class CounterStateType >
double getPackageCStateResidency (int state, const CounterStateType &before, const CounterStateType &after)
 Computes residency in the package C-state. More...
 
template<class CounterStateType >
uint64 getBytesReadFromMC (const CounterStateType &before, const CounterStateType &after)
 Computes number of bytes read from DRAM memory controllers. More...
 
template<class CounterStateType >
uint64 getBytesWrittenToMC (const CounterStateType &before, const CounterStateType &after)
 Computes number of bytes written to DRAM memory controllers. More...
 
template<class CounterStateType >
uint64 getIORequestBytesFromMC (const CounterStateType &before, const CounterStateType &after)
 Computes number of bytes of read/write requests from all IO sources. More...
 
template<class CounterStateType >
uint64 getNumberOfCustomEvents (int32 eventCounterNr, const CounterStateType &before, const CounterStateType &after)
 Returns the number of occured custom core events. More...
 
uint64 getIncomingQPILinkBytes (uint32 socketNr, uint32 linkNr, const SystemCounterState &before, const SystemCounterState &after)
 Get estimation of QPI data traffic per incoming QPI link. More...
 
double getIncomingQPILinkUtilization (uint32 socketNr, uint32 linkNr, const SystemCounterState &before, const SystemCounterState &after)
 Get data utilization of incoming QPI link (0..1) More...
 
double getOutgoingQPILinkUtilization (uint32 socketNr, uint32 linkNr, const SystemCounterState &before, const SystemCounterState &after)
 Get utilization of outgoing QPI link (0..1) More...
 
uint64 getOutgoingQPILinkBytes (uint32 socketNr, uint32 linkNr, const SystemCounterState &before, const SystemCounterState &after)
 Get estimation of QPI (data+nondata) traffic per outgoing QPI link. More...
 
uint64 getAllIncomingQPILinkBytes (const SystemCounterState &before, const SystemCounterState &after)
 Get estimation of total QPI data traffic. More...
 
uint64 getAllOutgoingQPILinkBytes (const SystemCounterState &before, const SystemCounterState &after)
 Get estimation of total QPI data+nondata traffic. More...
 
uint64 getIncomingQPILinkBytes (uint32 socketNr, uint32 linkNr, const SystemCounterState &now)
 Return current value of the counter of QPI data traffic per incoming QPI link. More...
 
uint64 getSocketIncomingQPILinkBytes (uint32 socketNr, const SystemCounterState &now)
 Get estimation of total QPI data traffic for this socket. More...
 
uint64 getAllIncomingQPILinkBytes (const SystemCounterState &now)
 Get estimation of Socket QPI data traffic. More...
 
double getQPItoMCTrafficRatio (const SystemCounterState &before, const SystemCounterState &after)
 Get QPI data to Memory Controller traffic ratio. More...
 
uint64 getNumberOfEvents (PCIeCounterState before, PCIeCounterState after)
 Returns the raw count of PCIe events. More...
 

Detailed Description

Main CPU counters header.

Include this header file if you want to access CPU counters (core and uncore - including memory controller chips and QPI)

Function Documentation

template<class CounterStateType >
double getActiveAverageFrequency ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes average core frequency when not in powersaving C0-state (also taking Intel Turbo Boost technology into account)

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
frequency in Hz

References PCM::getInstance(), and PCM::getNominalFrequency().

template<class CounterStateType >
double getActiveRelativeFrequency ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes average core frequency when not in powersaving C0-state (also taking Intel Turbo Boost technology into account)

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
Fraction of nominal frequency (if >1.0 then Turbo was working during the measurement)
uint64 getAllIncomingQPILinkBytes ( const SystemCounterState before,
const SystemCounterState after 
)
inline

Get estimation of total QPI data traffic.

Returns an estimation of number of data bytes transferred to all sockets over all Intel(r) Quick Path Interconnect links

Parameters
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
Number of bytes

References getIncomingQPILinkBytes(), PCM::getInstance(), PCM::getNumSockets(), and PCM::getQPILinksPerSocket().

Referenced by getQPItoMCTrafficRatio().

uint64 getAllIncomingQPILinkBytes ( const SystemCounterState now)
inline

Get estimation of Socket QPI data traffic.

Returns an estimation of number of data bytes transferred to all sockets over all Intel(r) Quick Path Interconnect links

Parameters
nowSystem CPU counter state
Returns
Number of bytes

References PCM::getInstance(), PCM::getNumSockets(), and getSocketIncomingQPILinkBytes().

uint64 getAllOutgoingQPILinkBytes ( const SystemCounterState before,
const SystemCounterState after 
)
inline

Get estimation of total QPI data+nondata traffic.

Returns an estimation of number of data and non-data bytes transferred from all sockets over all Intel(r) Quick Path Interconnect links

Parameters
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
Number of bytes

References PCM::getInstance(), PCM::getNumSockets(), getOutgoingQPILinkBytes(), and PCM::getQPILinksPerSocket().

template<class CounterStateType >
double getAverageFrequency ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes average core frequency also taking Intel Turbo Boost technology into account.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
frequency in Hz

References PCM::getInstance(), and PCM::getNominalFrequency().

template<class CounterStateType >
uint64 getBytesReadFromMC ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of bytes read from DRAM memory controllers.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
Number of bytes

Referenced by getQPItoMCTrafficRatio().

template<class CounterStateType >
uint64 getBytesWrittenToMC ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of bytes written to DRAM memory controllers.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
Number of bytes

Referenced by getQPItoMCTrafficRatio().

template<class CounterStateType >
uint64 getConsumedEnergy ( const CounterStateType &  before,
const CounterStateType &  after 
)

Returns energy consumed by processor, exclusing DRAM (measured in internal units)

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

Referenced by getConsumedJoules().

template<class CounterStateType >
double getConsumedJoules ( const CounterStateType &  before,
const CounterStateType &  after 
)

Returns Joules consumed by processor (excluding DRAM)

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

References getConsumedEnergy(), PCM::getInstance(), and PCM::getJoulesPerEnergyUnit().

INTELPCM_API CoreCounterState getCoreCounterState ( uint32  core)

Reads the counter state of a (logical) core.

Helper function. Uses PCM object to access counters.

Parameters
corecore id
Returns
State of counters in the core

References PCM::getCoreCounterState(), and PCM::getInstance().

template<class CounterStateType >
double getCoreCStateResidency ( int  state,
const CounterStateType &  before,
const CounterStateType &  after 
)
inline

Computes residency in the core C-state.

Parameters
stateC-state
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
residence ratio (0..1): 0 - 0%, 1.0 - 100%

References PCM::getInstance(), getInvariantTSC(), getRefCycles(), and PCM::isCoreCStateResidencySupported().

double getCoreIPC ( const SystemCounterState before,
const SystemCounterState after 
)
inline

Computes average number of retired instructions per core cycle for the entire system combining instruction counts from logical cores to corresponding physical cores.

Use this metric to evaluate IPC improvement between SMT(Hyperthreading) on and SMT off.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
IPC

References PCM::getInstance(), getIPC(), PCM::getNumCores(), PCM::getNumOnlineCores(), and PCM::getThreadsPerCore().

template<class CounterStateType >
uint64 getCycles ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes the number core clock cycles when signal on a specific core is running (not halted)

Returns number of used cycles (halted cyles are not counted). The counter does not advance in the following conditions:

  • an ACPI C-state is other than C0 for normal operation
  • HLT
  • STPCLK+ pin is asserted
  • being throttled by TM1
  • during the frequency switching phase of a performance state transition

The performance counter for this event counts across performance state transitions using different core clock frequencies

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
number core clock cycles
template<class CounterStateType >
uint64 getCycles ( const CounterStateType &  now)

Computes the number executed core clock cycles.

Returns number of used cycles (halted cyles are not counted).

Parameters
nowCurrent CPU counter state
Returns
number core clock cycles
template<class CounterStateType >
double getCyclesLostDueL2CacheMisses ( const CounterStateType &  before,
const CounterStateType &  after 
)

Estimates how many core cycles were potentially lost due to missing L2 cache but still hitting L3 cache.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Currently not supported on Intel(R) Atom(tm) processor
Returns
ratio that is usually beetween 0 and 1 ; in some cases could be >1.0 due to a lower access latency estimation

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
double getCyclesLostDueL3CacheMisses ( const CounterStateType &  before,
const CounterStateType &  after 
)

Estimates how many core cycles were potentially lost due to L3 cache misses.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
ratio that is usually beetween 0 and 1 ; in some cases could be >1.0 due to a lower memory latency estimation

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
uint64 getDRAMClocks ( uint32  channel,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns DRAM clock ticks.

Parameters
channelDRAM channel number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
template<class CounterStateType >
uint64 getDRAMConsumedEnergy ( const CounterStateType &  before,
const CounterStateType &  after 
)

Returns energy consumed by DRAM (measured in internal units)

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

Referenced by getDRAMConsumedJoules().

template<class CounterStateType >
double getDRAMConsumedJoules ( const CounterStateType &  before,
const CounterStateType &  after 
)

Returns Joules consumed by DRAM.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

References PCM::getCPUModel(), getDRAMConsumedEnergy(), PCM::getInstance(), and PCM::getJoulesPerEnergyUnit().

template<class CounterStateType >
double getExecUsage ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes average number of retired instructions per time intervall.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
usage

Referenced by getTotalExecUsage().

uint64 getIncomingQPILinkBytes ( uint32  socketNr,
uint32  linkNr,
const SystemCounterState before,
const SystemCounterState after 
)
inline

Get estimation of QPI data traffic per incoming QPI link.

Returns an estimation of number of data bytes transferred to a socket over Intel(r) Quick Path Interconnect

Parameters
socketNrsocket identifier
linkNrlinkNr
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
Number of bytes

Referenced by getAllIncomingQPILinkBytes(), getIncomingQPILinkUtilization(), and getSocketIncomingQPILinkBytes().

uint64 getIncomingQPILinkBytes ( uint32  socketNr,
uint32  linkNr,
const SystemCounterState now 
)
inline

Return current value of the counter of QPI data traffic per incoming QPI link.

Returns the number of incoming data bytes to a socket over Intel(r) Quick Path Interconnect

Parameters
socketNrsocket identifier
linkNrlinkNr
nowCurrent System CPU counter state
Returns
Number of bytes
double getIncomingQPILinkUtilization ( uint32  socketNr,
uint32  linkNr,
const SystemCounterState before,
const SystemCounterState after 
)
inline

Get data utilization of incoming QPI link (0..1)

Returns an estimation of utilization of QPI link by data traffic transferred to a socket over Intel(r) Quick Path Interconnect

Parameters
socketNrsocket identifier
linkNrlinkNr
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
utilization (0..1)

References getIncomingQPILinkBytes(), PCM::getInstance(), getInvariantTSC(), PCM::getNominalFrequency(), PCM::getNumCores(), and PCM::getQPILinkSpeed().

template<class CounterStateType >
uint64 getInstructionsRetired ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes the number of retired instructions.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
number of retired instructions
template<class CounterStateType >
uint64 getInstructionsRetired ( const CounterStateType &  now)

Computes the number of retired instructions.

Parameters
nowCurrent CPU counter state
Returns
number of retired instructions
template<class CounterStateType >
uint64 getInvariantTSC ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of invariant time stamp counter ticks.

This counter counts irrespectively of C-, P- or T-states

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
number of time stamp counter ticks

Referenced by getCoreCStateResidency(), getIncomingQPILinkUtilization(), getOutgoingQPILinkBytes(), getOutgoingQPILinkUtilization(), getPackageCStateResidency(), and PCM::getTickCount().

template<class CounterStateType >
uint64 getIORequestBytesFromMC ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of bytes of read/write requests from all IO sources.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
Number of bytes
template<class CounterStateType >
double getIPC ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes average number of retired instructions per core cycle (IPC)

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
IPC

Referenced by getCoreIPC().

template<class CounterStateType >
double getL2CacheHitRatio ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes L2 cache hit ratio.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
value between 0 and 1

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
uint64 getL2CacheHits ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of L2 cache hits.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
number of hits

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
uint64 getL2CacheMisses ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of L2 cache misses.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
number of misses

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
double getL3CacheHitRatio ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes L3 cache hit ratio.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
value between 0 and 1

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
uint64 getL3CacheHits ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes total number of L3 cache hits.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
number of hits

References PCM::getCPUModel(), PCM::getInstance(), getL3CacheHitsNoSnoop(), and getL3CacheHitsSnoop().

template<class CounterStateType >
uint64 getL3CacheHitsNoSnoop ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of L3 cache hits where no snooping in sibling L2 caches had to be done.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
number of hits

References PCM::getCPUModel(), and PCM::getInstance().

Referenced by getL3CacheHits().

template<class CounterStateType >
uint64 getL3CacheHitsSnoop ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of L3 cache hits where snooping in sibling L2 caches had to be done.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
number of hits

References PCM::getCPUModel(), and PCM::getInstance().

Referenced by getL3CacheHits().

template<class CounterStateType >
uint64 getL3CacheMisses ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes number of L3 cache misses.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Warning
Works only in the DEFAULT_EVENTS programming mode (see program() method)
Returns
number of misses

References PCM::getCPUModel(), and PCM::getInstance().

template<class CounterStateType >
uint64 getL3CacheOccupancy ( const CounterStateType &  now)

Computes L3 Cache Occupancy.

template<class CounterStateType >
uint64 getMCCounter ( uint32  channel,
uint32  counter,
const CounterStateType &  before,
const CounterStateType &  after 
)

Direct read of memory controller PMU counter (counter meaning depends on the programming: power/performance/etc)

Parameters
countercounter number
channelchannel number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
template<class CounterStateType >
double getNormalizedQPIL0pTxCycles ( uint32  port,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns the ratio of QPI cycles in power saving half-lane mode.

Parameters
portQPI port number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
0..1 - ratio of QPI cycles in power saving half-lane mode

References getQPIClocks(), and getQPIL0pTxCycles().

template<class CounterStateType >
double getNormalizedQPIL1Cycles ( uint32  port,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns the ratio of QPI cycles in power saving shutdown mode.

Parameters
portQPI port number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
0..1 - ratio of QPI cycles in power saving shutdown mode

References getQPIClocks(), and getQPIL1Cycles().

template<class CounterStateType >
uint64 getNumberOfCustomEvents ( int32  eventCounterNr,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns the number of occured custom core events.

Read number of events programmed with the CUSTOM_CORE_EVENTS

Parameters
eventCounterNrEvent/counter number (value from 0 to 3)
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
Number of bytes
uint64 getNumberOfEvents ( PCIeCounterState  before,
PCIeCounterState  after 
)
inline

Returns the raw count of PCIe events.

Parameters
beforePCIe counter state before the experiment
afterPCIe counter state after the experiment
uint64 getOutgoingQPILinkBytes ( uint32  socketNr,
uint32  linkNr,
const SystemCounterState before,
const SystemCounterState after 
)
inline

Get estimation of QPI (data+nondata) traffic per outgoing QPI link.

Returns an estimation of number of data bytes transferred from a socket over Intel(r) Quick Path Interconnect

Parameters
socketNrsocket identifier
linkNrlinkNr
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
Number of bytes

References PCM::getInstance(), getInvariantTSC(), PCM::getNominalFrequency(), PCM::getNumCores(), getOutgoingQPILinkUtilization(), and PCM::getQPILinkSpeed().

Referenced by getAllOutgoingQPILinkBytes().

double getOutgoingQPILinkUtilization ( uint32  socketNr,
uint32  linkNr,
const SystemCounterState before,
const SystemCounterState after 
)
inline

Get utilization of outgoing QPI link (0..1)

Returns an estimation of utilization of QPI link by (data+nondata) traffic transferred from a socket over Intel(r) Quick Path Interconnect

Parameters
socketNrsocket identifier
linkNrlinkNr
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
utilization (0..1)

References PCM::getInstance(), getInvariantTSC(), PCM::getNominalFrequency(), PCM::getNumCores(), and PCM::getQPILinkSpeed().

Referenced by getOutgoingQPILinkBytes().

template<class CounterStateType >
double getPackageCStateResidency ( int  state,
const CounterStateType &  before,
const CounterStateType &  after 
)
inline

Computes residency in the package C-state.

Parameters
stateC-state
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
residence ratio (0..1): 0 - 0%, 1.0 - 100%

References getInvariantTSC().

template<class CounterStateType >
uint64 getPCUClocks ( const CounterStateType &  before,
const CounterStateType &  after 
)

Returns clock ticks of power control unit.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

References getPCUCounter().

template<class CounterStateType >
uint64 getPCUCounter ( uint32  counter,
const CounterStateType &  before,
const CounterStateType &  after 
)

Direct read of power control unit PMU counter (counter meaning depends on the programming: power/performance/etc)

Parameters
countercounter number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

Referenced by getPCUClocks().

template<class CounterStateType >
uint64 getQPIClocks ( uint32  port,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns QPI LL clock ticks.

Parameters
portQPI port number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

Referenced by getNormalizedQPIL0pTxCycles(), and getNormalizedQPIL1Cycles().

template<class CounterStateType >
uint64 getQPIL0pTxCycles ( uint32  port,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns the number of QPI cycles in power saving half-lane mode.

Parameters
portQPI port number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

Referenced by getNormalizedQPIL0pTxCycles().

template<class CounterStateType >
uint64 getQPIL1Cycles ( uint32  port,
const CounterStateType &  before,
const CounterStateType &  after 
)

Returns the number of QPI cycles in power saving shutdown mode.

Parameters
portQPI port number
beforeCPU counter state before the experiment
afterCPU counter state after the experiment

Referenced by getNormalizedQPIL1Cycles().

double getQPItoMCTrafficRatio ( const SystemCounterState before,
const SystemCounterState after 
)
inline

Get QPI data to Memory Controller traffic ratio.

Ideally for NUMA-optmized programs the ratio should be close to 0.

Parameters
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
Ratio

References getAllIncomingQPILinkBytes(), getBytesReadFromMC(), and getBytesWrittenToMC().

template<class CounterStateType >
uint64 getRefCycles ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes the number of reference clock cycles while clock signal on the core is running.

The reference clock operates at a fixed frequency, irrespective of core frequency changes due to performance state transitions. See Intel(r) Software Developer's Manual for more details

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
number core clock cycles

Referenced by getCoreCStateResidency().

template<class CounterStateType >
double getRelativeFrequency ( const CounterStateType &  before,
const CounterStateType &  after 
)

Computes average core frequency also taking Intel Turbo Boost technology into account.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
Fraction of nominal frequency
INTELPCM_API SocketCounterState getSocketCounterState ( uint32  socket)

Reads the counter state of a socket.

Helper function. Uses PCM object to access counters.

Parameters
socketsocket id
Returns
State of counters in the socket

References PCM::getInstance(), and PCM::getSocketCounterState().

uint64 getSocketIncomingQPILinkBytes ( uint32  socketNr,
const SystemCounterState now 
)
inline

Get estimation of total QPI data traffic for this socket.

Returns an estimation of number of bytes transferred to this sockets over all Intel(r) Quick Path Interconnect links on this socket

Parameters
beforeSystem CPU counter state before the experiment
afterSystem CPU counter state after the experiment
Returns
Number of bytes

References getIncomingQPILinkBytes(), PCM::getInstance(), and PCM::getQPILinksPerSocket().

Referenced by getAllIncomingQPILinkBytes().

INTELPCM_API SystemCounterState getSystemCounterState ( )

Reads the counter state of the system.

Helper function. Uses PCM object to access counters.

System consists of several sockets (CPUs). Socket has a CPU in it. Socket (CPU) consists of several (logical) cores.

Returns
State of counters in the entire system

References PCM::getInstance(), and PCM::getSystemCounterState().

double getTotalExecUsage ( const SystemCounterState before,
const SystemCounterState after 
)
inline

Computes average number of retired instructions per time intervall for the entire system combining instruction counts from logical cores to corresponding physical cores.

Use this metric to evaluate cores utilization improvement between SMT(Hyperthreading) on and SMT off.

Parameters
beforeCPU counter state before the experiment
afterCPU counter state after the experiment
Returns
usage

References getExecUsage(), PCM::getInstance(), PCM::getNumCores(), PCM::getNumOnlineCores(), and PCM::getThreadsPerCore().