KOJAK Patterns
General Patterns
Execution
- Key words:
- Execution time
- Unit:
- Seconds
- Description:
- Time spent on program execution but without the
idle times of slave threads during OpenMP sequential execution. Note
that for pure MPI applications, this pattern is equal to Time.
- Parent:
- Time
- Children:
- MPI, OpenMP
Time
- Key words:
- CPU allocation time
- Unit:
- Seconds
- Description:
- Time spent on program execution including the idle
times of CPUs reserved for slave threads during OpenMP sequential
execution. Total assumes that every thread of a process allocated a
separate CPU during the entire runtime of the process.
- Parent:
- None
- Children:
- Execution, Idle Threads
MPI Patterns
Barrier Completion (MPI)
- Key words:
- MPI, synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
barriers after the first process has left the operation.
- Parent:
- Synchronization (MPI)
- Children:
- None
Collective
- Key words:
- MPI, collective communication
- Unit:
- Seconds
- Description:
- Time spent on MPI collective communication.
- Parent:
- Communication
- Children:
- Early Reduce, Late Broadcast, Wait at N x N
Communication
- Key words:
- MPI, communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
communication calls.
- Parent:
- MPI
- Children:
- Collective, Point-to-Point
Early Reduce
- Key words:
- MPI, n-to-1 communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from all processes to one destination process (i.e., n-to-1) may
suffer from waiting times if the destination process enters the
operation earlier than its sending counterparts, that is, before any
data could have been sent. The pattern refers to the time lost as a
result of this situation.
- Parent:
- Collective
- Children:
- None
IO (MPI)
- Key words:
- MPI, IO
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI IO calls.
- Parent:
- MPI
- Children:
- None
Late Broadcast
- Key words:
- MPI, 1-to-n communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from one source process to all processes (i.e., 1-to-n) may suffer
from waiting times if destination processes enter the operation
earlier than the source process, that is, before any data could have
been sent. The pattern refers to the time lost as a result of this
situation.
- Parent:
- Collective
- Children:
- None
Late Receiver
- Key words:
- MPI, delayed sender
- Unit:
- Seconds
- Description:
- A send operation is blocked until the
corresponding receive operation is called. This can happen for several
reasons. Either the MPI implementation is working in synchronous mode
by default or the size of the message to be sent exceeds the available
MPI-internal buffer space and the operation is blocked until the data
is transferred to the receiver. The pattern refers to the time spend
waiting as a result of this situation.
- Parent:
- Point-to-Point
- Children:
- Messages in Wrong Order
(Late Receiver)
Late Sender
- Key words:
- MPI, delayed receiver
- Unit:
- Seconds
- Description:
- The time lost in a wait state caused by a blocking
receive operation (e.g, MPI_Recv or MPI_Wait) that is posted earlier
than the corresponding send operation.
- Parent:
- Point-to-Point
- Children:
- Messages in Wrong Order
(Late Sender)
Messages in Wrong Order (Late Receiver)
- Key words:
- MPI, sending order of messages
- Unit:
- Seconds
- Description:
- A Late Receiver
situation may be the result of messages that are sent in the wrong
order. If a process sends messages to processes that are not ready to
receive them, the sender's MPI-internal buffer may overflow so that
from then on the process needs to send in synchronous mode causing a
Late Receiver situation. This pattern refers to the time spent in a
wait state as a result of this situation.
- Parent:
- Late Receiver
- Children:
- None
Messages in Wrong Order (Late Sender)
- Key words:
- MPI, acceptance order of messages
- Unit:
- Seconds
- Description:
- A Late Sender
situation may be the result of messages that are received in the wrong
order. If a process expects messages from one or more processes in a
certain order, although these processes are sending them in a
different order, the receiver may need to wait for a message if it
tries to receive a message early that has been sent late. The
situation can be avoided by receiving messages in the order in which
they are sent instead. This pattern refers to the time spent in a wait
state as a result of this situation.
- Parent:
- Late Sender
- Children:
- None
MPI
- Key words:
- MPI
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
calls.
- Parent:
- Execution
- Children:
- Communication, IO (MPI), Synchronization (MPI)
Point-to-Point
- Key words:
- MPI, point-to-point communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
point-to-point communication calls.
- Parent:
- Communication
- Children:
- Late Receiver, Late Sender
Synchronization (MPI)
- Key words:
- MPI, barrier
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
barrier calls.
- Parent:
- MPI
- Children:
- Wait at Barrier (MPI)
Wait at Barrier (MPI)
- Key words:
- MPI, barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an MPI barrier, which is the time inside the barrier call
until the last processes has reached the barrier. A large amount of
waiting time spent in front of barriers can be an indication of load
imbalance.
- Parent:
- Synchronization (MPI)
- Children:
- None
Wait at N x N
- Key words:
- MPI, n-to-n communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from all processes to all processes (i.e., n-to-n) exhibit an inherent
synchronization among all participants, that is, no process can finish
the operation until the last process has started it. This pattern
covers the time spent in n-to-n operations until all processes have
reached it.
- Parent:
- Collective
- Children:
- None
OpenMP Patterns
API Lock Synchronization
- Key words:
- OpenMP, API lock routines
- Unit:
- Seconds
- Description:
- This pattern refers to the time a thread spent in
an OpenMP API lock routine waiting for a lock that had been
previously acquired by another thread.
- Parent:
- Synchronization (OpenMP)
- Children:
- None
Barrier (OpenMP)
- Key words:
- OpenMP, barrier
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in implicit
(compiler-generated) or explicit (user-specified) OpenMP barrier
synchronization. Note that during measurement implicit barriers are
treated similar to explicit ones. The instrumentation procedure
replaces an implicit barrier with an explicit barrier enclosed by the
parallel construct. This is done by adding a nowait clause and a
barrier directive as the last statement of the parallel construct. In
cases where the implicit barrier cannot be removed (i.e., parallel
region), the explicit barrier is executed in front of the implicit
barrier, which will then be negligible because the team will already
be synchronized when reaching it. The synthetic explicit barrier
appears in the display as a special implicit barrier construct.
- Parent:
- (OpenMP)
- Children:
- Explicit, Implicit
Critical
- Key words:
- OpenMP, critical section
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent waiting in
front of a critical section occupied by another thread.
- Parent:
- Lock Competition
- Children:
- None
Explicit
- Key words:
- OpenMP, explicit barrier
- Unit:
- Seconds
- Description:
- Time spent in explicit (i.e., user-specified)
OpenMP barriers.
- Parent:
- Barrier (OpenMP)
- Children:
- Wait at Barrier
(Explicit)
Flush
- Key words:
- OpenMP, flush directive
- Unit:
- Seconds
- Description:
- Time spent in OpenMP flush directives.
- Parent:
- OpenMP
- Children:
- None
Fork
- Key words:
- OpenMP, team creation
- Unit:
- Seconds
- Description:
- Time spent by the master thread creating a team of
threads.
- Parent:
- OpenMP
- Children:
- None
Idle Threads
- Key words:
- OpenMP, sequential execution
- Unit:
- Seconds
- Description:
- This pattern refers to idle times on CPUs reserved
for slave threads when a process is executed sequentially before or
after an OpenMP parallel region.
- Parent:
- Time
- Children:
- None
Implicit
- Key words:
- OpenMP, implicit barrier
- Unit:
- Seconds
- Description:
- Time spent in implicit (i.e., compiler-generated)
OpenMP barriers.
- Parent:
- Barrier (OpenMP)
- Children:
- Wait at Barrier
(Implicit)
Lock Competition
- Key words:
- OpenMP, lock synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time a thread spent
waiting for a lock that had been previously acquired by another
thread. The lock may either had been acquired transparently at the
beginning of a critical section or using an explicit API call.
- Parent:
- (OpenMP)
- Children:
- API Lock Synchronization,
Critical
OpenMP
- Key words:
- OpenMP
- Unit:
- Seconds
- Description:
- Time spent on behalf of the OpenMP. This includes
time spent in OpenMP API calls as well as time spent in code generated
by the OpenMP compiler.
- Parent:
- Execution
- Children:
- Flush, Fork, Synchronization (OpenMP)
Synchronization (OpenMP)
- Key words:
- OpenMP, synchronization
- Unit:
- Seconds
- Description:
- Time spent in OpenMP barrier or lock
synchronization. Lock synchronization may be accomplished using either
API calls or critical sections.
- Parent:
- OpenMP
- Children:
- Barrier (OpenMP), Lock Competition
Wait at Barrier (Explicit)
- Key words:
- OpenMP, explicit barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an explicit (user-specified) OpenMP barrier. It refers to the
time spent in the barrier until all processes have reached it.
- Parent:
- Explicit
- Children:
- None
Wait at Barrier (Implicit)
- Key words:
- OpenMP, implicit barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an implicit (compiler-generated) OpenMP barrier. It refers to
the time spent in the barrier until all processes have reached it.
- Parent:
- Implicit
- Children:
- None
CPU & Memory Patterns
Floating Point Instructions
- Key words:
- Hardware counter
- Unit:
- Number of occurrences
- Description:
- Number of floating-point instructions
- Parent:
- None
- Children:
- None
L1 Cache Misses
- Key words:
- Hardware counter
- Unit:
- Number of occurrences
- Description:
- Number of level 1 data cache misses
- Parent:
- None
- Children:
- None