KOJAK Patterns

General Patterns

Execution

Key words:
Execution time
Unit:
Seconds
Description:
Time spent on program execution but without the idle times of slave threads during OpenMP sequential execution. Note that for pure MPI applications, this pattern is equal to Time.
Parent:
Time
Children:
MPI, OpenMP

Time

Key words:
CPU allocation time
Unit:
Seconds
Description:
Time spent on program execution including the idle times of CPUs reserved for slave threads during OpenMP sequential execution. Total assumes that every thread of a process allocated a separate CPU during the entire runtime of the process.
Parent:
None
Children:
Execution, Idle Threads

MPI Patterns

Barrier Completion (MPI)

Key words:
MPI, synchronization
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI barriers after the first process has left the operation.
Parent:
Synchronization (MPI)
Children:
None

Collective

Key words:
MPI, collective communication
Unit:
Seconds
Description:
Time spent on MPI collective communication.
Parent:
Communication
Children:
Early Reduce, Late Broadcast, Wait at N x N

Communication

Key words:
MPI, communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI communication calls.
Parent:
MPI
Children:
Collective, Point-to-Point

Early Reduce

Key words:
MPI, n-to-1 communication
Unit:
Seconds
Description:
Collective communication operations that send data from all processes to one destination process (i.e., n-to-1) may suffer from waiting times if the destination process enters the operation earlier than its sending counterparts, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation.
Parent:
Collective
Children:
None

IO (MPI)

Key words:
MPI, IO
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI IO calls.
Parent:
MPI
Children:
None

Late Broadcast

Key words:
MPI, 1-to-n communication
Unit:
Seconds
Description:
Collective communication operations that send data from one source process to all processes (i.e., 1-to-n) may suffer from waiting times if destination processes enter the operation earlier than the source process, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation.
Parent:
Collective
Children:
None

Late Receiver

Key words:
MPI, delayed sender
Unit:
Seconds
Description:
A send operation is blocked until the corresponding receive operation is called. This can happen for several reasons. Either the MPI implementation is working in synchronous mode by default or the size of the message to be sent exceeds the available MPI-internal buffer space and the operation is blocked until the data is transferred to the receiver. The pattern refers to the time spend waiting as a result of this situation.
Parent:
Point-to-Point
Children:
Messages in Wrong Order (Late Receiver)

Late Sender

Key words:
MPI, delayed receiver
Unit:
Seconds
Description:
The time lost in a wait state caused by a blocking receive operation (e.g, MPI_Recv or MPI_Wait) that is posted earlier than the corresponding send operation.
Parent:
Point-to-Point
Children:
Messages in Wrong Order (Late Sender)

Messages in Wrong Order (Late Receiver)

Key words:
MPI, sending order of messages
Unit:
Seconds
Description:
A Late Receiver situation may be the result of messages that are sent in the wrong order. If a process sends messages to processes that are not ready to receive them, the sender's MPI-internal buffer may overflow so that from then on the process needs to send in synchronous mode causing a Late Receiver situation. This pattern refers to the time spent in a wait state as a result of this situation.
Parent:
Late Receiver
Children:
None

Messages in Wrong Order (Late Sender)

Key words:
MPI, acceptance order of messages
Unit:
Seconds
Description:
A Late Sender situation may be the result of messages that are received in the wrong order. If a process expects messages from one or more processes in a certain order, although these processes are sending them in a different order, the receiver may need to wait for a message if it tries to receive a message early that has been sent late. The situation can be avoided by receiving messages in the order in which they are sent instead. This pattern refers to the time spent in a wait state as a result of this situation.
Parent:
Late Sender
Children:
None

MPI

Key words:
MPI
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI calls.
Parent:
Execution
Children:
Communication, IO (MPI), Synchronization (MPI)

Point-to-Point

Key words:
MPI, point-to-point communication
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI point-to-point communication calls.
Parent:
Communication
Children:
Late Receiver, Late Sender

Synchronization (MPI)

Key words:
MPI, barrier
Unit:
Seconds
Description:
This pattern refers to the time spent in MPI barrier calls.
Parent:
MPI
Children:
Wait at Barrier (MPI)

Wait at Barrier (MPI)

Key words:
MPI, barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an MPI barrier, which is the time inside the barrier call until the last processes has reached the barrier. A large amount of waiting time spent in front of barriers can be an indication of load imbalance.
Parent:
Synchronization (MPI)
Children:
None

Wait at N x N

Key words:
MPI, n-to-n communication
Unit:
Seconds
Description:
Collective communication operations that send data from all processes to all processes (i.e., n-to-n) exhibit an inherent synchronization among all participants, that is, no process can finish the operation until the last process has started it. This pattern covers the time spent in n-to-n operations until all processes have reached it.
Parent:
Collective
Children:
None

OpenMP Patterns

API Lock Synchronization

Key words:
OpenMP, API lock routines
Unit:
Seconds
Description:
This pattern refers to the time a thread spent in an OpenMP API lock routine waiting for a lock that had been previously acquired by another thread.
Parent:
Synchronization (OpenMP)
Children:
None

Barrier (OpenMP)

Key words:
OpenMP, barrier
Unit:
Seconds
Description:
This pattern refers to the time spent in implicit (compiler-generated) or explicit (user-specified) OpenMP barrier synchronization. Note that during measurement implicit barriers are treated similar to explicit ones. The instrumentation procedure replaces an implicit barrier with an explicit barrier enclosed by the parallel construct. This is done by adding a nowait clause and a barrier directive as the last statement of the parallel construct. In cases where the implicit barrier cannot be removed (i.e., parallel region), the explicit barrier is executed in front of the implicit barrier, which will then be negligible because the team will already be synchronized when reaching it. The synthetic explicit barrier appears in the display as a special implicit barrier construct.
Parent:
(OpenMP)
Children:
Explicit, Implicit

Critical

Key words:
OpenMP, critical section
Unit:
Seconds
Description:
This pattern refers to the time spent waiting in front of a critical section occupied by another thread.
Parent:
Lock Competition
Children:
None

Explicit

Key words:
OpenMP, explicit barrier
Unit:
Seconds
Description:
Time spent in explicit (i.e., user-specified) OpenMP barriers.
Parent:
Barrier (OpenMP)
Children:
Wait at Barrier (Explicit)

Flush

Key words:
OpenMP, flush directive
Unit:
Seconds
Description:
Time spent in OpenMP flush directives.
Parent:
OpenMP
Children:
None

Fork

Key words:
OpenMP, team creation
Unit:
Seconds
Description:
Time spent by the master thread creating a team of threads.
Parent:
OpenMP
Children:
None

Idle Threads

Key words:
OpenMP, sequential execution
Unit:
Seconds
Description:
This pattern refers to idle times on CPUs reserved for slave threads when a process is executed sequentially before or after an OpenMP parallel region.
Parent:
Time
Children:
None

Implicit

Key words:
OpenMP, implicit barrier
Unit:
Seconds
Description:
Time spent in implicit (i.e., compiler-generated) OpenMP barriers.
Parent:
Barrier (OpenMP)
Children:
Wait at Barrier (Implicit)

Lock Competition

Key words:
OpenMP, lock synchronization
Unit:
Seconds
Description:
This pattern refers to the time a thread spent waiting for a lock that had been previously acquired by another thread. The lock may either had been acquired transparently at the beginning of a critical section or using an explicit API call.
Parent:
(OpenMP)
Children:
API Lock Synchronization, Critical

OpenMP

Key words:
OpenMP
Unit:
Seconds
Description:
Time spent on behalf of the OpenMP. This includes time spent in OpenMP API calls as well as time spent in code generated by the OpenMP compiler.
Parent:
Execution
Children:
Flush, Fork, Synchronization (OpenMP)

Synchronization (OpenMP)

Key words:
OpenMP, synchronization
Unit:
Seconds
Description:
Time spent in OpenMP barrier or lock synchronization. Lock synchronization may be accomplished using either API calls or critical sections.
Parent:
OpenMP
Children:
Barrier (OpenMP), Lock Competition

Wait at Barrier (Explicit)

Key words:
OpenMP, explicit barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an explicit (user-specified) OpenMP barrier. It refers to the time spent in the barrier until all processes have reached it.
Parent:
Explicit
Children:
None

Wait at Barrier (Implicit)

Key words:
OpenMP, implicit barrier
Unit:
Seconds
Description:
This pattern covers the time spent waiting in front of an implicit (compiler-generated) OpenMP barrier. It refers to the time spent in the barrier until all processes have reached it.
Parent:
Implicit
Children:
None

CPU & Memory Patterns

Floating Point Instructions

Key words:
Hardware counter
Unit:
Number of occurrences
Description:
Number of floating-point instructions
Parent:
None
Children:
None

L1 Cache Misses

Key words:
Hardware counter
Unit:
Number of occurrences
Description:
Number of level 1 data cache misses
Parent:
None
Children:
None