Ha!
Thanks a lot - that's more than I expected!
The dependency tree will help a lot - to decide
if that and which work can be done or not.
Your code will help also, but it seems to be a hole
project in C++!? C++ handels data and the
routines that access it very nice, but to speed it
up is difficult, because the compiler adds so many
assembler sections and stuff to keep the C++
possibilities open at every point. However...
If you want to check how many clocks a SINGLE CPU
did on your routines you can use the following Inline-ASM
makro for Intel 32/64 bits that runs ONLY on GCC and all
childs - GFortran, G++. Measuring time is not the best thing
here.
1. you need a 64-Bit-Integer-type (easy on X86_64, but IA32...)
These are for C or C++, but on Fortran you have to use
Fortran-types:
- Code: Select all
#ifdef __x86_64__
typedef unsigned long int tsc_t;
#else
typedef unsigned long long int tsc_t;
#endif
2. Here is the Clock-Counter-Makro brought to you by IBM.
It does not depend on C-, C++- or Fortran-language, but
on the GCC compiler suite - the languages are anyway all
the same on that level...
- Code: Select all
#define rdtscll(val) __asm__ __volatile__ ("rdtsc" : "=A" (val))
That's it:
3. Now, create 2 vars to bracket routine-calls in your code
- Code: Select all
tsc_t start, stop;
4. Now include it before and after the call:
- Code: Select all
rdtscll(start);
"your C or C++-Code"
rdtscll(stop);
cout << stop - start << " Calls" << endl;
This will give you the actual number of CPU-clocks,
but it differs - the min number should be the real count.
You can give the proc a high priority to minimize the
influence of other threads.
Thanks a lot! Looking at the code will take time...
M.
Greetings from germany.
P.S.
If you already know - sorry for wasting time.