Back in the day, when I compared how long perfmon and perctr took to give me some basic readings at runtime (not setup time), I found perctr to be much faster than perfmon.
I can dig out the actual numbers if anybody needs to know and they escape me right now, but it was more than an order of magnitude. More importantly, perfctr allowed me to get the registers faster than gettimeofday, which allowed me to just replace a whole bunch of time-only monitoring code with perctr. This is through PAPI, BTW. Perfmon was more like an ultra-heavy system call.
Did anybody measure how the new 2.6.30 perf_counters compare? I asked on the perctr list but (understandably) they didn't have that info ready.