Recently, I had to do latency testing of message passing between kernel-land and user-land, so I first looked after time structures and functions available in both the kernel and the libc, and then I looked after possible tuning of the kernel.
Obviously gettimeofday is not precise enough because it returns a timeval structure (defined in time.h), which only has a microsecond resolution:
1 2 3 4 | struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ }; |
Instead, the timespec structure (also defined in time.h) seems perfect with a nanosecond resolution:
1 2 3 4 | struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; |
So, I stumbled upon kernel POSIX timers which can be accessed from user-land with the following functions (triggering syscalls):
1 2 | #include <time.h> int clock_gettime(clockid_t clk_id, struct timespec *tp); |
According to the manual, a system-wide realtime clock is provided (CLOCK_REALTIME), perfect here.
When you look at the libc and kernel source kernel/posix-timers.c, it is interesting how these clocks are implemented... and definitely not trivial!
Now how to use it? Consider the following code which measures the latency between two gettime calls, for each type of clock.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdio.h> #include <time.h> int main( int argc, char **argv, char **arge) { struct timespec tps, tpe; if ((clock_gettime(CLOCK_REALTIME, &tps) != 0) || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) { perror ( "clock_gettime" ); return -1; } printf ( "%lu s, %lu ns\n" , tpe.tv_sec-tps.tv_sec, tpe.tv_nsec-tps.tv_nsec); return 0; } |
Compile and link with librt (for clock_gettime).
결과적으로 Library 에서 librt 는 반드시 컴파일 라이브러리 헤더에 포함시켜줘야 한다. (Real Time 관련 헤더라 빠지면 에러난다)
1 | $ gcc -Wall -lrt -o time time .c |
Now here is what I get on a Core i7 950, Linux 2.6.33, libc 2.10.2-6, Debian sid.
이와같이, 컴퓨터 사양에 의해서도 nano second 는 엄청난 차이를 보여준다.
1 2 | $ . /time 0 s, 113 ns |
Pretty good!
Interesting thing, if in another shell if I run a do-nothing loop (« while(1); »), I have better timings:
1 2 | $ . /time 0 s, 58 ns |
Apparently, it comes from a feature of modern CPU: the ability to shutdown ("relax") when idle. And it takes some time for the CPU to wake-up. You can modify this behaviour by using the idle=poll kernel command line. This way, the CPU never relaxes and is always ready (polling) to work.