定时器功能,使用 C + + 提供以纳秒为单位的时间

我希望计算一个 API 返回一个值所需的时间。 采取这种行动所需的时间是在纳秒的空间。因为 API 是一个 C + + 类/函数,所以我使用 timer.h 来计算同样的东西:

  #include <ctime>
#include <iostream>


using namespace std;


int main(int argc, char** argv) {


clock_t start;
double diff;
start = clock();
diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
cout<<"printf: "<< diff <<'\n';


return 0;
}

上面的代码以秒为单位给出时间。我如何在纳秒内得到相同的结果并且更加精确?

210138 次浏览

In general, for timing how long it takes to call a function, you want to do it many more times than just once. If you call your function only once and it takes a very short time to run, you still have the overhead of actually calling the timer functions and you don't know how long that takes.

For example, if you estimate your function might take 800 ns to run, call it in a loop ten million times (which will then take about 8 seconds). Divide the total time by ten million to get the time per call.

With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.

Still, something like that is a start:

Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.

inline __int64 GetCpuClocks()
{


// Counter
struct { int32 low, high; } counter;


// Use RDTSC instruction to get clocks count
__asm push EAX
__asm push EDX
__asm __emit 0fh __asm __emit 031h // RDTSC
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX


// Return result
return *(__int64 *)(&counter);


}

This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.

Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().

If you need subsecond precision, you need to use system-specific extensions, and will have to check with the documentation for the operating system. POSIX supports up to microseconds with gettimeofday, but nothing more precise since computers didn't have frequencies above 1GHz.

If you are using Boost, you can check boost::posix_time.

If this is for Linux, I've been using the function "gettimeofday", which returns a struct that gives the seconds and microseconds since the Epoch. You can then use timersub to subtract the two to get the difference in time, and convert it to whatever precision of time you want. However, you specify nanoseconds, and it looks like the function clock_gettime() is what you're looking for. It puts the time in terms of seconds and nanoseconds into the structure you pass into it.

What others have posted about running the function repeatedly in a loop is correct.

For Linux (and BSD) you want to use clock_gettime().

#include <sys/time.h>


int main()
{
timespec ts;
// clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}

For windows you want to use the QueryPerformanceCounter. And here is more on QPC

Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:

QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).

EDIT 2013/07/16:

It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx

...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another...

However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.

This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.

Other fine reading related to timers:

See the comments for more details.

You can use the following function with gcc running under x86 processors:

unsigned long long rdtsc()
{
#define rdtsc(low, high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))


unsigned int low, high;
rdtsc(low, high);
return ((ulonglong)high << 32) | low;
}

with Digital Mars C++:

unsigned long long rdtsc()
{
_asm
{
rdtsc
}
}

which reads the high performance timer on the chip. I use this when doing profiling.

I am using the following to get the desired results:

#include <time.h>
#include <iostream>
using namespace std;


int main (int argc, char** argv)
{
// reset the clock
timespec tS;
tS.tv_sec = 0;
tS.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
...
... <code to check for the time to be put here>
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
cout << "Time taken is: " << tS.tv_sec << " " << tS.tv_nsec << endl;


return 0;
}

To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime(). The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)

inline uint64_t rdtsc()
{
uint32_t lo, hi;
__asm__ __volatile__ (
"xorl %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a" (lo), "=d" (hi)
:
: "%ebx", "%ecx" );
return (uint64_t)hi << 32 | lo;
}

and for clock_gettime: (I chose microsecond resolution arbitrarily)

#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}

the timing and values produced:

Absolute values:
rdtsc           = 4571567254267600
clock_gettime   = 1278605535506855


Processing time: (10000000 runs)
rdtsc           = 2292547353
clock_gettime   = 1031119636

I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.

#include <dos.h>


void main()
{
struct  time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...


// read the time here remove Hours and min if the time is in sec


gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d  Hundreds:%d\n",t.ti_hour-Hour,
t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main

Using Brock Adams's method, with a simple class:

int get_cpu_ticks()
{
LARGE_INTEGER ticks;
QueryPerformanceFrequency(&ticks);
return ticks.LowPart;
}


__int64 get_cpu_clocks()
{
struct { int32 low, high; } counter;


__asm cpuid
__asm push EDX
__asm rdtsc
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX


return *(__int64 *)(&counter);
}


class cbench
{
public:
cbench(const char *desc_in)
: desc(strdup(desc_in)), start(get_cpu_clocks()) { }
~cbench()
{
printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
if(desc) free(desc);
}
private:
char *desc;
__int64 start;
};

Usage Example:

int main()
{
{
cbench c("test");
... code ...
}
return 0;
}

Result:

test took: 0.0002 ms

Has some function call overhead, but should be still more than fast enough :)

This new answer uses C++11's <chrono> facility. While there are other answers that show how to use <chrono>, none of them shows how to use <chrono> with the RDTSC facility mentioned in several of the other answers here. So I thought I would show how to use RDTSC with <chrono>. Additionally I'll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system's built-in clock facilities (which will likely be based on clock(), clock_gettime() and/or QueryPerformanceCounter.

Note that the RDTSC instruction is x86-specific. QueryPerformanceCounter is Windows only. And clock_gettime() is POSIX only. Below I introduce two new clocks: std::chrono::high_resolution_clock and std::chrono::system_clock, which, if you can assume C++11, are now cross-platform.

First, here is how you create a C++11-compatible clock out of the Intel rdtsc assembly instruction. I'll call it x::clock:

#include <chrono>


namespace x
{


struct clock
{
typedef unsigned long long                 rep;
typedef std::ratio<1, 2'800'000'000>       period; // My machine is 2.8 GHz
typedef std::chrono::duration<rep, period> duration;
typedef std::chrono::time_point<clock>     time_point;
static const bool is_steady =              true;


static time_point now() noexcept
{
unsigned lo, hi;
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return time_point(duration(static_cast<rep>(hi) << 32 | lo));
}
};


}  // x

All this clock does is count CPU cycles and store it in an unsigned 64-bit integer. You may need to tweak the assembly language syntax for your compiler. Or your compiler may offer an intrinsic you can use instead (e.g. now() {return __rdtsc();}).

To build a clock you have to give it the representation (storage type). You must also supply the clock period, which must be a compile time constant, even though your machine may change clock speed in different power modes. And from those you can easily define your clock's "native" time duration and time point in terms of these fundamentals.

If all you want to do is output the number of clock ticks, it doesn't really matter what number you give for the clock period. This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds. And in that case, the more accurate you are able to supply the clock speed, the more accurate will be the conversion to nanoseconds, (milliseconds, whatever).

Below is example code which shows how to use x::clock. Actually I've templated the code on the clock as I'd like to show how you can use many different clocks with the exact same syntax. This particular test is showing what the looping overhead is when running what you want to time under a loop:

#include <iostream>


template <class clock>
void
test_empty_loop()
{
// Define real time units
typedef std::chrono::duration<unsigned long long, std::pico> picoseconds;
// or:
// typedef std::chrono::nanoseconds nanoseconds;
// Define double-based unit of clock tick
typedef std::chrono::duration<double, typename clock::period> Cycle;
using std::chrono::duration_cast;
const int N = 100000000;
// Do it
auto t0 = clock::now();
for (int j = 0; j < N; ++j)
asm volatile("");
auto t1 = clock::now();
// Get the clock ticks per iteration
auto ticks_per_iter = Cycle(t1-t0)/N;
std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
// Convert to real time units
std::cout << duration_cast<picoseconds>(ticks_per_iter).count()
<< "ps per iteration\n";
}

The first thing this code does is create a "real time" unit to display the results in. I've chosen picoseconds, but you can choose any units you like, either integral or floating point based. As an example there is a pre-made std::chrono::nanoseconds unit I could have used.

As another example I want to print out the average number of clock cycles per iteration as a floating point, so I create another duration, based on double, that has the same units as the clock's tick does (called Cycle in the code).

The loop is timed with calls to clock::now() on either side. If you want to name the type returned from this function it is:

typename clock::time_point t0 = clock::now();

(as clearly shown in the x::clock example, and is also true of the system-supplied clocks).

To get a duration in terms of floating point clock ticks one merely subtracts the two time points, and to get the per iteration value, divide that duration by the number of iterations.

You can get the count in any duration by using the count() member function. This returns the internal representation. Finally I use std::chrono::duration_cast to convert the duration Cycle to the duration picoseconds and print that out.

To use this code is simple:

int main()
{
std::cout << "\nUsing rdtsc:\n";
test_empty_loop<x::clock>();


std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
test_empty_loop<std::chrono::high_resolution_clock>();


std::cout << "\nUsing std::chrono::system_clock:\n";
test_empty_loop<std::chrono::system_clock>();
}

Above I exercise the test using our home-made x::clock, and compare those results with using two of the system-supplied clocks: std::chrono::high_resolution_clock and std::chrono::system_clock. For me this prints out:

Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration


Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration


Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration

This shows that each of these clocks has a different tick period, as the ticks per iteration is vastly different for each clock. However when converted to a known unit of time (e.g. picoseconds), I get approximately the same result for each clock (your mileage may vary).

Note how my code is completely free of "magic conversion constants". Indeed, there are only two magic numbers in the entire example:

  1. The clock speed of my machine in order to define x::clock.
  2. The number of iterations to test over. If changing this number makes your results vary greatly, then you should probably make the number of iterations higher, or empty your computer of competing processes while testing.

What do you think about that:

    int iceu_system_GetTimeNow(long long int *res)
{
static struct timespec buffer;
//
#ifdef __CYGWIN__
if (clock_gettime(CLOCK_REALTIME, &buffer))
return 1;
#else
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &buffer))
return 1;
#endif
*res=(long long int)buffer.tv_sec * 1000000000LL + (long long int)buffer.tv_nsec;
return 0;
}

You can use Embedded Profiler (free for Windows and Linux) which has an interface to a multiplatform timer (in a processor cycle count) and can give you a number of cycles per seconds:

EProfilerTimer timer;
timer.Start();


... // Your code here


const uint64_t number_of_elapsed_cycles = timer.Stop();
const uint64_t nano_seconds_elapsed =
mumber_of_elapsed_cycles / (double) timer.GetCyclesPerSecond() * 1000000000;

Recalculation of cycle count to time is possibly a dangerous operation with modern processors where CPU frequency can be changed dynamically. Therefore to be sure that converted times are correct, it is necessary to fix processor frequency before profiling.

For C++11, here is a simple wrapper:

#include <iostream>
#include <chrono>


class Timer
{
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count(); }


private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
};

Or for C++03 on *nix,

class Timer
{
public:
Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }


double elapsed() {
clock_gettime(CLOCK_REALTIME, &end_);
return end_.tv_sec - beg_.tv_sec +
(end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
}


void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }


private:
timespec beg_, end_;
};

Example of usage:

int main()
{
Timer tmr;
double t = tmr.elapsed();
std::cout << t << std::endl;


tmr.reset();
t = tmr.elapsed();
std::cout << t << std::endl;
return 0;
}

From https://gist.github.com/gongzhitaao/7062087

Here is a nice Boost timer that works well:

//Stopwatch.hpp


#ifndef STOPWATCH_HPP
#define STOPWATCH_HPP


//Boost
#include <boost/chrono.hpp>
//Std
#include <cstdint>


class Stopwatch
{
public:
Stopwatch();
virtual         ~Stopwatch();
void            Restart();
std::uint64_t   Get_elapsed_ns();
std::uint64_t   Get_elapsed_us();
std::uint64_t   Get_elapsed_ms();
std::uint64_t   Get_elapsed_s();
private:
boost::chrono::high_resolution_clock::time_point _start_time;
};


#endif // STOPWATCH_HPP




//Stopwatch.cpp


#include "Stopwatch.hpp"


Stopwatch::Stopwatch():
_start_time(boost::chrono::high_resolution_clock::now()) {}


Stopwatch::~Stopwatch() {}


void Stopwatch::Restart()
{
_start_time = boost::chrono::high_resolution_clock::now();
}


std::uint64_t Stopwatch::Get_elapsed_ns()
{
boost::chrono::nanoseconds nano_s = boost::chrono::duration_cast<boost::chrono::nanoseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(nano_s.count());
}


std::uint64_t Stopwatch::Get_elapsed_us()
{
boost::chrono::microseconds micro_s = boost::chrono::duration_cast<boost::chrono::microseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(micro_s.count());
}


std::uint64_t Stopwatch::Get_elapsed_ms()
{
boost::chrono::milliseconds milli_s = boost::chrono::duration_cast<boost::chrono::milliseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(milli_s.count());
}


std::uint64_t Stopwatch::Get_elapsed_s()
{
boost::chrono::seconds sec = boost::chrono::duration_cast<boost::chrono::seconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(sec.count());
}

Minimalistic copy&paste-struct + lazy usage

If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include's. This is the only instance in which I sacrifice Allman-style formatting.

You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds, microseconds, milliseconds, seconds, minutes, or hours.

#include <chrono>
struct MeasureTime
{
using precision = std::chrono::microseconds;
std::vector<std::chrono::steady_clock::time_point> times;
std::chrono::steady_clock::time_point oneLast;
void p() {
std::cout << "Mark "
<< times.size()/2
<< ": "
<< std::chrono::duration_cast<precision>(times.back() - oneLast).count()
<< std::endl;
}
void m() {
oneLast = times.back();
times.push_back(std::chrono::steady_clock::now());
}
void t() {
m();
p();
m();
}
MeasureTime() {
times.push_back(std::chrono::steady_clock::now());
}
};

Usage

MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...

Standard output result

Mark 1: 123
Mark 2: 32
Mark 3: 433234

If you want summary after execution

If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):

void s() { // summary
int i = 0;
std::chrono::steady_clock::time_point tprev;
for(auto tcur : times)
{
if(i > 0)
{
std::cout << "Mark " << i << ": "
<< std::chrono::duration_cast<precision>(tprev - tcur).count()
<< std::endl;
}
tprev = tcur;
++i;
}
}

So then you can just use:

MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();

Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s() and m.t().

plf::nanotimer is a lightweight option for this, works in Windows, Linux, Mac and BSD etc. Has ~microsecond accuracy depending on OS:

  #include "plf_nanotimer.h"
#include <iostream>


int main(int argc, char** argv)
{
plf::nanotimer timer;


timer.start()


// Do something here


double results = timer.get_elapsed_ns();
std::cout << "Timing: " << results << " nanoseconds." << std::endl;
return 0;
}