IO Stream performance.

Many of  the competitive programming portals available on the web advice –for certain problems involving huge input reading or output printing– the user to avoid cin/cout in spite of printf/scanf:

As input/output can reach huge size it is recommended to use fast input/output methods: for example, prefer to use scanf/printf instead of cin/cout in C++, prefer to use BufferedReader/PrintWriter instead of Scanner/System.out in Java.

This quote came directly from codeforces, and basically assume that printf and scanf are intrinsically faster than their C++ counterparts cout and cin. But are printf and scanf really faster? I’ve made a couple of tests, with some interesting results.

In order to obtain the hereby showed results I will use for testing purpose Linux (Kubuntu, kernel 4.2.0-23) with g++ (4.9.2) and Windows with g++ (4.9.2 from mingw) and Visual Studios 2015. The goal is to estimate which I/O procedure is faster and if there’s any significant differences when running those tests on Windows or Linux, and by differentiating the results from VS to g++.

I prepared separate and simple test for IO operations on strings for cin/cout and printf/scanf. The strings length is predefined and 0 terminated, I made separate tests for both strings ending with a newline ‘n‘ and –for coutstd::endl;

The reason behind that is because there’s a performance difference depending on the way the string is terminate, those are the very simple testing function:

void run_cout_endl_test()
	for(int i{0};i<num_of_runs;i++)

void run_cout_newline_test()
	for(int i{0};i<num_of_runs;i++)

I made separate measurements with std::sync_with_stdio set to true or false, is common knowledge that a C++ stream not synchronized with the standard C streams is faster, we’ll check if this is true..

The testing code is available on my github, before running the tests two values need to be defined: buffer_size and num_of_runs. The first define the size of the IO input, In my experiment i used two sizes 2^9 and 2^19, the latter value specify the amount of IO operations to perform in a loop, my values were 5000 for 2^19 and 5e10^5 for 2^9.

To measure the time required to execute the IO operations I used the C++ high_resolution_clock clock, the code to trigger each test is the following:

string run_test(const std::string& text,
		std::function<void()> test)
	stringstream output_str;
	auto start = chrono::high_resolution_clock::now();
	auto stop = chrono::high_resolution_clock::now();
	output_str<<text<<" "<<chrono::duration_cast(stop-start).count()/num_of_runs<<"ns"<<endl;
	return output_str.str();


As you can see the test code is straightforward, for the remaining implementation details have a look at my github.

The testing procedure is the following:

  1. Prepare the test code, setting up the values of buffer_size and num_of_runs. For the cin/scanf setup the code to use std::cin or scanf &c.
  2. Compile the code with g++ (-O3, -std=c++1y) or VS2015 (full-optimalization/speed) on the target testing OS
  3. Run the tests and collect the result. Each test needs a input/output redirection for stream in and stream out.

No real magic here.

Let’s have a look at the result’s I’ve got on Windows for the stream out operations. In the following pictures a bar with a name followed by SYNC is relative to a test with std::sync_with_stdio(true), if SYNC is missing then std::sync_with_stdio is set to false;

std::cout terminated with ‘n’ compiled with g++ is the faster. A call too std::cout takes 5664ns on average to complete. The other labels on the columns are showing the relative (to the best) performance.
std::cout terminated with ‘n’ compiled with g++ is again the faster. Notice how bad are performing the printf C variants. On average a call to std::cout needs 5114us to complete in this test.

The results are interesting, for the test with the small amount of data –512bytesstd::cout is clearly the faster stream out method, for some reason compiling with g++ and disabling the synchronization with the C streams cause a huge drop in the performance, I repeated this test many times and the results was always very similar.

If we look at the big data test again the faster method is std::cout, notice how bad printf performs with g++ here. I don’t know for sure what’s the reason behind this bad behavior with g++, this need further investigation. Notice also how similar are the performances between VS2015 and g++ for all the bars but printf, g++ in general seems to produce faster streaming methods.

Let’s have a look at the stream IN tests on Windows:

The best performance are achieved by std::cin with std::sync_with_stdio set to false, an average call takes 3802ns to complete, note how enabling the synchronization cause a drop in the performance for std::cin.
Again std::cin with std::sync_with_stdio to false if the faster, average time to complete 3320us.

VS2015 seems to produce faster binaries when using scanf if compared with the g++ version. From those results is rather clear that the synchronization between C++ and C streams should be avoided, in particular please look on how tremendously bad is performing std::cin with the synchronization enabled.

The reason why std::cin with std::sync_with_stdio set to true, is that much slower that the other stream methods needs further investigation.

The conclusion after this set of tests on Windows are:

  1. printf and scanf are way slower that std::cout and std::cin
  2. Terminating a string with ‘n’ produce faster code, that’s probably because std::endl force the stream to flush. (But for big buffers the difference is almost nonexistent)
  3. std::sync_with_stdio should be set to false if the code is expecting many stream in operations.

Now we shall have a look at the results of the very same tests on Linux. The only compiler I’ve been using is g++, perhaps one day I will repeat those experiments with clang.

Printf is the much slower than any other stream out method, std::cout ‘n’ terminated is the faster with an average execution time of 411ns.
For big data std::cout with std::endl is the faster, average execution time 1885us.

The measurement on Linux are rather confirming that printf is not faster at all, and for small buffer is way slower than his C++ counterparts.

Let’s have a look at the IN measurements:

Interesting, std::cin with std::sync_with_stdio enabled is 30 times slower than the same identical test with the synchronization disabled. The average execution time of the best performing is 369ns.
Again std::cin with the synchronization enabled is much slower than any other method. The fastest takes 3075us to complete.

I will investigate further the reason why std::cin is that much slower of the version of itself with the synchronization disabled, I will provide the result in a new post sooner or later.

The data are talking for themselves, better disable the synchronization with the C stream library and avoid printf and scanf, on both platforms their performance is outblasted by std::cout and std::cin.

The conclusion for the measuremnts on Linux are:

  1. printf and scanf are way slower that std::cout and std::cin
  2. Disable the synchronization with the C stream library, always!
  3. Better to terminate a string with ‘n’ for small lengths, if the string is very long std::endl perform a little better.

After all this writing, seems that I need to investigate a little further some odd results I found in my measurements, for example why g++ produce such a slow version of std::cin if the C synchronization is enabled, fortunately all the source code is available so finding out the reason is just a matter of digging long enough the mud.

Thanks for reading!


