Discussion:
Compiling for single-threaded use (implicit threading support difference in 4.9.1 vs. 4.8.1)
Johan Alfredsson
2014-10-14 17:29:52 UTC
Permalink
Hi,

I've noticed that g++ 4.9.1 behaves differently than 4.8.1 with
regards to (implicit) threading support. The 4.8.1 and 4.9.1 compilers
used were configured with identical options (*) to the configure
script (except --prefix) using --enable-threads=posix.

For the following test-case

#include <string>
#include <iostream>

int main() {
std::string test("test");
std::cout << test << std::endl;
}

invoking g++ -O3 test.cc -o test, the 'test' binary is compiled with
multi-threading support using 4.9.1 but not using 4.8.1, e.g. for the
libstdc++ pool allocator a mutex is locked when allocating memory for
the string in the test program above while no such locking is present
in the 'test' binary compiled with 4.8.1. (There is also a difference
in that there is a weak symbol __pthread_key_create in the binary
compiled with 4.9.1 but no such thing for the 4.8.1 case.)

As my application is single-threaded, I don't want to pay the
performance penalty of mutexes etc. Hence, my question is if it is
possible to explicitly request gcc to compile my application in
single-threaded mode.

I'm also curious about what the correct behaviour is -- I found some
PR:s in bugzilla that may be related, like 61144. To me, it seems like
the implicit way of figuring out whether to use locks or not is not a
robust solution as you might dlopen() a library that uses threads from
a single-threaded application and thereby risk data races.

Regards,

/Johan Alfredsson

(*) ./configure --enable-languages=c,c++,fortran
--enable-targets=x86_64-suse-linux,i686-suse-linux
--prefix=/usr/local/gcc/<version> --with-gnu-as
--with-as=/usr/local/binutils-2.23.2/bin/as --with-gnu-ld
--with-ld=/usr/local/binutils-2.23.2/bin/ld
--with-gmp=/usr/local/gmp-5.0.1 --with-mpfr=/usr/local/mpfr-3.0.0
--with-mpc=/usr/local/mpc-0.8.2 --enable-threads=posix --enable-shared
--enable-__cxa_atexit --enable-libstdcxx-allocator=pool
x86_64-suse-linux
Jonathan Wakely
2014-10-15 09:14:24 UTC
Permalink
Post by Johan Alfredsson
Hi,
I've noticed that g++ 4.9.1 behaves differently than 4.8.1 with
regards to (implicit) threading support. The 4.8.1 and 4.9.1 compilers
used were configured with identical options (*) to the configure
script (except --prefix) using --enable-threads=posix.
For the following test-case
#include <string>
#include <iostream>
int main() {
std::string test("test");
std::cout << test << std::endl;
}
invoking g++ -O3 test.cc -o test, the 'test' binary is compiled with
multi-threading support using 4.9.1 but not using 4.8.1, e.g. for the
libstdc++ pool allocator a mutex is locked when allocating memory for
the string in the test program above while no such locking is present
in the 'test' binary compiled with 4.8.1. (There is also a difference
in that there is a weak symbol __pthread_key_create in the binary
compiled with 4.9.1 but no such thing for the 4.8.1 case.)
Using a mutex in a single-threaded program would be a bug.
Post by Johan Alfredsson
As my application is single-threaded, I don't want to pay the
performance penalty of mutexes etc. Hence, my question is if it is
possible to explicitly request gcc to compile my application in
single-threaded mode.
It should happen automatically, there's no way to request it because
there should be no need.

I'll try to reproduce what you're seeing.
Jonathan Wakely
2014-10-15 10:44:53 UTC
Permalink
Post by Jonathan Wakely
Post by Johan Alfredsson
Hi,
I've noticed that g++ 4.9.1 behaves differently than 4.8.1 with
regards to (implicit) threading support. The 4.8.1 and 4.9.1 compilers
used were configured with identical options (*) to the configure
script (except --prefix) using --enable-threads=posix.
For the following test-case
#include <string>
#include <iostream>
int main() {
std::string test("test");
std::cout << test << std::endl;
}
invoking g++ -O3 test.cc -o test, the 'test' binary is compiled with
multi-threading support using 4.9.1 but not using 4.8.1, e.g. for the
libstdc++ pool allocator a mutex is locked when allocating memory for
the string in the test program above while no such locking is present
in the 'test' binary compiled with 4.8.1. (There is also a difference
in that there is a weak symbol __pthread_key_create in the binary
compiled with 4.9.1 but no such thing for the 4.8.1 case.)
Using a mutex in a single-threaded program would be a bug.
Post by Johan Alfredsson
As my application is single-threaded, I don't want to pay the
performance penalty of mutexes etc. Hence, my question is if it is
possible to explicitly request gcc to compile my application in
single-threaded mode.
It should happen automatically, there's no way to request it because
there should be no need.
I'll try to reproduce what you're seeing.
I can't reproduce the problem with GCC 4.9.1 or trunk. I'm using a
Fedora 20 x86_64 system, so it's possible there's something different
on your distro.

The code below should be equivalent to what you're running, but
without depending on --enable-libstdcxx-allocator=
pool

#include <string>
#include <iostream>
#include <ext/pool_allocator.h>

int main() {
std::basic_string<char ,std::char_traits<char>,
__gnu_cxx::__pool_alloc<char> > test("test");
std::cout << test << std::endl;
}


I don't see any mutex locking or atomic operations because
__gthread_active_p() always returns false.
Johan Alfredsson
2014-10-16 14:43:46 UTC
Permalink
Post by Jonathan Wakely
Post by Jonathan Wakely
Post by Johan Alfredsson
Hi,
I've noticed that g++ 4.9.1 behaves differently than 4.8.1 with
regards to (implicit) threading support. The 4.8.1 and 4.9.1 compilers
used were configured with identical options (*) to the configure
script (except --prefix) using --enable-threads=posix.
For the following test-case
#include <string>
#include <iostream>
int main() {
std::string test("test");
std::cout << test << std::endl;
}
invoking g++ -O3 test.cc -o test, the 'test' binary is compiled with
multi-threading support using 4.9.1 but not using 4.8.1, e.g. for the
libstdc++ pool allocator a mutex is locked when allocating memory for
the string in the test program above while no such locking is present
in the 'test' binary compiled with 4.8.1. (There is also a difference
in that there is a weak symbol __pthread_key_create in the binary
compiled with 4.9.1 but no such thing for the 4.8.1 case.)
Using a mutex in a single-threaded program would be a bug.
Indeed. I don't use mutexes but things like the pool allocator does
even if I don't want/need that (see below).
Post by Jonathan Wakely
Post by Jonathan Wakely
Post by Johan Alfredsson
As my application is single-threaded, I don't want to pay the
performance penalty of mutexes etc. Hence, my question is if it is
possible to explicitly request gcc to compile my application in
single-threaded mode.
It should happen automatically, there's no way to request it because
there should be no need.
I'll try to reproduce what you're seeing.
I can't reproduce the problem with GCC 4.9.1 or trunk. I'm using a
Fedora 20 x86_64 system, so it's possible there's something different
on your distro.
Sorry, my mistake. It turned out that librt was implicitly linked in
in the 4.9.1 case. However, the only things I use from librt are high
precision timers, so a switch to ensure no performance hit in my own
code would be great.

Regards,

/Johan
Marc Glisse
2014-10-16 15:01:43 UTC
Permalink
Post by Johan Alfredsson
Sorry, my mistake. It turned out that librt was implicitly linked in
in the 4.9.1 case. However, the only things I use from librt are high
precision timers, so a switch to ensure no performance hit in my own
code would be great.
timers were moved out of librt exactly for this reason. I believe Fedora
20 should have a recent enough glibc that you don't need -lrt.
--
Marc Glisse
Jonathan Wakely
2014-10-16 16:06:32 UTC
Permalink
Post by Johan Alfredsson
Sorry, my mistake. It turned out that librt was implicitly linked in
in the 4.9.1 case. However, the only things I use from librt are high
precision timers, so a switch to ensure no performance hit in my own
code would be great.
Which suggests you were using --enable-libstdcxx-time=rt and so were
not using the same configuration for both compilers.

If you had told us the actual configurations for both versions that
would have been obvious!

As Marc says, use a newer glibc to get high-precision timers without
needing librt.
Johan Alfredsson
2014-10-16 17:51:25 UTC
Permalink
Post by Jonathan Wakely
Post by Johan Alfredsson
Sorry, my mistake. It turned out that librt was implicitly linked in
in the 4.9.1 case. However, the only things I use from librt are high
precision timers, so a switch to ensure no performance hit in my own
code would be great.
Which suggests you were using --enable-libstdcxx-time=rt and so were
not using the same configuration for both compilers.
No, this was something that happened due to the configuration of our
in-house build system. As I said, the configuration of the compilers
were identical.
Post by Jonathan Wakely
If you had told us the actual configurations for both versions that
would have been obvious!
I did, and that was not the issue here.
Post by Jonathan Wakely
As Marc says, use a newer glibc to get high-precision timers without
needing librt.
I'll look into that.

Thanks,

/Johan

Marc Glisse
2014-10-15 11:40:00 UTC
Permalink
Post by Jonathan Wakely
Post by Johan Alfredsson
As my application is single-threaded, I don't want to pay the
performance penalty of mutexes etc. Hence, my question is if it is
possible to explicitly request gcc to compile my application in
single-threaded mode.
It should happen automatically, there's no way to request it because
there should be no need.
Well, I would quite like a compilation flag
-fI-promise-not-to-use-threads, that would automatically turn atomics into
plain variables with regular operations, turn TLS into regular memory,
remove locks, etc, and perform all the optimizations this enables. It
isn't quite the same as a runtime test that only skips a few mutexes in
the library.
--
Marc Glisse
leon zadorin
2014-10-15 23:15:08 UTC
Permalink
Post by Jonathan Wakely
Post by Johan Alfredsson
As my application is single-threaded, I don't want to pay the
performance penalty of mutexes etc. Hence, my question is if it is
possible to explicitly request gcc to compile my application in
single-threaded mode.
It should happen automatically, there's no way to request it because
there should be no need.
Well, I would quite like a compilation flag -fI-promise-not-to-use-threads,
that would automatically turn atomics into plain variables with regular
operations, turn TLS into regular memory, remove locks, etc, and perform all
the optimizations this enables. It isn't quite the same as a runtime test
that only skips a few mutexes in the library.
Yeah, that would be awesome :) I would love to get a feel for how gcc
is currently poised with respect to devoting development resources
towards retaining single-thread optimizations based of command-line
switches... I guess with limited resources (developers' time et. al.)
this may need to be seen in perspective... although, personally, I
would love to see aggressive single-thread optimizations wherever
possible (e.g. programmer setting a "green light" switch for
single-threaded assumptions, even on c++{11,14} code with assumptions
that it doesn't use any of the "multi-threaded" features of the
language) :) :) :)

As far as I know of this stuff, the non-gcc llvm/clang has just
introduced -mthread-model=single to clang (and -thread-model=single
with -loweratomic being available to "opt" bitcode optimizer
previously already for quite some time)... however, at this time,
mostly appears to be implemented for ARM-based architectures... but
this is taking me outside the GCC context...
Continue reading on narkive:
Loading...