Discussion:
shared libraries + lto ?
Alain Meunier
2014-08-01 09:09:38 UTC
Permalink
Hello,

I would like to know if one can use lto with shared libraries and leverage all the goodness of both worlds ?

My tests show that it works but not sure if lto brang something or not in the game.

gcc -O3 -shared -fPIC my_shared.c -o libmy_shared.so -lto

and linking

gcc -O3 my_app.c -o my_binary -lmy_shared -lto

I would like to keep the ability to have shared libraries. Will gcc make something out of it ?

Thanks
Vincent Lefevre
2014-08-01 10:47:16 UTC
Permalink
Post by Alain Meunier
I would like to know if one can use lto with shared libraries and
leverage all the goodness of both worlds ?
My tests show that it works but not sure if lto brang something or not in the game.
I did some timings with MPFR + GMP two years ago and I found that it
was useless to use LTO with the shared library (I even wonder whether
this can make sense at all). Here are the results:

Precision 10:
shared static
arg macro arg macro
Default 3.480 3.470 2.670 2.690
LTO paths 4.000 3.980 2.640 2.660
With LTO 4.110 3.970 2.320 2.410

Precision 80:
shared static
arg macro arg macro
Default 5.520 5.470 4.950 5.000
LTO paths 5.510 5.500 4.440 4.470
With LTO 5.540 5.520 4.040 4.120

Precision 300:
shared static
arg macro arg macro
Default 6.770 6.560 5.950 5.960
LTO paths 6.140 5.980 5.060 5.020
With LTO 5.980 5.960 4.280 4.400

Conclusion (on these examples):
* There isn't much difference between a precision given in argument
and a fixed precision given via a macro (known at compile time of
the main program).
* Using a static library instead of a shared library can yield a
speedup of up to 44% (this happens with LTO enabled), i.e. that's
almost twice as fast!
* LTO should be used only with -static (for performance, but also
when considering practical use, it is pointless to use LTO with
shared libraries).
* The LTO speedup ("With LTO" compared to "LTO paths" in static) can
be up to 15% (28% if we compare to the default static library, but
we are not just measuring LTO in this case).
* The LTO speedup compared to traditional linking (shared library
from the vendor, here Debian/unstable) can be up to 37%.

Note: The versions of MPFR in "Default" (Debian packages providing
MPFR 3.1.0-p10) and with LTO paths (MPFR 3.1.1-p2) are not exactly
the same, but the differences consist only of bug fixes, so that the
tested source code should be the same.
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Alain Meunier
2014-08-01 11:59:28 UTC
Permalink
Thanks Vincent,

I think I will not use shared libraries in this case.
I will stick static ones.

But on the whole net there are many different expressions of correctness when related to lto.
Could you clarify :

say I have a static library libfoo.a:
void function cool(int * restrict a,int * restrict b){
       //do something useful
}

I also have  libbar.a:
void function eatTheWorld(int * restrict a,int * restrict b){
       //do something useful
}
Both compiled with gcc
gcc /*optim. flags here*/ -fPIC foo.c -o libfoo.a -flto
gcc /*optim. flags here*/ -fPIC bar.c -o libbar.a -flto

and a main program my_app.c :

#include here
int main(){
    int f1 = 5;
    int f2 = 3;
    cool(&f1,&f2);
    eatTheWorld(&f1,&f2)

return 0;
}

I will compile it with
gcc /*optim. flags here*/  my_app.c -lfoo -lbar -flto

Is that it ? Nothing more ?

This article suggests otherwise : http://hubicka.blogspot.fr/2014/04/linktime-optimization-in-gcc-2-firefox.html

No plugin or all this mess ?
I am on Debian testing 64 bits.

The lto best use is still a bit unclear to yield the best performance, at least to me.
----------------------------------------
Date: Fri, 1 Aug 2014 12:47:16 +0200
Subject: Re: shared libraries + lto ?
Post by Alain Meunier
I would like to know if one can use lto with shared libraries and
leverage all the goodness of both worlds ?
My tests show that it works but not sure if lto brang something or
not in the game.
I did some timings with MPFR + GMP two years ago and I found that it
was useless to use LTO with the shared library (I even wonder whether
shared static
arg macro arg macro
Default 3.480 3.470 2.670 2.690
LTO paths 4.000 3.980 2.640 2.660
With LTO 4.110 3.970 2.320 2.410
shared static
arg macro arg macro
Default 5.520 5.470 4.950 5.000
LTO paths 5.510 5.500 4.440 4.470
With LTO 5.540 5.520 4.040 4.120
shared static
arg macro arg macro
Default 6.770 6.560 5.950 5.960
LTO paths 6.140 5.980 5.060 5.020
With LTO 5.980 5.960 4.280 4.400
* There isn't much difference between a precision given in argument
and a fixed precision given via a macro (known at compile time of
the main program).
* Using a static library instead of a shared library can yield a
speedup of up to 44% (this happens with LTO enabled), i.e. that's
almost twice as fast!
* LTO should be used only with -static (for performance, but also
when considering practical use, it is pointless to use LTO with
shared libraries).
* The LTO speedup ("With LTO" compared to "LTO paths" in static) can
be up to 15% (28% if we compare to the default static library, but
we are not just measuring LTO in this case).
* The LTO speedup compared to traditional linking (shared library
from the vendor, here Debian/unstable) can be up to 37%.
Note: The versions of MPFR in "Default" (Debian packages providing
MPFR 3.1.0-p10) and with LTO paths (MPFR 3.1.1-p2) are not exactly
the same, but the differences consist only of bug fixes, so that the
tested source code should be the same.
--
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Vincent Lefevre
2014-08-03 14:48:34 UTC
Permalink
On 2014-08-01 13:59:28 +0200, Alain Meunier wrote:
[...]
Post by Alain Meunier
Both compiled with gcc
gcc /*optim. flags here*/ -fPIC foo.c -o libfoo.a -flto
gcc /*optim. flags here*/ -fPIC bar.c -o libbar.a -flto
#include here
int main(){
    int f1 = 5;
    int f2 = 3;
    cool(&f1,&f2);
    eatTheWorld(&f1,&f2)
return 0;
}
I will compile it with
gcc /*optim. flags here*/  my_app.c -lfoo -lbar -flto
Is that it ? Nothing more ?
I compiled everything with "-flto=jobserve -fuse-linker-plugin"
(that was two years ago), but I don't know whether this is still
necessary.
--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Loading...