-fprofile-generate will instrument the application with profiling code. The application will, while actually running, log certain events that could improve performance if this usage pattern was known at compile time. Branches, possibility for inlining, etc, can all be logged, but I'm not sure in detail how GCC implements this.
After the program exits, it will dump all this data into *.gcda files, which are essentially log data for a test run. After rebuilding the application with -fprofile-use flag, GCC will take the *.gcda log data into account when doing its optimizations, usually increasing the performance significantly. Of course, this depends on many factors.
You definitely need separate output directories for object files. I would recommend naming them "profile" and "release". You might have to copy the *.gcda files that result from the profile run so that GCC finds them in the release build step.
The result will almost certainly be faster. It will probably be larger as well. The -fprofile-use option enables many other optimization steps that are otherwise only enabled by -O3.
g++ -O3 -fprofile-generate [more params here, like -march=native ...] -o executable_name
// run my program's benchmarks, or something to stress its most common path
g++ -O3 -fprofile-use [more params here, like -march=native...] -o executable_name
Basically, you initially compile and link with this extra flag for both compiling and linking: -fprofile-generate (from here).
Then, when you run it, by default it will create .gcda files "next" to your .o files, it seems (hard coded to the full path where they were built).
You can optionally change where it creates these .gcda files with the -fprofile-dir=XXX setting.
Then you re compile and relink using the -fprofile-use parameter, and it compiles it using profile guided goodness.