Subroutines called deep within a loop nest could be called thousands of times. Even if the time spent in each routine per call is small, the total effect might be substantial. Also, subprogram calls inhibit optimization of the loop that contains them because the compiler cannot make assumptions about the state of registers over the call.
Automatic inlining of subprogram calls (using -inline=x,y,..z, or -O4) is one way to let the compiler replace the actual call with the subprogram itself (pulling the subprogram into the loop). The subprogram source code for the routines that are to be inlined must be found in the same file as the calling routine.
There are other ways to eliminate subprogram calls:
Use statement functions. If the external function being called is a simple math function, it might be possible to rewrite the function as a statement function or set of statement functions. Statement functions are compiled in-line and can be optimized.
Push the loop into the subprogram. That is, rewrite the subprogram so that it can be called fewer times (outside the loop) and operate on a vector or array of values per call.