libmvec - vector versions of some elementary mathematical functions

cc [flag... ]file... -lmvec [library... ] or cc [flag... ]file... -lmvec_mt [library... ] void vatan_(int *n, double * restrictx, int *stridex, dou- ble * restricty, int *stridey); void vatan2_(int *n, double * restricty, int *stridey, dou- ble * restrictx, int *stridex, double * restrictz, int *stridez); void vcos_(int *n, double * restrictx, int *stridex, double * restricty, int *stridey); void vexp_(int *n, double * restrictx, int *stridex, double * restricty, int *stridey); void vhypot_(int *n, double * restrictx, int *stridex, dou- ble * restricty, int *stridey, double * restrictz, int *stridez); void vlog_(int *n, double * restrictx, int *stridex, double * restricty, int *stridey); void vpow_(int *n, double * restrictx, int *stridex, double * restricty, int *stridey, double * restrictz, int *stri-dez); void vrhypot_(int *n, double * restrictx, int *stridex, double * restricty, int *stridey, double * restrictz, int *stridez); void vrsqrt_(int *n, double * restrictx, int *stridex, dou- ble * restricty, int *stridey); void vsin_(int *n, double * restrictx, int *stridex, double * restricty, int *stridey); void vsincos_(int *n, double * restrictx, int *stridex, double * restricts, int *strides, double * restrictc, int *stridec); void vsqrt_(int *n, double * restrictx, int *stridex, dou- ble * restricty, int *stridey); void vatanf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey); void vatan2f_(int *n, float * restricty, int *stridey, float * restrictx, int *stridex, float * restrictz, int *stridez); void vcosf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey); void vexpf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey); void vhypotf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey, float * restrictz, int *stridez); void vlogf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey); void vpowf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey, float * restrictz, int *stri-dez); void vrhypotf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey, float * restrictz, int *stridez); void vrsqrtf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey); void vsinf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey); void vsincosf_(int *n, float * restrictx, int *stridex, float * restricts, int *strides, float * restrictc, int *stridec); void vsqrtf_(int *n, float * restrictx, int *stridex, float * restricty, int *stridey);

These routines evaluate common elementary functions for an entire vector of values at once. The first parameter indi- cates the number of values to compute. Subsequent parame- ters specify the argument and result vectors. Each vector is described by a pointer to the first element and a stride, which is the increment between successive elements. For brevity in the descriptions that follow, *stridexwill be denotedsxand similarly for *stridey, *stridez, etc. vatan_(n,x,stridex,y,stridey) computesy[i*sy] = atan(x[i*sx]) for eachi= 0, 1, ..., *n- 1. Analogous descriptions apply to vcos_, vexp_, vlog_, vsin_ and vsqrt_. vatan2_(n,y,stridey,x,stridex,z,stridez) computesz[i*sz] = atan(y[i*sy] /x[i*sx]) using the signs of both arguments to determine the quadrant in which the resulting angle lies. vhypot_(n,x,stridex,y,stridey,z,stridez) computesz[i*sz] = sqrt(x[i*sx]**2 +y[i*sy]**2). vrhypot_(n,x,stridex,y,stridey,z,stridez) computesz[i*sz] = 1 / sqrt(x[i*sx]**2 +y[i*sy]**2). vpow_(n,x,stridex,y,stridey,z,stridez) computesz[i*sz] =x[i*sx]**y[i*sy], i.e.,x[i*sx] raised to the powery[i*sy]. vrsqrt_(n,x,stridex,y,stridey) computesy[i*sy] = 1 / sqrt(x[i*sx]) for eachi= 0, 1, ..., *n- 1. vsincos_(n,x,stridex,s,strides,c,stridec) simultane- ously computess[i*ss] = sin(x[i*sx]) andc[i*sc] = cos(x[i*sx]). The functions vatanf_, vatan2f_, vcosf_, vexpf_, vhypotf_, vlogf_, vpowf_, vrhypotf_, vrsqrtf_, vsinf_, vsincosf_ and vsqrtf_ are single precision versions of the double preci- sion functions listed above. For each function, the element count *nmust be greater than zero. The strides for the argument and result arrays may be arbitrary integers, but the arrays themselves must not be the same or overlap. For example, the results of the code fragment double x[100]; int n = 100, s = 1; /*...*/ vexp_(&n, x, &s, x, &s); are undefined. Note that a zero stride may be specified, which effectively collapses the entire vector into a single element. Thus, for example, one can use vpow_ to compute values ofx[i]**yfor a fixed value ofyby specifying *stridey= 0. Finally, note that a stride may be negative, but the corresponding pointer must still point to the first element of the vector to be used; if the stride is negative, this will be the highest-addressed element in memory. (This convention differs from the Level 1 BLAS, in which array parameters always refer to the lowest-addressed element in memory even when negative increments are used.)

Seeattributes(5)for descriptions of the following attri- butes: _______________________________________ | ATTRIBUTE TYPE | ATTRIBUTE VALUE| |____________________|_________________|| Availability | SPROlang | | Interface Stability| Evolving | | MT-Level | MT-Safe | |____________________|_________________|

atan(3M),atan2(3M),clibmvec(3M),cos(3M),exp(3M),hypot(3M),log(3M),pow(3M),sin(3M),trig_sun(3M), attri-butes(5)

The vector functions treat exceptional cases in the spirit of IEEE 754, producing essentially the same results as the corresponding scalar functions in Fortran and in C when the -xlibmieee option is used. (See the manual pages for the scalar functions for their behavior on exceptional cases.) Some vector functions may raise the inexact exception even if all elements of the argument array are such that the numerical results are exact.

The vector functions listed above are provided with the Sun Studio compilers in each of two libraries, libmvec.a and libmvec_mt.a. The latter contains parallelized versions of the functions that work in conjunction with the automatic parallelization provided by the compiler. To use libmvec_mt.a, you must link with one of the parallelization options -xparallel, -xexplicitpar, or -xautopar. The vector functions are also available in Solaris 10 in a shared library, libmvec.so. Using the -lmvec compiler flag, however, will link with the static archives provided with Sun Studio. To link with libmvec.so, specify the library by its full pathname, i.e., /usr/lib/libmvec.so for 32-bit pro- grams or /usr/lib/64/libmvec.so for 64-bit programs. It is the user's responsibility to ensure that the default round-to-nearest mode is in effect when functions in this library are called. The vector functions assume that the default round-to-nearest mode is in effect. If the calling program changes the rounding mode to a non-default mode, it must re-establish round-to-nearest mode before calling one of the vector functions. The result of calling a vector function with a non-default rounding mode is undefined. The vector functions may be called indirectly through the use of -xvector=yes or -fast. The -fast compile option asserts that the program only uses default rounding, and so the rounding should never be changed. The -xvector=yes com- pile option allows the compiler to replace calls to scalar functions in the standard math library by calls to the corresponding vector functions. Therefore, when using this option, the program must ensure that round-to-nearest mode is in effect whenever any math function is called. In vector and parallel execution, elements need not be pro- cessed in the natural orderx[0],x[1*sx],x[2*sx], etc. Therefore, exceptions that occur may not be raised in order. For example, if exp(x[1]) would raise the overflow excep- tion, and exp(x[9]) would raise the invalid operation excep- tion, there is no guarantee that a call to vexp_ will indi- cate the overflow first. The vector functions vsqrt_ and vsqrtf_, unlike their scalar counterpart, may not produce correctly rounded results. How- ever, the error in each result is less than one unit in the last place. When a program is linked with one of -xarch=v8plus, v8plusa, v8plusb, v9, v9a, or v9b, the vexpf_ function delivers zero rather than a subnormal result for all sufficiently large negative arguments.