Intel® Math Kernel Library 6.0 Beta for Linux*
Technical User Notes

Contents

Compiler Support
Directory Structure
Linking with the Intel® Math Kernel Library (MKL)
Using MKL Parallelism
Memory Management
Performance
Obtaining Version Information

 
 

Compiler Support

Intel does not support the Intel® Math Kernel Library (MKL) for use with any compilers other than those identified in the release notes. However, other compilers have successfully linked with the libraries.

When using the cblas interface, the header file mkl.h will simplify the programmer's development as it specifies enumerated values as well as prototypes of all the functions. The header determines if the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation.

Directory Structure

MKL separates IA-32 versions of the library and versions for the Intel® Itanium® and Itanium® 2 processors. The IA-32 versions are located in the lib/32 directory and the Itanium and Itanium 2 processor versions are located in the lib/64 directory. Semantically, MKL consists of two parts: LAPACK, and processor specific kernels. The LAPACK library contains LAPACK routines and drivers that were optimized without regard to processor so that it can be used effectively on processors from the Intel® Pentium® to Pentium® 4 processors. Processor specific kernels contain BLAS, FFTs, DFTs, VSL, cblas, and VML routines that were optimized for each specific processor. Also, threading software is supplied as separate libraries, libguide.a and libguide.so, for linking statically and dynamically to MKL.

The information below indicates the library's directory structure.

lib/32 Contains all libraries for 32-bit applications
libmkl_ia32.a Optimized kernels for Intel® Pentium®, Pentium® III, and Pentium® 4 processors
libmkl_lapack.a LAPACK routines and drivers
libguide.a Threading library for static linking
libmkl.so Library dispatcher for dynamic load of processor specific kernel
libmkl_lapack32.so LAPACK routines and drivers, single precision data types
libmkl_lapack64.so LAPACK routines and drivers, double precision data types
libmkl_def.so default kernel (Intel® Pentium®, Pentium® Pro, and Pentium® II processors)
libmkl_p3.so Intel® Pentium® III processor kernel
libmkl_p4.so Pentium 4 processor kernel
libvml.so Library dispatcher for dynamic load of processor specific VML kernels
libmkl_vml_def.so VML part of default kernel (Pentium, Pentium Pro, Pentium II processors)
libmkl_vml_p3.so VML part of Pentium III processor kernel
libmkl_vml_p4.so VML part of Pentium 4 processor kernel
libguide.so Threading library for dynamic linking
 
lib/64 Contains all libraries for Itanium®-based and Itanium® 2-based applications
libmkl_ipf.a Processor kernels for the Intel® Itanium® and Itanium® 2 Processors
libmkl_lapack.a LAPACK routines and drivers
libguide.a Threading library for static linking
libmkl_lapack32.so LAPACK routines and drivers, single precision data types
libmkl_lapack64.so LAPACK routines and drivers, double precision data types
libmkl_itp.so Itanium processor kernel
libmkl_vml_itp.so VML part of Itanium processor kernel
libguide.so Threading library for dynamic linking

Linking with MKL

To use LAPACK and BLAS software you must link three libraries: an interface, LAPACK, and the processor optimized kernel. Some possible variants:

ld myprog.o lib/32/ibmkl_lapack.a lib/32/libmkl_ia32.a
IA-32 static linking, LAPACK library, Pentium III processor kernel.
ld myprog.o lib/32/libmkl_lapack.a lib/32/libmkl_ia32.a
IA-32 static linking, LAPACK library, Pentium 4 processor kernel.
ld myprog.o lib/32/libmkl_.so
Dynamic linking on IA-32 platforms. Shared object dispatcher will dynamically load the appropriate shared object for the system at runtime.
ld myprog.o lib/64/libmkl_lapack.a lib/64/libmkl_ipf.a
Itanium®-based and Itanium® 2-based processor static linking of LAPACK and kernel.

Using MKL Parallelism

MKL is threaded in a number of places: LAPACK (*GETRF, *POTRF, *GBTRF routines), Level 3 BLAS, DFTs, and FFTs. MKL uses OpenMP* threading software.

There are situations in which conflicts can exist that make the use of threads in MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate.

If the user threads the program using OpenMP directives and uses the Intel compilers to compile the program MKL and the user program will both use the same threading library. MKL tries to determine if it is in a parallel region in the program and if it is, it does not spread its operations over multiple threads. But MKL can be aware that it is in a parallel region only if the threaded program and MKL are using the same threading library. If the user program is threaded by some other means, MKL may operate in multithreaded mode and the computations may be corrupted. Here are several cases and our recommendations for the user:

  1. User threads the program using OS threads (pthreads on Linux*, Win32* threads on Windows*). If more than one thread calls MKL and the function being called is threaded it is important that threading in MKL be turned off. Set OMP_NUM_THREADS=1 in the environment.
  2. User threads the program using OpenMP directives and/or pragmas and compiles the program using a compiler other than a compiler from Intel. This is more problematic in that setting OMP_NUM_THREADS in the environment affects both the compiler's threading library and the threading library with MKL. At this time the safe approach is to set OMP_NUM_THREADS=1.
  3. There are multiple programs running on a multiple-cpu system. In cluster applications the parallel program can run separate instances of the program on each processor. But the threading software will see multiple processors on the system even though each processor has a separate process running on it. In this case OMP_NUM_THREADS should be set to 1.

Setting the number of threads: The OpenMP* software responds to the environmental variable OMP_NUM_THREADS. The number of threads can be set in the shell the program is running in. To change the number of threads, in a command shell in which the program is going to run, enter:

set OMP_NUM_THREADS=<number of threads to use>.

Some other shells require the variable and its value to be exported, as in:

export OMP_NUM_THREADS=<number of threads to use>.

If the variable OMP_NUM_THREADS is not set, MKL software will run on the number of threads equal to the number of processors. We recommend always setting OMP_NUM_THREADS.

Memory Management

MKL has memory management software that controls memory buffers for use by MKL functions. When a call is made to certain MKL functions (such as those in the Level 3 BLAS or DFTs), new buffers are allocated if there are no free ones (marked as free) currently available. These buffers are not deallocated until the program ends. If at some point the user program needs to free memory it may do so with a call to MKL_FreeBuffers(). If another call is made to an MKL function that needs a memory buffer then the memory manager will again allocate the buffers.

This memory management software is turned on by default. To disable it, set the environment variable MKL_DISABLE_FAST_MM to any value, which will cause memory to be allocated and freed from call to call. Disabling this feature will negatively impact performance of routines such as the level 3 BLAS, especially for small problem sizes.

Performance

To obtain the best performance with MKL, make sure the following conditions are fulfilled: arrays must be aligned on a 16-byte boundary and the leading dimension values (n*element_size) of two-dimensional arrays must be divisible by 16. There are additional conditions for the FFT functions. The addresses of first elements of arrays and the leading dimension values (n*element_size) of two-dimensional arrays should be divisible by cache line size (32 byte for Pentium III processor and 64 byte for Pentium 4 processor). Furthermore, for the C-style FFTs on the Pentium 4 processor the distance L between arrays that represent real and imaginary parts should not satisfy the following inequality:

k*2**16 <= L < k*2**16+64

These conditions are needed due to the use of Streaming SIMD Extensions (SSE).

For the C-style FFT on the Itanium processor it is enough if the distance L between arrays that represent real and imaginary parts is not divisible by 64. The best case is if L=k*64 + 16.

Obtaining Version Information

MKL provides a facility by which one can obtain information about the library (e.g., the version number). Two methods are provided for extracting this information. One, you may extract a version string using the function MKLGetVersionString. Two, you can use the MKLGetVersion function to obtain an MKLVersion structure that contains the version information. Example programs for extracting this information are provided in the mkl/examples/versionquery directory. Makefiles are also provided to automatically build the examples and output summary files containing the version information.

Intel, the Intel logo, Pentium, Xeon and Itanium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.

Copyright(C) 2000-2002, Intel Corporation, All Rights Reserved.