ADVISOR DETAILS

RECENT BLOG POSTS

Avoiding Potential Problems – Memory Limits on the Intel(r) Xeon Phi(tm) Coprocessor

EXCEEDING MEMORY SPACE

As with any Linux* system, the operating system on the coprocessor will sometimes allow you to allocate more memory space than is physically available. On many systems this is taken care of by swapping some pages of memory out to disk. On those systems, if the amount of physical memory plus the amount of swap space is exceeded, the operating system will begin killing off jobs. 

The situation on the Intel Xeon Phi coprocessor is complicated by the lack of directly attached disks. So:

  • there is, currently, at most 8GB of physical memory available (I say this, having come from a generation where my first personal computer had 4KB of memory)
  • by default, some of the memory is used for the coprocessor’s file system
  • by default, there is no swap space available for the coprocessor

LIMITING MEMORY USED FOR RAM DISK

The Intel Xeon Phi coprocessor does not have any directly accessible disk drives. Because of this, the default root file system for the coprocessor is stored on a RAM disk. This not only affects file storage but reduces the memory available to running programs.

The size of the root file system is kept small by using BusyBox to replace many of the common Linux commands, such as sh, cp and ls and by limiting the number of shared libraries that are copied to the root file system. Even given these reductions, about 10 MBytes of memory is still consumed for file storage.

Most user programs, whether run in offload or native mode on the coprocessor will require that additional libraries be copied to the root file system, consuming more memory space and further limiting the space available for user programs. Additionally, programs running in native mode will require access to data files and temporary files, consuming additional space.

The df command will tell you how must space you are using for files:

[knightscorner5-mic0]$ df -h
Filesystem                Size      Used Available Use% Mounted on
none                      7.6G         0      7.6G   0% /dev
none                     12.9G     77.1M     12.8G   1% /
none                      7.6G         0      7.6G   0% /dev
none                      7.6G         0      7.6G   0% /dev/shm

The amount of memory consumed for file storage can be reduced by using network file storage. Besides NFS, which comes as part of the standard Linux releases, a number of different network file systems have been successfully used. For examples of using Lustre and Panasas, see Configuring Intel® Xeon PhiTM Coprocessors Inside a Cluster under at http://software.intel.com/en-us/mic-developer under the Case Studies tab. 

Good candidates for networked file systems are:

  • home directories
  • dedicated data storage
  • shared libraries associated with compilers and tools such as MPI

It is also possible to use an NFS mounted file system as the root partition. This requires you to set aside a directory on the host for each coprocessor card and populate those directories with the contents of the root directory for each individual coprocessor. 

When an NFS mounted root is used, the coprocessor will first boot with a very minimal initial root in RAM. This initial root mounts the NFS file system and uses the Linux switch_root command to make this file system the new root. The initial RAM device is then removed, freeing up the memory for use by programs.

A disadvantage to using a networked file system is the increased latency involved in reading from a file. For large data files, use of file systems optimized for large transfers, such as Lustre, can help. For a networked root file system, which requires NFS, keeping the physical disk space close to the coprocessor – in other words, on the host – can help cut down latency. However, any file which will be read from or written to frequently may be better off remaining in a RAM disk on the coprocessor.

Directions for setting up and using networked file systems with the Intel Xeon Phi coprocessor can be found in the Intel(r) Xeon Phi(tm) Coprocessor Intel(r) Manycore Platform Software Stack (Intel(r) MPSS) Boot Configuration Guide which comes with each MPSS release.

ADDING FILE SWAP SPACE

Because the coprocessor has no directly attached disks, there is no default disk space for swapping out pages of memory from a running process. It is possible, however to add networked swap space by using a virtio block device on the host.

After creating the swap space on the host, the steps found in the MPSS readme file can be used to configure the coprocessor for swapping. The commands below show an example of setting up 4GB of swap space:

sudo service mpss start  
dd bs=1G if=/dev/zero of=/tmp/VirtblkSwap count=4  
sudo bash  
echo /tmp/VirtblkSwap >/sys/devices/virtual/mic/mic0/virtblk_file  
exit  
ssh root@mic0 modprobe mic_virtblk  
ssh root@mic0 mkswap /dev/vda  
ssh root@mic0 swapon /dev/vda  
ssh root@mic0 cat /proc/swaps   

Although adding swap space will increase the size of the programs the coprocessor can run, it will also degrade perform each time the swap space is used. It should only be used when necessary. 

When deciding whether or not to set up swap space, consider the data access patterns a program will be using. If it is possible to partition up the work so that the amount of data required in memory at any one time is below the maximum available, this is preferable to using swap space. Also, if memory is being oversubscribed because of the number of processes running on the coprocessor at one time, it may be advantageous to limit the number of processes running on the coprocessor at one time rather than using swap space.

Finally, because of the mapping required between addresses on the host and coprocessor, swap space cannot be used by jobs using the offload model of programming.

Read more >

Avoiding Potential Problems – Intel(r) Xeon Phi(tm) coprocessor predefined macros and Fortran

Consider the following code which uses the __MIC__ macro:

program f_vs_F
  implicit none
  parameter ISIZE = 100000
  real(8), dimension(ISIZE) :: a
  !dir$ attributes offload:mic::run
  !dir$ offload target(mic)
  call run(a,ISIZE)
end program f_vs_F

!dir$ attributes offload:mic::run
subroutine run(a,isize)
  use omp_lib
  implicit none
  integer isize,i
  real(8) :: a(isize)
  !dir$ if defined (__MIC__)
    PRINT *,"Using offload compiler :  Hello from the coprocessor"
  !dir$ endif
  !$omp parallel do  private(i)
  do i = 1, isize
    a(i) = i
  enddo
  !$omp end parallel do
end subroutine run

This code offloads subroutine run if possible. If the offload succeeds, the subroutine prints out a message to let you know the code ran on the coprocessor.

If we put this code in a file named ifdef_test.f90, the code compiles with the ifort command without error. However, if we put this code in a file named ifdef_test.F90, we get the following errors:

ifdef_test.F90(36): remark #5082: *MIC* Directive ignored - Syntax error, found INTEGER_CONSTANT '1' when expecting one of: <IDENTIFIER> <CHAR_CON_KIND_PARAM> <CHAR_NAM_KIND_PARAM> <CHARACTER_CONSTANT>
!dir$ if defined (1)
------------------^
ifdef_test.F90(38): remark #5169: *MIC* Misplaced conditional compilation directive or nesting too deep
!dir$ endif
------^

What is going on here?

When a file ending in F90 is compiled using the ifort command, ifort calls the fpp preprocessor before compiling the code. Other things which will cause the preprocessor to be called are file names ending in F, FOR FTN, FPP or fpp, adding -fpp on the ifort command line or calling fpp directly.

The preprocessor modifies the source code before it is handed to the compiler. But wait, I can hear you saying – there are no preprocessor directives in this source code. And indeed there are no lines starting with a #. There is, however, a macro, __MIC__. 

When the code above is placed in a file ending in F90 and compiled, several things occur: 

  1. The code is passed to the fpp preprocessor with __MIC__ undefined. Because __MIC__ is not defined and is not used in a preprocessor directive, the preprocessor does not detect that it is being used as a macro. Hence it does not change the line “!dir$ if defined (__MIC__)”.
  2. The compiler compiles the code from the preprocessor, with the __MIC__ still undefined. It processes the directive “!dir$ if defined (__MIC__)” and selects the code for __MIC__ undefined. It also processes the directive “!dir$ offload target(mic)” and notes that the code must next be compiled for execution on the coprocessor.
  3. The code is passed again to the fpp preprocessor, this time with __MIC__ defined to be 1. The preprocessor notes the existence of the macro and attempts to process it by changing the line “!dir$ if defined (__MIC__)” into “!dir$ if defined (1)”
  4. The compiler compiles this modified code, this time with __MIC__ defined. However, the code no longer contains the line “!dir$ if defined (__MIC__)”. Instead it contains the line “!dir$ if defined (1)”, which is not a valid directive and results in the error messages shown above.

The problem is the result of trying to provide the user with the best of both worlds – the power of Fortran and the flexibility of C style directives. If the __MIC__ macro were only defined during the compile phase and not during the preprocessing phase, the problem would not exist. However, doing that would mean that preprocessor directives using __MIC__ would not behave as expected.

The simplest change for the code shown above would be to rename the file from ifdef_test.F90 to ifdef_test.f90. But now consider the following code:

#define ISIZE 100000

program f_vs_F
  implicit none
  real(8), dimension(ISIZE) :: a
  !dir$ attributes offload:mic::run
  !dir$ offload target(mic)
  call run(a)
end program f_vs_F

!dir$ attributes offload:mic::run
subroutine run(a)
  use omp_lib
  implicit none
  integer i
  real(8) :: a(isize)
  !dir$ if defined (__MIC__)
    PRINT *,"Using offload compiler :  Hello from the coprocessor"
  !dir$ endif
  !$omp parallel do  private(i)
  do i = 1, ISIZE
    a(i) = i
  enddo
  !$omp end parallel do
end subroutine run

This code relies on a preprocessor definition to set the size of the array. This code must be passed through the preprocessor. Now what do you do?

In this case, the solution is to remove the Fortran style directive and replace it with a C style preprocessor directive:

#ifdef __MIC__
    PRINT *,"Using offload compiler :  Hello from the coprocessor"
#endif

So which should you use? “!dir$ if defined (__MIC__)” or “#ifdef __MIC__”? It depends on what other restrictions you have placed on the code. If your code already uses preprocessor directives or you think you might in the future, or if your naming conventions will cause the preprocessor to be invoked, the choice is simple – use “#ifdef __MIC__”. If your coding standards prohibit the use of non-Fortran style statements in your code, your choice is again simple – use “!dir$ if defined (__MIC__). In all other cases — do what you think best.

Read more >