Fast computation with efficient object data distribution for large-scale hologram generation on a multi-GPU cluster

T Baba, S Watanabe, BJ Jackin, K Ootsu… - … on Information and …, 2019 - search.ieice.org
T Baba, S Watanabe, BJ Jackin, K Ootsu, T Ohkawa, T Yokota, Y Hayasaki, T Yatagai
IEICE TRANSACTIONS on Information and Systems, 2019search.ieice.org
The 3D holographic display has long been expected as a future human interface as it does
not require users to wear special devices. However, its heavy computation requirement
prevents the realization of such displays. A recent study says that objects and holograms
with several giga-pixels should be processed in real time for the realization of high
resolution and wide view angle. To this problem, first, we have adapted a conventional FFT
algorithm to a GPU cluster environment in order to avoid heavy inter-node communications …
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.
search.ieice.org
Showing the best result for this search. See all results