Computational Fluid Dynamics with Embedded Cut Cells on Graphics Hardware
The advent of general purpose computing on graphics cards has led to significant software speedup in many fields. Designing code for GPUs, however, requires careful consideration of the underlying hardware. This thesis explores the implementation of fluid dynamics simulations featuring embedded cut cells using the CUDA programming platform. We demonstrate efficient generation and handling of geometric surface data in rectilinear computational grids. This is added to a split Euler solver to define piecewise linear cut cells describing solid surfaces in fluid flows. To reduce the memory footprint of embedded boundaries, we present a system of compressed data structures. The software is extended to run on multiple graphics cards and shows good scaling.
Simulating embedded boundaries requires a description of object surfaces. We implement a fast and robust narrow band signed distance field generator for graphics cards based on the Characteristic/Scan Conversion algorithm for stereolithography files. The thesis presents an augmented approach to handle commonly occurring complex configurations and we show that the method is correct for all closed surfaces. We discuss efficient feature construction and work scheduling and demonstrate high-speed distance generation for complex geometries.
At the core of our simulation implementation is a split Euler solver for high-speed flow. We present a one-dimensional method that achieves coalesced memory access and uses shared memory caching to best harness the potential of GPU hardware. Multidimensional simulations use a framework of data transposes to align data with sweep dimensions to maintain optimal memory access. Analysis of the solver shows that compute resources are used efficiently.
The solver is extended to include cut cells describing solid boundaries in the domain. We present a compression and mapping method to reduce the memory footprint of the surface information. The cut cell solver is validated with different flow regimes and we simulate shock wave interaction with complex geometries to demonstrate the stability of the implementation.
We conclude with multi-card parallelisation and analyse existing literature on domain segmentation and GPU communication. We present a system of domain splitting and message passing with overlapping compute and communication streams. A comparison of naïve and GPU-aware Open MPI shows the benefits of using CUDA specific library calls. The complete software pipeline demonstrates good scaling for up to thirty-two cards on a GPU cluster.