Using DFTK on GPUs
In this example we will look how DFTK can be used on Graphics Processing Units. In its current state runs based on Nvidia GPUs using the CUDA.jl Julia package are better supported and there are considerably less rough edges.
GPU support is still a relatively new feature in DFTK. While basic SCF computations and e.g. forces are supported, this is not yet the case for all parts of the code. In most cases there is no intrinsic limitation and typically it only takes minor code modification to make it work on GPUs. If you require GPU support in one of our routines, where this is not yet supported, feel free to open an issue on github or otherwise get in touch.
using AtomsBuilder
using DFTK
using PseudoPotentialData
Model setup. First step is to setup a Model
in DFTK. This proceeds exactly as in the standard CPU case (see also our Tutorial).
silicon = bulk(:Si)
model = model_DFT(silicon;
functionals=PBE(),
pseudopotentials=PseudoFamily("dojo.nc.sr.pbe.v0_4_1.standard.upf"))
Next is the selection of the computational architecture. This effectively makes the choice, whether the computation will be run on the CPU or on a GPU.
Nvidia GPUs. Supported via CUDA.jl. If you install the CUDA package, all required Nvidia cuda libraries will be automatically downloaded. So literally, the only thing you have to do is:
using CUDA
architecture = DFTK.GPU(CuArray)
DFTK.GPU{CUDA.CuArray}()
AMD GPUs. Supported via AMDGPU.jl. Here you need to install ROCm manually. With that in place you can then select:
using AMDGPU
architecture = DFTK.GPU(ROCArray)
DFTK.GPU{AMDGPU.ROCArray}()
Portable architecture selection. To make sure this script runs on the github CI (where we don't have GPUs available) we check for the availability of GPUs before selecting an architecture:
architecture = has_cuda() ? DFTK.GPU(CuArray) : DFTK.CPU()
DFTK.CPU()
Basis and SCF. Based on the architecture
we construct a PlaneWaveBasis
object as usual:
basis = PlaneWaveBasis(model; Ecut=30, kgrid=(5, 5, 5), architecture)
... and run the SCF and some post-processing:
scfres = self_consistent_field(basis; tol=1e-6)
compute_forces(scfres)
2-element Vector{StaticArraysCore.SVector{3, Float64}}:
[-1.543577034046708e-14, -1.3100040856417465e-14, -1.7936165441728804e-14]
[1.5336087805413073e-14, 1.2525868139611569e-14, 1.8568244227832982e-14]
Our current (May 2025) benchmarks show DFTK to have reasonable performance on Nvidia / CUDA GPUs with up to a 100-fold speed-up over single-threaded CPU execution. However, support on AMD GPUs has been less benchmarked and there are likely rough edges. Since GPU support in DFTK is relatively new we appreciate any experience reports or bug reports.