Saving SCF results on disk and SCF checkpoints
For longer DFT calculations it is pretty standard to run them on a cluster in advance and to perform postprocessing (band structure calculation, plotting of density, etc.) at a later point and potentially on a different machine.
To support such workflows DFTK offers the two functions save_scfres
and load_scfres
, which allow to save the data structure returned by self_consistent_field
on disk or retrieve it back into memory, respectively. For this purpose DFTK uses the JLD2.jl file format and Julia package.
As JLD2 is an optional dependency of DFTK these three functions are only available once one has both imported DFTK and JLD2 (using DFTK
and using JLD2
).
To illustrate the use of the functions in practice we will compute the total energy of the O₂ molecule at PBE level. To get the triplet ground state we use a collinear spin polarisation (see Collinear spin and magnetic systems for details) and a bit of temperature to ease convergence:
using DFTK
using LinearAlgebra
using JLD2
d = 2.079 # oxygen-oxygen bondlength
a = 9.0 # size of the simulation box
lattice = a * I(3)
O = ElementPsp(:O, load_psp("hgh/pbe/O-q6.hgh"))
atoms = [O, O]
positions = d / 2a * [[0, 0, 1], [0, 0, -1]]
magnetic_moments = [1., 1.]
Ecut = 10 # Far too small to be converged
model = model_DFT(lattice, atoms, positions;
functionals=PBE(),
temperature=0.02, smearing=Smearing.Gaussian(),
magnetic_moments)
basis = PlaneWaveBasis(model; Ecut, kgrid=[1, 1, 1])
scfres = self_consistent_field(basis, tol=1e-2, ρ=guess_density(basis, magnetic_moments))
save_scfres("scfres.jld2", scfres);
n Energy log10(ΔE) log10(Δρ) Magnet Diag Δtime
--- --------------- --------- --------- ------ ---- ------
1 -27.64753524780 -0.13 0.001 7.0 122ms
2 -28.92460390188 0.11 -0.83 0.689 2.0 83.2ms
3 -28.93114309853 -2.18 -1.13 1.194 2.0 101ms
4 -28.93775143701 -2.18 -1.19 1.777 1.5 79.3ms
5 -28.93923445128 -2.83 -1.05 2.000 2.0 83.8ms
6 -28.93958690649 -3.45 -1.88 1.975 1.5 81.2ms
7 -28.93961185800 -4.60 -3.02 1.985 1.0 77.3ms
scfres.energies
Energy breakdown (in Ha):
Kinetic 16.7725986
AtomicLocal -58.4947679
AtomicNonlocal 4.7085582
Ewald -4.8994689
PspCorrection 0.0044178
Hartree 19.3612427
Xc -6.3913071
Entropy -0.0008852
total -28.939611858000
The scfres.jld2
file could now be transferred to a different computer, Where one could fire up a REPL to inspect the results of the above calculation:
using DFTK
using JLD2
loaded = load_scfres("scfres.jld2")
propertynames(loaded)
(:α, :history_Δρ, :converged, :occupation, :occupation_threshold, :algorithm, :basis, :runtime_ns, :n_iter, :n_matvec, :history_Etot, :εF, :energies, :ρ, :timedout, :n_bands_converge, :eigenvalues, :ψ, :ham)
loaded.energies
Energy breakdown (in Ha):
Kinetic 16.7725986
AtomicLocal -58.4947679
AtomicNonlocal 4.7085582
Ewald -4.8994689
PspCorrection 0.0044178
Hartree 19.3612427
Xc -6.3913071
Entropy -0.0008852
total -28.939611858000
Since the loaded data contains exactly the same data as the scfres
returned by the SCF calculation one could use it to plot a band structure, e.g. plot_bandstructure(load_scfres("scfres.jld2"))
directly from the stored data.
Notice that both load_scfres
and save_scfres
work by transferring all data to/from the master process, which performs the IO operations without parallelisation. Since this can become slow, both functions support optional arguments to speed up the processing. An overview:
save_scfres("scfres.jld2", scfres; save_ψ=false)
avoids saving the Bloch wave, which is usually faster and saves storage space.load_scfres("scfres.jld2", basis)
avoids reconstructing the basis from the file, but uses the passed basis instead. This save the time of constructing the basis twice and allows to specify parallelisation options (via the passed basis). Usually this is useful for continuing a calculation on a supercomputer or cluster.
See also the discussion on Input and output formats on JLD2 files.
Checkpointing of SCF calculations
A related feature, which is very useful especially for longer calculations with DFTK is automatic checkpointing, where the state of the SCF is periodically written to disk. The advantage is that in case the calculation errors or gets aborted due to overrunning the walltime limit one does not need to start from scratch, but can continue the calculation from the last checkpoint.
The easiest way to enable checkpointing is to use the kwargs_scf_checkpoints
function, which does two things. (1) It sets up checkpointing using the ScfSaveCheckpoints
callback and (2) if a checkpoint file is detected, the stored density is used to continue the calculation instead of the usual atomic-orbital based guess. In practice this is done by modifying the keyword arguments passed to # self_consistent_field
appropriately, e.g. by using the density or orbitals from the checkpoint file. For example:
checkpointargs = kwargs_scf_checkpoints(basis; ρ=guess_density(basis, magnetic_moments))
scfres = self_consistent_field(basis; tol=1e-2, checkpointargs...);
n Energy log10(ΔE) log10(Δρ) Magnet α Diag Δtime
--- --------------- --------- --------- ------ ---- ---- ------
1 -27.64751491168 -0.13 0.001 0.80 6.5 90.9ms
2 -28.92463825939 0.11 -0.83 0.692 0.80 2.0 107ms
3 -28.93118374485 -2.18 -1.13 1.198 0.80 2.0 106ms
4 -28.93778868973 -2.18 -1.19 1.780 0.80 1.5 87.3ms
5 -28.93920192088 -2.85 -1.02 2.000 0.80 2.0 96.3ms
6 -28.93958274264 -3.42 -1.87 1.975 0.80 1.5 88.6ms
7 -28.93961122014 -4.55 -3.16 1.985 0.80 1.0 83.9ms
Notice that the ρ
argument is now passed to kwargsscfcheckpoints instead. If we run in the same folder the SCF again (here using a tighter tolerance), the calculation just continues.
checkpointargs = kwargs_scf_checkpoints(basis; ρ=guess_density(basis, magnetic_moments))
scfres = self_consistent_field(basis; tol=1e-3, checkpointargs...);
n Energy log10(ΔE) log10(Δρ) Magnet α Diag Δtime
--- --------------- --------- --------- ------ ---- ---- ------
1 -28.93961277370 -3.25 1.986 0.80 10.5 144ms
Since only the density is stored in a checkpoint (and not the Bloch waves), the first step needs a slightly elevated number of diagonalizations. Notice, that reconstructing the checkpointargs
in this second call is important as the checkpointargs
now contain different data, such that the SCF continues from the checkpoint. By default checkpoint is saved in the file dftk_scf_checkpoint.jld2
, which can be changed using the filename
keyword argument of kwargs_scf_checkpoints
. Note that the file is not deleted by DFTK, so it is your responsibility to clean it up. Further note that warnings or errors will arise if you try to use a checkpoint, which is incompatible with your calculation.
We can also inspect the checkpoint file manually using the load_scfres
function and use it manually to continue the calculation:
oldstate = load_scfres("dftk_scf_checkpoint.jld2")
scfres = self_consistent_field(oldstate.basis, ρ=oldstate.ρ, ψ=oldstate.ψ, tol=1e-4);
n Energy log10(ΔE) log10(Δρ) Magnet Diag Δtime
--- --------------- --------- --------- ------ ---- ------
1 -28.93867306393 -2.36 1.985 6.0 89.1ms
2 -28.93948288794 -3.09 -2.75 1.985 1.0 110ms
3 -28.93961088927 -3.89 -2.67 1.985 3.5 110ms
4 -28.93961206783 -5.93 -2.81 1.985 1.0 71.9ms
5 -28.93961272622 -6.18 -2.98 1.985 1.0 72.4ms
6 -28.93961284111 -6.94 -3.04 1.985 1.0 72.6ms
7 -28.93961302113 -6.74 -3.21 1.985 1.0 69.1ms
8 -28.93961308331 -7.21 -3.33 1.985 1.0 74.0ms
9 -28.93961315129 -7.17 -3.74 1.985 1.0 74.4ms
10 -28.93961317153 -7.69 -4.29 1.985 2.0 82.5ms
Some details on what happens under the hood in this mechanism: When using the kwargs_scf_checkpoints
function, the ScfSaveCheckpoints
callback is employed during the SCF, which causes the density to be stored to the JLD2 file in every iteration. When reading the file, the kwargs_scf_checkpoints
transparently patches away the ψ
and ρ
keyword arguments and replaces them by the data obtained from the file. For more details on using callbacks with DFTK's self_consistent_field
function see Monitoring self-consistent field calculations.
(Cleanup files generated by this notebook)
rm("dftk_scf_checkpoint.jld2")
rm("scfres.jld2")