I have analyzed rather carefully the scaling properties of the code and my conclusions are that the use of a smaller cutoff for deltapsi does not help. This will be summarized in my notes and in the paper. Here I just report the major conclusions: * The most expensive part of this story is the FFT (forth and back) to calculate V*deltapsi in h_psi_c called by chpsi_all_eta (within the CG minimizer). The computational cost of the FFTW is 39/4*N*log2(N) where N is the number of elements of the transform. This number is equal to the number of real space points (in the example 12*12*12=1728 - typically we want 0.1 A spacing, corresponding to 24*24*24=13824 in diamond at 60Ry). We can try to use "pruned FFTWs" (http://www.fftw.org/pruned.html), where the input (output) array is sparse and the other one is dense. For instance we could start from deltapsi on a small cutoff in G-space Ns, and FFTW to real space on the large N cutoff. Howevere the computational cost would be about 39/4*N*log2(Ns)= 39/4*N*log2(Ns/N*N) = 39/4*N*[log2(N) - log2(N/Ns)]. Therefore the gain w.r.t. a full FFT on the large cutoff would be [log2(N) - log2(N/Ns)]/log2(N) = 1 - log2(N/Ns)/log2(N). For teh example I am using (Ns=51,N=411) this gain is 0.65 (i.e. 65% of the time employed in a full FFT). However, with realistic cutoffs, Ns=137,N=13824: gain = 0.52. For very large systems N/Ns is a fixed constant, and log2(N) increases, therefore the gain goes to 1 and the two implementations become equivalent. The previous estimates become even worse if we consider that in the V*deltapsi calculation we need also the FFT backwards to G-space. Since we want V*deltapsi on the large cutoff, we cannot prune this second transform. Therefore we can gain a tiny bit on the first transform and nothing on the second one. The previous two example gains would become then (1+0.65)/2 = 0.83, and 0.76. * Anothe possibility is to avoid the FFT from the outset and use the convolution between V and psi. This calculation requires Ns * N multiplications, to be compared with the FFT case 2 * 39/4 * N * log2N (factor 2 because we have 2 FFTs). Therefore the gain in this case would be Ns/(2 * 39/4 * log2N). In the examples above: 51/411: 0.24 137/13824: 0.51 If we go to a much larger system (supercell with large N), then the FFT approach is ways more convenient.