:::::::::::::: gwdfpt.note.6.11.08 :::::::::::::: I don't think we can achieve linear scaling as suggested in Baroni&Giannozzi92. In their case the kinetic energy was calculated by finite differences, so it's strictly localized in R-space. Using psinc functions instead, the kinetic energy is still localized, but it non-vanishing on further neighbors. Mostofi et al. mention that the application of the kinetic energy is still a global operation to be performed with FFTs (http://www.tcm.phy.cam.ac.uk/onetep/intro/node6.html). In other words, if we want to keep the G-space accuracy, we NEED to use the full representation of the kinetic energy WITHOUT truncation. This means that it is useless to compute the matrix elements of the kinetic energy in the psinc basis, since they don't decay so fast. For Haydock, we still need a localized object to start with, and the psinc choice seems still to be the most convenient to switch to/from the G-space. :::::::::::::: gwdfpt.note.6.12.08 :::::::::::::: In order to have a complete mapping of the psinc basis into the G-space, we need to keep the entire G-bx, and not to trunctate it within the kinetic energy cutoff !! All the nice properties of the psinc's are based on this full mapping, if we truncate the orthogonality and all that vanish. I think we need to define a grid spacing such that the corresponding G-box is CONTAINED within the sphere of the G-vectors smaller than the cutoff. In practice the calculation HAS TO BE RUN with a cutoff smaller than for the wavefunctions. I think this is still fine, since calculations of epsilon are usually done with 5 or 10 Ry. :::::::::::::: gwdfpt.note.6.17.08 :::::::::::::: Ok I think I am able to calculate the eigenvalues of H with the recursion method for the special case k=0 and psinc(1) (diagonal element, hermitian Hamiltonian). The eigenvalues do coincide with those of the CG method to within 1 meV after about 150 iterations. I obtained the eigenvalues by diagonalizing the tridiagonal matrix obtained from the recursion coefficients. Some non0degenerate eigenvalues appear multiple times, I don't know at this stage whether this is a problem for the computation of the Green's function or not. i would say no, since in the recursion methods there is no assumption about the multiplicity of the calculated spectrum. The Green's function has poles at SOME OF the eigenvalues, the strenght is nonzero when the initial state has components on that eigenstate. Now I need to try the nondiagonal elements, and the nonzero k vectors. :::::::::::::: gwdfpt.note.6.18.08 :::::::::::::: I tested the Green's function G(r,r',k,w) against the sum-over-state expression. The diagonal element on psinc(1) and k=0 gives EXACTLY the same result as for the sum over states (I compared the imaginary parts to avoid calculating many empty states). This is great, now I need to do the off-diagonal matrix elements and the case k/=0. :::::::::::::: gwdfpt.note.6.19.08 :::::::::::::: I tested the Green's function G(r,r',k,w) against the sum-over-state expression. OK, the off-diagonal elements work perfectly. Also the case k/=0 works perfectly. I need about 100 iterations to have the Green's function converged with delta = 0.1 eV. It looks like the Green's function converges from the bottom, i.e. the more iterations we do, the larger the energy range where it matches the full calculation based on the sum-over-states. :::::::::::::: gwdfpt.note.7.30.08 :::::::::::::: The screened coulomb interaction seems to work now. I tried three things as a sanity check: (1) the inverse dielectric matrix eps^{-1} as defined by W/v_c using the SCF linear response. (2) the dielectric matrix eps = delta - v_c * chi_0 using the non-SCF linear response. (3) The dielectric matrix eps = delta - v_c * chi_0 using the expansion over empty states. (3) and (2) give identical results (see results/comparison_eps.eps for G=G'=0 and varying q) and agree well with Walter&Cohen. I obtain 11.4 for q=0, W&C obtain 11.3. The comparison for eps^{-1} is more delicate. I have been struggling on this for a while. The main point is that, if you take eps^{-1} and invert it for q->0, you don't get eps... After some reading, it appears that this is related to the non-analiticity of the dielectric matrix as q approaches 0. See Baldereschi&Tosatti as well as Hybertsen&Louie for a detailed discussion. In particular, looking at B&T we find that Table II - 10 point local gives eps = 12.1, while Table VIII - Local gives eps-1 = 0.092 (both for q close to 0). Now: 1/0.092 = 10.9, in disagreement with the value 12.1. However, Table VII of B&T gives 0.083 and 1/0.083 = 12.04, in better agreement with 12.1. In practice there is no simple way to relate eps^{-1}(q->0) to eps(q->0) [it looks like one has to set the wings to zero but I did not try that]. The good news are that also in my case I obtain eps^{-1} = 0.098 with 1/0.098 = 10.2 while eps = 11.4. In particular, also the other elements of my IDM are in agreement with the column "Local" in Table VIII of B&T. Note also in Tables VI and VII of H&L that the RPA results for eps^{-1}, 0.082, (1/0.082 = 12.2) does not correspond to the RPA result for eps, 13.6. So everything is fine and we can proceed with the self-energy. I need to check whether we can run with \omega /= 0 and with \omega + i*\delta... :::::::::::::: gwdfpt.note.21.12.08 :::::::::::::: I am putting together G and W. The debug tests on G are successful. The debug tests on W now seem to work. Since I moved to a uniform and Gamma-centered grid for the purpose of folding, I was obtaining eps0 values around 15.7 for 555-000. After some tests I realized that the code seems correct, but my previous tests giving 11.65 at 555-111 corresponded to 19 inequivalent points, while 555-000 corresponds to 10 inequivalent points. By simply bringing 555-000 to 10,10,10-000 I obtain 11.69 and the frequency dependence is ok. For the q=0,G=0 case I perform a separate calculation in coulomb_q0G0.f90 where the initial {k} grid is kept, and the {k+q} grid is generated for a very small xq0, and the eigenstates calculated explicitly (no folding). Everything seems ok to this point. The question now is whether I NEED q=0 in the sum, or I can skip it in the integration for Sigma. In Steve's code I think that q=0 is included explicitly but I don't remember why... :::::::::::::: gwdfpt.note.04.01.09 :::::::::::::: I was able to optimize substantially solve_linter_dyn by using the following scheme: 1) run a full bcgsolve minimization for the first dvscf iteration, and 2) run a "few-steps" minimization (ideally 1 step, but I use 5 for safety) of dpsi in bcgsolve and update the dvscf in solve_linter. The convergence takes place with the same number of dvscf iterations (sometimes smaller). The rationale for this is that the bcgsolve part runs over k-points and is way more expensive than the update of the potential. This idea is similar to the direct minimization of the energy functional used in Paratec. For the case q=0,G=0 I need to perform the original minimization (separate relaxations of dpsi and dvscf), the combined minimization does not seem to converge. Now I think it's time to seriously optimize bcgsolve, which is the #1 bottleneck of the entire story. :::::::::::::: gwdfpt.note.05.01.09 :::::::::::::: By doing some profiling of bcgsolve_all it turns out that the most expensive part (by a proportion 9:1) is the application of the Hamiltonian (A*x - ch_psi_all_eta), and in particular the FFT forth and back to compute V*psi. There is not much to optimize here, so I re-considered the possibility of including some preconditioning. In order to add the preconditioner I simply modified the linear system from Ax = b to (MA)x = Mb with M given by h_diag and replaced everywhere in the procedure A by MA and b by Mb. The new matrix MA is still complex, and does not need to be symmetric or Hermitian in the complex-BCG algorithm by Jacobs - so no problem here. In practice the preconditioner is applied in only three locations. A comparison of the convergence with and without preconditioning is in the file results/preconditioning.eps. It is clear that preconditioning gives a reduction of the number of iterations required for a given accuracy of a factor 3 to 4. The preconditioning 1/max(1,x) of Giannozzi (?, PH) and the one by Payne-Teter-Allan (PRB 1988) give similar results. WARNING: it seems that preconditioning does not work for several k-points. :::::::::::::: gwdfpt.note.17.01.09 :::::::::::::: Trying to parallelize on hbar: I had to declare plan in fftw.f90 as integer*8 otherwise segfault probelms occurred. I realized that some q vecs could not be converged in coulomb for the versions devel-7 and devel-8. This has to do with the combined minimization. By playing a bit with the numbers, it seems that the whole process is quite sensitive to the choice of parameters. Now I stick to maxscf = 1 maxbcgsolve = 8 alpha_mix = 0.3 I *disabled* the preconditioning. :::::::::::::: gwdfpt.note.19.01.09 :::::::::::::: I was able to parallelize over the G-perturbations in hbar and slater. The results look identical (apart some tiny differences in the minimization steps). Now the Makefile can deal with the serial (x61) and the parallel (hbar and slater) cases. Hbar seems hugely faster than slater (at 8 procs) - what is that? Now I need to parallelize for the Green's function (currently done only by proc 1) and see wheher I can speedup solve_linter by using a ngms cutoff on dpsi. This is reasonable in principle and could lead to an extremely efficient code. I need to work a bit on that.