Compute bandwith-reducing orderings to assemble into
Reported by Jed Brown | April 17th, 2009 @ 05:36 PM
The natural ordering to start with has global dofs essentially sorted as [vertices, edges, faces, regions]. While this is a fine ordering for matrix-free application, it is not good when using an assembled matrix in the preconditioner (mostly we're thinking of SOR, ILU(k) and LU, perhaps as part of a multigrid scheme). A low-bandwidth ordering (different from a fill-reducing ordering) should give better cache performance.
It is natural to compute this ordering while creating the function space, so that subsequent assembly can use the good ordering. Obviously the corresponding scalar function space is sufficient for computing the reordering, but it might be possible to make it cheaper while keeping all interior dofs for each region together. Suppose we compute an ordering by assigning one dof to all the interior nodes of every entity (with a positive number of nodes). For any space of at least quadratic elements, the reordering would be computed by assembling the nonzero pattern for a matrix corresponding to the Q1 basis induced by quadratic elements. For order 5, this corresponds to more than a factor of 10 savings (i.e. the ordering should be very cheap to compute).
No comments found
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
An implementation of the ``dual order hp'' version of the finite element method. This project targets parallel domain-decomposition methods for strongly coupled nonlinear problems with PDE constraints.