Changeset [b421aafb8359d2cf7f12ebe927cb3df66cb334f9] by Jed Brown

May 10th, 2009 @ 11:36 PM

Reworking of TensorMult_Hex to provide unrolled implementations

Implemented SSE version for P=Q=4, D=1, gives 3x speed improvement for TensorMult_Hex. This is probably the most important size, but we also need for D=3, and it would be nice to have for P=Q=6 as well (higher than order 5 is not practical except for simple domains). Using an odd number of points, i.e. P and/or Q odd, is not ideal because it is bad for alignment. This stuff only works with GCC and SSE3, if someone needs it on a different architecture, tell me and I'll make it portable (but you won't get the speed without SSE3 and a compiler that is good with intrinsics). http://github.com/jedbrown/dohp/...

Committed by Jed Brown

  • M CMakeLists.txt
  • M include/private/microbench.h
  • M src/fs/tests/CMakeLists.txt
  • M src/fs/tests/refout/ellip-e0-b4-p17-sse.refout
  • M src/jacobi/impls/tensor/efstopo.c
  • M src/jacobi/impls/tensor/inlinetmulthex.h
  • M src/jacobi/impls/tensor/tensor.c
  • M src/jacobi/impls/tensor/tensor.h
New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

An implementation of the ``dual order hp'' version of the finite element method. This project targets parallel domain-decomposition methods for strongly coupled nonlinear problems with PDE constraints.