Changeset: Changeset [b421aafb8359d2cf7f12ebe927cb3df66cb334f9] by Jed Brown

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

List changesets

Changeset [b421aafb8359d2cf7f12ebe927cb3df66cb334f9] by Jed Brown

May 10th, 2009 @ 11:36 PM

Reworking of TensorMult_Hex to provide unrolled implementations

Implemented SSE version for P=Q=4, D=1, gives 3x speed improvement for TensorMult_Hex. This is probably the most important size, but we also need for D=3, and it would be nice to have for P=Q=6 as well (higher than order 5 is not practical except for simple domains). Using an odd number of points, i.e. P and/or Q odd, is not ideal because it is bad for alignment. This stuff only works with GCC and SSE3, if someone needs it on a different architecture, tell me and I'll make it portable (but you won't get the speed without SSE3 and a compiler that is good with intrinsics). http://github.com/jedbrown/dohp/...

Committed by Jed Brown

M CMakeLists.txt
M include/private/microbench.h
M src/fs/tests/CMakeLists.txt
M src/fs/tests/refout/ellip-e0-b4-p17-sse.refout
M src/jacobi/impls/tensor/efstopo.c
M src/jacobi/impls/tensor/inlinetmulthex.h
M src/jacobi/impls/tensor/tensor.c
M src/jacobi/impls/tensor/tensor.h

Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

An implementation of the ``dual order hp'' version of the finite element method. This project targets parallel domain-decomposition methods for strongly coupled nonlinear problems with PDE constraints.

Dohp dohp

Keyword searching

Changeset [b421aafb8359d2cf7f12ebe927cb3df66cb334f9] by Jed Brown

Create your profile