#9 Unrolled and/or SSE tuned versions of TensorMult_*

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

#9 new

Unrolled and/or SSE tuned versions of TensorMult_*

Reported by Jed Brown | November 28th, 2008 @ 01:40 AM

Without unrolling over the last dimension, GCC seems limited at about 1150 MFLOPS, but this operation should be capable of much better. The first step is naive unrolling over the last dimension. That has a chance of putting us above 2 GFLOPS by skipping lots of conditionals and changing lots of mulsd to mulapd and such. If it doesn't then some SSE intrinsics should easily do the trick, but that may call for some custom tuning for each number of dofs and last dimension. Not a priority.

No comments found

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

An implementation of the ``dual order hp'' version of the finite element method. This project targets parallel domain-decomposition methods for strongly coupled nonlinear problems with PDE constraints.

People watching this ticket

Jed Brown

Dohp dohp

Unrolled and/or SSE tuned versions of TensorMult_*

No comments found

Create your profile

People watching this ticket

Tags

Pages

Dohp dohp

Keyword searching

Unrolled and/or SSE tuned versions of TensorMult_*

No comments found

Create your profile

People watching this ticket

Tags

Pages