TY - GEN
T1 - Low-constant parallel algorithms for finite element simulations using linear octrees
AU - Sundar, Hari
AU - Sampath, Rahul S.
AU - Adavani, Santi S.
AU - Davatzikos, Christos
AU - Biros, George
PY - 2007
Y1 - 2007
N2 - In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice encoding for compression by representing the octree and element connectivity as an Uniquely Decodable Code (UDC); overlapping communication and computation; and byte alignment for cache efficiency. The cost of applying the Laplacian is comparable to that of applying it using a direct indexing regular grid discretization with the same number of elements. Our algorithm has scaled up to four billion octants on 4096 processors on a Cray XT3 at the Pittsburgh Supercomputing Center. The overall tree construction time is under a minute in contrast to previous implementations that required several minutes; the evaluation of the discretization of a variable-coefficient Laplacian takes only a few seconds. (c) 2007 ACM.
AB - In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice encoding for compression by representing the octree and element connectivity as an Uniquely Decodable Code (UDC); overlapping communication and computation; and byte alignment for cache efficiency. The cost of applying the Laplacian is comparable to that of applying it using a direct indexing regular grid discretization with the same number of elements. Our algorithm has scaled up to four billion octants on 4096 processors on a Cray XT3 at the Pittsburgh Supercomputing Center. The overall tree construction time is under a minute in contrast to previous implementations that required several minutes; the evaluation of the discretization of a variable-coefficient Laplacian takes only a few seconds. (c) 2007 ACM.
UR - http://www.scopus.com/inward/record.url?scp=56749103058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56749103058&partnerID=8YFLogxK
U2 - 10.1145/1362622.1362656
DO - 10.1145/1362622.1362656
M3 - Conference contribution
AN - SCOPUS:56749103058
SN - 9781595937643
T3 - Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC'07
BT - Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC'07
T2 - 2007 ACM/IEEE Conference on Supercomputing, SC'07
Y2 - 10 November 2007 through 16 November 2007
ER -