Computer Science > EXAM > CSE 6220 Intro to High Performance Computing - CSE 6220 (OMS CS): Intro to HPCFinal Exam J. _ Georgi (All)

CSE 6220 Intro to High Performance Computing - CSE 6220 (OMS CS): Intro to HPCFinal Exam J. _ Georgia Institute Of Technology

Document Content and Description Below

CSE 6220 (OMS CS): Intro to HPC Final Exam J Apr 27-Apr 30 (AOE) Question 1 (10 points) TopoZoo: Binomial Heap This problem concerns a binomial tree network topology, defined as follows. Let Bh den... ote a binomial tree network of height h. If h = 0, then B0 is a single processing node. Otherwise, it is constructed by taking two binomial tree networks of height h − 1 and making the root of one tree as the child of the root of the other tree, as shown below. Five different binomial tree networks. Figure taken from book.huihoo.com. (a) (6 points) Suppose every node of a given Bh network holds a unique random integer. Give an efficient parallel algorithm to heapify the values as a minHeap. That is, after your parallel-heapify, the integer on each node should be smaller than all integers on its children. Your description must be concise and clear. There is no need to deal with corner cases or to provide an exact algorithm. However, you need to argue convincingly that the parallel running time of your scheme is O(h). To simplify your notation, let Bh[k; l] represent the node at the l-th level and the k-th column of the network. (b) (4 points) Describe an injective, dilation-1 embedding to map a Bh to a Hypercube of dimension h. Your description must be concise and clear. You don’t need to deal with corner cases or provide an exact algorithm.CSE 6220 (OMS CS) Final Exam J, Page 2 of 6 Spring 2019 Question 2 (10 points) Dense Matrix Algorithms (a) (5 points) Consider the matrix-vector multiplication operation, y = Ax, where A is an n×n matrix and x and y are n×1 vectors. Each output y vector element may be computed as y[i] = Pn j=0 A[i; j] · x[j] Suppose that the matrix is partitioned with 1D column-wise partitioning, such that each processor stores np complete columns of the matrix A and has matching np elements of the vectors x and y, i.e., if a processor holds column j (i.e., A[∗; j]), then it also owns x[j] and y[j]. Design an efficient distributed memory parallel algorithm to compute matrixvector multiplication in parallel. Analyze the run-time of this algorithm and specify the computation and communication time. (b) (5 points) Let A be an n × n matrix. The transpose of A, denoted AT is defined as AT[i; j] = A[j; i]. Design a parallel algorithm to compute AT from A using an n × n virtual torus topology. During the execution of the algorithm, each processor can hold at most a constant number of matrix elements at a time (can be greater than one). Specify the total runtime for your algorithm. [Show More]

Last updated: 2 years ago

Preview 1 out of 6 pages

Buy Now

Instant download

We Accept: