Vector Models for Data-Parallel Computing
1 Introduction
1.1 Parallel Vector Models
1.2 Vector Instructions
1.3 Implementation
1.4 Summary and Roadmap
I Models
1.1 Parallel Vector Models
1.2 Vector Instructions
1.3 Implementation
1.4 Summary and Roadmap
I Models
2 Parallel Vector Models
2.1 The Vector Random Access Machine
2.2 Comparison to P-RAM Models
2.3 Comparison to Circuit and Network Models
2.4 ComparisontoBit-VectorModels
2.5 Selecting Primitives
2.6 Other Issues
2.6.1 Serially Optimal Algorithms
2.6.2 SpaceComplexity
2.6.3 Equal Time Assumption
2.6.4 Do We Need the Scalar Memory?
2.7 Conclusion
2.1 The Vector Random Access Machine
2.2 Comparison to P-RAM Models
2.3 Comparison to Circuit and Network Models
2.4 ComparisontoBit-VectorModels
2.5 Selecting Primitives
2.6 Other Issues
2.6.1 Serially Optimal Algorithms
2.6.2 SpaceComplexity
2.6.3 Equal Time Assumption
2.6.4 Do We Need the Scalar Memory?
2.7 Conclusion
3 The Scan Primitives
3.1 Why Scan Primitives?
3.2 Notation
3.3 Example: Line-of-Sight
3.4 Simple Operations
3.4.1 Example: Split Radix Sort
3.5 Segments and Segmented Scans
3.6 Allocating Elements
3.6.1 Example: Line Drawing
3.6.2 Notes on Allocating
3.7 Long Vectors and Load Balancing
3.7.1 Load Balancing
3.7.2 Example: Halving Merge
3.7.3 Notes on Simulating Long Vectors
3.1 Why Scan Primitives?
3.2 Notation
3.3 Example: Line-of-Sight
3.4 Simple Operations
3.4.1 Example: Split Radix Sort
3.5 Segments and Segmented Scans
3.6 Allocating Elements
3.6.1 Example: Line Drawing
3.6.2 Notes on Allocating
3.7 Long Vectors and Load Balancing
3.7.1 Load Balancing
3.7.2 Example: Halving Merge
3.7.3 Notes on Simulating Long Vectors
4 The Scan Vector Model
4.1 The Scan Vector Instruction Set
4.1.1 Scalar Instructions
4.1.2 Elementwise Instructions
4.1.3 Permute Instructions
4.1.4 Scan Instructions
4.1.5 Vector-Scalar Instructions
4.2 Simple Operations
4.3 Segments and Segmented Instructions
4.4 Segmented Operations
4.5 Additional Instructions
4.5.1 Merge Instruction
4.5.2 Combine Instructions
4.5.3 Multi-Extract Instruction
4.5.4 Keyed-Scan Instructions
4.1 The Scan Vector Instruction Set
4.1.1 Scalar Instructions
4.1.2 Elementwise Instructions
4.1.3 Permute Instructions
4.1.4 Scan Instructions
4.1.5 Vector-Scalar Instructions
4.2 Simple Operations
4.3 Segments and Segmented Instructions
4.4 Segmented Operations
4.5 Additional Instructions
4.5.1 Merge Instruction
4.5.2 Combine Instructions
4.5.3 Multi-Extract Instruction
4.5.4 Keyed-Scan Instructions
II Algorithms
5 Data Structures
5.1 Graphs
5.1.1 Vector Graph Representations
5.1.2 Neighbor Reducing
5.1.3 Distributing an Excess Across Edges
5.2 Trees
5.2.1 Vector Tree Representation
5.2.2 Leaffix and Rootfix Operations
5.2.3 Tree Manipulations
5.3 Multidimensional Arrays
5 Data Structures
5.1 Graphs
5.1.1 Vector Graph Representations
5.1.2 Neighbor Reducing
5.1.3 Distributing an Excess Across Edges
5.2 Trees
5.2.1 Vector Tree Representation
5.2.2 Leaffix and Rootfix Operations
5.2.3 Tree Manipulations
5.3 Multidimensional Arrays
6 Computational-Geometry Algorithms
6.1 Generalized Binary Search
6.2 Building a k-DTree
6.3 ClosestPair
6.4 Quickhull
6.5 n MergeHull
6.6 Line of Sight
6.1 Generalized Binary Search
6.2 Building a k-DTree
6.3 ClosestPair
6.4 Quickhull
6.5 n MergeHull
6.6 Line of Sight
7 Graph Algorithms
7.1 Minimum Spanning Tree and Connectivity
7.2 Maximum Flow
7.3 Maximal Independent Set and Biconnectivity
7.1 Minimum Spanning Tree and Connectivity
7.2 Maximum Flow
7.3 Maximal Independent Set and Biconnectivity
8 Numerical Algorithms
8.1 Matrix-Vector Multiplication
8.2 Linear-SystemsSolver
8.3 Simplex
8.4 Outer Product
8.5 Sparse-Matrix Multiplication
8.1 Matrix-Vector Multiplication
8.2 Linear-SystemsSolver
8.3 Simplex
8.4 Outer Product
8.5 Sparse-Matrix Multiplication
III Languages and Compilers
9 Collection-Oriented Languages
9.1 Collections
9.2 Collection Operations
9.3 Mapping Collections onto Vectors
9 Collection-Oriented Languages
9.1 Collections
9.2 Collection Operations
9.3 Mapping Collections onto Vectors
10 Flattening Nested Parallelism
10.1 Nested Parallelism and Replicating
10.2 The Replicating Theorem
10.3 Access-Restricted Code
10.4 Access-Fixed Replicating Theorem
10.5 Indirect Addressing
10.6 Conditional Control
10.6.1 Branch-Packing
10.6.2 Contained Programs
10.6.3 Containment of Functions in Book
10.6.4 Round-Robin Simulation
10.1 Nested Parallelism and Replicating
10.2 The Replicating Theorem
10.3 Access-Restricted Code
10.4 Access-Fixed Replicating Theorem
10.5 Indirect Addressing
10.6 Conditional Control
10.6.1 Branch-Packing
10.6.2 Contained Programs
10.6.3 Containment of Functions in Book
10.6.4 Round-Robin Simulation
11 A Compiler for Paralation Lisp
11.1 Source Code: Paralation Lisp
11.2 TargetCode:Scan-VectorLisp
11.3 Translation
11.1 Source Code: Paralation Lisp
11.2 TargetCode:Scan-VectorLisp
11.3 Translation
IV Architecture
12 Implementing Parallel Vector Models
12.1 Implementation on the Connection Machine
12.1.1 The Vector Memory
12.1.2 The Instructions
12.1.3 Optimizations
12.1.4 Running Times
12.2 Simulating on P-RAM
12 Implementing Parallel Vector Models
12.1 Implementation on the Connection Machine
12.1.1 The Vector Memory
12.1.2 The Instructions
12.1.3 Optimizations
12.1.4 Running Times
12.2 Simulating on P-RAM
13 Implementing the Scan Primitives
13.1 Unsigned +-Scan and Max-Scan
13.1.1 Tree Scan
13.1.2 Hardware Implementation of Tree Scan
13.1.3 An Example System
13.2 Directly Implementing Other Scans
13.2.1 Backward and Segmented Scans
13.2.2 Multidimensional Grid Scans
13.3 Floating-Point +-Scan
13.1 Unsigned +-Scan and Max-Scan
13.1.1 Tree Scan
13.1.2 Hardware Implementation of Tree Scan
13.1.3 An Example System
13.2 Directly Implementing Other Scans
13.2.1 Backward and Segmented Scans
13.2.2 Multidimensional Grid Scans
13.3 Floating-Point +-Scan
14 Conclusion
Download From
No comments:
Post a Comment