This space is set for discussing the Parallel Computing Curriculum
Proposed Courses:
(Instructions for course proposals^{})
Course 
Banner Form 
Syllabus 
Comments 

CS421: Parallel programming (for cs majors) 
CS421_parProg_BannerForm^{}  syllabus_421^{}  
CS 4xx: Performance programming 
CS4xx_perfProg_BannerForm^{}  syllabus_4xx_perfProg^{}  
CS 5yy: Theory of Parallel Computing 
CS 5yy ParallelTheory Form^{}  CS 5yy Syllabus^{}  
CS 5zz: Advanced(?) Topics in Parallel Computing 
ParallelTopics_Banner^{}  ParallelTopicsSyllabus^{}  
Existing: CS 554: Parallel Numerical Algorithms 

Recently Updated

Existing Courses in Parallel Computing
updated by William GroppJan 02, 2013

Parallel Course Roadmaps
updated by William GroppDec 17, 2012

Meeting Notes for Dec 7, 2012
created by William GroppDec 17, 2012

Parallel Computing Topics and Tracks
updated by William GroppDec 04, 2012

attached by Marc SnirDec 13, 2010

Home
updated by Laxmikant KaleDec 12, 2010

attached by Laxmikant KaleDec 12, 2010

Home
updated by Marc SnirDec 09, 2010

attached by Marc SnirDec 09, 2010

attached by Marc SnirDec 09, 2010

attached by Maria GarzaranDec 09, 2010

attached by Maria GarzaranDec 09, 2010

Home
updated by Maria GarzaranDec 09, 2010

attached by Laxmikant KaleDec 09, 2010

attached by Laxmikant KaleDec 09, 2010
 More
Navigate space
Existing & Proposed Courses
Points for Discussion
Why do we feel we should teach shared memory parallel programming models as an applicationlevel programming model (or at least all but the most advanced application programmer)? It is clearly important to understand what happens at the low levels (OS, what the compiler for a dataparallel programming language might need, implementing a messagepassing model), but is it really important for building applications? If messagepassing is not sufficient (and I don't necessarily mean MPI; we could use erlang, for example), is there something other than general threads and locks that makes sense? This comment was stimulated by a discussion at EC2 ( http://www.cse.psu.edu/~swarat/ec2/program.html^{} ) about using teaching concurrency. Should we go out on a limb and make this a virture?
4xx Parallel Programming (ParP).
Course is intended for CS majors and would be the main introduction to parallel programming.
Taught once a year – possibly twice if has many nonCS students.
Taught by Kale/Padua/Gropp/Snir/Garzaran/...
Curriculum:
Overview of architectures.
Architectural characterization of most important parallel systems today. Issues in effective programming of parallel architectures: exploitation of parallelism, locality (cache, registers), load balancing, communication, overhead, consistency, coherency, latency avoidance.
Programming paradigms.
By the data: Partitioned data, global view of data, no state. By control: Partitioned control, global view of control, functional control . Parallel programming for performance vs. parallel programming for productivity. Survey of programming languages. OpenMP, MPI, TBB, Charm++.
Tools.
Performance monitors. Debuggers.
Models.
Simplified models of the issues mentioned above. Exploitation of parallelism (e.g. PRAM, dependence graphs), communication (latency/ bandwidth), load balancing, (scheduling theory, practical scheduling algorithms), speedup, efficiency, redundancy, isoefficiency.
Programming principles.
Reactive parallel programming. Synchronization strategies, critical regions, atomic updates, races, deadlock avoidance, prevention, livelock, starvation, scheduling fairness.
Optimizations.
Enhancing parallelism, dependence analysis. Tiling for locality and communication. Aggregation for communication. Load balancing strategies, virtualization. Speculative parallelization, transactional memories. Bottlenecks, small critical sections.
Algorithms.
Basic algorithms: Fully (embarrassingly) parallel computation, reduction, parallel prefix, ... Linear algebra algorithms: Matrix multiplication, LU decomposition, Jacobi relaxation, fixed point iterations. Sorting and searching. Graph algorithms, datamining algorithms. Database algorithms. N Body Problem. Computer network (routing) algorithms.
Remarks:
• Should probably reduce the models part (assuming we have a theory course) – but still cover basic concepts (speedup, weak/strong scaling, etc.)
• Should reduce the optimization part (assume we have performance programming course)
• Should cover messagepassing (dare I say MPI?), and possibly PGAS, as well as shared memory.
• MPs to program parallel codes – need time on parallel platforms (shared memory / distributed memory)
• Would lockfree algorithms be covered here? If so, also need to cover related memory consistency issues.
Resources:
• Principles of Parallel Programming Calvin Lin, Larry Snyder (2008) (have not seen yet)• Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (2003) (Google books^{})• Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill (2004) (not a textbook)• Introduction to Parallel Computing (2nd Edition) by Ananth Grama, George Karypis, Vipin Kumar, and Anshul Gupta (2003). (Lecture slides^{}).
• Parallel Programming (2nd Edition) by Barry Wilkinson and Michael Allen (2005) (looks decent, and is soft cover; slides for the book^{})
 The art of Parallel Programming (2nd Edition) by Bruce Lester (2006). (preface^{})
4xx Performance Programming (PerfP)
David/Mahe – please send current draft submitted by Garzaran to committee
Intended for CS majors
Taught once a year
Taught by Garzaran/Padua/Kale/Snir/Gropp
Possible topics.
 Sequential Performance bottlenecks: CPU (execution units, vectorization, registers, register renaming, pipelining); caches (temporal and spatial locality, cache conflicts); memory (latency, row/column, read/write), I/O...
 Parallel performance bottlenecks: Amdahl, load imbalance, communication, false sharing, granularity of communication (distributed memory)
 Tools: profiling, performance tools, compilers
 Some queuing models (memory, networks)?
 Algorithm choice and performance (say something about need to evaluate algorithm choice). Also transformations that do not provide bitforbit identity in the computation (e.g., associativity in floatingpoint computations).
Remarks:
• Is parallel programming a prereq/coreq?
• Multiple opportunities for handson MPs
Resources:
• Parallel Computer Architecture; A Hardware/Software Approach by David Culler and Jaswinder Pal Singh (1999) (Google books^{})
• Software Performance and Scalability: A Quantitative Approach Henry H. Liu (2009)
• The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling R. K. Jain (1991)
• Performance Optimization of Numerically Intensive Codes by Adolfy Hoisie (2001)
• Measuring Computer Performance: A Practitioner's Guide, David Lilja (2001)
5xx Theory of parallel computing (TPP)
Course is intended for graduates or undergraduates with suitable background
Taught once a year
Taught by Snir/Gropp
Topics
Parallel computing models:
 PRAM, Bulk, postal model, LogP + congestion.
 Discussion of the tradeoff between model simplicity and features represented
Simple parallel algorithms:
 reduction, scan, sorting, matrix algorithms, FFT
 graph algorithms
 computational geometry (convex hull, triangulation)
Lower bounds:
 I/O complexity, communication complexity
Synchronization and communication structures and algorithms:
 Counting networks, diffracting trees, sorting networks
 Load balancing and work stealing algorithms
 skiplists, hash sets, parallel search
 barriers, distributed termination (static, dynamic, dense, sparse)
Systolic algorithms:
 Retiming, VLSI complexity
Networks:
 Topologies: Butterfly, Benes, Fat trees, meshes, hypercubes
 Routing: circuit switching, packet switching, wormhole routing
 Deterministic and randomized routing – lower and upper bounds
 Graph embedding: upper and lower bounds
Relations between models
 Simulation of PRAM on network
Remarks:
• Topics are preliminary – list will evolve
• Need to coordinate with parallel programming course to reduce overlap.
Resources:
• Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes by Frank Thomson Leighton, 1991
• Parallel Algorithms (Chapman & Hall/CRC Numerical Analy & Scient Comp. Series) by Henri Casanova, Arnaud Legrand, and Yves Robert, 2008
 The Art of Multiprocessor Programming by Maurice Herlihy and Nir Shavit (2008) +(focused on concurrent programming)_
 Introduction to Parallel Algorithms by Joseph JaJa (1992)
5xx Advanced Topics in parallel computing
Regular course (it is not a "special topics" course). It will have a fixed component (about half) and a variable component that the instructor may choose depenind on their interest and the current hot topics.
in such issues as heterogeneous programming (GPGPU), runtimes for parallel programming, emerging parallel languages, etc.
The following is a current list of fixed topics. Feel free to edit to add. I will paste into the standard form word document later.
The numbers at the left are number of 50 minute lectures devoted to that topic (in the proposal.. in reality, the instructor will control it more)
 1: Introduction and Review: Parallel Architectures, basic programming models
 4: Advanced interconnects, network contention, latency tolerance, topology aware mapping
 4: Advanced parallel global address space (PGAS) models
 3: Jiiter/noise: Origins of computational and communication noise, and its impact on performance, strategies for mitigation: asynchronous reductions, transient load balancing
 3: Fault Tolerance
 3: Petascale architectures, exascale architecture issues
 3: Energy and Thermal considerations in parallel computing
 3: Advanced and Hybrid parallel programming, multiparadigm programming
 3: Domain specific frameworks
 3: adaptive refinement: Structured AMR, Delauny mesh refinement,
 3: Performance Modeling, simulation and prediction
 15: Variable content, covering recent advances, or optional topics, at the instructors discretion
 1 Midterm
 42 Total(Doesn't add up at the moment)
CS 554 Parallel Numerical Algorithms^{}
Has significant overlap with other proposed courses – should involve Mike in discussion
Syllabus
PARALLEL COMPUTING
 Motivation
 Architectures
 Taxonomy
 Memory Organization
 Networks
 Network Topologies
 Graph Embedding
 Communication
 Message Routing
 Communication Concurrency
 Collective Communication
PARALLEL ALGORITHM DESIGN
 Computational Model
 Design Methodology
 Partitioning
 Communication
 Agglomeration
 Mapping
 Example
PARALLEL PROGRAMMING
 Parallel Programming Paradigms
 Functional languages
 Parallelizing compilers
 Object parallel
 Data parallel
 Shared memory
 Remote memory access
 Message passing
 MPI – MessagePassing Interface
 MPI Basics
 Communication and Communicators
 Virtual Process Topologies
 Performance Monitoring and Visualization
PARALLEL PERFORMANCE
 Analysis
 Basic Definitions
 Amdahl's Law
 Problem Scaling
 Isoefficiency and Scalability
 Speedup and efficiency
 Scalability
 Communication modeling
 Isoefficiency function
 Modeling
VECTOR AND MATRIX PRODUCTS
 Inner product
 Outer product
 Matrixvector product
 Matrixmatrix product
 Row, column, and submatrix partitioning
LU FACTORIZATION
 ijk forms of Gaussian elimination
 Row, column, and submatrix partitioning
 Partial pivoting and alternatives
CHOLESKY FACTORIZATION
 ijk forms of Cholesky factorization
 Memory access patterns
 Data dependences
 Sparse Cholesky factorization
 Elimination trees
 Fanout, fanin, and multifrontal algorithms
TRIANGULAR SYSTEMS
 Row vs. column partitioning
 Fanin and fanout algorithms
 Wavefront algorithms
 Cyclic algorithms
 Submatrix partitioning
BAND AND TRIDIAGONAL SYSTEMS
 Cyclic reduction
ITERATIVE METHODS FOR LINEAR SYSTEMS
 Stationary iterative methods
 Conjugate gradient method
 Preconditioning
 Partitioning
 Coloring
QR FACTORIZATION
 Householder QR factorization
 Givens QR factorization
EIGENVALUE PROBLEMS
 Power and inverse iteration
 Preliminary reduction and QR iteration
 Arnoldi and Lanczos methods
 Spectrumslicing methods
 Divideandconquer method
 Singular value decomposition
FAST FOURIER TRANSFORM
 Discrete Fourier transform
 FFT algorithm
 Binary exchange algorithm
 Transpose algorithm
OTHER NUMERICAL PROBLEMS
 Nonlinear equations
 Optimization
 Numerical integration
 Ordinary differential equations
PARTIAL DIFFERENTIAL EQUATIONS
 Domain decomposition
 Schwarz method
 Schur method
 Multigrid
PARTICLE SIMULATIONS
Additional resources:
Ganesh Gopalakrishnan has some notes on multicore programming at http://www.eng.utah.edu/~cs5966/^{} .
Course at IIT (Spring 2010): CS546 Parallel and Distributed Processing^{}
<! /* Style Definitions */ table.MsoNormalTable
> 1  Introduction and Review: Parallel Architectures, basic programming models 
4  Advanced interconnects, network contention, latency tolerance, topology aware mapping 
4  Advanced parallel global addresss space (PGAS) models 
3  Jiiter/noise: Origins of computational and communication noise, and its impact on performance, strategies for mitigation: asynchronous reductions, transient load balancing 
3  Petascale architectures, exascale architecture issues 
2  Energy and Thermal considerations in parallel computing 
4  Advanced parallel global addresss space (PGAS) models 
4 3 15 
Advanced and Hybrid parallel programming Performance Modeling, simulation and prediction Variable content, covering recent advances, or optional topics, at the instructors discretion 
1  Midterm 
42  Total 