Skip to end of metadata
Go to start of metadata

This space is set for discussing the Parallel Computing Curriculum

Proposed Courses:

(Instructions for course proposals)

Banner Form
CS421: Parallel programming (for cs majors)
CS421_parProg_BannerForm syllabus_421  
CS 4xx: Performance programming
CS4xx_perfProg_BannerForm syllabus_4xx_perfProg  
CS 5yy: Theory of Parallel Computing
CS 5yy ParallelTheory Form CS 5yy Syllabus  
CS 5zz: Advanced(?) Topics in Parallel Computing
ParallelTopics_Banner ParallelTopicsSyllabus  
Existing: CS 554: Parallel Numerical Algorithms

Recently Updated

Navigate space

Existing & Proposed Courses

Points for Discussion

Why do we feel we should teach shared memory parallel programming models as an application-level programming model (or at least all but the most advanced application programmer)?  It is clearly important to understand what happens at the low levels (OS, what the compiler for a data-parallel programming language might need, implementing a message-passing model), but is it really important for building applications?  If message-passing is not sufficient (and I don't necessarily mean MPI; we could use erlang, for example), is there something other than general threads and locks that makes sense?  This comment was stimulated by a discussion at EC2 ( http://www.cse.psu.edu/~swarat/ec2/program.html ) about using teaching concurrency.  Should we go out on a limb and make this a virture?

4xx Parallel Programming (ParP).

Course is intended for CS majors and would be the main introduction to parallel programming.
Taught once a year – possibly twice if has many non-CS students.
Taught by Kale/Padua/Gropp/Snir/Garzaran/...


Overview of architectures.

Architectural characterization of most important parallel systems today. Issues in effective programming of parallel architectures: exploitation of parallelism, locality (cache, registers), load balancing, communication, overhead, consistency, coherency, latency avoidance.

Programming paradigms.

By the data: Partitioned data, global view of data, no state. By control: Partitioned control, global view of control, functional control . Parallel programming for performance vs. parallel programming for productivity. Survey of programming languages. OpenMP, MPI, TBB, Charm++.


Performance monitors. Debuggers.


Simplified models of the issues mentioned above. Exploitation of parallelism (e.g. PRAM, dependence graphs), communication (latency/ bandwidth), load balancing, (scheduling theory, practical scheduling algorithms), speedup, efficiency, redundancy, isoefficiency.

Programming principles.

Reactive parallel programming. Synchronization strategies, critical regions, atomic updates, races, deadlock avoidance, prevention, livelock, starvation, scheduling fairness.


Enhancing parallelism, dependence analysis. Tiling for locality and communication. Aggregation for communication. Load balancing strategies, virtualization. Speculative parallelization, transactional memories. Bottlenecks, small critical sections.


Basic algorithms: Fully (embarrassingly) parallel computation, reduction, parallel prefix, ... Linear algebra algorithms: Matrix multiplication, LU decomposition, Jacobi relaxation, fixed point iterations. Sorting and searching. Graph algorithms, datamining algorithms. Database algorithms. N Body Problem. Computer network (routing) algorithms.

• Should probably reduce the models part (assuming we have a theory course) – but still cover basic concepts (speedup, weak/strong scaling, etc.)
• Should reduce the optimization part (assume we have performance programming course)
• Should cover message-passing (dare I say MPI?), and possibly PGAS, as well as shared memory.
• MPs to program parallel codes – need time on parallel platforms (shared memory / distributed memory)
• Would lock-free algorithms be covered here? If so, also need to cover related memory consistency issues.


• Principles of Parallel Programming Calvin Lin, Larry Snyder (2008) (have not seen yet)• Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (2003) (Google books)• Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill (2004) (not a textbook)• Introduction to Parallel Computing (2nd Edition) by Ananth Grama, George Karypis, Vipin Kumar, and Anshul Gupta (2003). (Lecture slides).
• Parallel Programming (2nd Edition) by Barry Wilkinson and Michael Allen (2005) (looks decent, and is soft cover; slides for the book)

  • The art of Parallel Programming (2nd Edition) by Bruce Lester (2006). (preface)

4xx Performance Programming (PerfP)

David/Mahe – please send current draft submitted by Garzaran to committee

Intended for CS majors
Taught once a year
Taught by Garzaran/Padua/Kale/Snir/Gropp

Possible topics.

  • Sequential Performance bottlenecks: CPU (execution units, vectorization, registers, register renaming, pipelining); caches (temporal and spatial locality, cache conflicts); memory (latency, row/column, read/write), I/O...
  • Parallel performance bottlenecks: Amdahl, load imbalance, communication, false sharing, granularity of communication (distributed memory)
  • Tools: profiling, performance tools, compilers
  • Some queuing models (memory, networks)?
  • Algorithm choice and performance (say something about need to evaluate algorithm choice). Also transformations that do not provide bit-for-bit identity in the computation (e.g., associativity in floating-point computations).


• Is parallel programming a prereq/coreq?
• Multiple opportunities for hands-on MPs

• Parallel Computer Architecture; A Hardware/Software Approach by David Culler and Jaswinder Pal Singh (1999) (Google books)
• Software Performance and Scalability: A Quantitative Approach Henry H. Liu (2009)
• The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling R. K. Jain (1991)
• Performance Optimization of Numerically Intensive Codes by Adolfy Hoisie (2001)
• Measuring Computer Performance: A Practitioner's Guide, David Lilja (2001)

5xx Theory of parallel computing (TPP)

Course is intended for graduates or undergraduates with suitable background
Taught once a year
Taught by Snir/Gropp


Parallel computing models:

  • PRAM, Bulk, postal model, LogP + congestion.
  • Discussion of the trade-off between model simplicity and features represented

Simple parallel algorithms:

  • reduction, scan, sorting, matrix algorithms, FFT
  • graph algorithms
  • computational geometry (convex hull, triangulation)

Lower bounds:

  • I/O complexity, communication complexity

Synchronization and communication structures and algorithms:

  • Counting networks, diffracting trees, sorting networks
  • Load balancing and work stealing algorithms
  • skip-lists, hash sets, parallel search
  • barriers, distributed termination (static, dynamic, dense, sparse)

Systolic algorithms:

  • Retiming, VLSI complexity


  • Topologies: Butterfly, Benes, Fat trees, meshes, hypercubes
  • Routing: circuit switching, packet switching, wormhole routing
  • Deterministic and randomized routing – lower and upper bounds
  • Graph embedding: upper and lower bounds

Relations between models

  • Simulation of PRAM on network


• Topics are preliminary – list will evolve
• Need to coordinate with parallel programming course to reduce overlap.

• Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes by Frank Thomson Leighton, 1991
• Parallel Algorithms (Chapman & Hall/CRC Numerical Analy & Scient Comp. Series) by Henri Casanova, Arnaud Legrand, and Yves Robert, 2008

  • The Art of Multiprocessor Programming by Maurice Herlihy and Nir Shavit (2008) +(focused on concurrent programming)_
  • Introduction to Parallel Algorithms by Joseph JaJa (1992)

5xx Advanced Topics in parallel computing

Regular course (it is not a "special topics" course). It will have a fixed component (about half) and a variable component that the instructor may choose depenind on their interest and the current hot topics.

in such issues as heterogeneous programming (GPGPU), run-times for parallel programming, emerging parallel languages, etc.

The following is a current list of fixed topics. Feel free to edit to add. I will paste into the standard form word document later.

The numbers at the left are number of 50 minute lectures devoted to that topic (in the proposal.. in reality, the instructor will control it more)

  • 1: Introduction and Review:  Parallel Architectures, basic programming models
  • 4: Advanced interconnects, network contention, latency tolerance, topology aware mapping
  • 4: Advanced parallel global address space (PGAS) models
  • 3: Jiiter/noise: Origins of computational and communication noise, and its impact on performance, strategies for mitigation: asynchronous reductions, transient load balancing
  • 3: Fault Tolerance
  • 3: Petascale architectures, exascale architecture issues
  • 3: Energy and Thermal considerations in parallel computing
  • 3: Advanced and Hybrid parallel programming, multi-paradigm programming
  • 3: Domain specific frameworks
  • 3: adaptive refinement: Structured AMR, Delauny mesh refinement,
  • 3: Performance Modeling, simulation and prediction
  • 15: Variable content, covering recent advances, or optional topics, at the instructors discretion
  • 1 Midterm
  • 42 Total(Doesn't add up at the moment)

CS 554 Parallel Numerical Algorithms

Has significant overlap with other proposed courses – should involve Mike in discussion



  • Motivation
  • Architectures
    • Taxonomy
    • Memory Organization
  • Networks
    • Network Topologies
    • Graph Embedding
  • Communication
    • Message Routing
    • Communication Concurrency
    • Collective Communication


  • Computational Model
  • Design Methodology
    • Partitioning
    • Communication
    • Agglomeration
    • Mapping
  • Example


  • Parallel Programming Paradigms
    • Functional languages
    • Parallelizing compilers
    • Object parallel
    • Data parallel
    • Shared memory
    • Remote memory access
    • Message passing
  • MPI – Message-Passing Interface
    • MPI Basics
    • Communication and Communicators
    • Virtual Process Topologies
    • Performance Monitoring and Visualization


  • Analysis
    • Basic Definitions
    • Amdahl's Law
    • Problem Scaling
    • Isoefficiency and Scalability
    • Speedup and efficiency
    • Scalability
    • Communication modeling
    • Isoefficiency function
  • Modeling


  • Inner product
  • Outer product
  • Matrix-vector product
  • Matrix-matrix product
  • Row, column, and submatrix partitioning


  • ijk forms of Gaussian elimination
  • Row, column, and submatrix partitioning
  • Partial pivoting and alternatives


  • ijk forms of Cholesky factorization
  • Memory access patterns
  • Data dependences
  • Sparse Cholesky factorization
  • Elimination trees
  • Fan-out, fan-in, and multifrontal algorithms


  • Row vs. column partitioning
  • Fan-in and fan-out algorithms
  • Wavefront algorithms
  • Cyclic algorithms
  • Submatrix partitioning


  • Cyclic reduction


  • Stationary iterative methods
  • Conjugate gradient method
  • Preconditioning
  • Partitioning
  • Coloring


  • Householder QR factorization
  • Givens QR factorization


  • Power and inverse iteration
  • Preliminary reduction and QR iteration
  • Arnoldi and Lanczos methods
  • Spectrum-slicing methods
  • Divide-and-conquer method
  • Singular value decomposition


  • Discrete Fourier transform
  • FFT algorithm
  • Binary exchange algorithm
  • Transpose algorithm


  • Nonlinear equations
  • Optimization
  • Numerical integration
  • Ordinary differential equations


  • Domain decomposition
  • Schwarz method
  • Schur method
  • Multigrid


Additional resources:

Ganesh Gopalakrishnan has some notes on multicore programming at http://www.eng.utah.edu/~cs5966/ .

Course at IIT (Spring 2010): CS546 Parallel and Distributed Processing
<!-- /* Style Definitions */ table.MsoNormalTable

Unknown macro: {mso-style-name}

-->| 1 | Introduction and Review:  Parallel Architectures, basic programming models |

4 Advanced interconnects, network contention, latency tolerance, topology aware mapping
4 Advanced parallel global addresss space (PGAS) models
3 Jiiter/noise: Origins of computational and communication noise, and its impact on performance, strategies for mitigation: asynchronous reductions, transient load balancing
3 Petascale architectures, exascale architecture issues
2 Energy and Thermal considerations in parallel computing
4 Advanced parallel global addresss space (PGAS) models
Advanced and Hybrid parallel programming
Performance Modeling, simulation and prediction
Variable content, covering recent advances, or optional topics, at the instructors discretion
1 Midterm
42 Total
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.