21-765.
Introduction to Parallel Computing and Scientific
Computation
Location:
POS 147 Time: Fri 2:00-3:50pm
First Lecture: Jan 17, 2025
Projects
due:
graduating students - May 4, 2025 ; non-graduating students -
May 9, 2025
Level: Introductory
Coverage:
General
The objectives of this course are:
- to develop structural intuition of how the hardware and the
software work together, starting from simple systems to complex
shared resource architectures;
- to provide guidelines about how to write and document a
software package;
- to familiarize the audience with the main parallel programming
techniques and the common software packages/libraries.
General Considerations
The course is intended to be self-consistent, no prior computer
skills being required. However, familiarity with the C programming
language and Unix command line should give the student more time
to concentrate on the core issues of the course, as hardware
structure, operating system and networking insights, numerical
methods.
The main idea of the course is to give the student a hands-on
experience of writing a simple software package that eventually
can be implemented on a parallel computer architecture. All the
steps and components of the process (defining the problem,
numerical algorithms, program design, coding, different levels of
documentation) are treated at a basic level. Everything is done in
the context of a structured vision of the computing environment.
The typical programming environment makes the computer hardware
and operating system transparent to the user. In contrast, each
program intended for efficient parallel execution must consider
the custom physical and logical communication topology of the
processors in a parallel system. The course gives a general image
over the entire range of issues that a developer should consider
when designing a parallel algorithm, from principles to details.
The knowledge provided by the course should be enough to help the
audience decide what's the most appropriate technique to approach
a problem on a given computer architecture. However, the
development of an efficient algorithm will require a lot of additional study, practice, and
experimental work.
The examples, exercises, and projects were determined by the
computers and software available for practice. The following were
preferred: the C language, the x86_64 hardware platform, and the
Linux operating system. However, the presentation will be kept at
a very general level such that the student is prepared for any
real parallel computing environment. The individual study and the
midterm project are based on Python.
The course contains three parts:
The first part makes the connection between real life and the
computer world.
- Module 1: software package structure, design, development, and
maintenance concerns.
- Module 2: parallel computing basic concepts and programming
techniques: SMP, MPI, domain/data decomposition, deadlocks,
hybrid programming.
- Module 3: tools for programming and cluster management: git,
remote access/key management, schedulers.
- Module 4: how to transform a real life problem into a
sequential computer algorithm, with reference to basic numerical
methods.
The second part provides the background needed to understand how
computer systems work.
- Module 5: the layered model of the computer hardware basics.
- Module 6: a model of structural information
organization with applications to filesystems and storage.
- Module 7: a typical operating system, user interfaces, shell,
process communications, user level issues.
- Module 8: programming notions with applications to the C
language, libraries, compilers, debuggers.
- Module 9: describes computer networks, topology, and layered
communication protocols.
The third part explores the performance computing world.
- Module 10: how to take advantage of multiple cores (SMP)
through multi-threading and OpenMP.
- Module 11: the MPI standard, several common implementation,
additional library issues.
- Module 12: the PETSC library, an interesting application of
MPI for real life simulations.
- Module 13: GPU computing: CUDA and OpenACC.
- Module 14: modern developments: Big Data (Spark), Artificial
Intelligence (Decision Trees, Neural Networks/Tensorflow).
Credit:
Grading is based on three components:
- Class attendance (30%)
- Merits of the final project (50%)
- Midterm take-home test and/or in-class short quizzes (20%)
Students will need to pick a final project no later than the third
week of the semester and to deliver milestones every other week. A
project consists in developing a (simple) software package or module
that has a defined practical purpose. More details and alternatives
are here. Students are welcome to
discuss with the instructor projects close to their scientific
interests, or pick one of the offered projects.
Merits of the final project considered for grading are:
- how well is the program structured such that it will allow
easy further development, easy debugging (diagram of modules and
APIs)
- the quality (and not the length) of the documentation
- how functional, efficient, and/or innovative is the numerical
algorithm (tested on the provided examples)
Homework will be assigned after most
lectures. Submitting solutions is not mandatory, but can add a
bonus for the final grade. The purpose of the homework is to help
develop practical skills and get used with the computing
environment. The homework is recommended for students auditing the
lectures as well. Students are encouraged to submit solutions
containing interesting approaches or comments.
Administrativia
- Students are welcome to participate for credit or for fun.
Unregistered students should express their interest by e-mail to
florin@andrew.cmu.edu
or in person (Wean Hall, Room 6218) at any time before second
day of school of the Spring Semester.
- Third party websites used during the class (please create accounts if you don't already
have):
- I'm always available for consultations and for discussions
regarding the projects and the curriculum.