Multiprocessor Scheduling

Thread Scheduling

How are the execution of the threads actually scheduled?

You know that the SM processes 4 warps at the same time.

Warp vs. block scheduling? https://stackoverflow.com/questions/64624793/warp-and-block-scheduling-in-cuda-what-exactly-happens-and-questions-about-el

4 ways of doing thread scheduling:

  1. Load sharing: Processes are not assigned to a particular processor
  2. Gang scheduling: A set of related threads is scheduled to run on a set of processors at the same time
  3. Dedicated processor assignment: Threads are assigned to specific processor
  4. Dynamic scheduling: Number of threads in a process can be altered during the course of execution

Each processor has a queue.

1. Load Sharing

What is the difference between load sharing and Load Balancing?

Load sharing does not take the server’s current load or capacity into account when distributing traffic.

In load sharing, there is a global queue, where processors picks processes from this queue. The load is distributed evenly across the processors.

Advantages of Load Sharing

Advantages:

  • No processor is left idle while there are processes available.
  • No centralized scheduler required
  • Transparent for the developer

  • This seems like good
  • Race condition, multiple processes extracting from the queue, so synchronization might end up being a bottleneck (imagine when you have dozens of cores)
  • You always get a cache miss if it switches to another core

Disadvantages

Disadvantages of Load Sharing

  • Central queue needs mutual exclusion
  • Preemptive threads are unlikely to resume execution on the same processor (cache misses)
  • If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time

2. Dedicated Processor Assignment

Dedicate thread groups to processors until application completes

There is no Multiprogramming! This looks bad, extremely wasteful. However: Look at the scenario where you have 1000 processor cores. There is Zero overhead.

3. Gang Scheduling

Group Scheduling

The concept of scheduling a set of processes simultaneously on a set of processors predates the use of threads. [JONE80] refers to the concept as group scheduling.

This approach has the following performance benefits:

  • If processes in the group are related or coordinated in some fashion, synchronization blocking may be reduced, less process switching may be necessary, and performance will increase
  • A single scheduling decision affects a number of processors and processes at one time, reducing scheduling overhead

Threads often need to synchronize with each other. Therefore, the idea is to run multiple threads to together by simultaneous scheduling of threads that make up a single process.

Success

This is useful for applications where performance severely degrades when any part of the application is not running

Disadvantages

4. Dynamic Scheduling

Number of threads in a process are altered dynamically by the application. Requires a layer of indirection which maps computation tasks to threads.