Computer Organization

Part 44 – Cluster Processors, UMA, NUMA

UNIT – VI

Tushar B. Kute,
Department of Information Technology,
Sandip Institute of Technology & Research Centre, Nashik.
http://tusharkute.com
Clusters

• Computer cluster is a group of linked computers, working together closely so that in many respect they form a single computer.
• The components of a cluster are commonly but not always connected to each other through fast LAN.
• Computer means a system that run its own, a part from the cluster. Such a computer in cluster is typically referred as a node.
Advantages of clustering

- Absolute scalability
- Incremental scalability
- High availability
- Cost effective
Cluster configurations

(a) Two-node cluster with no shared disk
Cluster configurations

(b) Two-node cluster with shared disk
Homogenous clusters

- Every single node is exactly the same,
Heterogeneous Cluster

• Made from different kinds of computers. For example: a few Sun SPARC station IPXs, a few Intel 486 machines, and a DEC alpha.

• Made from different machines in the same architecture family. For example: a collection of Intel boxes where the machines are of different generations such as mixture of 486, Pentium I, and Pentium II.
Operating System Design Issues

• Failure management
• Load balancing
• Parallelizing computation
  – Parallelizing compiler
  – Parallelized applications
  – Parametric computing
Cluster Computer Architecture

Cluster middleware
(single system image and availability infrastructure)

Sequential applications
Parallel programming environments

Parallel applications

PC / Workstation
Communication software
Net. Interface HW

PC / Workstation
Communication software
Net. Interface HW

PC / Workstation
Communication software
Net. Interface HW

PC / Workstation
Communication software
Net. Interface HW

High speed network / Switch
Cluster middleware services and functions

- Single entry point
- Single file hierarchy
- Single control unit
- Single virtual networking
- Single memory space
- Single job management system
- Single I/O space
- Single Process Space
- Check Pointing
- Process Migration
Comparison

<table>
<thead>
<tr>
<th>Sr. No.</th>
<th>Cluster</th>
<th>SMP</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Multiprocessor system.</td>
<td>Multiprocessor system.</td>
</tr>
<tr>
<td>2</td>
<td>Difficult to manage and configure.</td>
<td>Easy to manage and configure.</td>
</tr>
<tr>
<td>3</td>
<td>Requires more physical space.</td>
<td>Requires less physical space</td>
</tr>
<tr>
<td>4</td>
<td>Draws more power.</td>
<td>Draws less power.</td>
</tr>
<tr>
<td>5</td>
<td>Not much closer to the original single processor model.</td>
<td>Much closer to the original single processor model.</td>
</tr>
<tr>
<td>6</td>
<td>Cluster products are less established and stable than SMP products.</td>
<td>SMP products are well established and stable.</td>
</tr>
<tr>
<td>7</td>
<td>Have more incremental and absolute scalability.</td>
<td>Incremental and absolute scalability is less than clusters.</td>
</tr>
<tr>
<td>8</td>
<td>Since the scalability is more, can dominate high performance server market.</td>
<td>Since the scalability is low, less preferred in high performance server market.</td>
</tr>
<tr>
<td>9</td>
<td>Clusters have higher availability.</td>
<td>SMPs have lower availability than clusters.</td>
</tr>
</tbody>
</table>
Uniform Memory Access

• It is a shared memory architecture used in parallel computers. All the processors in the UMA model share physical memory uniformly.

• In a UMA architecture, access time to memory location is independent of which processor makes the request or which memory chip contains the transferred data.
Types of UMA

• UMA using bus-based SMP architectures
• UMA using crossbar switches
• UMA using multistage switching networks
Example: UMA
Non-Uniform Memory Access

• It is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor.

• Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.
Cache Coherence NUMA

• The system runs only one OS and shows only a single memory image to the user even though the memory is physically distributed over processors.

• Single processors can access their own memory much faster than that of other processors, the memory access is non-uniform.
CC-NUMA
Vector Processing

- It is a CPU design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously.
- This is in contrast to scalar processor which handles one element at a time using multiple instructions.
Examples and Applications

• Radar and Signal processing for detection of space/underwater targets.
• Remote sensing for earth resource exploration.
• Computational wind tunnel experiments.
• 3D stop action computer assisted tomography.
• Weather forecasting
• Medical diagnosis
Vector Processing Approach

• Instead of pipelining just the instructions, they also pipeline the data itself. They are fed instructions that say not just to add A to B, but to add all of the numbers “in the array A to all of the numbers from array B”.

http://www.tusharkute.com
Illustrations

• Programming language
  ▪ Execute this loop for 10 times
  ▪ Read the next instruction and decode it
  ▪ Fetch first number
  ▪ Fetch second number
  ▪ Add them
  ▪ Put the result here
  ▪ End loop

• Vector Processing
  ▪ Read instructions and decode it.
  ▪ Fetch 10 numbers
  ▪ Fetch 10 numbers
  ▪ Add them
  ▪ Put the results here
Vector computations

• Pipelined ALU
• Parallel ALU
• Parallel Processors
Pipelined ALU
Parallel ALU

Memory -> Input Registers -> ALU -> Output Register

Memory -> Input Registers -> ALU

Memory -> Input Registers -> ALU

Memory -> Input Registers -> ALU
Parallel Processors

SIMD Array Processor
Bus Arbitration

• The device that is allowed to initiate data transfers on the bus at any given time is called bus master. There may be more than one bus master such as processor, DMA controller etc.

• They share the system bus. When the current master relinquishes control of the bus, another master acquire control of bus.

• Bus arbitration is the process by which the next device to become the bus master is selected and bus mastership is transferred to it. The selection of bus master is usually done on the priority basis.
Centralized arbitration

• A single bus arbiter performs the required arbitration. The bus arbiter may be the processor or a separate controller connected to the bus.

• Methods:
  – Daisy chaining
  – Polling
  – Independent request
Daisy chaining
Polling
Independent request

Diagram showing a controller managing requests and grants for bus access logic in masters 1, 2, and N.
References

- Computer Architecture and Organization
  - By A. P. Godse (from books.google.com)
- Computer Organization
  - By Hamacher and Zaky
- Computer Organization and Architecture
  - By William Stallings