akash verma: SMU ASSIGNMENT OF MCA 2ND SAM ASSIGNMENT OF : MCA 2050- COMPUTER ARCHITECTURE

ASSIGNMENT OF : MCA 2050- COMPUTER ARCHITECTURE

Question No 1. What is the difference between process and thread?
Process Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
Thread
A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.

Question No 2. Explain in detail the techniques to handle hazards.

Structural hazards can occur when functional unit not fully pipelined (initiation interval > 1) → need to add interlocking Possibly more than one register write per cycle → either add ports to register ﬁle or treat conﬂict as a hazard and stall Possible hazards between integer and FP instructions → use separate register ﬁles Stalls for hazards become longer and more frequent
WAW hazards are possible → stall second instruction or prevent ﬁrst instruction from writing
Structural hazards Stalling: pipeline interlock Code scheduling
Data hazards Stalling: pipeline interlock Forwarding Load delay Stalling: pipeline interlock Code scheduling: ﬁll the load delay slot
Control Hazards Early branch resolution Stalling: Flushing the pipeline Delayed branch Predict non taken (or taken) Static branch prediction Dynamic branch prediction
Floating point operations take multiple cycles in EXE Assume a system with 1 int ALU, 1 FP/int multiplier, 1 FP adder, 1 FP/int divider
Instruction latency: cycles to wait for the result of an instruction e.g. 0 cycles for integer ALU since no wait is necessary Usually the number of cycles for the execution pipeline minus 1 Instruction initiation interval: time to wait to issue another instruction of the same type Not equal to number of cycles, if multicycle operation is pipelined or partially pipelined Examples: Integer ALU: 1 EXE cycle → latency = 0; initiation interval = 1
FP add, fully pipelined: 4 EXE cycles → latency = 3; initiation interval = 1 FP divide, not pipelined: 25 EXE cycles → latency = 24; initiation interval = 25
Question No.3 Explain the Tumasulo approach. Write the differences between Tomasulo’s scheme and score boarding.
Tomasulo's approach : A technique to allow execution to proceed in the presence of hazards . This was first introduced in the IBM 360/91. Applied only to floating - point operations (including FP loads & stores). We have already seen that the compiler can rename registers (statically) to avoid WAW and WAR hazards. Tomasulo's scheme performs this function dynamically. It buffers operands of instructions waiting to issue, fetching them as soon as they are available, avoi ding the register file. The register specifiers of instructions are renamed to reservation station numbers as they are issued,eliminating WAW and WAR hazards. Tomasulo's approach Differences between scoreboarding and Tomasulo's approach:
Register renaming
Register renaming is used to eliminate WAR and WAW hazards. In contrast to scoreboarding which must wait for WAR and WAW hazards to clear.
Distributed control
Hazard detection and execution control are distributed to each functional unit. In contrast to scoreboarding, in which it is centralized.
Common Data Bus
Is used to forward results directly to the functional units without going through the register file. Tomasulo's approach Reservation stations: The reservation stations are the heart of Tomasulo's approach. They are located at each functional unit and determine when an instruction can begin execution.
Tomasulo's approach Operation steps:
Issue
Take an instruction from the FP operation's queue. If there's a station available for it, send the instruction to the station. Otherwise, stall for a structural hazard. Also, this step checks to see if the source operands will be produced by a current instruction. If so, renaming is performed by checking to see if the desired register is being written by an instruction already at a reservation station. Tomasulo's approach The reservation station fields:
Operation (Op)
The operation to be performed.
Operand sources (Q j , Q k )
The reservation stations that will produce the values for the two operands. A 0 in either slot means the source operand is already in V j or V k , or that the slot is not needed.
Operand values (V j , V k )
The values for the two operands. They are valid iff the corresponding Q is 0.
Busy
Indicates the reservation station and the accompanying functional unit are busy. Tomasulo's approach Other components: The register file and store buffer have a field for all registers: Q i . This is the number of the reservation station that contains the operation that will eventually write the register/buffer. If no operation is pending, this value is 0 (blank). Consider the same code sequence:
Tomasulo's approach
o
An example

o
The information in the instruction status table is actually distributed in the hardware.
Question No 4. Explain any five types of vector instructions in in detail.
1. Vector scalar instructions:- Using these instructions, a scalar operand can be compined with a vector one. If A an B vector registers and F Is a function that performs some operation on each element of a single or two vector operands a vector scalar operand can be defined as follow Ai: =f(scalar,Bi) 2. Vector –vector instructions:- Using these instructions, one or two vector operands are fetched from respective vector registers and produce results in another vector register. If A,B and C are three vector registers a vector-vector operand can be defined as follows Ai:=(Bi,Ci) 3. Vector-memory instructions:- these instructions correspond to vector load or vector store.The vector load can be defined as follows A: F(M) WHERE M IS A MEMORY REGISTER THERE VECTOR STORE STORE CAN BE FEFINED AS FOLLOWS : M=F(A)
4. Gather and scatter instructions : Gather is an operation that fetches the non zero elements of a sparse vector from memory as defined below
A x Vo:=f (M)
Scatter stores a vector in a sparse vector into memory as defined below
M: = f(A x Vo)
5 Masking instructions : these instructions use a mask vector to expand or compress a vector as defined below:
V = F (A x VM ) where V is a mask vector.

Question No.5 DIFFERENCE BETWEEN MULTI PROCESSOR AND MULTI COMPUTER
MULTI PROCESSOR
1.A computer made up of several computers. similar to parallel computing.
2. Distributed computing deals with hardware and software systems containing more than one processing element, multiple programs, running under a loosely or tightly controlled regime.
3. multicomputer have one physical address space per CPU.
4. It can run faster
5. A multi-computer is multiple computers, each of which can have multiple processors. Used for true parallel processing
.
MULTI COMPUTER
1.A multiprocessor system is simply a computer that has more than one CPU on its motherboard.
2. Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system.

3. Multiprocessors have a single physical address space (memory) shared by all the CPUs

4. A multiprocessor would run slower, because it would be in ONE computer.
5. A multi-processor is a single system with multiple CPU's.
Question No 6. Write short notes on:
1. DSP Processor 2. Dual core technology
DSP PROCESSOR:
A digital signal processor (DSP) is a specialized microprocessor (or a SIP block), with its architecture optimized for the operational needs of digital signal processing.[1][2]
The goal of DSPs is usually to measure, filter and/or compress continuous real-world analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but dedicated DSPs usually have better power efficiency thus they are more suitable in portable devices such as mobile phones because of power consumption constraints.[3] DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time.
DUAL CORE TECHNOLOGY:
A multi-core processor is a single computing component with two or more independent actual central processing units (called "cores"), which are the units that read and execute program instructions.[1] The instructions are ordinary CPU instructions such as add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to
parallel computing.[2] Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package.
Processors were originally developed with only one core. In the mid 1980s Rockwell International manufactured versions of the 6502 with two 6502 cores on one chip as the R65C00, R65C21, and R65C29,[3][4] sharing the chip's pins on alternate clock phases. Other multi-core processors were developed in the early 2000s by Intel, AMD and others.
Multicore processors may have two cores (dual-core CPUs, for example AMD Phenom II X2 and Intel Core Duo), four cores (quad-core CPUs, for example AMD Phenom II X4, Intel's i5 and i7 processors), six cores (hexa-core CPUs, for example AMD Phenom II X6 and Intel Core i7 Extreme Edition 980X), eight cores (octo-core CPUs, for example Intel Xeon E7-2820 and AMD FX-8350), ten cores (for example, Intel Xeon E7-2850), or more.
A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches, and they may implement message passing or shared memory inter-core communication methods. Common network topologies to interconnect cores include bus, ring, two-dimensional mesh, and crossbar. Homogeneous multi-core systems include only identical cores, heterogeneous multi-core systems have cores that are not identical. Just as with single-processor systems, cores in multi-core systems may implement architectures such as superscalar, VLIW, vector processing, SIMD, or multithreading.
Multi-core processors are widely used across many application domains including general-purpose, embedded, network, digital signal processing (DSP), and graphics.
The improvement in performance gained by the use of a multi-core processor depends very much on the software algorithms used and their implementation. In particular, possible gains are limited by the fraction of the software that can be run in parallel simultaneously on multiple cores; this effect is
described by Amdahl's law. In the best case, so-called embarrassingly parallel problems may realize speedup factors near the number of cores, or even more if the problem is split up enough to fit within each core's cache(s), avoiding use of much slower main system memory. Most applications, however, are not accelerated so much unless programmers invest a prohibitive amount of effort in re-factoring the whole problem.[5] The parallelization of software is a significant ongoing topic of research.
………………………………………………………………………………………………………………………………………… ……………………………

2 comments:

sunny singh14 March 2016 at 07:14
Thanks bro,Gud job keep it up
James Banda26 May 2021 at 00:24
Wow, great post. I want to draft like this - taking time and real hard work to make a great article. This post has encouraged me to write some posts that I am going to write soon. With this post, I am happy to share an profile with a new information which mic test.

Wednesday 8 July 2015

SMU ASSIGNMENT OF MCA 2ND SAM ASSIGNMENT OF : MCA 2050- COMPUTER ARCHITECTURE

2 comments: