Chapter 4: Computer Organization and Architecture (Set-10)
A system is byte-addressable and uses 48-bit virtual addresses; the theoretical size of its virtual address space is
A 1 PB
B 16 TB
C 256 TB
D 4 GB
A 48-bit address space contains 248248 unique byte addresses. 248248 bytes equals 256 terabytes. Real usable space can be smaller due to OS and hardware limits, but the theoretical space is 256 TB.
A cache has 32 KB size with 64-byte blocks; the number of cache lines is
A 128 lines
B 512 lines
C 256 lines
D 1024 lines
32 KB equals 32×1024 = 32768 bytes. Each line holds one 64-byte block. Lines = 32768/64 = 512. This value is independent of mapping type; associativity changes sets, not total lines.
For 64-byte cache blocks, how many block offset bits are required
A 4 bits
B 5 bits
C 6 bits
D 8 bits
Block size 64 bytes equals 2626. Offset selects a byte within the block, so 6 low-order address bits are used as offset. Remaining bits are split into index and tag.
A 32 KB 4-way set-associative cache with 64-byte blocks has how many sets
A 128 sets
B 64 sets
C 256 sets
D 512 sets
Total lines = 32 KB / 64 B = 512 lines. With 4-way associativity, each set has 4 lines. Sets = 512/4 = 128 sets. Index bits select among these sets.
In a 4-way set-associative cache with 128 sets, the set index bits count is
A 6 bits
B 8 bits
C 10 bits
D 7 bits
128 sets equals 2727. Therefore, 7 bits are needed to select the set. Offset bits depend on block size; the remaining higher bits form the tag field.
In a write-back cache, a dirty block must be written to memory
A On every write
B On eviction only
C On every read
D Never written
Write-back updates only the cache during writes and marks the block dirty. Main memory is updated only when the dirty block is evicted, reducing memory traffic but requiring dirty-bit tracking.
In write-through caching, the biggest disadvantage compared to write-back is usually
A More cache hits
B Less memory traffic
C More memory writes
D No tag checks
Write-through writes to main memory on each store, increasing bus traffic and memory writes. This keeps memory consistent but can reduce performance when programs perform frequent write operations.
A page fault service time is large mainly because it requires
A Register access
B ALU operation
C Cache hit check
D Disk or SSD access
A page fault means the required page is not in RAM. The OS must fetch it from secondary storage, which is far slower than RAM, causing a big delay compared to normal memory access.
The most direct purpose of a TLB in virtual memory systems is to
A Increase disk capacity
B Speed address translation
C Lower clock speed
D Replace cache memory
A Translation Lookaside Buffer caches recent virtual-to-physical address translations. This reduces extra memory accesses needed for page table lookup, improving average memory access time.
A TLB miss typically causes the CPU to
A Consult page table
B Skip the instruction
C Flush the cache
D Reset program counter
If translation is not in TLB, the system must look up the page table to find the physical frame number. This is slower than a TLB hit and may require additional memory accesses.
In a pipeline, a structural hazard happens when
A Branch changes flow
B Data dependency exists
C Hardware resource conflicts
D Cache hit occurs
Structural hazards occur when two pipeline stages need the same hardware at the same time, like one memory port. Without duplicated resources, one instruction must stall, lowering throughput.
A control hazard is mostly caused by
A Register shortage
B Branch instruction
C Cache replacement
D DMA transfer
Branches change the next instruction address. Until the branch decision is known, the pipeline may fetch wrong-path instructions, requiring stalling or flushing, which reduces pipeline efficiency.
A data hazard occurs when
A ALU is too fast
B Disk is full
C Instruction depends on result
D Bus is too wide
Data hazards occur when an instruction needs a value that a previous instruction has not yet produced or written back. Pipelines handle this using stalls, forwarding, or reordering techniques.
Forwarding reduces data hazard stalls because it
A Bypasses register write-back
B Delays all writes
C Removes branching
D Increases ROM size
Forwarding sends results from an execution stage directly to a later stage needing them, without waiting for the result to be written into the register file. This prevents unnecessary stalls in dependent sequences.
Branch prediction improves pipeline performance mainly by
A Increasing cache size
B Changing ISA rules
C Lowering bus width
D Reducing control stalls
If the predictor guesses the correct next PC, the pipeline keeps fetching useful instructions without waiting. Correct predictions reduce flushes and stalls caused by branches, improving throughput.
In a von Neumann design, the main bottleneck arises because
A Separate memories exist
B Same path for code/data
C ALU has no flags
D Registers are missing
Von Neumann uses the same memory and bus path for both instructions and data. Instruction fetch and data access compete for bandwidth, limiting throughput; Harvard reduces this by separating paths.
Harvard architecture reduces contention mainly by
A No CPU clock
B No interrupts used
C Separate code/data paths
D No memory hierarchy
Harvard uses separate instruction and data memories or buses. This allows simultaneous instruction fetch and data access, reducing contention and often improving throughput in embedded and DSP designs.
CPI is best defined as
A Cycles per instruction
B Cores per interrupt
C Cache per index
D Clocks per input
CPI measures the average number of clock cycles needed for each instruction. Lower CPI generally improves performance if clock speed is unchanged, but CPI depends on pipeline, cache, and instruction mix.
If CPI falls from 2.0 to 1.5 while clock stays same, performance change is about
A 25% faster
B 33% faster
C 50% slower
D No change
Execution time is proportional to CPI for a fixed instruction count and clock rate. Speedup = old/new = 2.0/1.5 = 1.333…, so performance improves by about 33%.
Amdahl’s law states overall speedup is limited mainly by the
A Cache size only
B Disk speed only
C Serial program part
D Bus width only
Even with many cores, any portion that cannot be parallelized must run serially, limiting total speedup. Amdahl’s law quantifies this upper bound using the serial fraction.
If 90% of a program is parallelizable, the maximum speedup with infinite cores is
A 2×
B 5×
C 90×
D 10×
Amdahl’s law maximum speedup is 1/(1−P)1/(1−P). With P=0.9P=0.9, serial fraction is 0.1, so maximum speedup is 1/0.1=101/0.1=10, even with unlimited processors.
In shared-memory multiprocessors, cache coherence is needed because
A RAM never changes
B Caches may hold stale data
C CPU has no registers
D Bus has no control lines
Each core may cache the same memory location. If one core updates it, others must see the new value. Coherence protocols ensure consistent views and prevent reading outdated cached data.
The main purpose of an instruction set (ISA) is to define
A Physical motherboard design
B Screen resolution rules
C Software-visible behavior
D Disk partition format
ISA defines instructions, registers, addressing modes, and data formats available to software. Compilers target the ISA, and hardware must implement it for programs to run correctly.
A typical RISC advantage for pipelining is that RISC often uses
A Fixed-length formats
B Variable-length formats
C No load/store
D No branch instructions
Fixed-length instructions make fetch and decode more regular and predictable, simplifying pipeline timing. Variable-length instructions can complicate decoding and alignment, making pipelining harder.
In classic RISC, memory is usually accessed using
A Any instruction type
B ALU instructions only
C Load/store only
D Branch instructions only
RISC load/store architecture restricts memory access to dedicated load and store instructions. ALU operations typically use registers, simplifying datapath and helping pipeline efficiency.
In cache terms, miss penalty mainly depends on
A Printer speed
B Lower-level access time
C Monitor refresh rate
D Keyboard latency
Miss penalty is the extra time to fetch data from the next lower memory level, such as RAM or disk. Higher latency or slower bandwidth in lower levels increases the miss penalty.
Increasing block size can improve hit rate due to
A Spatial locality
B Temporal locality
C Disk locality
D Printer locality
Larger blocks bring nearby bytes into cache. If the program soon accesses neighboring addresses, it benefits from spatial locality and hits in the fetched block, improving hit rate up to a point.
Too large a cache block may hurt performance mainly by causing
A More tag bits
B Faster RAM access
C More cache pollution
D Less miss penalty
Large blocks may load many unused bytes, wasting bandwidth and evicting useful data. This cache pollution reduces effective capacity and can increase misses for workloads with weak spatial locality.
In cache mapping, conflict misses are most common in
A Fully associative
B Direct mapped
C Larger RAM
D Write-through only
Direct-mapped cache forces each block to one cache line. Different blocks mapping to the same line conflict and replace each other, causing conflict misses that associative designs reduce.
In set-associative caches, LRU replacement tends to work well mainly due to
A Spatial locality only
B Disk caching
C CPU word length
D Temporal locality
Temporal locality means recently used blocks are likely to be used again soon. LRU keeps recently used blocks in cache and evicts the least recently used, often improving hit rates.
In a byte-addressable system, the lowest address bits are used for
A Tag selection
B Set selection
C Block offset
D Opcode decode
Low-order bits choose the byte within a cache block or within a word. Higher bits are used as index (set) and tag. Offset bits depend on block size.
A non-maskable interrupt is typically used for
A Critical hardware faults
B Normal timer ticks
C Keyboard typing
D Printer ready
NMI is reserved for urgent events like power failure warnings or severe hardware errors. It cannot be ignored or disabled by normal masking, ensuring immediate CPU attention.
In interrupt handling, saving PC and flags is required mainly to
A Speed up DMA
B Resume correctly later
C Increase cache hits
D Change word length
PC and flags record where execution stopped and the CPU state. After servicing the interrupt routine, restoring these values allows the CPU to return to the exact instruction flow safely.
A microcontroller differs from a microprocessor mainly because microcontrollers usually
A Have no control unit
B Use no ROM
C Include on-chip peripherals
D Use no ALU
Microcontrollers integrate CPU, RAM/ROM, and I/O peripherals on one chip. This supports embedded control tasks with compact design and low cost, unlike microprocessors needing external components.
SMT improves utilization mainly because it
A Fills idle execution slots
B Doubles cache always
C Removes branch hazards
D Lowers RAM latency
Simultaneous multithreading lets multiple threads share a core’s execution resources. When one thread stalls for memory, another can use idle units, improving throughput and overall resource utilization.
A “cache hit” means the data is found in
A Disk storage
B Page file only
C Cache memory
D ROM firmware
A cache hit indicates the requested block is already in cache, so it is returned quickly. Hits reduce average access time and help avoid stalls that happen on cache misses.
A “TLB hit” means the required translation is found in
A Hard disk
B TLB cache
C ALU register
D Control bus
A TLB hit means the virtual-to-physical mapping is available in the TLB. This avoids page table lookup, reducing translation overhead and improving effective memory access speed.
If both TLB miss and page fault happen, the main reason for large delay is
A Extra ALU steps
B Larger bus width
C More registers used
D Disk read required
A page fault requires fetching the missing page from secondary storage into RAM. Disk/SSD access dominates the delay, making it far slower than TLB miss alone, which only requires page table lookup.
In a CPU, instruction throughput increases most directly when
A RAM decreases
B Disk increases
C CPI decreases
D ROM decreases
With fixed clock rate, throughput in instructions per second increases when cycles per instruction (CPI) decreases. Lower CPI means more instructions complete per unit time, improving overall performance.
If clock speed increases but CPI increases equally, the instruction throughput
A Increases a lot
B Stays roughly same
C Drops to zero
D Doubles always
Instruction throughput is proportional to clock rate divided by CPI. If both increase proportionally, their ratio stays similar. Real performance may still vary due to memory effects and workload behavior.
In DMA operations, which component temporarily becomes bus master to control transfers
A CPU control unit
B Output unit
C Cache memory
D DMA controller
During DMA, the DMA controller takes control of the bus to move data between memory and the device. This “bus mastering” reduces CPU involvement in data transfers.
In caches, “write allocate” means on a write miss the cache
A Loads block into cache
B Ignores the write
C Writes only to disk
D Flushes entire cache
Write allocate fetches the missed block into cache, then performs the write in cache. This works well with write-back policies, improving locality for future writes to the same block.
A common pairing is “write-back” with
A No allocation
B Read through
C Write allocate
D Disk paging
Write-back often pairs with write allocate: on a write miss, the block is brought into cache and modified, marked dirty, and written to memory later on eviction. This reduces repeated memory writes.
Which cache miss type happens when the cache is too small for the working set
A Compulsory miss
B Capacity miss
C Conflict miss
D TLB miss
Capacity misses occur when the cache cannot hold all needed blocks, even with full associativity. The working set exceeds cache size, causing evictions and later re-fetching from lower memory levels.
The first access to a block causing a miss is called
A Conflict miss
B Capacity miss
C Compulsory miss
D Dirty miss
Compulsory (cold) misses happen the first time a block is accessed because it has never been in cache before. They occur even with infinite cache and are reduced by prefetching.
In a multi-level cache, L1 is usually smaller than L2 mainly because L1 must be
A Non-volatile
B Very high capacity
C Located on disk
D Extremely low latency
L1 cache is closest to the core and must be very fast. To keep latency minimal, it is kept small. Larger caches like L2 trade slightly higher latency for more capacity.
A CPU core stalling frequently on memory suggests improvement from
A Smaller cache
B Better cache hierarchy
C Lower address width
D Fewer registers
Frequent stalls often come from cache misses and high memory latency. Improving cache size, associativity, or levels can increase hit rate and reduce stalls, improving effective IPC and throughput.
A classic advantage of microprogrammed control is that it
A Removes control unit
B Eliminates memory access
C Simplifies adding instructions
D Eliminates registers
Microprogrammed control stores control sequences as microinstructions. This makes it easier to implement complex instructions and modify instruction behavior, though it can be slower than hardwired control in some designs.
Hardwired control is often faster than microprogrammed control because it
A Uses fixed logic gates
B Uses disk storage
C Uses larger RAM
D Uses more paging
Hardwired control generates control signals through fixed combinational logic, avoiding microinstruction fetch. This can speed up control signal generation, though it is less flexible than microprogrammed design.
The best statement about “hard” CPU performance questions is that real speed depends on
A Clock only
B Many architecture factors
C MIPS only
D Cache only
Real performance depends on clock rate, CPI/IPC, cache hierarchy, memory latency, branch prediction, pipeline hazards, and workload nature. No single metric alone predicts performance accurately across different systems.