Chapter 4: Computer Organization and Architecture (Set-9)
A CPU has a 32-bit address bus and is byte-addressable; ignoring reserved regions, the theoretical maximum directly addressable memory space is
A B) 2 GB
B C) 8 GB
C D) 16 GB
D A) 4 GB
With 32 address bits, the CPU can form 232232 unique addresses. If each address refers to 1 byte, total directly addressable space is 232232 bytes = 4,294,967,296 bytes ≈ 4 GB.
If a system uses word-addressable memory where each address refers to a 4-byte word, then a 16-bit address bus can directly address how much memory
A B) 256 KB
B A) 64 KB
C C) 1 MB
D D) 4 MB
A 16-bit address bus provides 216=65,536216=65,536 addresses. If each address is a 4-byte word, total bytes = 65,536 × 4 = 262,144 bytes, which equals 256 KB.
In a 4 KB direct-mapped cache with 16-byte blocks, the number of cache lines is
A A) 64 lines
B B) 128 lines
C C) 256 lines
D D) 512 lines
Cache size is 4 KB = 4096 bytes. Block size is 16 bytes. Number of lines = 4096 / 16 = 256 lines. Each line holds exactly one block in direct mapping.
For the same 4 KB cache with 16-byte blocks, the block offset requires how many bits
A A) 2 bits
B C) 8 bits
C D) 16 bits
D B) 4 bits
Block size is 16 bytes, and 16=2416=24. Therefore, 4 bits are needed to select a byte within the block. These are the block offset bits in the address.
In that 4 KB direct-mapped cache, the index field needs how many bits
A C) 8 bits
B A) 4 bits
C B) 6 bits
D D) 12 bits
There are 256 cache lines, and 256=28256=28. Therefore, 8 bits are needed as index bits to select the cache line. Remaining bits become tag (for a given address size).
In a 4-way set-associative cache of 4 KB with 16-byte blocks, the number of sets is
A A) 16 sets
B C) 64 sets
C B) 32 sets
D D) 256 sets
Total lines = 4096/16 = 256 lines. With 4-way associativity, sets = 256/4 = 64 sets. Each set contains 4 lines where the block may be placed.
In that 4-way set-associative cache, the set index needs how many bits
A A) 4 bits
B C) 8 bits
C D) 10 bits
D B) 6 bits
There are 64 sets, and 64=2664=26. So 6 bits are needed to select the set. Offset remains 4 bits; remaining upper bits are used as the tag.
Which cache organization requires comparing the tag of a memory block with tags in all cache lines during lookup
A B) Fully associative
B A) Direct mapped
C C) Set associative
D D) Write back
Fully associative cache allows a block to be placed anywhere, so the cache must compare the requested tag against all line tags in parallel (or via search hardware), increasing complexity and cost.
A key reason fully associative caches are uncommon for large caches is that they
A A) Need no tags
B C) Increase disk space
C B) Require many comparators
D D) Reduce bus width
Fully associative caches need tag comparisons against many lines, requiring more comparators and hardware. This increases cost, power, and complexity, so set-associative designs are preferred for larger caches.
In virtual memory, a page fault occurs when
A A) Cache hit happens
B C) Disk becomes full
C D) CPU changes opcode
D B) Page not in RAM
A page fault happens when the requested page is not currently in physical memory (RAM). The OS must load it from secondary storage into RAM, which is much slower than a normal memory access.
A major performance cost of a page fault is that it involves
A A) Register transfer only
B B) Disk access delay
C C) Faster cache lookup
D D) Higher clock speed
Handling a page fault requires reading a page from disk or SSD into RAM. Secondary storage access is far slower than RAM, so page faults can cause large execution delays and CPU waiting.
Which concept best explains why caches improve average access time without increasing RAM speed
A B) Monitor resolution
B C) Disk formatting
C D) Network routing
D A) Temporal locality
Temporal locality means recently used data is likely to be used again soon. Cache keeps such data close to the CPU, so repeated accesses hit in cache, reducing average access time.
Spatial locality means that when a memory location is accessed, nearby locations are likely to be
A B) Accessed soon
B A) Never accessed
C C) Stored in ROM
D D) Sent to printer
Spatial locality indicates that programs tend to access memory in nearby addresses, like arrays. Cache fetches blocks containing neighboring bytes, so subsequent accesses often hit in cache.
In a pipeline, a control hazard is most closely related to
A A) Data dependency
B C) Cache replacement
C B) Branch instruction
D D) Disk scheduling
Control hazards arise when the pipeline fetches instructions before knowing the outcome of a branch. If the branch changes flow, the pipeline may need flushing or stalling until the correct path is known.
A typical method to reduce control hazard penalty is
A A) Lower bus width
B C) Remove cache
C D) Reduce word length
D B) Branch prediction
Branch prediction guesses whether a branch will be taken and what the next PC will be. Correct prediction reduces stalls and pipeline flushes, improving throughput in pipelined processors.
A pipeline data hazard can be reduced using forwarding because forwarding
A A) Deletes instructions
B B) Sends results early
C C) Slows down ALU
D D) Removes registers
Forwarding (bypassing) sends an ALU result directly from a pipeline stage to a later stage needing it, without waiting for register write-back. This reduces stalls caused by dependent instructions.
If instruction latency stays the same but throughput increases, the most likely technique used is
A A) Pipelining
B B) Disk compression
C C) File encryption
D D) Screen scaling
Pipelining overlaps instruction stages so more instructions complete per unit time. Individual instruction latency may not reduce, but throughput increases because the pipeline keeps hardware stages busy.
Which metric better reflects performance for continuous workloads with many tasks
A A) Latency
B C) Address width
C D) Word size
D B) Throughput
Throughput measures how much work completes per unit time, such as transactions per second. For continuous workloads like servers or batch processing, throughput is often more important than single-request latency.
Which metric is more critical for a single quick response, like reading one memory value once
A A) Throughput
B C) Cache size
C B) Latency
D D) MIPS
Latency measures the delay before a response begins. For one-off operations, lower latency makes the response faster, whereas throughput matters more when many operations must be processed continuously.
A CPU with same clock speed but better cache hit rate often performs faster mainly because it has
A B) Fewer memory waits
B A) More stalls
C C) Less register use
D D) Smaller bus width
Cache hits serve data quickly. With fewer cache misses, the CPU spends less time stalled waiting for slow RAM or disk, increasing effective IPC and reducing total execution time.
When comparing two CPUs using MIPS, a major limitation is that MIPS
A A) Measures cache size
B C) Equals FLOPS always
C D) Measures bus width
D B) Ignores instruction mix
MIPS counts instructions per second but does not reflect how much work each instruction performs. Different instruction sets and program mixes lead to misleading comparisons across different architectures.
In CISC vs RISC, a key RISC feature that helps pipelining is often
A A) Variable-length instructions
B B) Fixed-length instructions
C C) No registers available
D D) No branch instructions
Many RISC ISAs use fixed-length instructions, simplifying fetch and decode stages. This regularity helps pipeline design and timing, improving throughput compared to variable-length instructions common in CISC designs.
In cache write policies, “write-through” means
A B) Write cache and memory
B A) Write only in cache
C C) Write only to disk
D D) Never update memory
Write-through updates both cache and main memory on each write. This keeps memory consistent but increases memory traffic. Write-back reduces traffic by writing to memory only when evicting a block.
In cache write policies, write-back typically reduces memory traffic because it
A A) Writes every time
B C) Disables cache hits
C D) Doubles bus width
D B) Writes on eviction
Write-back updates cache only during writes and marks blocks dirty. Main memory is updated only when the dirty block is evicted, reducing memory writes and bus usage compared to write-through.
In a write-back cache, a “dirty bit” indicates that a cache block
A A) Is empty now
B C) Has wrong tag
C B) Differs from memory
D D) Has parity error
Dirty bit shows the cache block has been modified since it was loaded. It must be written back to main memory before eviction, ensuring memory eventually reflects the updated data.
A common replacement policy used in set-associative caches to reduce misses is
A B) LRU
B A) FIFO
C C) ASCII
D D) JPEG
LRU (Least Recently Used) replaces the block that has not been used for the longest time. It works well with temporal locality and often improves hit rate compared to simpler policies.
If a cache line size is increased too much, one possible negative effect is
A A) More spatial locality
B C) More address bits
C D) Faster disk access
D B) More wasted bandwidth
Very large blocks can fetch many unused bytes, increasing memory traffic and cache pollution. This can waste bandwidth and reduce effective cache capacity, especially for programs with poor spatial locality.
In a byte-addressable system, the lowest address bits are commonly used as
A A) Tag bits
B C) Set bits
C B) Offset bits
D D) Opcode bits
Low-order address bits select the byte within a block or word. These are offset bits used by cache and memory systems to locate the exact byte position after a block is selected.
In an interrupt, which CPU action helps ensure correct return to the interrupted program
A A) Clears cache
B B) Saves PC and flags
C C) Changes clock speed
D D) Flushes disk buffers
The CPU saves the Program Counter and status flags (and sometimes registers) so it can resume exactly where it left off. After the interrupt service routine, saved state is restored.
A non-maskable interrupt (NMI) is typically used for
A A) Normal keyboard input
B C) Screen refresh control
C D) Printer status check
D B) Critical hardware events
NMI cannot be ignored and is used for urgent conditions like hardware failures, power loss warnings, or severe errors. It ensures the CPU responds immediately to critical events.
In DMA, the CPU is freed because DMA controller handles
A B) Data transfer cycles
B A) Instruction decode
C C) Cache replacement
D D) Register renaming
The DMA controller manages transferring data between I/O device and memory, including address generation and bus control. The CPU only sets up DMA and is interrupted when transfer finishes.
DMA is most beneficial for transferring
A A) Single byte input
B C) One instruction only
C B) Large data blocks
D D) Small register values
DMA setup overhead is worthwhile when moving large blocks like disk buffers or network packets. For tiny transfers, CPU-driven I/O may be simpler, but for large data DMA boosts performance.
In multiprocessor systems, “shared memory” means processors
A B) Access common RAM
B A) Use separate RAM
C C) Share only cache
D D) Share only registers
Shared-memory multiprocessors allow all processors to access the same main memory space. This simplifies communication using shared variables, but requires synchronization to avoid data conflicts.
A major challenge in shared-memory multiprocessors is ensuring
A A) Screen brightness
B C) Printer alignment
C D) Disk formatting
D B) Cache coherence
Cache coherence ensures all processors see consistent values when each has its own cache. Without coherence protocols, one CPU may read stale data while another has updated it in its cache.
In general, increasing the number of CPU cores improves performance most when
A A) Program is single-thread
B B) Program is parallel
C C) Disk is full
D D) Cache is disabled
Only parallel workloads can use multiple cores effectively. If software is single-threaded, extra cores may remain idle, so performance gains depend on how well tasks can be split into threads.
Which best describes why RISC may need more instructions for the same task compared to CISC
A A) No memory access
B C) No control unit
C D) No addressing modes
D B) Simpler instructions
RISC uses simpler instructions that do less per instruction. Complex operations are built by combining multiple simple instructions, which can increase instruction count but often improves pipelining and execution efficiency.
In the instruction cycle, the “effective address” is typically finalized during
A C) Execute stage
B A) Fetch stage
C B) Decode stage
D D) Store stage
For many instructions, address calculation happens during execution using ALU and addressing mode logic. The effective address is then used to access memory for operand read/write as part of execution.
For an indexed addressing mode, the effective address is commonly formed by
A A) Tag plus index
B C) Opcode plus flags
C D) Offset minus cache
D B) Base plus index
Indexed addressing adds an index value (often from a register) to a base address to compute the effective address. This is useful for arrays and table lookups, supporting efficient address calculations.
In a pipeline, structural hazards occur when
A A) Branch taken
B B) Hardware resource conflicts
C C) Data dependency exists
D D) Cache miss happens
Structural hazards arise when multiple pipeline stages need the same hardware resource at the same time, like a shared memory port. This forces stalls unless resources are duplicated or scheduled.
A key benefit of Harvard architecture over von Neumann in basic design is
A B) Separate code and data
B A) No registers needed
C C) No interrupts possible
D D) Only one bus used
Harvard architecture uses separate memory paths for instructions and data, reducing contention and allowing simultaneous instruction fetch and data access. Von Neumann shares one path, which can create bottlenecks.
Von Neumann bottleneck mainly refers to limitation caused by
A A) Small keyboard
B C) Large cache size
C B) Shared bus for code/data
D D) Many registers
In von Neumann architecture, instructions and data share the same memory and bus path. This can limit throughput because instruction fetch and data access compete for the same bandwidth.
In CPU performance, “CPI” refers to
A B) Cache per index
B C) Cores per interrupt
C D) Clocks per input
D A) Cycles per instruction
CPI indicates how many clock cycles an average instruction takes. Lower CPI means fewer cycles per instruction, often improving performance if clock speed and other factors remain similar.
If CPI decreases while clock speed stays same, the CPU’s performance generally
A B) Increases
B A) Decreases
C C) Stays constant always
D D) Becomes slower always
Performance improves when CPI decreases because fewer cycles are needed per instruction. With the same clock rate, more instructions can complete per second, reducing total execution time for programs.
In cache, a “hit” means the requested data
A A) Is on disk
B B) Found in cache
C C) Not in memory
D D) Causes page fault
A cache hit means requested data or instruction is already in cache, so it can be served quickly. This reduces average access time compared to a miss that requires slower RAM access.
In cache, a “miss penalty” is mainly the time to
A A) Print the result
B C) Increase bus width
C D) Change instruction set
D B) Fetch from lower level
Miss penalty is the extra time needed to bring data from a lower memory level like RAM or disk into cache. This penalty causes CPU stalls and reduces effective performance.
A common reason L1 cache is smaller than L2 cache is that L1 must be
A A) Very large capacity
B C) Always non-volatile
C B) Extremely fast
D D) Stored on disk
L1 cache is closest to the CPU core and must have very low latency. To keep it extremely fast, it is kept small. L2 is larger but slightly slower, balancing speed and capacity.
In multicore CPUs, “cache coherence” protocols are needed because
A B) Caches may differ data
B A) Cores share keyboard
C C) RAM never changes
D D) ALU becomes slow
Each core may cache the same memory location. If one core updates it, others must see the updated value. Coherence protocols ensure caches remain consistent and avoid reading stale data.
A CPU that supports simultaneous multithreading (SMT) improves utilization mainly by
A A) Adding more RAM
B C) Removing cache memory
C D) Slowing clock speed
D B) Sharing core resources
SMT allows multiple threads to share execution resources within a core. When one thread stalls for memory, another thread can use idle resources, improving overall throughput and core utilization.
In performance analysis, Amdahl’s law mainly highlights that speedup is limited by
A A) Fastest component only
B B) Serial portion of work
C C) Cache size always
D D) Disk speed only
Amdahl’s law shows overall speedup from parallelism is limited by the fraction that cannot be parallelized. Even with many cores, a serial part of the program sets an upper bound on speedup.
If a program is 80% parallelizable, the maximum theoretical speedup as cores approach infinity is
A B) 5×
B A) 1.25×
C C) 10×
D D) 80×
Amdahl’s law gives maximum speedup as 1/(1−P)1/(1−P). With P=0.8P=0.8, serial fraction is 0.2, so max speedup is 1/0.2=51/0.2=5. Infinite cores cannot exceed this limit.