5.1 In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a 32-bit integer.
...
5.1 In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a 32-bit integer.
for (I = 0; I < 8; I++)
for (J = 0; J < 8000; J++)
A[I][J] = B[I][0] + A[J][I];
5.1.1 [5] How many 32-bit integers can be stored in a 16-byte cache block?
5.1.2 [5] References to which variables exhibit temporal locality?
5.1.3 [5] References to which variables exhibit spatial locality?
Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C by storing matrix elements within the same column contiguously in memory.
for I = 1:8
for J = 1:8000
A(I,J) = B(I,0) + A(J,I);
end
end
5.1.4 [5] How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced?
5.1.5 [5] References to which variables exhibit temporal locality?
5.1.6 [5] References to which variables exhibit spatial locality? - ANSWER 5.1.1 [5] How many 32-bit integers can be stored in a 16-byte cache block?
https://drive.google.com/file/d/1LhsMKJsc48EbXZqL7HsSIbF23FRAkfQR/view?usp=sharing
5.1.2 [5] References to which variables exhibit temporal locality?
Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C by storing matrix elements within the same column contiguously in memory.
for I = 1:8
for J = 1:8000
A(I,J) = B(I,0) + A(J,I);
end
end
5.1.4 [5] How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced?
https://docs.google.com/document/d/1jx4qQAGnxk_OoQo7sunqA3q52AUvcMSEIy4OLyBIKFE/edit?usp=sharing
5.1.5 [5] References to which variables exhibit temporal locality?
Locality Quiz - Georgia Tech - HPCA: Part 3 - ANSWER https://www.youtube.com/watch?v=z9LHetPW0Vs&list=PLn4mZps3Wx0_6thXcBr99-Y4n49t_lIb0&index=2&t=0s
5.2 Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses.3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
5.2.1 [10] For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.
5.2.2 [10] For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty. - ANSWER 5.2.1 [10] For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.
https://drive.google.com/file/d/1584dgRGktv0oeJysRWcdYaJwcx37RepX/view?usp=sharing
5.2.2 [10] For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.
https://drive.google.com/file/d/15Th2yjeNU_zyNCQva_dsk35arcHaR0lJ/view?usp=sharing
5.2 Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
5.2.3 You are asked to optimize a cache design for the given references. There are three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss rate, which cache design is the best? If the miss stall time is 25 cycles, and C1 has an access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the best cache design? - ANSWER
5.2.4 [15] Calculate the total number of bits required for the cache listed above, assuming a 32-bit address. Given that total size, find the total size of the closest direct-mapped cache with 16-word blocks of equal size or greater. Explain why the second cache, despite its larger data size, might provide slower performance than the first cache.
5.2.5 [20] Generate a series of read requests that have a lower miss rate on a 2 KiB 2-way set associative cache than the cache listed above. Identify one possible solution that would make the cache listed have an equal or lower miss rate than the 2 KiB cache. Discuss the advantages and disadvantages of such a solution.
5.2.6 [15] The formula (Block address) modulo (Number of blocks in the cache) shows the typical method to index a direct-mapped cache. Assuming a 32-bit address and 1024 blocks in the cache, consider a different indexing function, specifically (Block address[31:27] XOR Block address[26:22]). Is it possible to use this to index a direct-mapped cache? If so, explain why and discuss any changes that might need to be made to the cache. If it is not possible, explain why. - ANSWER ojo
5.3 For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.
https://drive.google.com/file/d/1D6Og46lAEiIqEn4_HN4S-O8pqY2EZrKt/view?usp=sharing
5.3.1 [5] What is the cache block size (in words)?
5.3.2 [5] How many entries does the cache have?
5.3.3 [5] What is the ratio between total bits required for such a cache implementation over the data storage bits?
Starting from power on, the following byte-addressed cache references are recorded.
https://drive.google.com/file/d/1z_n-aFvx5QP9y6GQ9ChI_EwC6fn_hcsD/view?usp=sharing
5.3.4 [10] How many blocks are replaced?
5.3.5 [10] What is the hit ratio?
5.3.6 [10] List the final state of the cache, with each valid entry represented as a record of . - ANSWER https://www.chegg.com/homework-help/questions-and-answers/53-direct-mapped-cache-design-32-bit-address-following-bits-address-used-access-cache-tag--q28324666
5.4 Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:
https://drive.google.com/file/d/1PbmOvJtnFdIkraqltHpnepAodFdIun0c/view?usp=sharing
5.4.1 [5] Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.
5.4.2 [20] Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.
5.4.3 [20] For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.
Consider the following program and cache behaviors.
https://drive.google.com/file/d/15ostiXsIERCfNi6pDWFX35NoLg6ZAueq/view?usp=sharing
5.4.4 [5] For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?
5.4.5 [5] For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?
5.4.6 [5] What are the minimal bandwidths needed to achieve the performance of CPI = 1.5? - ANSWER https://www.chegg.com/homework-help/questions-and-answers/6-recall-two-write-policies-two-write-allocation-policies-combinations-implemented-either--q19793126
https://www.coursehero.com/file/22662573/Assignment-5/#/doc/qa
Memoria Caché (Arquitectura de Computadoras "A" ) - ANSWER https://www.youtube.com/watch?v=uK5ZpWRCT2s
Estructura de Computadores - 4.2 Memoria de Caché - José Luis Abellán Miguel - ANSWER https://www.youtube.com/watch?v=AeJSk6Q9Tvg
4)
Which block in the cache is replaced by memory block 29?Cache configuration: 4-way set-associative cache with 8-one word blocksReplacement scheme: LRUSequence of previously accessed block addresses: 5, 13, 21, 13, 5(Note: All memory block
[Show More]