Prefetching occurs when a processor requests an instruction or data block from main memory before it is actually needed. Once the block comes back from memory, it is placed in a cache. When the instruction/data block is actually needed, it can be accessed much more quickly from the cache than if it had to make a request from memory. Thus, prefetching hides memory access latency and hence, it is a useful technique for addressing the memory wall issue.
Since programs are generally executed sequentially, performance is likely to be best when instructions are prefetched in program order. Alternatively, the prefetch may be part of a complex branch prediction algorithm, where the processor tries to anticipate the result of a calculation and fetch the right instructions in advance. In the case of dedicated hardware (like a Graphics Processing Unit) the prefetch can take advantage of the spatial coherence usually found in the texture mapping process. In this case, the prefetched data are not instructions, but texture elements (texels) that are candidates to be mapped on a polygon.
The first mainstream microprocessors to use some form of instruction prefetch were the Intel 8086 (six bytes) and the Motorola 68000 (four bytes). In recent years, many high-performance processors use prefetching techniques.
Types of prefetching
Data or instruction prefetching
As the name implies, the prefetching can be performed for either data blocks or instruction blocks. Since data access patterns show less regularity than instruction patterns, accurate data prefetching is generally more challenging than instruction prefetching.
Hardware or software prefetching
Prefetching can be performed in either hardware or software. Hardware prefetchers may use some storage to detect access patterns and based on it, prefetch instructions are issued. Software prefetchers insert prefetch instructions in program source-code based on knowledge of program control flow.
Prefetch degree is the number of cache lines prefetched in each prefetching operation.
Prefetch distance shows how far ahead of the demand access stream, the data blocks are prefetched.
A prefetch operation is useful or useless depending on whether the item brought by it removes or does not remove a future cache miss. A prefetch operation is harmful if the item brought by it replaces a useful block and thus, possibly increases the cache misses. Harmful prefetches lead to cache pollution. A prefetch operation is redundant if the data-block brought by it is already present in cache.
- "A Survey of Recent Prefetching Techniques for Processor Caches", ACM Computing Surveys, 2016.
- Halstead, Robert; Ward, Stephen (1989). Computation Structures. MIT Press. p. 812. ISBN 0-262-23139-5.
- David Callahan; Ken Kennedy; Allan Porterfield (April 1991). Software prefetching. 4th Conference on Architectural Support of Programming Languages & Operating Systems. New York, NY, USA: ACM. pp. 40–52. doi:10.1145/106972.106979. ISBN 0-89791-380-9. Retrieved 2010-11-21.
- Chi-Keung Luk; Todd C. Mowry (October 1996). Compiler-based prefetching for recursive data structures. 7th Conference on Architectural Support of Programming Languages & Operating Systems. New York, NY, USA: ACM. pp. 222–233. doi:10.1145/237090.237190. ISBN 0-89791-767-7. Retrieved 2010-11-21.
- Abdel-Hameed Badawy; Aneesh Aggarwal; Donald Yeung; Chau-Wen Tseng (July 2004). "The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems" (PDF). The Journal of Instruction-Level Parallelism. 6. ISSN 1942-9525.
- "Data Prefetch Support". GCC online documentation. Free Software Foundation, Inc. Retrieved 2016-04-25.