References
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
All APIs for Intel intrinsics with examples
https://developer.arm.com/architectures/instruction-sets/intrinsics
All APIs for ARM intrinsics.
15-418/15-618: Parallel Computer Architecture and Programming, Spring 2018: Schedule
https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15418-s18/www/schedule.html
How to Write Fast Code 18-645 (CMU, ECE)
https://users.ece.cmu.edu/~pueschel/teaching/18-645-CMU-spring08/course.html
http://spcl.inf.ethz.ch/Teaching/2018-dphpc/lectures/lecture8-simd.pdf
- SSE:
128-bit
__m128
,__m128d
_mm_load_ps
,_mm_add_pd
- AVX:
256-bit
__m256
,__mm256d
_mm256_load_ps
,_mm256_add_pd
- AVX512:
512-bit