References
- https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html - All APIs for Intel intrinsics with examples 
- https://developer.arm.com/architectures/instruction-sets/intrinsics - All APIs for ARM intrinsics. 
- 15-418/15-618: Parallel Computer Architecture and Programming, Spring 2018: Schedule - https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15418-s18/www/schedule.html 
- How to Write Fast Code 18-645 (CMU, ECE) - https://users.ece.cmu.edu/~pueschel/teaching/18-645-CMU-spring08/course.html 
- http://spcl.inf.ethz.ch/Teaching/2018-dphpc/lectures/lecture8-simd.pdf 
- SSE:
- 128-bit 
- __m128,- __m128d
- _mm_load_ps,- _mm_add_pd
 
- AVX:
- 256-bit 
- __m256,- __mm256d
- _mm256_load_ps,- _mm256_add_pd
 
- AVX512:
- 512-bit