memcpy on an Intel Core i7 10700K CPU , using GCC 10.2 on Linux kernel 5.10. My assumption is that its speed should be close to the time it takes to transfer one long multiplied by the number of longs being copied. Could memcpy be optimized to exceed this expectation, possibly using SIMD or other CPU specific features?memcpy?