Impact of Using Extended Registers (r8, r9, etc.) on Code Size and Performance

I have a question about using the extended registers as in r8, r9, r10, .... , I want to use them frequently in my program, but I've noticed a problem with code size.

For instance:

mov eax, DWORD [rdi+4]

☝️This generates the following machine code: 8b 47 04 that's 3 bytes

But when I use an extended register like r9, it becomes:

mov eax, DWORD [r9+4]

Which now generates : 41 8b 41 04, which has an extra byte prolly due to the prefix 41 , making it 4 bytes now

Why does using rN registers ie r8, r9, etc, result in larger code size compared to using lower registers like rdi?

Apart from code size, are there other performance related concerns , it's it's cache performance or execution cycles or others when using the rN registers instead of the lower registers?
Was this page helpful?