I’ve tried to estimate this probability before, and there are at least two factors you didn’t take into account:
- Only probable-primes can be chosen by key generation routines for p and q, ideally with the goal that n=p×q must be a semiprime (the product of exactly two primes).
- There is other logic to try to require n to be larger than some minimum value, so that we don’t end up with a much shorter modulus at random. I think the most common interpretation is “a modulus which takes 2048 bits to write out in binary”, which means a number between 2^2048 and 2^2049. I’m not sure whether that’s the exact interpretation used by all key-generation routines, but running
openssl genrsa 2048 | openssl rsa -modulus -noout | cut -d= -f2 | tr -d '\n' | wc -c many times shows that OpenSSL appears to use this interpretation. (the hexadecimal representation of the modulus is always exactly 512 hex digits long, never, say, 511 or 510)
So the collision probability should be based on the number of semiprimes between 2^2048 and 2^2049 whose factors are roughly equal in size (potentially excluding some of them because the key generation deliberately or accidentally excludes some primes from being used in a key).
I think this is about the square of the number of primes from 2^1024 to sqrt(2^2049) (which is a bit smaller than 2^1025).
I just chose 1000000 integers at random in this range and 1425 of them were prime. If that sample were typical, that would mean there were roughly 106109615035909173991822291315914873021653897686062968341193540595665511543244825485791419726267136989394312557693091048073884629203739655747091199916336469562459901109366621456544966554847548011018687734419813914851962110837880014622865473073207030444050620634256500741057437943967654418934532734104462573 primes in this range, or around 10^305. The square of that number is around 10^610, which is then an extremely rough estimate for the number of 2048-bit RSA moduli.
The individual collisions according to this estimate are about three million times more likely than your estimate, but that’s not very much at this level of improbability.
Another thing to think about is the birthday paradox, relating to the probability that a collision will ever happen in a particular application of RSA.