Internationalized Domain Names

I would be happy if you’d also support Romanian characters șțăîâȘȚÂĂÎ in domain names. Romanian registrar ROTLD announced a few months ago it supports these characters in domain names.

Why would it matter if a certain character looks like another when it comes to certificates? As per a recent blog, Let’s Encrypt isn’t going to have any direct malware and phishing oversight except as to query an external API for any known URIs. I’m no expert on issuing certificates but I’m interested in knowing what potential problems there would be in supporting requests to issue certificates to punycode encoded FQDNs.

well the problem is that the browser know what IDN is what punycode and I believe the browser will trust the IDN as IDN and puny isnt just a plan server-set alias or something that can change over time, it is made by spec. and so it should be obvious that certain charsets shouldnt be mixed as well. there are certain standard levels and I think it would be okay to say greek latin and cyrillic may not mix the rest should be okay (but that should also e managed by the registrars that way.

also the cert policy says iirc that it’s forbidden to create confuging names that could impersonate others

For the purpose of IDN, there is only one charset: Unicode. Charset mixing shouldn't be an issue when processing FQDNs. I don't know the details of the certificate specs although I would assume the aliases will be written in Unicode and/or IDN in the issued certificate so charsets shouldn't be (much of) a problem there.

I don't think the other CAs enforce such a policy in such a manner. LE is going to check each non-IDN domain name against Google's API of bad domains names so why can't the same thing apply to IDN domains? LE is making a good faith effort to enforce the policy without resorting to overly restrictive limitations on what domains can be registered and which not.

If the matter is purely potential confusion of humans, LE could mark FQDNs with suspicious characters and execute extra checks against that API at regular intervals and stop the extra checks after a certain time limit if it doesn't appear on that list (only checking when there are renewal requests from then on).

1 Like

rather than charset I probably sould have said character classes.
and there are pseudo-standards against this.emphasized text


and here are the standards I talked about:
http://www.unicode.org/reports/tr39/#Restriction_Level_Detection

If visually ambiguous characters are going to be the justification for not supporting IDN then ASCII domain names should also not be supported as there are also visually ambiguous ASCII characters.

Seems like a pretty weak and inconsistent justification to me.

Depending on the URL display font these characters can look very much like each other. Especially so when they are not right next to each other.

B8 g9 l1 O0 qg

1 Like

well at least fir those it can be easily seen if you use thr right front (especially monospace fonts help here.

also there’s a difference between similar and looking exactly the same.

I understand issue with homoglyphs in internationalized domains, but what about internationalized country code top-level domains? For example, .рф, .বাংলা, .中国, they don’t have homoglyphs problem.

I think if we would use restriction level highly restrictive of the Unicode restriction levels that shouldnt really be a problem, and that is also used for example when GMaill works with IDN Emails.

I would like to add upon the request to support IDN. I don’t think it’s letsencrypt’s job to restrict domain names that are completely valid in their syntax, even though they might be misused. In any case if people register domains to misuse them a block by letsencrypt won’t stop them.

I’m using an IDN for legit reasons (my name) and I would like to be able to encrypt via letsencrypt as well.

Any chance to receive support with the public beta?

1 Like

well a certain part of the IDN should be restricted, like using cyrillic, greek and latin together, so the identical symbol attack can be prevented, I mean what browser will add LE if they literally allow a cert for paypal.com but with a cyrillic a?

but I wanna know how the icann handles IDNs and homoglyphs in gTLDs, I mean with gTLDs the ICANN holds the rules, for example country names are forbidden (like germany.com)

I don’t see why LE should limit hostnames to anything but anything but hosnames that can be looked up via dns.

Yes there is a potential issue with eg cyrilic a and ascii a, but in my view hat is the problem of the registrary.

If I register abc.ru and bc.ru fr whatso ever reason, I don’t se any others than ru-nic to having a potential problem. The cert only verifyes that the server is the correct host. Not the owner of the sever.

There is no proof of that they belongs to abc.com (broadcasting) or abc.dk (a danish buisness school)

3 Likes

but somebody could make google.com with cyrillic o or whatever and since google just uses a DV it is a really nice impersonation attack…

All 3 variants of google.com with cyrillic О were registered by Google itself on 7 February 2005.

https://who.is/whois/gоogle.com
https://who.is/whois/goоgle.com
https://who.is/whois/gооgle.com

Also, where have you ever seen Google using DV? It’s always OV from Google Internet Authority G2 under GeoTrust Global CA.

1 Like

“the cert doesnt contain any info about the owner” at least FF said that always in earlier versions.

Firefox had been showing “which is run by (unknown)” for every non-EV cert regardless, and it even continues to say “This website does not supply ownership information” for sites that do. For example, https://selecadm.name cert contains street address, unlike many EVs from Symantec. Firefox and other browsers rely on EV policy OIDs and often fail to perform other checks.

1 Like

oh didnt know that one. why dont they just check whether or not the data like company, address etc is there?

I hit this issue while trying to encrypt 2π.com. I would love to contribute to letsencrypt to enable IDN.

Here is my idea:

  • If the domain registrar for the TLD has a strong IDN policy, accept.
  • Generate all homograph domains using a character homograph table.
  • If there are more than (say) 1000, reject
  • Check all domains for existence (DNS lookup).
  • If one exists, reject
  • If none exists, accept.

This should allow a large fraction of honest domains to be accepted without accepting any harmful ones.

I would lower it, because almost a thousand lookups is a bit much…

Sometimes it’s the same entity which is running “the original” and the one which is being applied for. When that’s the case, it would be foolish to reject only because there’s a positive lookup. It’s starting to gain ground in Iceland to register both the name with special characters and the anglicised version. One of the reason is because the ccTLD registry offers the non-anglicised versions much cheaper (if one also gets the anglicised version).

The current policy is making it impossible for people with both to ensure HTTPS protection from the moment visitors go to the non-anglicised version (assuming HSTS preload). It also excludes people with only the non-anglicised version from getting a Let’s Encrypt certificate.