Internationalized Domain Names

By the way, what are the issues exactly? I can understand there can be issues with the client (inputing the '+e é instead of the native é). But if I input it in the xn–* form (as I still have to do for most mail clients anyway…), it should look like just like a regular domain name to the CA.

As @jsha previously pointed out the main concern is homoglyphs, characters that visually look alike but have different code points. This basically means somebody might be able to create a domain name that looks exactly like a high-value domain name but if you did a string comparison between the two domains they wouldn’t match and could point to two different websites.

This means we could accidentally issue a certificate for a domain that looks like google.com even though that name is already part of a high-value anti-phishing blacklist.

(Wikipedia has a pretty good article on duplicate unicode characters, which are part of the concern, here – https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode)

4 Likes

This is possibly a stupid question, but do the newer TLD’s such as .app .docs .beer etc fall under “international” domains as well? Or will the be supported at launch? I have a few domains that utilize some of the new TLDs that were introduced this year.

Sorry, I just saw the reply here Do You Support Free Domains stating they should be supported at launch.

@Taubin, that’s right! There’s a distinction between international domain names (fully supported) and internationalized domain names (not currently supported). The international domain names are those under a foreign-based country-code top-level domain (like .de, .ru, .za, .uk) and the internationalized domain names are those that contain a non-ASCII character (like παράδειγμα.com or 例子.com).

3 Likes

@Taubin

Also a couple of good Wikipedia page links that may assist with clarifying the Internationalized Domain Names (IDN) distinction are provided earlier in this thread.

1 Like

What about internationalized domain names under sensible ccTLDs which do not allow homoglyphs?

Like it is not possible to register аbc.de or аbc.fi (аbc.de or аbc.fi) containing a Cyrillic a as these ccTLDs accept only a limited set of accented letters and such but not symbols, other scripts or anything like that.

For this kind of ccTLDs there are no need for extra homoglyph tests, as registry authorities have prevented them altogether.

I would be happy if you’d also support Romanian characters șțăîâȘȚÂĂÎ in domain names. Romanian registrar ROTLD announced a few months ago it supports these characters in domain names.

Why would it matter if a certain character looks like another when it comes to certificates? As per a recent blog, Let’s Encrypt isn’t going to have any direct malware and phishing oversight except as to query an external API for any known URIs. I’m no expert on issuing certificates but I’m interested in knowing what potential problems there would be in supporting requests to issue certificates to punycode encoded FQDNs.

well the problem is that the browser know what IDN is what punycode and I believe the browser will trust the IDN as IDN and puny isnt just a plan server-set alias or something that can change over time, it is made by spec. and so it should be obvious that certain charsets shouldnt be mixed as well. there are certain standard levels and I think it would be okay to say greek latin and cyrillic may not mix the rest should be okay (but that should also e managed by the registrars that way.

also the cert policy says iirc that it’s forbidden to create confuging names that could impersonate others

For the purpose of IDN, there is only one charset: Unicode. Charset mixing shouldn't be an issue when processing FQDNs. I don't know the details of the certificate specs although I would assume the aliases will be written in Unicode and/or IDN in the issued certificate so charsets shouldn't be (much of) a problem there.

I don't think the other CAs enforce such a policy in such a manner. LE is going to check each non-IDN domain name against Google's API of bad domains names so why can't the same thing apply to IDN domains? LE is making a good faith effort to enforce the policy without resorting to overly restrictive limitations on what domains can be registered and which not.

If the matter is purely potential confusion of humans, LE could mark FQDNs with suspicious characters and execute extra checks against that API at regular intervals and stop the extra checks after a certain time limit if it doesn't appear on that list (only checking when there are renewal requests from then on).

1 Like

rather than charset I probably sould have said character classes.
and there are pseudo-standards against this.emphasized text


and here are the standards I talked about:
http://www.unicode.org/reports/tr39/#Restriction_Level_Detection

If visually ambiguous characters are going to be the justification for not supporting IDN then ASCII domain names should also not be supported as there are also visually ambiguous ASCII characters.

Seems like a pretty weak and inconsistent justification to me.

Depending on the URL display font these characters can look very much like each other. Especially so when they are not right next to each other.

B8 g9 l1 O0 qg

1 Like

well at least fir those it can be easily seen if you use thr right front (especially monospace fonts help here.

also there’s a difference between similar and looking exactly the same.

I understand issue with homoglyphs in internationalized domains, but what about internationalized country code top-level domains? For example, .рф, .বাংলা, .中国, they don’t have homoglyphs problem.

I think if we would use restriction level highly restrictive of the Unicode restriction levels that shouldnt really be a problem, and that is also used for example when GMaill works with IDN Emails.

I would like to add upon the request to support IDN. I don’t think it’s letsencrypt’s job to restrict domain names that are completely valid in their syntax, even though they might be misused. In any case if people register domains to misuse them a block by letsencrypt won’t stop them.

I’m using an IDN for legit reasons (my name) and I would like to be able to encrypt via letsencrypt as well.

Any chance to receive support with the public beta?

1 Like

well a certain part of the IDN should be restricted, like using cyrillic, greek and latin together, so the identical symbol attack can be prevented, I mean what browser will add LE if they literally allow a cert for paypal.com but with a cyrillic a?

but I wanna know how the icann handles IDNs and homoglyphs in gTLDs, I mean with gTLDs the ICANN holds the rules, for example country names are forbidden (like germany.com)

I don’t see why LE should limit hostnames to anything but anything but hosnames that can be looked up via dns.

Yes there is a potential issue with eg cyrilic a and ascii a, but in my view hat is the problem of the registrary.

If I register abc.ru and bc.ru fr whatso ever reason, I don’t se any others than ru-nic to having a potential problem. The cert only verifyes that the server is the correct host. Not the owner of the sever.

There is no proof of that they belongs to abc.com (broadcasting) or abc.dk (a danish buisness school)

3 Likes

but somebody could make google.com with cyrillic o or whatever and since google just uses a DV it is a really nice impersonation attack…

All 3 variants of google.com with cyrillic О were registered by Google itself on 7 February 2005.

https://who.is/whois/gоogle.com
https://who.is/whois/goоgle.com
https://who.is/whois/gооgle.com

Also, where have you ever seen Google using DV? It’s always OV from Google Internet Authority G2 under GeoTrust Global CA.

1 Like