The idea can be modified a bit:
- If the domain exists, it must also be listed as a domain in the certificate request.
The idea can be modified a bit:
Sorry, here is a major problem in my approach as stated above:
Suppose I have happy.com and hĪ±ppy.com and have been happily using letsencrypt for years. Now evil.com registers hĪ±ĻĻy.com. Suddenly my letsencrypt update will fail. This is of course a no-go.
Maybe we should not use the procedure if the use also owns the non-punnycode domain. But there could be multiple non-punnycode homograph domains.
in my opinion there should be the unicode strict matching even for domain registrations which doesnt let cyrillic, greek and latin in the same domain, would help for most issues.
Hi,
while I do understand the security concerns in regard to phising, Iām disappointed that a valid, existing, and widely used technology is basically banned from participation. Iām running a server with an IDN and unfortunately I had to learn that I can not use the great service that LE is providing.
Is there any update on this? Is there any timetable on reviewing the decision if IDN will be supported by LE at some point?
Kind regards,
Jin
A quick solution could be to create a whitelist of allowed characters as opposed to adding full support.
Well, my question was rather aiming at some response from the LE devs if there is some roadmap for IDN or if IDN users should not have any hopes regarding LE support in the near future.
I second jin_eld on that. Iād like to know whether there is any hope to use Letās Encrypt with IDNs or whether we need to start looking for alternatives.
Interstingly, on other occasions the staff states that they do NOT want to be gatekeepers against phishing and the like: https://letsencrypt.org/2015/10/29/phishing-and-malware.html
Pretty inconsistent to ban IDNs then.
Thanks for the link, I was not aware of this. Then itās indeed strange that we have to discuss IDN support in the context of phising and malware, given the above statement.
So once again a question to the LE guys: is the info in the above post still valid and if so, when will we be able to register certificates for IDN? =)
Kind regards,
Jin
I have a working prototype to check IDNs against spoofing attempts. It still needs some testing but once this is finished Iāll prepare a PR. Please give me two weeks to finish this, since right now Iām busy with other urgent stuff.
Great, thanks! I have a staging environment where I can test some stuff, so looking forward to it.
LE could just require a proof of property for the NFKC normalized domain, hence only the owner of both google.com and google.com (cyrillic o) could ask for a certificate for google.com (cyrillic o).
The rfc7700 discuss of internationalized strings comparison for human-friendly names for, among others, websites.
Well, the Unicode Standard handles confusable characters quite well. See [1], [2], [3] and [4]. The ICU Project implemented spoofing detection using this recommendations [5]. Since there is a Python wrapper for libicu [6] it is quite simple to incorporate spoofing detection into Lets Encrypt. As I already wrote, I just did that and it just needs further testing. This should satisfy the concerns the Lets Encrypt guys have, so once I have prepared a PR it hopefully will be accepted.
PS: As a new user Iām restricted to two links in a post so I dropped the leading http below, you have to copy & paste the references. Sorry for that.
[1] unicode.org/reports/tr36/
[2] unicode.org/reports/tr39/
[3] unicode.org/reports/tr46/#Processing
[4] unicode.org/reports/tr39/#Restriction_Level_Detection.
[5] icu-project.org/apiref/icu4c/uspoof_8h.html
[6] github.com/ovalhub/pyicu
Could a technical solution to this problem be to only allow punycode domains where all characters of the domain belong to the same unicode code blockā¦ but to refuse domains mixing characters from two or more different unicode code blocks?
This would mean that you only allow punycode domains which are written with characters from the same unicode code block. I assume that all punycode domains which are created with good intentions, have a name in only one language, for example chinese, cyrillic or hindi. But if you mix a cyrillic a with latin letters in a domainā¦ then this domain would probably have no use in real life with good intentions, and should therefor not by allowed by letsencrypt.
This could be a way to allow use of punycode domains with good intentions, but still avoid homoglyphs. Punycode domains will be the future, considering the growth of the Chinese and Indian domain market.
When it comes to latin characters, there would probably have to be exceptions to the one Unicode block rule, if implemented. Some latin alphabets use characters from two different blocks or even more.
also numbers should be excluded of that rule if you e.g. want a page with the name 2(sign-of-pi).com
Yesā¦ there would still be some alphabets and cases that would not be allowed with such a rule. But the āone unicode block ruleā would still be OK in 98% of all idn cases. And that is a good start! The system might not need to handle cases such as 2(unicode-pi).com. But it would be a BIG improvement if the system could handle most idn domains. And most idn domains dont mix characters from several unicode blocks.
All french idn domain use at least 2 unicode blocks, for example http://www.acadƩmie-franƧaise.fr/
which use chars form the Basic Latin block and the Latin-1 Supplement block.
well I think we can count latin as one part.
as well as cyrillic, or the cjk or greek
It is possible to write the same latin characters in many ways though. For example Ƥ can be written as an Ƥ or as ĀØ and a. There are normalisation forms that can be used to specify which form to use though. https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization ā¦there are also specific tools for unicode equivalence. So it is possible to make several choicesā¦ from 1) very strict, allowing only domains with only asciiā¦ to something less strict 2) allowing only domain with characters from the same blockā¦ to something more free 3) allowing all domains which does not have an equivalent homoglyphā¦ o 4) completely free combinations ā¦et.c. Life is not always black or white