A non-English domain is considered invalid. #21

Rombobeorn · 2014-07-19T12:39:51Z

When I enter my email address Bjorn@Rombobjörn.se I get this error
message:

{
"reason": "value must be an email address"
}

Perhaps you prefer to not support IDNA (RFC 5890) until full support
for international email addresses (RFC 6530) is in place, but in that
case the error message should be "International email addresses aren't
supported yet. Please enter an address with only English letters and
digits."

ralphbean · 2014-07-19T14:14:05Z

Yes! My apologies for missing this.

The code where the regular expressions are evaluated is here: https://github.com/fedora-infra/fmn.lib/blob/develop/fmn/lib/__init__.py#L166-L179

And the regular expressions themselves are defined here: https://github.com/fedora-infra/fmn.lib/blob/develop/fmn/lib/__init__.py#L24-L26

(those code locations are for anyone who is interested in submitting a patch/pull-request for this).

abdelgmartinezl · 2014-07-30T04:06:50Z

Hi!

I tried this regex and locally worked. But I want to know if this could work properly on the fmn project:

import re
re.match("^.+@.+.([a-zA-Z]{2,3}|[0-9]{1,3})$", "Bjorn@Rombobjörn.se")

Rombobeorn · 2014-07-30T10:39:00Z

That pattern fails in at least three cases:

Non-English top-level domains exist now.
There are top-level domains with more than three letters.
Email addresses directly in a top-level domain may be rare, but they can technically exist, and they may become more common now that corporations can buy top-level domains.

abdelgmartinezl · 2014-07-30T12:53:17Z

So the best approach for now is changing the message or would you like me to improve the pattern?

Rombobeorn · 2014-08-03T11:10:23Z

What the pattern and the error message should be depends on what the rest of the system can handle.

If the underlying email machinery receives an address that it considers invalid, can the error it returns be passed back to the web interface? If the address is syntactically valid but the domain part can't be resolved to a mail server address, can that error be passed back to the web interface? If the email is sent but bounces, can that error be passed back to the web interface? If that error reporting is in place, then the web interface should only do basic sanity checking and delegate the full address validation to the email machinery.

Does the underlying email machinery implement RFC 6530? If not, does it support IDNA by converting the domain part of the address to ACE before it composes the message? If not, then the error message I suggested earlier should be displayed in the case when the address conforms to this syntax:
https://tools.ietf.org/html/rfc6531#section-3.3
but does not conform to this syntax:
https://tools.ietf.org/html/rfc5322#section-3.4.1
https://tools.ietf.org/html/rfc5322#section-3.2.3
https://tools.ietf.org/html/rfc5321#section-4.1.2

If address validation is delegated to the email machinery, but the email machinery doesn't support international addresses, then the web interface should check whether the input is valid UTF-8 but not ASCII, and in that case display the error message I suggested earlier.

By the way, in the proposed new pattern it looks like you tried to support an IPv4 address in the domain part, but you forgot the square brackets and you forgot about IPv6.

One problem with using a single regular expression for input validation is that it gives only a yes-or-no result. It can't distinguish between different ways that an email address can be invalid or unsupported. To compensate you have to write a long error message that enumerates all the possible reasons for rejecting the input.

As you might have guessed by now I don't agree with the "EasyFix" label. I suspect that you have a fair bit of coding to do before you can give accurate error messages.

abdelgmartinezl · 2014-08-07T14:11:30Z

Understood and agree all you mention. Maybe we can work it as multiple scenario. First, solve a regex that satisfies an English domain (first scenario). If an error appears, then try another regex that satisfies a non-English domain (second scenario) and so on. What do you think of this approach?

pypingou · 2014-08-08T10:06:04Z

@Rombobeorn You seem to know the context and problem quite well, could you advise us on what would be the best way to deal with this?
You say regex isn't so what is?

Rombobeorn · 2014-09-15T06:50:34Z

Hi. Sorry for the long silence. I've been very busy lately.

I can't give very specific advice without knowing what components you use to send email, what addresses those components can handle, what their APIs are like, how your error handling works, and so on – and I don't have time to read your code to find out.

Validating an email address needs to be done in multiple steps, and different error messages are needed for different errors. Therefore a single regex isn't an adequate solution, but regexes may well be used in some of the steps.

One of the validation steps is to actually send email to the address, which means that it will be passed to some email-handling library, or at least to a mail server, and will be validated there. It would make sense to leverage that validation rather than duplicating it, but it depends on how well that library or server reports errors. I think you should start by finding answers to the questions in my previous comment to get a better idea of what you can delegate and what you need to implement. If you're using a good library you might find that you only need to check that the input is a valid UTF-8 string containing exactly one @, or something like that.

pypingou · 2014-11-27T16:54:35Z

Maybe we could try using this library to help us: https://github.com/clarete/emailverifier @Rombobeorn what do you think?

abdelgmartinezl · 2014-12-03T20:57:20Z

Hi @pypingou! I have been playing around with https://github.com/syrusakbary/validate_email, perhaps could help us here. Does it worth it?

robert-scheck · 2015-02-21T02:32:59Z

Given that there are IDNA 2003 and IDNA 2008, it would be important that especially IDNA 2008 is supported (think of "straße.tld" which gets "strasse.tld" on IDNA 2003, but "xn--strae-oqa.tld" on IDNA 2008).

IIRC more than one @ in an e-mail address is allowed (in the local part), too. By the way, spaces in the local part of e-mail addresses are also allowed. Oh and finally there is also UTF8SMTP.

ralphbean · 2015-03-23T23:28:46Z

It would make sense to leverage that validation rather than duplicating it

This does make a lot of sense.. but I didn't know how to go about it. But I just found this, which I think will do:

>>> import smtplib
>>> server = smtplib.SMTP("bastion")  # our mail sending host
>>> server.verify("ralph@fedoraproject.org")
(252, '2.0.0 ralph@fedoraproject.org')

it depends on how well that library or server reports errors.

I think the error we get back here is going to be good enough to surface to users.

>>> server.verify("not-a-real-user-for-sure@fedoraproject.org")
(550, '5.1.1 <not-a-real-user-for-sure@fedoraproject.org>: Recipient address rejected: User unknown in local recipient table')
>>> server.verify("Bjorn@Rombobjörn.se")
(501, '5.1.3 Bad recipient address syntax')
>>> server.verify("Bjorn@Rombobjorn.se")
(252, '2.0.0 Bjorn@Rombobjorn.se')

Note that "full validation" at this step would involve really sending an email to the user and we do do that in a subsequent step. This will do for the preliminary step, I think.

puiterwijk · 2015-03-23T23:33:01Z

This sounds like a nice solution to me, as we'll automatically upgrade/downgrade the email addresses supported based on the mail server we're using (if it doesn't support international domains, we can't either).

...instead of trying to do it ourselves with a regex. This should fix fedora-infra/fmn#21 but will require one more change to fmn.web.

robert-scheck · 2015-03-25T00:47:51Z

@puiterwijk, I think the conversion from UTF-8 to ACE should usually happen on client side not on MTA side. Instead of passing UTF-8 we IMHO should pass ACE to server.verify().

…-date Add a relative arrow date to the irc message

create new fmn rules

…echa If a rule throws an exception, then the match should fail.

need to have endpoints so fedmsg load_configs doesnt complain

Clarify irc delivery details.

ralphbean added EasyFix labels Jul 19, 2014

pypingou mentioned this issue Dec 3, 2014

validate_email clarete/emailverifier#1

Open

ralphbean added a commit to fedora-infra/fmn.lib that referenced this issue Mar 24, 2015

Ask an SMTP server to validate our email addresses.

1f69c0e

...instead of trying to do it ourselves with a regex. This should fix fedora-infra/fmn#21 but will require one more change to fmn.web.

ralphbean mentioned this issue Mar 24, 2015

Ask an SMTP server to validate our email addresses. fedora-infra/fmn.lib#44

Merged

ralphbean closed this as completed in fedora-infra/fmn.lib#44 Mar 24, 2015

jeremycline pushed a commit to jeremycline/fmn that referenced this issue Dec 7, 2016

Merge pull request fedora-infra#21 from fedora-infra/feature/relative…

7ca396c

…-date Add a relative arrow date to the irc message

jeremycline pushed a commit to jeremycline/fmn that referenced this issue Dec 7, 2016

Merge pull request fedora-infra#21 from sayanchowdhury/fedora-infragh-31

8cb2ca6

create new fmn rules

jeremycline pushed a commit to jeremycline/fmn that referenced this issue Dec 7, 2016

Merge pull request fedora-infra#21 from fedora-infra/feature/for-boch…

d46c7cc

…echa If a rule throws an exception, then the match should fail.

jeremycline pushed a commit that referenced this issue Apr 26, 2017

Merge pull request #21 from fedora-infra/fixing-fedmsg-config

a0a1e1e

need to have endpoints so fedmsg load_configs doesnt complain

jeremycline pushed a commit that referenced this issue Apr 26, 2017

Merge pull request #21 from fedora-infra/feature/delivery-clarification

45fbc8b

Clarify irc delivery details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A non-English domain is considered invalid. #21

A non-English domain is considered invalid. #21

Rombobeorn commented Jul 19, 2014

ralphbean commented Jul 19, 2014

abdelgmartinezl commented Jul 30, 2014

Rombobeorn commented Jul 30, 2014

abdelgmartinezl commented Jul 30, 2014

Rombobeorn commented Aug 3, 2014

abdelgmartinezl commented Aug 7, 2014

pypingou commented Aug 8, 2014

Rombobeorn commented Sep 15, 2014

pypingou commented Nov 27, 2014

abdelgmartinezl commented Dec 3, 2014

robert-scheck commented Feb 21, 2015

ralphbean commented Mar 23, 2015

puiterwijk commented Mar 23, 2015

robert-scheck commented Mar 25, 2015

A non-English domain is considered invalid. #21

A non-English domain is considered invalid. #21

Comments

Rombobeorn commented Jul 19, 2014

ralphbean commented Jul 19, 2014

abdelgmartinezl commented Jul 30, 2014

Rombobeorn commented Jul 30, 2014

abdelgmartinezl commented Jul 30, 2014

Rombobeorn commented Aug 3, 2014

abdelgmartinezl commented Aug 7, 2014

pypingou commented Aug 8, 2014

Rombobeorn commented Sep 15, 2014

pypingou commented Nov 27, 2014

abdelgmartinezl commented Dec 3, 2014

robert-scheck commented Feb 21, 2015

ralphbean commented Mar 23, 2015

puiterwijk commented Mar 23, 2015

robert-scheck commented Mar 25, 2015