com.internationalnetwork.util
Class ElectronicMailAddress

Object
  extended by com.internationalnetwork.util.ElectronicMailAddress

public class ElectronicMailAddress
extends Object

This class provides methods for parsing or constructing internet eMail addresses that are RFC2822-compliant. The java.text.ParseException exception is only exception used to indicate non-compliance in both parsing and construction.

Parsing eMail addresses, which are usually supplied by users and, as such, may include serious formatting errors that can lead to a wide variety of problems. Some of these problems are due to flawed approaches based on incorrect assumptions about, or a total disregard for, well-established internet standards. For example, many programs will incorrectly interpret quoted-strings containing special characters such as "@" to have a special delimiting effect when it really should be taken literally:

"I'm @ work"@example.com

The above example also encloses the local-part within quotation marks so that spaces, and other special characters, may be used freely as literal characters (and not be interpreted as functional characters). The form of a quoted-pair is required to include a quotation mark (") or backslash (\) as a literal character in a quoted-string as either \" or \\ respectively.

The Display Name (text preceeding the eMail address) is also permitted, and does not need to be in the form of a quoted-string:

Display Name <someone@example.com>

In most cases, all that's really needed is the local-part and the internet domain name portions, thus "someone@example.com" (without the enclosing angle brackets) is made available with ease through the use of accessor methods getLP() and getIDN(), or even eMailAddress(int). The getDN() accessor method returns the Display Name.

Although RFC2822 defines certain maximum limits, it also contradictorally specifies that addresses in excess of these limits must be supported:

The following table shows the various portions of an eMail address, in a number of similar examples, each split into numbered portions, followed by key interpretations:

  1 2 3 4 5 6 7 8  
A       nobody @ example.com     A
B     < nobody @ example.com >   B
C   "Some One" < nobody @ example.com >   C
D Group: "Some One" < nobody @ example.com > ; D
  1 2 3 4 5 6 7 8  

Colums Rows
  1. Group construct (a display-name followed by a colon)
  2. Display-name (may also be enclosed in a quoted-string)
  3. Angle-addr (an opening "<" requires a closing ">")
  4. Local-part (may also be enclosed in a quoted-string)
  5. @ (addr-spec delimiter between local-part and domain)
  6. Domain (the Internet Domain Name or a domain-literal)
  7. Angle-addr (a closing ">" must match an opening "<")
  8. Group construct (must match a group construct)
  1. Addr-spec (an eMail address in its most basic form)
  2. Angle-addr (an eMail address enclosed in angle brackets)
  3. Mailbox (an eMail address with a human-readable description)
  4. Group (a group containing one or more eMail addresses)
Notes

The simplest form of an eMail address is defined as addr-spec and is comprised of items 4, 5, and 6 (as shown in row "A"). Nearly all eMail client applications include an Address Book feature that provides a convenient way for users to also use human-readable names, which results in common use of addresses comprised of columns 2, 3, 4, 5, 6 and 7 (as shown in row "C").

The group construct (see columns 1 and 8 in row "D") is optional. It usually includes a list of eMail addresses delimited by commas, can't be nested (unlike a white space comment), and a matching terminating semi-colon is required (as shown in column 8 in row "D").

A list of eMail addresses is delimited by commas (not a semi-colon) regardless of the presence of a group construct (although the group construct's terminating semi-colon may take the place of a delimiting comma). Each eMail address in the list may be in a different format (e.g., there may be a mixture of any of the forms shown in rows "A" through "D," along with any other RFC2822-compliant addresses).

Commented folding white space can be included almost everywhere. A comment uses a format very similar to that of a quoted-string, except that it is enclosed within parenthesis instead of quotation marks. Comment nesting is also permitted, thus unbalanced parenthesis will result in a ParserException exception.

Domain-literals are also handled in accordance with RFC2822. The format of addresses enclosed within the square brackets will only be validated to satisfy the ABNF Syntax rules defined in RFC2822, and not verification (which could potentially require Socket I/O or other application-specific techniques that are beyond the scope of this Java class; some additional testing is anticipated for future releases, but only to make syntax checking more strict without also violating RFC2822).

Certain obsolete rules are defined (search for the phrase "Obsolete Syntax" in RFC2822), and various isObsolete...() methods are provided as a convenience for situations where non-obsolete formats might be undesirable or prohibited. As far as we can determine, obsolete forms are only typically used in a display-name, so if you would like to eliminate the obsolete formats we recommend you to consider only doing so for local-part and domain.


Field Summary
static String VERSION
          Version number of this Package (read-only).
 
Constructor Summary
ElectronicMailAddress()
          Constructs an empty ElectronicMailAddress object.
ElectronicMailAddress(String mailbox)
          Constructs an ElectronicMailAddress object containing the specified eMail address split into its various pieces.
 
Method Summary
 int checkCFWS(String cfws)
          Checks a String for RFC2822 CFWS compliance, and returns a set of flags describing what was encountered.
 int checkDN(String displayName)
          Checks a String for RFC2822 display-name compliance, and returns a set of flags describing what was encountered.
 int checkIDN(String domain)
          Checks a String for RFC2822 domain compliance, and returns a set of flags describing what was encountered.
 int checkLP(String localPart)
          Checks a String for RFC2822 local-part compliance, and returns a set of flags describing what was encountered.
 String getAdditionalData()
          Returns the additional unparsed data, which should be any number of comma delimited eMail addresses that follow an existing comma (or semi-colon if a group construct is being terminated) after the first address that was provided in the original address string (and will be excluded from this additional data; there will be no leading comma or semi-colon).
 String getDN()
          Returns the parsed Display Name.
 int getDNFlags()
          Returns the flags associated with the display-name.
 String getErrorMessage()
          Returns error message text describing any parsing error that occurred.
 String getGN()
          Returns the name of the parsed group construct, without the colon (":").
 int getGNFlags()
          Returns the flags associated with the group construct display-name.
 String getIDN()
          Returns the parsed Internet Domain Name (which can be a domain-literal).
 int getIDNFlags()
          Returns the flags associated with the Internet Domain Name.
 String getLP()
          Returns the parsed Local-part.
 int getLPFlags()
          Returns the flags associated with the local-part.
 boolean getTabConversionMode()
          Returns a boolean that indicates if non-literal tabs will be converted to spaces.
 boolean isAngleAddr()
          Returns a boolean that indicates if the specified address contains an RFC2822 compliant angle-addr.
 boolean isDNObsolete()
          Returns a boolean that indicates if the obsolete syntax was used.
 boolean isGNObsolete()
          Returns a boolean that indicates if the obsolete syntax was used.
 boolean isIDL()
          Returns a boolean that indicates if the domain is a domain-literal (an IP address enclosed within square brackets).
 boolean isIDNObsolete()
          Returns a boolean that indicates if the obsolete syntax was used.
 boolean isInGroup()
          Returns a boolean that indicates if a group construct was defined.
 boolean isLocalAddress()
          Returns a boolean that indicates if the specified address didn't include an at ("@") symbol delimiter and a domain.
 boolean isLPObsolete()
          Returns a boolean that indicates if the obsolete syntax was used.
 boolean isNullAngleAddr()
          Returns a boolean that indicates if the specified address was a null angle-addr.
 boolean isValid()
          Indicates if the eMail address is valid.
static void main(String... args)
           
 String setAdditionalData(String additionalData)
          Sets the AdditionalData, but doesn't check validity.
 String setDN(String displayName)
          Sets the Display name, but doesn't check validity.
 String setGN(String groupName)
          Sets the group construct name, but doesn't check validity.
 String setIDN(String internetDomainName)
          Sets the Internet Domain Name, but doesn't check validity.
 String setLP(String localPart)
          Sets the Local-part, but doesn't check validity.
 boolean setTabConversionMode(boolean tabConversion)
          Sets non-literal tabs conversion mode.
 String[] toArray()
          Returns a String array[] containing the following elements:
 String toString(String... defaults)
          Returns a String containing an RFC2822 name-addr formatted eMail address, but without the group construct display-name portion.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final String VERSION
Version number of this Package (read-only).

See Also:
Constant Field Values
Constructor Detail

ElectronicMailAddress

public ElectronicMailAddress()
Constructs an empty ElectronicMailAddress object. Although this eMail address is technically invalid, the validity will change as soon as it is populated with an eMail address by way of various set methods.


ElectronicMailAddress

public ElectronicMailAddress(String mailbox)
                      throws java.text.ParseException
Constructs an ElectronicMailAddress object containing the specified eMail address split into its various pieces. The group construct is supported, and when multiple addresses are specified (regardless of the use of any number of group constructs), as per RFC 2822 conventions, the getAdditionalData() method makes it possible to parse every one of them in a loop.

If the mailbox is actually a mailbox-list, then the extra addresses will be available via the getAdditionalData() method, with a group construct carried over if one was defined (this simplifies using a loop to process a mailbox-list).

Parameters:
mailbox - RFC-compliant eMail address (if a null or a blank String is specified, a new object is created but with an error condition set)
Throws:
java.lang.ParseException - When a non-compliant address is encountered
java.text.ParseException
Method Detail

checkCFWS

public int checkCFWS(String cfws)
              throws java.text.ParseException
Checks a String for RFC2822 CFWS compliance, and returns a set of flags describing what was encountered. The archaic RFC2234 ABND Syntax used to describe the CFWS rules in RFC2822 (rules that begin with an "obs-" prefix are obsolete, but are still supported) are:

        ccontent = ctext / quoted-pair / comment
        CFWS     = *([FWS] comment) (([FWS] comment) / FWS)
        comment  = "(" *([FWS] ccontent) [FWS] ")"
        CR       = %x0D
        CRLF     = CR LF
        ctext    = NO-WS-CTL / %d33-39 / %d42-91 / %d93-126
        FWS      = ([*WSP CRLF] 1*WSP) / obs-FWS
        HTAB     = %x09
        LF       = %x0A
        obs-FWS  = 1*WSP *(CRLF 1*WSP)
        SP       = %x20
        WSP      = SP / HTAB
      

Although "comment"s are optional (and may be nested), the number of opening and closing parenthesis must match.

Returns:
Flags, as per the following bit values:
2 = Contains at least one RFC2822 quoted-pair
8 = Contains at least one RFC2822 comment
16 = Contains a nested hierarchy of RFC2822 comment
32 = Contains at least one RFC2822 "CRLF 1*WSP" character sequence
Throws:
java.text.ParseException - For non-compliant CFWS

checkDN

public int checkDN(String displayName)
            throws java.text.ParseException
Checks a String for RFC2822 display-name compliance, and returns a set of flags describing what was encountered. The archaic RFC2234 ABNF Sytax used to describe the display-name rules in RFC2822 (rules that begin with an "obs-" prefix are obsolete, but are still supported) are:

        atext         = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" /
                        "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" /
                        "{" / "|" / "}" / "~"
        atom          = [CFWS] 1*atext [CFWS]
        ccontent      = ctext / quoted-pair / comment
        comment       = "(" *([FWS] ccontent) [FWS] ")"
        ctext         = NO-WS-CTL / %d33-39 / %d42-91 / %d93-126
        display-name  = phrase
        obs-char      = %d0-9 / %d11 / %d12 / %d14-127
        obs-phrase    = word *(word / "." / CFWS)
        obs-qp        = "\" (%d0-127)
        obs-text      = *LF *CR *(obs-char *LF *CR)
        phrase        = 1*word / obs-phrase
        qcontent      = qtext / quoted-pair
        qtext         = NO-WS-CTL / %d33 / %d35-91 / %d93-126
        quoted-pair   = ("\" text) / obs-qp
        quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
        text          = %d1-9 / %d11 / %d12 / %d14-127 / obs-text
        word          = atom / quoted-string
      

Some interpretive notes of interest:

Returns:
Flags, as per the following bit values:
1 = Contains at least one RFC2822 quoted-string
2 = Contains at least one RFC2822 quoted-pair
4 = Contains at least one RFC2822 obs-phrase and/or at least one obs-qp
8 = Contains at least one RFC2822 comment
16 = Contains a nested hierarchy of RFC2822 comment
32 = Contains at least one RFC2822 "CRLF 1*WSP" character sequence
Throws:
java.text.ParseException - For a non-compliant display-name

checkIDN

public int checkIDN(String domain)
             throws java.text.ParseException
Checks a String for RFC2822 domain compliance, and returns a set of flags describing what was encountered. The archaic RFC2234 ABNF Sytax used to describe the domain rules in RFC2822 (rules that begin with an "obs-" suffix are obsolete, but are still supported) are:

        atext          = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" /
                         "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" /
                         "{" / "|" / "}" / "~"
        atom           = [CFWS] 1*atext [CFWS]
        ctext          = NO-WS-CTL / %d33-39 / %d42-91 / %d93-126
        ccontent       = ctext / quoted-pair / comment
        comment        = "(" *([FWS] ccontent) [FWS] ")"
        dcontent       = dtext / quoted-pair
        domain         = dot-atom / domain-literal / obs-domain
        domain-literal = [CFWS] "[" *([FWS dcontent) [FWS] "]" [CFWS]
        dot-atom       = [CFWS] dot-atom-text [CFWS]
        dot-atom-text  = 1*atext *("." 1*atext)
        dtext          = NO-WS-CTL / %d33-90 / %d94-126 ; Not "[", "]", or "\"
        obs-char       = %d0-9 / %d11 / %d12 / %d14-127
        obs-domain     = atom *("." atom)
        obs-qp         = "\" (%d0-127)
        obs-text       = *LF *CR *(obs-char *LF *CR)
        qtext          = NO-WS-CTL / %d33 / %d35-91 / %d93-126
        quoted-pair    = ("\" text) / obs-qp
        text           = %d1-9 / %d11 / %d12 / %d14-127 / obs-text
        word           = atom / quoted-string
      

Some interpretive notes of interest:

Returns:
Flags, as per the following bit values:
1 = Contains one RFC2822 domain-literal
2 = Contains at least one RFC2822 quoted-pair
4 = Contains at least one RFC2822 obs-domain and/or at least one obs-qp
8 = Contains at least one RFC2822 comment
16 = Contains a nested hierarchy of RFC2822 comment
32 = Contains at least one RFC2822 "CRLF 1*WSP" character sequence
64 = Contains a multi-level domain (one or more periods are functioning as delimiters)
Throws:
java.text.ParseException - For a non-compliant domain

checkLP

public int checkLP(String localPart)
            throws java.text.ParseException
Checks a String for RFC2822 local-part compliance, and returns a set of flags describing what was encountered. The archaic RFC2234 ABNF Sytax used to describe the local-part rules in RFC2822 (rules that begin with an "obs-" suffix are obsolete, but are still supported) are:

        atext          = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" /
                         "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" /
                         "{" / "|" / "}" / "~"
        atom           = [CFWS] 1*atext [CFWS]
        ctext          = NO-WS-CTL / %d33-39 / %d42-91 / %d93-126
        ccontent       = ctext / quoted-pair / comment
        comment        = "(" *([FWS] ccontent) [FWS] ")"
        dot-atom       = [CFWS] dot-atom-text [CFWS]
        dot-atom-text  = 1*atext *("." 1*atext)
        local-part     = dot-atom / quoted-string / obs-local-part
        obs-char       = %d0-9 / %d11 / %d12 / %d14-127
        obs-local-part = word *("." word)
        obs-phrase     = word *(word / "." / CFWS)
        obs-qp         = "\" (%d0-127)
        obs-text       = *LF *CR *(obs-char *LF *CR)
        phrase         = 1*word / obs-phrase
        qcontent       = qtext / quoted-pair
        qtext          = NO-WS-CTL / %d33 / %d35-91 / %d93-126
        quoted-pair    = ("\" text) / obs-qp
        quoted-string  = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
        text           = %d1-9 / %d11 / %d12 / %d14-127 / obs-text
        word           = atom / quoted-string
      

Some interpretive notes of interest:

Returns:
Flags, as per the following bit values:
1 = Contains at least one RFC2822 quoted-string
2 = Contains at least one RFC2822 quoted-pair
4 = Contains at least one RFC2822 obs-local-part and/or at least one obs-qp
8 = Contains at least one RFC2822 comment
16 = Contains a nested hierarchy of RFC2822 comment
32 = Contains at least one RFC2822 "CRLF 1*WSP" character sequence
64 = Contains more than one RFC2822 quoted-string
Throws:
java.text.ParseException - For a non-compliant local-part

getAdditionalData

public String getAdditionalData()
Returns the additional unparsed data, which should be any number of comma delimited eMail addresses that follow an existing comma (or semi-colon if a group construct is being terminated) after the first address that was provided in the original address string (and will be excluded from this additional data; there will be no leading comma or semi-colon).

If a group construct was used, a semi-colon is required to terminate the group after the final recipient, and if the extracted eMail address does not end with a semi-colon, then the group construct will be re-inserted at the beginning of the additional unparsed data so that the group construct won't get lost in the context of a loop (in the caller's code) that is processing a list of addresses.

Returns:
Additional unparsed data (will never be null)

getDN

public String getDN()
Returns the parsed Display Name.

Returns:
Display name (will never be null)

getDNFlags

public int getDNFlags()
Returns the flags associated with the display-name.

Returns:
See setDN() for details

getErrorMessage

public String getErrorMessage()
Returns error message text describing any parsing error that occurred.

Returns:
Error text (or an empty string if no error, will never be null)

getGN

public String getGN()
Returns the name of the parsed group construct, without the colon (":").

Returns:
Group name (will never be null)

getGNFlags

public int getGNFlags()
Returns the flags associated with the group construct display-name.

Returns:
See setGN() for details

getIDN

public String getIDN()
Returns the parsed Internet Domain Name (which can be a domain-literal).

Returns:
Internet Domain Name (will never be null)

getIDNFlags

public int getIDNFlags()
Returns the flags associated with the Internet Domain Name.

Returns:
See setIDN() for details

getLP

public String getLP()
Returns the parsed Local-part.

Returns:
Local-part (will never be null)

getLPFlags

public int getLPFlags()
Returns the flags associated with the local-part.

Returns:
See setLP() for details

getTabConversionMode

public boolean getTabConversionMode()
Returns a boolean that indicates if non-literal tabs will be converted to spaces. By default, they will be as long as they are not quoted.

This behaviour is not dictated by RFC2822, but we decided to default to it because many systems aren't expecting control characters anywhere.

Returns:
True, non-literal tabs will be converted to spaces

isAngleAddr

public boolean isAngleAddr()
Returns a boolean that indicates if the specified address contains an RFC2822 compliant angle-addr.

Since invalid characters could follow the greater-than sign, this method is not an overall validity indicator.

Returns:
True, if address contains an RFC2822 compliant angle-addr

isDNObsolete

public boolean isDNObsolete()
Returns a boolean that indicates if the obsolete syntax was used.

Returns:
True, if obs-phrase detected in formation of display-name

isGNObsolete

public boolean isGNObsolete()
Returns a boolean that indicates if the obsolete syntax was used.

Returns:
True, if obs-phrase detected in formation of group construct

isIDL

public boolean isIDL()
Returns a boolean that indicates if the domain is a domain-literal (an IP address enclosed within square brackets).

Returns:
True, if domain is a domain-literal

isIDNObsolete

public boolean isIDNObsolete()
Returns a boolean that indicates if the obsolete syntax was used.

Returns:
True, if obs-domain detected in formation of domain

isInGroup

public boolean isInGroup()
Returns a boolean that indicates if a group construct was defined.

Returns:
True, if this eMail address is in a group

isLocalAddress

public boolean isLocalAddress()
Returns a boolean that indicates if the specified address didn't include an at ("@") symbol delimiter and a domain. Such local addresses are accepted by some systems as being part of the primary domain, thus the local address <nobody> being delivered to the mail exchanger for the "example.com" domain would be delivered to "nobody@example.com" by default.

Note that this will always be true when isNullAngleAddr() is true.

The "Missing @domain" error is still returned since this violates RFC 2822.

Returns:
True, if a local address was specified

isLPObsolete

public boolean isLPObsolete()
Returns a boolean that indicates if the obsolete syntax was used.

Returns:
True, if obs-local-part detected in formation of local-part

isNullAngleAddr

public boolean isNullAngleAddr()
Returns a boolean that indicates if the specified address was a null angle-addr. Examples of null addresses are:

Note that a null angle-address also qualifies as a local address, and isLocalAddress() will also return true respectively.

The "Missing local-part" error is still returned since this violates RFC 2822.

Returns:
True, if a null angle-addr was specified

isValid

public boolean isValid()
Indicates if the eMail address is valid.

Returns:
True = valid, false = invalid

main

public static void main(String... args)

setAdditionalData

public String setAdditionalData(String additionalData)
Sets the AdditionalData, but doesn't check validity.

Returns:
Previous AdditionalData

setDN

public String setDN(String displayName)
             throws java.text.ParseException
Sets the Display name, but doesn't check validity.

Returns:
Previous Display name
Throws:
java.text.ParseException - For a non-compliant display-name

setGN

public String setGN(String groupName)
Sets the group construct name, but doesn't check validity.

Returns:
Previous group name

setIDN

public String setIDN(String internetDomainName)
              throws java.text.ParseException
Sets the Internet Domain Name, but doesn't check validity.

Returns:
Previous Internet Domain Name
Throws:
java.text.ParseException - For a non-compliant display-name

setLP

public String setLP(String localPart)
             throws java.text.ParseException
Sets the Local-part, but doesn't check validity.

Returns:
Previous Local-part
Throws:
java.text.ParseException - For a non-compliant display-name

setTabConversionMode

public boolean setTabConversionMode(boolean tabConversion)
Sets non-literal tabs conversion mode.

Returns:
Previous setting

toArray

public String[] toArray()
Returns a String array[] containing the following elements:

  1. Group construct display-name (without colon)
  2. Display-name
  3. Local-part
  4. Domain
  5. Additional data

This method is provided mainly for efficiency as it eliminates the need to call four methods separately, thus if you need two or more of these pieces of information, calling this method instead of each respective "get" method separately will require fewer CPU cycles, and could also be easier to handle from a coding perspective.

Returns:
String[5] (elements will never contain null)

toString

public String toString(String... defaults)
Returns a String containing an RFC2822 name-addr formatted eMail address, but without the group construct display-name portion.

If the local-part is empty, then "nobody" is automatically provided in the local-part portion of addr-spec, unless a default value is provided.

If the address is local (because no domain was defined), then "localhost" is automatically provided in the domain portion of addr-spec, unless a default value is provided.

Parameters:
defaults - Optional. The first parameter is the default to use when the local-part is empty. The second parameter is the default to use when the domain is empty.
Returns:
RFC2822 name-addr eMail address
Throws:
IllegalArgumentException - if more than two default values are provided (there generally isn't a need to catch this exception, unless you are using an array based on user or configuration file input)