ITP Sites:   ITP Site|TechBlog|TechHub in schools|NZ CloudCode|All Tech Events|Software Escrow NZ

ITP Techblog

Brought to you by IT Professionals NZ
Menu
« Back to Home

Programming Language Hacks

Don Hollander, Guest post. 12 October 2017, 8:00 am
Programming Language Hacks

Universal Acceptance is the simple concept that all legitimate domain names and email addresses work across all applications. 

A recent Universal Acceptance Steering Group (UASG) study found that many users were being denied access to applications because they lacked a simple fix. 

Top-level domains (TLDs) and email addresses have evolved markedly since 2010, when non-ASCII characters were first introduced. Hundreds of these new style TLD names, including TLDs longer than three characters, have been added into the root zone. In 2012 non-ASCII characters became available in mailbox portion of email addresses.

Examples taken from the UASG's working test cases include:

Style of Address

Example Test Case

ascii@ascii.newshort

info1@ua-test.link

ascii@ascii.newlong

info2@ua-test.technology

ascii@idn.ascii

info3@普遍接受-测试.top

ascii@ascii.idn

info4@ua-test.世界

Unicode@ascii.ascii

测试1@ua-test.link

Unicode@idn.idn

测试5@普遍接受-测试.世界

Arabic.arabic@arabic

دون@رسيل.السعودية

In a recent study of 1000 popular websites, too few accepted the full range of email addresses to be used as unique identifiers. We found no consistency in the programming of the Regular Expressions used to validate email addresses and very little use of competent server-side libraries for validation, contributing to these poor results.

The UASG was established in 2015 to raise awareness of issues like this and to facilitate resolution. It is an initiative of the Internet community and is supported by ICANN. The UASG has developed a range of documentation and resources for becoming UA-ready, for both management and developers.

Developers must update their code to accommodate this growing number of domain names and email addresses. Here is some guidance for modernizing your applications:

Input

Data fields that accept domain names or email addresses must accept ASCII and non-ASCII characters. Many of the next billion Internet users to come online (and existing users that prefer addresses that better reflect their sense of identity) require text that doesn't use only ASCII. UTF-8 is the key here. This will affect input, storage and output of data from keyboards, databases and other data sources. Most modern software components are capable of supporting this. They just need to be configured correctly. 

Validation

The easiest way to deal with this is to use a simple syntactic validation of the email address in the client side and more extensive validation through server-side libraries. There are other ways of making sure the data entered is what the user meant, such as requiring entry of the field twice and doing a compare or sending an email to verify receipt. Using extensive and complicated Regular Expressions are often difficult to debug and may not cater to the now dynamic set of top-level domain names.

If you need to validate further, use a DNS lookup - that's the most certain. Or if you're going to use a local table of TLDs, make sure that it's from an authoritative source and that your local table is updated at least daily. 

Storage

The easiest way to deal with storage is to support Unicode. This ensures that the data is reproducible exactly as received. But for applications or systems that can't, there is an algorithm (Punycode)that allows transformation of domain names between ASCII and non-ASCII strings. 

Processing

When processing or sorting, it's important that equivalent names are treated as equivalent.  Examples of equivalent but different representations include Unicode vs. Punycode, Unicode Normalization and the use of different native scripts. Treating equivalences will require some policies for the application or indeed the organization.

Display

Public-facing applications should be capable of displaying TLDs and email addresses in native scripts with appropriate fonts.

Validation Libraries

Programming language libraries, particularly open source programming language libraries, are creating or correcting validation routines, so becoming UA-ready may be as simple as re-compiling the code using the latest version of the library. The UASG is encouraging remediation work in many libraries.

When systems are UA-ready, they will work with the continuously expanding domain name space. It also sets businesses up for future opportunities and success by supporting their customers using their customers' chosen identities. It's time to get applications up to scratch.  

 

Don Hollander is a New Zealand based former CIO for very large domestic and international corporations. He has been involved in the New Zealand IT industry for many years and served as the Chair of TUANZ in the late 1990s. Besides travelling the world raising awareness, he operate a wee second hand bookshop in Newtown.


Comments

You must be logged in in order to post comments. Log In


Web Development by The Logic Studio