What is the Cost of Bad Addressing?
According to an IDS report, bad data costs the US economy around 3.1 trillion dollars each year (an IBM study). For example, an audit of sorting machine efficiency at New York City’s Queens Processing and Distribution Center found that in one machine, 17% of trays were bogged down by recirculating packages, since the sorter was unable to read the address (reference: Sorting the Savings).
Companies that use private shipping companies also face seemingly small address change fees that can quickly add up and eat away the bottom line. However, the cost of wrong addresses goes beyond re-shipping fees. Here are just some of the other negative aspects:
A great solution to these problems might be easier than you imagine. This blog shows you how to nip the problem in the bud by implementing AI-driven address normalization.
What is Address Normalization?
Address Normalization is the process of formatting an address suggested by the appropriate postal authority. For example, in the US, the United States Postal Service (USPS) Publication 28 determines and precisely defines postal addressing standards.
Address normalization involves verifying and revising address records to follow a standard format using a reliable database. It entails correcting addresses to a normalized form and checking for spelling, formatting, and abbreviation issues. Once the process completes, you’ll have flawlessly formatted and styled addresses that you can use for shipping, billing, and customer segmentation for marketing campaigns.
In Image 1, note how the addresses present in the original data are transformed into a standard format.
In the example shown in Image 1, “Avenue” becomes “AVE,” “Northwest” becomes “NW,” and “Place” becomes “Pl.” Converting “Twelth” to “12TH” is an interesting challenge in that “Twelth” is misspelled, and should actually be “Twelfth.” In this case, your normalization implementation needs to account for pronunciation mistakes.
You might also notice that the spelled-out state name “Massachusetts” is normalized to the two-letter state code. Finally, the USPS prefers that addresses are all uppercase, as you can see in Image 1.
How Address Normalization Help Businesses?
Address Normalization accurately registers addresses and converts data to official specifications. This feature enhances the shipping and delivery procedures. Software that reads through text to extract and compile pertinent data for billing and shipping might help businesses standardize addresses.
There are more than 130 different address forms used across the world. With artificial intelligence and machine learning (AI/ML) technology, global e-commerce enterprises can easily manage these variations and reach clients in any locality.
An address is “standardized” when the pertinent information (such as the street number, business word abbreviations (ex: apartment), street suffixes, city, state, and postal code) are in the proper format.
When customers share their billing and delivery information, the following problems frequently occur:
|Problem||Correct Address||What Was Sent|
|Incorrect information||221B Baker Street||221B Baker Place|
|Incomplete Information||123 Main Street, Apt B||123 Main Street|
|Pronunciation Mistake||Gloucester, MA 01930||Glawster, MA 01930|
|Number Format Errors||Abington, MA 02351||Abington, MA 2,351|
|Abbreviation Misformatting||San Francisco, CA 94101||SF Calif 94101|
|Missing Vital Information||205 West 400 South, Salt Lake City, UT||205 West, Salt Lake City, UT|
|Historic Mistakes||Willis Tower, Chicago, IL||Sears Tower, Chicago, IL|
Due to postal carriers’ inability to find a location, these errors make it difficult to deliver items and return the shipments. Customers incur shipment delays due to these mistakes and businesses are forced to bear the costs. Address normalization has subsequently emerged as an excellent solution to such problems.
Address Normalization Implementation
Our development teams have determined that a T5-based transformers model is one of the most effective ways to implement an address normalization solution. You can use T5 models for several natural language processing (NLP) tasks, such as summarization, quality analysis, translation, text generation, and more. However, the potentially daunting process of training a custom model prevents many companies from utilizing this relatively simple tool.
Many engineers accustomed to working in the NLP world are happy to use several pre-trained models already available from Hugging Face. The AI community-provided models by Hugging Face do not need additional fine-tuning, nor do they require any custom training.
For Python developers, an easy-to-use PIP package named simpleT5, built on top of PyTorch-lightening, is available for immediate use. The objective of this library is to help you custom-train your own T5 models rather than downloading pre-trained models and using them out of the box. It would improve output accuracy and precisely tune it to the kind of task that you wish to perform.
Building a Pipeline for Address Normalization
In building a pipeline for address normalization, business participants can upload N addresses in bulk using CSV, JSON, or XML files. Your implementation would then convert any non-standard addresses to standard ones.
The word tokenizer in the T5 model (in this example, simpleT5) breaks each address apart and produces tokens. The system consults a dictionary and presents each token as a key. The final output includes the value associated with the key if a match is found. The token appears directly in the output when the system finds no match.
As you can see in the example shown in Image 2, “strt” becomes “ST,” and “Apartment 505” becomes “APT 505”.
Address normalization is an often-overlooked source of lost revenue for many businesses. Bad addressing can result in failed direct mail campaigns, customer dissatisfaction, product mis-shipments, and attendant costs. The T5 model provides an elegant foundation for a powerful AI/ML-based address normalization solution. Pre-defined models are readily available; however, the simple T5 Python package can easily be trained to provide you with an excellent, low-cost solution.