Excel Formulas For Common Data Cleanup

For years, I’ve used the publication as a resource to describe how to do things and keep a record for myself to look up later! A client handed us a customer data file that was a disaster. Virtually every field was misformatted, and as a result, we could not import the data. While there are some great add-ons for Excel to do the cleanup using Visual Basic, we run Office for Mac, which won’t support macros. Instead, we look for straight formulas to assist. I thought I’d share some of those here so you can use them.
Remove Non-Numeric Characters
Systems often require phone numbers inserted in a specific 11-digit formula with the country code and no punctuation. However, folks often enter this data with dashes and periods instead. Here’s an excellent formula for removing all non-numeric characters in Excel. The formula reviews the data in cell A2:
=IF(A2="","",SUMPRODUCT(MID(0&A2,LARGE(INDEX(ISNUMBER(--MID(A2,ROW($1:$25),1))*
ROW($1:$25),0),ROW($1:$25))+1,1)*10^ROW($1:$25)/10))
You can copy the resulting column and use Edit > Paste Values to write over the data with the properly formatted result.
Evaluate Multiple Fields with an OR
We often purge incomplete records from an import. Users don’t realize that you don’t always have to write complex hierarchical formulas and that you can write an OR statement instead. I want to check A2, B2, C2, D2, or E2 for missing data in the example below. If any data is missing, I will return a 0; otherwise, a 1. That will allow me to sort the data and delete the incomplete records.
=IF(OR(A2="",B2="",C2="",D2="",E2=""),0,1)
Trim and Concatenate Fields
If your data has First and Last Name fields, but your import has a full name field, you can concatenate the fields together neatly using the built-in Excel Function Concatenate, but be sure to use TRIM to remove any empty spaces before or after the text. We wrap the entire field with TRIM if one of the fields doesn’t have data:
=TRIM(CONCATENATE(TRIM(A1)," ",TRIM(B1)))
Check for Valid Email Address
A pretty simple formula that looks for both the @ and . in an email address (not the RFC standard):
=AND(FIND(“@”,A2),FIND(“.”,A2),ISERROR(FIND(” “,A2)))
Extract First and Last Names
Sometimes, the problem is the opposite. Your data has a full name field, but you need to parse out the first and last names. These formulas look for the space between the first and last name and grab text where necessary. It also handles if there is no last name or a blank entry in A2.
=IFERROR(IF(SEARCH(" ",A2,1),LEFT(A2, SEARCH(" ",A2,1)),A2),IF(LEN(A2)>0,A2,""))
And the last name:
=IFERROR(IF(SEARCH(" ",A2,1),RIGHT(A2,LEN(A2)-SEARCH(" ",A2,1)),A2),"")
Limit the Number of Characters and Add …
Did you ever want to clean up your meta descriptions? If you want to pull content into Excel and then trim the content for use in a Meta Description field (150 to 160 characters), you can do that using this formula. It cleanly breaks the description at a space and then adds the …:
=IF(LEN(A1)>155,LEFT(A1,FIND("*",SUBSTITUTE(A1," ","*",LEN(LEFT(A1,154))-LEN(SUBSTITUTE(LEFT(A1,154)," ",""))))) & IF(LEN(A1)>FIND("*",SUBSTITUTE(A1," ","*",LEN(LEFT(A1,154))-LEN(SUBSTITUTE(LEFT(A1,154)," ","")))),"…",""),A1)
Of course, these aren’t meant to be comprehensive… just some quick formulae to help you get a jump start! What other formulas do you find yourself using? Add them in the comments, and I’ll give you credit as I update this article.