Recently there was a request to create a function that would parse out the city, state and postal code information from a single string into separate columns for both US and Canadian addresses. There are obviously many different ways to approach this problem, and in my first few attempts I tried using the PATINDEX function with regular expression to first extract the postal code and then to identify the state and city in that order. Since using regular expressions in T-SQL does not quite work the same as using regular expression in managed code, I was finding it difficult to account for all the different variations without explicitly defining each and everyone of them first. The comments in the code below explain some of these variations.

I wanted to keep it simple avoid solutions involving CLR or using OLE functions. Yes, a CLR solution would probably be the best in terms of both performance and robustness, but I wanted to keep the solution within the scope of a single T-SQL function and I think the solution below works quite well.

Below you’ll find the complete code for the function along with comments and usage documentation. Please take note of the assumptions being made about the data in the code comments notes section. I did test this against a large data set with many different variations which are also mentioned in the code comments. If you come across any examples that don’t parse correctly with this code, please let me know in a comment below. Hope you enjoy.
 

SET ANSI_NULLS ON; 
GO
SET QUOTED_IDENTIFIER ON; 
GO
IF EXISTS (SELECT * FROM [sys].[objects] WHERE [object_id] = OBJECT_ID(N'[dbo].[ParseAddress]') AND [type] IN (N'FN', N'IF', N'TF', N'FS', N'FT')) 
BEGIN 
	DROP FUNCTION [dbo].[ParseAddress]; 
END; 
GO 

/*========================================================================================================= 
Script:		dbo.ParseAddress.sql 

Synopsis:	Function to parse an address string containing city, state and zip code information. This 
			should work for any United States or Canada address string. Tested using multiple variations 
			including 5-digit zip codes, 5-digit zip code + 4-digit extension separated with 1+ space(s),
			hyphen, or no separation and Canadian 6-character postal codes with the same variations. This
			function will also discard non-alpha characters and multiple spaces used to separate city, 
			state and zip code information.

Usage:		SELECT	 [Address] 
					,[City]  = [dbo].[ParseAddress]([Address], N'City') 
					,[State] = [dbo].[ParseAddress]([Address], N'State') 
					,[Zip]   = [dbo].[ParseAddress]([Address], N'Zip') 

Notes:		The following assumptions are made about the data for parsing: 

					- Zip code information is the last part of the string
					- Zip code will always contain 5+ characters. 
					- State will always be two letters. 
					- City will always end with a letter. 

=========================================================================================================== 
Revision History: 

Date			Author				Description 
----------------------------------------------------------------------------------------------------------- 


===========================================================================================================*/ 
CREATE FUNCTION [dbo].[ParseAddress] 
( 
	@String NVARCHAR(64), 
	@Get NVARCHAR(64) 
) 
RETURNS NVARCHAR(64) 
AS 
BEGIN 
	DECLARE @Address AS NVARCHAR(64); 
	DECLARE @City    AS NVARCHAR(25); 
	DECLARE @State   AS NVARCHAR( 2); 
	DECLARE @Zip     AS NVARCHAR(10); 
	DECLARE @Index   AS TINYINT     ; 
	DECLARE @Char    AS NCHAR(    1); 
	DECLARE @Value   AS NVARCHAR(64); 

	-- Remove any leading or trailing white space
 	SET @Address = LTRIM(RTRIM(@String)); 
	-- Initialize string index
	SET @Index = 1; 
	
	WHILE (@Index <= LEN(@Address)) 
	BEGIN 
		SET @Char = SUBSTRING(REVERSE(@Address), @Index, 1); 

		IF (@Zip IS NULL OR LEN(@Zip) < 5) 
		BEGIN 
			-- Continue reading valid characters for @Zip
			WHILE (PATINDEX(N'[a-zA-Z0-9]', @Char) = 1) 
			BEGIN 
				SET @Zip = ISNULL(STUFF(@Zip, 1, 0, UPPER(@Char)), UPPER(@Char)); 
				SET @Index = @Index + 1; 
				SET @Char = SUBSTRING(REVERSE(@Address), @Index, 1); 
			END; 
		END; 

		IF (@State IS NULL OR LEN(@State) <> 2) 
		BEGIN 
			-- Continue reading valid characters for @State
			WHILE (PATINDEX(N'[a-zA-Z]', @Char) = 1) 
			BEGIN 
				SET @State = ISNULL(STUFF(@State, 1, 0, UPPER(@Char)), UPPER(@Char)); 
				SET @Index = @Index + 1; 
				SET @Char = SUBSTRING(REVERSE(@Address), @Index, 1); 
			END; 
		END; 

		-- The last character of city should be an alpha character
		IF (PATINDEX(N'[a-zA-Z]', @Char) = 1) 
		BEGIN 
			-- Just assign the rest of the string to the @City variable
			SET @City = SUBSTRING(@Address, 1, LEN(@Address) - @Index + 1); 
			BREAK; 
		END; 

		SET @Index = @Index + 1; 
	END; 

	-- Removes double-spaces from the city name 
	WHILE (CHARINDEX(SPACE(2), @City) > 0) 
	BEGIN 
		SET @City = REPLACE(@City, SPACE(2), SPACE(1)); 
	END; 

	-- Format US Postal Codes that have 4 digit extension by stuffing the hyphen in correct position 
	IF (PATINDEX(N'[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]', @Zip) = 1) 
	BEGIN 
		SET @Zip = STUFF(@Zip, 6, 0, N'-'); 
	END; 

	-- Format Canadian Postal Codes by stuffing a space in correct position 
	IF (PATINDEX(N'[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]', @Zip) = 1) 
	BEGIN 
		SET @Zip = STUFF(@Zip, 4, 0, N' '); 
	END; 

	IF (@Get = N'City')  SET @Value = @City; 
	IF (@Get = N'State') SET @Value = @State; 
	IF (@Get = N'Zip')   SET @Value = @Zip; 

	RETURN @Value; 
END; 
GO