return only Digits 0-9 from a String

I need a regular expression that I can use in VBScript and .NET that will return only the numbers that are found in a string.

For Example any of the following "strings" should return only 1231231234

  • 123 123 1234
  • (123) 123-1234
  • 123-123-1234
  • (123)123-1234
  • 123.123.1234
  • 123 123 1234
  • 1 2 3 1 2 3 1 2 3 4

This will be used in an email parser to find telephone numbers that customers may provide in the email and do a database search.

I may have missed a similar regex but I did search on regexlib.com.

[EDIT] - Added code generated by RegexBuddy after setting up musicfreak's answer

VBScript Code

Dim myRegExp, ResultString
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "[^\d]"
ResultString = myRegExp.Replace(SubjectString, "")

VB.NET

Dim ResultString As String
Try
Dim RegexObj As New Regex("[^\d]")
ResultString = RegexObj.Replace(SubjectString, "")
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try

C#

string resultString = null;
try {
Regex regexObj = new Regex(@"[^\d]");
resultString = regexObj.Replace(subjectString, "");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
81794 次浏览

Have you gone through the phone nr category on regexlib. Seems like quite a few do what you need.

By the looks of things, your trying to catch any 10 digit phone number....

Why not do a string replace first of all on the text to remove any of the following characters.

<SPACE> , . ( ) - [ ]

Then afterwards, you can just do a regex search for a 10 digit number.

\d{10}

I don't know if VBScript has some kind of a "regular expression replace" function, but if it does, then you could do something like this pseudocode:

reg_replace(/\D+/g, '', your_string)

I don't know VBScript so I can't give you the exact code but this would remove anything that is not a number.

EDIT: Make sure to have the global flag (the "g" at the end of the regexp), otherwise it will only match the first non-number in your string.

In .NET, you could extract just the digits from the string. Using Linq like this:

string justNumbers = new String(text.Where(Char.IsDigit).ToArray());

Don't forget to include using System.Linq

Note: you've only solved half the problem here.

For US phone numbers entered "in the wild", you may have:

  • Phone numbers with or without the "1" prefix
  • Phone numbers with or without the area code
  • Phone numbers with extension numbers (if you blindly remove all non-digits, you'll miss the "x" or "Ext." or whatever also on the line).
  • Possibly, numbers encoded with mnemonic letters (800-BUY-THIS or whatever)

You'll need to add some smarts to your code to conform the resulting list of digits to a single standard that you actually search against in your database.

Some simple things you could do to fix this:

  • Before the RegEx removal of non-digits, see if there's an "x" in the string. If there is, chop everything off after it (will handle most versions of writing an extension number).

  • For any number with 10+ digits beginning with a "1", chop off the 1. It's not part of the area code, US area codes start in the 2xx range.

  • For any number still exceeding 10 digits, assume the remainder is an extension of some sort, and chop it off.

  • Do your database search using an "ends-with" pattern search (SELECT * FROM mytable WHERE phonenumber LIKE 'blah%'). This will handle sitations (although with the possibility of error) where the area code is not provided, but your database has the number with the area code.

In respect to the points made by richardtallent, this code will handle most of your issues in respect to extension numbers, and the US country code (+1) being prepended.

Not the most elegant solution, but I had to quickly solve the problem so I could move on with what I'm doing.

I hope it helps someone.

 Public Shared Function JustNumbers(inputString As String) As String
Dim outString As String = ""
Dim nEnds As Integer = -1


' Cycle through and test the ASCII character code of each character in the string. Remove everything non-numeric except "x" (in the event an extension is in the string as follows):
'    331-123-3451 extension 405  becomes 3311233451x405
'    226-123-4567 ext 405        becomes 2261234567x405
'    226-123-4567 x 405          becomes 2261234567x405
For l = 1 To inputString.Length
Dim tmp As String = Mid(inputString, l, 1)
If (Asc(tmp) >= 48 And Asc(tmp) <= 57) Then
outString &= tmp
ElseIf Asc(tmp.ToLower) = 120
outString &= tmp
nEnds = l
End If
Next




' Remove the leading US country code 1 after doing some validation
If outString.Length > 0 Then
If Strings.Left(outString, 1) = "1" Then


' If the nEnds flag is still -1, that means no extension was added above, set it to the full length of the string
' otherwise, an extension number was detected, and that should be the nEnds (number ends) position.
If nEnds = -1 Then nEnds = outString.Length


' We hit a 10+ digit phone number, this means an area code is prefixed;
' Remove the trailing 1 in case someone put in the US country code
' This is technically safe, since there are no US area codes that start with a 1. The start digits are 2-9
If nEnds > 10 Then
outString = Right(outString, outString.Length - 1)
End If
End If
End If


Debug.Print(inputString + "          : became : " + outString)


Return outString
End Function

As an alternative to the main .Net solution, adapted from a similar question's answer:

string justNumbers = string.Concat(text.Where(char.IsDigit));

The simplest solution, without a regular expression:

public string DigitsOnly(string s)
{
string res = "";
for (int i = 0; i < s.Length; i++)
{
if (Char.IsDigit(s[i]))
res += s[i];
}
return res;
}