These days legislation increasingly places an obligation on processors of personal information to ensure that information is accurate. Input sanitisation and data validation are critical approaches to maintaining accuracy.
I scrounged information about active South African mobile number prefixes using information from the Wikipedia article on South African telephone numbers, a February 2016 post on the Asterisk.org.za mailing list, as well as HLR validation logs.
After munging all of the info from those sources, the following regular expression – as at today – validates all cellphone numbers using known active prefixes in South Africa and excludes the rest.
I took care to try and ensure the regex is as character compact as possible while still providing for all known 3 and 4 digit prefixes.
The regex is:
0((60[3-9]|64[0-5]|66[0-5])\d{6}|(7[1-4689]|6[1-3]|8[1-4])\d{7})
An example usage (including start and ending terminators) would be:
SELECT * FROM cellnumber_listing WHERE NOT (trim(celltelephone) ~ '^0((60[3-9]|64[0-5]|66[0-5])\d{6}|(7[1-4689]|6[1-3]|8[1-4])\d{7})$');
Or for Perl Compatible Regular Expression (PCRE):
grep -P '0((60[3-9]|64[0-5]|66[0-5])\d{6}|(7[1-4689]|6[1-3]|8[1-4])\d{7})' file.txt
If you have any comments on the above or suggestions how it could be improved, please comment!
Hey Bruce, goo job man, just one question, what if you want one regex that does work on both cases above, (e.g) the one for +27… or just 07…..?
This one just does the case which starts with 0.
If you also want to handle +27, then use this:
(\+27|0)((60[3-9]|64[0-5])\d{6}|(7[1-4689]|6[1-3]|8[1-4])\d{7})
Your regex is out of date.
You missed. 0660 – 0665: Cellular: Used by Vodacom
See https://en.wikipedia.org/wiki/Telephone_numbers_in_South_Africa
Thanks