I'd have to strongly second the answer from Frank Krueger.
Whilst you say you need a regular expression to match an IPv6 address, I'm assuming what you really need is to be able to check if a given string is a valid IPv6 address. There is a subtle but important distinction here.
There is more than one way to check if a given string is a valid IPv6 address and regular expression matching is only one solution.
Use an existing library if you can. The library will have fewer bugs and its use will result in less code for you to maintain.
The regular expression suggested by Factor Mystic is long and complex. It most likely works, but you should also consider how you'd cope if it unexpectedly fails. The point I'm trying to make here is that if you can't form a required regular expression yourself you won't be able to easily debug it.
If you have no suitable library it may be better to write your own IPv6 validation routine that doesn't depend on regular expressions. If you write it you understand it and if you understand it you can add comments to explain it so that others can also understand and subsequently maintain it.
Act with caution when using a regular expression whose functionality you can't explain to someone else.
I don't think you have to have IPv6 compiled in to Python to get inet_pton, which can also parse IPv4 addresses if you pass in socket.AF_INET as the first parameter. Note: this may not work on non-Unix systems.
to answer "is a valid ipv6" it look like ok to me. To break it down in parts... forget it. I've omitted the unspecified one (::) since there is no use to have "unpecified adress" in my database.
the beginning:
^([0-9A-Fa-f]{0,4}:){2,7} <-- match the compressible part, we can translate this as: between 2 and 7 colon who may have heaxadecimal number between them.
followed by:
[0-9A-Fa-f]{1,4}$ <-- an hexadecimal number (leading 0 omitted)
OR
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4} <-- an Ipv4 adress
A simple regex that will match, but I wouldn't recommend for validation of any sort is this:
([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}
Note this matches compression anywhere in the address, though it won't match the loopback address ::1. I find this a reasonable compromise in order to keep the regex simple.
I successfully use this in iTerm2 smart selection rules to quad-click IPv6 addresses.
I was unable to get @Factor Mystic's answer to work with POSIX regular expressions, so I wrote one that works with POSIX regular expressions and PERL regular expressions.
Beware! In Java, the use of InetAddress and related classes (Inet4Address, Inet6Address, URL) may involve network trafic! E.g. DNS resolving (URL.equals, InetAddress from string!). This call may take long and is blocking!
For IPv6 I have something like this. This of course does not handle the very subtle details of IPv6 like that zone indices are allowed only on some classes of IPv6 addresses. And this regex is not written for group capturing, it is only a "matches" kind of regexp.
S - IPv6 segment = [0-9a-f]{1,4}
I - IPv4 = (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})
Schematic (first part matches IPv6 addresses with IPv4 suffix, second part matches IPv6 addresses, last patrt the zone index):
Looking at the patterns included in the other answers there are a number of good patterns that can be improved by referencing groups and utilizing lookaheads. Here is an example of a pattern that is self referencing that I would utilize in PHP if I had to:
^(?<hgroup>(?<hex>[[:xdigit:]]{0,4}) # grab a sequence of up to 4 hex digits
# and name this pattern for usage later
(?<!:::):{1,2}) # match 1 or 2 ':' characters
# as long as we can't match 3
(?&hgroup){1,6} # match our hex group 1 to 6 more times
(?:(?:
# match an ipv4 address or
(?<dgroup>2[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3}(?&dgroup)
# match our hex group one last time
|(?&hex))$
Note: PHP has a built in filter for this which would be a better solution than this
pattern.
Here's what I came up with, using a bit of lookahead and named groups. This is of course just IPv6, but it shouldn't interfere with additional patterns if you want to add IPv4:
It is difficult to find a regular expression which works for all IPv6 cases. They are usually hard to maintain, not easily readable and may cause performance problems. Hence, I want to share an alternative solution which I have developed: Regular Expression (RegEx) for IPv6 Separate from IPv4
Now you may ask that "This method only finds IPv6, how can I find IPv6 in a text or file?" Here are methods for this issue too.
Note: If you do not want to use IPAddress class in .NET, you can also replace it with my method. It also covers mapped IPv4 and special cases too, while IPAddress does not cover.
class IPv6
{
public List<string> FindIPv6InFile(string filePath)
{
Char ch;
StringBuilder sbIPv6 = new StringBuilder();
List<string> listIPv6 = new List<string>();
StreamReader reader = new StreamReader(filePath);
do
{
bool hasColon = false;
int length = 0;
do
{
ch = (char)reader.Read();
if (IsEscapeChar(ch))
break;
//Check the first 5 chars, if it has colon, then continue appending to stringbuilder
if (!hasColon && length < 5)
{
if (ch == ':')
{
hasColon = true;
}
sbIPv6.Append(ch.ToString());
}
else if (hasColon) //if no colon in first 5 chars, then dont append to stringbuilder
{
sbIPv6.Append(ch.ToString());
}
length++;
} while (!reader.EndOfStream);
if (hasColon && !listIPv6.Contains(sbIPv6.ToString()) && IsIPv6(sbIPv6.ToString()))
{
listIPv6.Add(sbIPv6.ToString());
}
sbIPv6.Clear();
} while (!reader.EndOfStream);
reader.Close();
reader.Dispose();
return listIPv6;
}
public List<string> FindIPv6InText(string text)
{
StringBuilder sbIPv6 = new StringBuilder();
List<string> listIPv6 = new List<string>();
for (int i = 0; i < text.Length; i++)
{
bool hasColon = false;
int length = 0;
do
{
if (IsEscapeChar(text[length + i]))
break;
//Check the first 5 chars, if it has colon, then continue appending to stringbuilder
if (!hasColon && length < 5)
{
if (text[length + i] == ':')
{
hasColon = true;
}
sbIPv6.Append(text[length + i].ToString());
}
else if (hasColon) //if no colon in first 5 chars, then dont append to stringbuilder
{
sbIPv6.Append(text[length + i].ToString());
}
length++;
} while (i + length != text.Length);
if (hasColon && !listIPv6.Contains(sbIPv6.ToString()) && IsIPv6(sbIPv6.ToString()))
{
listIPv6.Add(sbIPv6.ToString());
}
i += length;
sbIPv6.Clear();
}
return listIPv6;
}
bool IsEscapeChar(char ch)
{
if (ch != ' ' && ch != '\r' && ch != '\n' && ch!='\t')
{
return false;
}
return true;
}
bool IsIPv6(string maybeIPv6)
{
IPAddress ip;
if (IPAddress.TryParse(maybeIPv6, out ip))
{
return ip.AddressFamily == AddressFamily.InterNetworkV6;
}
else
{
return false;
}
}
}
libraryDependencies += "commons-validator" % "commons-validator" % "1.4.1"
import org.apache.commons.validator.routines._
/**
* Validates if the passed ip is a valid IPv4 or IPv6 address.
*
* @param ip The IP address to validate.
* @return True if the passed IP address is valid, false otherwise.
*/
def ip(ip: String) = InetAddressValidator.getInstance().isValid(ip)
Following the test's of the method ip(ip: String):
"The `ip` validator" should {
"return false if the IPv4 is invalid" in {
ip("123") must beFalse
ip("255.255.255.256") must beFalse
ip("127.1") must beFalse
ip("30.168.1.255.1") must beFalse
ip("-1.2.3.4") must beFalse
}
"return true if the IPv4 is valid" in {
ip("255.255.255.255") must beTrue
ip("127.0.0.1") must beTrue
ip("0.0.0.0") must beTrue
}
//IPv6
//@see: http://www.ronnutter.com/ipv6-cheatsheet-on-identifying-valid-ipv6-addresses/
"return false if the IPv6 is invalid" in {
ip("1200::AB00:1234::2552:7777:1313") must beFalse
}
"return true if the IPv6 is valid" in {
ip("1200:0000:AB00:1234:0000:2552:7777:1313") must beTrue
ip("21DA:D3:0:2F3B:2AA:FF:FE28:9C5A") must beTrue
}
}
I generated the following using python and works with the re module. The look-ahead assertions ensure that the correct number of dots or colons appear in the address. It does not support IPv4 in IPv6 notation.
pattern = '^(?=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$)(?:(?:25[0-5]|[12][0-4][0-9]|1[5-9][0-9]|[1-9]?[0-9])\.?){4}$|(?=^(?:[0-9a-f]{0,4}:){2,7}[0-9a-f]{0,4}$)(?![^:]*::.+::[^:]*$)(?:(?=.*::.*)|(?=\w+:\w+:\w+:\w+:\w+:\w+:\w+:\w+))(?:(?:^|:)(?:[0-9a-f]{4}|[1-9a-f][0-9a-f]{0,3})){0,8}(?:::(?:[0-9a-f]{1,4}(?:$|:)){0,6})?$'
result = re.match(pattern, ip)
if result: result.group(0)
Just matching local ones from an origin with square brackets included. I know it's not as comprehensive but in javascript the other ones had difficult to trace issues primarily that of not working, so this seems to get me what I needed for now. extra capitals A-F aren't needed either.
As stated above, another way to get an IPv6 textual representation validating parser is to use programming. Here is one that is fully compliant with RFC-4291 and RFC-5952. I've written this code in ANSI C (works with GCC, passed tests on Linux - works with clang, passed tests on FreeBSD). Thus, it does only rely on the ANSI C standard library, so it can be compiled everywhere (I've used it for IPv6 parsing inside a kernel module with FreeBSD).
// IPv6 textual representation validating parser fully compliant with RFC-4291 and RFC-5952
// BSD-licensed / Copyright 2015-2017 Alexandre Fenyo
#include <string.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
typedef enum { false, true } bool;
static const char hexdigits[] = "0123456789abcdef";
static int digit2int(const char digit) {
return strchr(hexdigits, digit) - hexdigits;
}
// This IPv6 address parser handles any valid textual representation according to RFC-4291 and RFC-5952.
// Other representations will return -1.
//
// note that str input parameter has been modified when the function call returns
//
// parse_ipv6(char *str, struct in6_addr *retaddr)
// parse textual representation of IPv6 addresses
// str: input arg
// retaddr: output arg
int parse_ipv6(char *str, struct in6_addr *retaddr) {
bool compressed_field_found = false;
unsigned char *_retaddr = (unsigned char *) retaddr;
char *_str = str;
char *delim;
bzero((void *) retaddr, sizeof(struct in6_addr));
if (!strlen(str) || strchr(str, ':') == NULL || (str[0] == ':' && str[1] != ':') ||
(strlen(str) >= 2 && str[strlen(str) - 1] == ':' && str[strlen(str) - 2] != ':')) return -1;
// convert transitional to standard textual representation
if (strchr(str, '.')) {
int ipv4bytes[4];
char *curp = strrchr(str, ':');
if (curp == NULL) return -1;
char *_curp = ++curp;
int i;
for (i = 0; i < 4; i++) {
char *nextsep = strchr(_curp, '.');
if (_curp[0] == '0' || (i < 3 && nextsep == NULL) || (i == 3 && nextsep != NULL)) return -1;
if (nextsep != NULL) *nextsep = 0;
int j;
for (j = 0; j < strlen(_curp); j++) if (_curp[j] < '0' || _curp[j] > '9') return -1;
if (strlen(_curp) > 3) return -1;
const long val = strtol(_curp, NULL, 10);
if (val < 0 || val > 255) return -1;
ipv4bytes[i] = val;
_curp = nextsep + 1;
}
sprintf(curp, "%x%02x:%x%02x", ipv4bytes[0], ipv4bytes[1], ipv4bytes[2], ipv4bytes[3]);
}
// parse standard textual representation
do {
if ((delim = strchr(_str, ':')) == _str || (delim == NULL && !strlen(_str))) {
if (delim == str) _str++;
else if (delim == NULL) return 0;
else {
if (compressed_field_found == true) return -1;
if (delim == str + strlen(str) - 1 && _retaddr != (unsigned char *) (retaddr + 1)) return 0;
compressed_field_found = true;
_str++;
int cnt = 0;
char *__str;
for (__str = _str; *__str; ) if (*(__str++) == ':') cnt++;
unsigned char *__retaddr = - 2 * ++cnt + (unsigned char *) (retaddr + 1);
if (__retaddr <= _retaddr) return -1;
_retaddr = __retaddr;
}
} else {
char hexnum[4] = "0000";
if (delim == NULL) delim = str + strlen(str);
if (delim - _str > 4) return -1;
int i;
for (i = 0; i < delim - _str; i++)
if (!isxdigit(_str[i])) return -1;
else hexnum[4 - (delim - _str) + i] = tolower(_str[i]);
_str = delim + 1;
*(_retaddr++) = (digit2int(hexnum[0]) << 4) + digit2int(hexnum[1]);
*(_retaddr++) = (digit2int(hexnum[2]) << 4) + digit2int(hexnum[3]);
}
} while (_str < str + strlen(str));
return 0;
}
Regexes for ipv6 can get really tricky when you consider addresses with embedded ipv4 and addresses that are compressed, as you can see from some of these answers.
The open-source IPAddress Java library will validate all standard representations of IPv6 and IPv4 and also supports prefix-length (and validation of such). Disclaimer: I am the project manager of that library.