检查字符串是否为有效URL的最佳正则表达式是什么?

如何检查给定字符串是否为有效的URL地址?

我对正则表达式的了解是基本的,不允许我从我已经在网络上看到的数百个正则表达式中进行选择。

633350 次浏览

文章获取URL的部分(正则表达式)讨论了解析URL以识别其各种组件。如果您想检查URL是否格式良好,它应该足以满足您的需求。

如果你需要检查它是否有效,你最终必须尝试访问另一端的任何内容。

不过,一般来说,您最好使用框架或其他库提供给您的函数。许多平台都包含解析URL的函数。例如,有Python的urlparse模块,在. NET中,您可以使用System. Uri类的构造函数作为验证URL的手段。

什么平台?如果使用. NET,请使用System.Uri.TryCreate,而不是regex。

例如:

static bool IsValidUrl(string urlString){Uri uri;return Uri.TryCreate(urlString, UriKind.Absolute, out uri)&& (uri.Scheme == Uri.UriSchemeHttp|| uri.Scheme == Uri.UriSchemeHttps|| uri.Scheme == Uri.UriSchemeFtp|| uri.Scheme == Uri.UriSchemeMailto/*...*/);}
// In test fixture...
[Test]void IsValidUrl_Test(){Assert.True(IsValidUrl("http://www.example.com"));Assert.False(IsValidUrl("javascript:alert('xss')"));Assert.False(IsValidUrl(""));Assert.False(IsValidUrl(null));}

(感谢@王耀西关于javascript:的提示)

如果你真的搜索最终匹配,你可能会在“一个好的URL正则表达式?”上找到它。

但是真正匹配所有可能的域并允许根据RFC允许的任何内容的正则表达式非常长且不可读,相信我;-)

非验证URI引用解析器

出于参考目的,这是IETF规范:(TXT|超文本标记语言)。特别是,附录B.用正则表达式解析URI引用演示了如何解析有效正则表达式。这被描述为,

举一个非验证URI引用解析器的例子,该解析器将采用任何给定的字符串并提取URI组件。

以下是他们提供的regex:

 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

正如其他人所说,最好将其留给您已经在使用的lib/框架。

这就是RegexBuddy的用法。

(\b(https?|ftp|file)://)?[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|]

它匹配下面的这些(在** **标记内):

**http://www.regexbuddy.com****http://www.regexbuddy.com/****http://www.regexbuddy.com/index.html****http://www.regexbuddy.com/index.html?source=library****http://www.regexbuddy.com/index.html?source=library#copyright**

您可以在http://www.regexbuddy.com/download.html下载RegexBuddy。

我编写了我的URL(实际上是IRI,国际化)模式以符合RFC 3987(http://www.faqs.org/rfcs/rfc3987.html)。这些是PCRE语法。

对于绝对IRI(国际化):

/^[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i

也允许相对IRI:

/^(?:[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i

它们是如何编译的(在PHP中):

<?php
/* Regex convenience functions (character class, non-capturing group) */function cc($str, $suffix = '', $negate = false) {return '[' . ($negate ? '^' : '') . $str . ']' . $suffix;}function ncg($str, $suffix = '') {return '(?:' . $str . ')' . $suffix;}
/* Preserved from RFC3986 */
$ALPHA = 'a-z';$DIGIT = '0-9';$HEXDIG = $DIGIT . 'a-f';
$sub_delims = '!\\$&\'\\(\\)\\*\\+,;=';$gen_delims = ':\\/\\?\\#\\[\\]@';$reserved = $gen_delims . $sub_delims;$unreserved = '-' . $ALPHA . $DIGIT . '\\._~';
$pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG);
$dec_octet = ncg(implode('|', array(cc($DIGIT),cc('1-9') . cc($DIGIT),'1' . cc($DIGIT) . cc($DIGIT),'2' . cc('0-4') . cc($DIGIT),'25' . cc('0-5'))));
$IPv4address = $dec_octet . ncg('\\.' . $dec_octet, '{3}');
$h16 = cc($HEXDIG, '{1,4}');$ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address);
$IPv6address = ncg(implode('|', array(ncg($h16 . ':', '{6}') . $ls32,'::' . ncg($h16 . ':', '{5}') . $ls32,ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32,ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32,ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32,ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32,ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32,ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16,ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::',)));
$IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+');
$IP_literal = '\\[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . '\\]';
$port = cc($DIGIT, '*');
$scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '\\+\\.'), '*');
/* New or changed in RFC3987 */
$iprivate = '\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}';
$ucschar = '\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}' .'\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}' .'\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}' .'\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}' .'\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}' .'\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}';
$iunreserved = '-' . $ALPHA . $DIGIT . '\\._~' . $ucschar;
$ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@'));
$ifragment = ncg($ipchar . '|' . cc('\\/\\?'), '*');
$iquery = ncg($ipchar . '|' . cc($iprivate . '\\/\\?'), '*');
$isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+');$isegment_nz = ncg($ipchar, '+');$isegment = ncg($ipchar, '*');
$ipath_empty = '(?!' . $ipchar . ')';$ipath_rootless = ncg($isegment_nz) . ncg('\\/' . $isegment, '*');$ipath_noscheme = ncg($isegment_nz_nc) . ncg('\\/' . $isegment, '*');$ipath_absolute = '\\/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment )$ipath_abempty = ncg('\\/' . $isegment, '*');
$ipath = ncg(implode('|', array($ipath_abempty,$ipath_absolute,$ipath_noscheme,$ipath_rootless,$ipath_empty))) . ')';
$ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*');
$ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name)));$iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*');$iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?');
$irelative_part = ncg(implode('|', array('\\/\\/' . $iauthority . $ipath_abempty . '','' . $ipath_absolute . '','' . $ipath_noscheme . '','' . $ipath_empty . '')));
$irelative_ref = $irelative_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?');
$ihier_part = ncg(implode('|', array('\\/\\/' . $iauthority . $ipath_abempty . '','' . $ipath_absolute . '','' . $ipath_rootless . '','' . $ipath_empty . '')));
$absolute_IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?');
$IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?');
$IRI_reference = ncg($IRI . '|' . $irelative_ref);

2011年3月7日编辑:由于PHP处理引用字符串中的反斜杠的方式,默认情况下这些反斜杠是不可用的。您需要双转义反斜杠,除非反斜杠在正则表达式中具有特殊含义。您可以这样做:

$escape_backslash = '/(?<!\\)\\(?![\[\]\\\^\$\.\|\*\+\(\)QEnrtaefvdwsDWSbAZzB1-9GX]|x\{[0-9a-f]{1,4}\}|\c[A-Z]|)/';$absolute_IRI = preg_replace($escape_backslash, '\\\\', $absolute_IRI);$IRI = preg_replace($escape_backslash, '\\\\', $IRI);$IRI_reference = preg_replace($escape_backslash, '\\\\', $IRI_reference);

关于眼睑'的回答帖子,上面写着“这是基于我对URI规范的阅读。”:感谢Eyelidness,你的是我寻求的完美解决方案,因为它基于URI规范!出色的工作。:)

我不得不做两个修改。第一个是让正则表达式在PHP(v5.2.10)中正确匹配IP地址URL与preg_match()函数。

我不得不在管道周围的“IP地址”上方的行中添加一组括号:

)|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?#

不知道为什么。

我还将顶级域的最小长度从3个字母减少到2个字母以支持。co.uk和类似的。

最终代码:

/^(https?|ftp):\/\/(?#                                      protocol)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?#         username)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?#      password)@)?(?#                                                     auth requires @)((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?#             domain segments AND)[a-z][a-z0-9-]*[a-z0-9](?#                                 top level domain  OR)|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?#)(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?#             IP address))(:\d+)?(?#                                                port))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path)(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?#      query string)?)?)?(?#                                                   path and query string optional)(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?#      fragment)$/i

这个修改后的版本没有根据URI规范进行检查,所以我不能保证它的合规性,它被修改为处理本地网络环境中的URL和两位数的TLD以及其他类型的Web URL,并在我使用的PHP设置中更好地工作。

作为php代码:

define('URL_FORMAT','/^(https?):\/\/'.                                         // protocol'(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'.         // username'(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'.      // password'@)?(?#'.                                                  // auth requires @')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'.                      // domain segments AND'[a-z][a-z0-9-]*[a-z0-9]'.                                 // top level domain  OR'|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'.'(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'.                 // IP address')(:\d+)?'.                                                // port')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path'(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'.      // query string'?)?)?'.                                                   // path and query string optional'(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'.      // fragment'$/i');

这是一个PHP测试程序,它使用正则表达式验证各种URL:

<?php
define('URL_FORMAT','/^(https?):\/\/'.                                         // protocol'(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'.         // username'(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'.      // password'@)?(?#'.                                                  // auth requires @')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'.                      // domain segments AND'[a-z][a-z0-9-]*[a-z0-9]'.                                 // top level domain  OR'|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'.'(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'.                 // IP address')(:\d+)?'.                                                // port')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path'(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'.      // query string'?)?)?'.                                                   // path and query string optional'(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'.      // fragment'$/i');
/*** Verify the syntax of the given URL.** @access public* @param $url The URL to verify.* @return boolean*/function is_valid_url($url) {if (str_starts_with(strtolower($url), 'http://localhost')) {return true;}return preg_match(URL_FORMAT, $url);}

/*** String starts with something** This function will return true only if input string starts with* niddle** @param string $string Input string* @param string $niddle Needle string* @return boolean*/function str_starts_with($string, $niddle) {return substr($string, 0, strlen($niddle)) == $niddle;}

/*** Test a URL for validity and count results.* @param url url* @param expected expected result (true or false)*/
$numtests = 0;$passed = 0;
function test_url($url, $expected) {global $numtests, $passed;$numtests++;$valid = is_valid_url($url);echo "URL Valid?: " . ($valid?"yes":"no") . " for URL: $url. Expected: ".($expected?"yes":"no").". ";if($valid == $expected) {echo "PASS\n"; $passed++;} else {echo "FAIL\n";}}
echo "URL Tests:\n\n";
test_url("http://localserver/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true);test_url("http://www.google.com", true);test_url("http://www.google.co.uk/projects/my%20folder/test.php", true);test_url("https://myserver.localdomain", true);test_url("http://192.168.1.120/projects/index.php", true);test_url("http://192.168.1.1/projects/index.php", true);test_url("http://projectpier-server.localdomain/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true);test_url("https://2.4.168.19/project-pier?c=test&a=b", true);test_url("https://localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true);test_url("http://user:password@localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true);
echo "\n$passed out of $numtests tests passed.\n\n";
?>

再次感谢眼睑的正则表达式!

我一直在写一篇深入的文章,讨论使用正则表达式进行URI验证。它基于RFC3986。

正则表达式URI验证

虽然这篇文章还没有完成,但我已经提出了一个PHP函数,它在验证HTTP和FTP URL方面做得很好。这是当前版本:

// function url_valid($url) { Rev:20110423_2000//// Return associative array of valid URI components, or FALSE if $url is not// RFC-3986 compliant. If the passed URL begins with: "www." or "ftp.", then// "http://" or "ftp://" is prepended and the corrected full-url is stored in// the return array with a key name "url". This value should be used by the caller.//// Return value: FALSE if $url is not valid, otherwise array of URI components:// e.g.// Given: "http://www.jmrware.com:80/articles?height=10&width=75#fragone"// Array(//    [scheme] => http//    [authority] => www.jmrware.com:80//    [userinfo] =>//    [host] => www.jmrware.com//    [IP_literal] =>//    [IPV6address] =>//    [ls32] =>//    [IPvFuture] =>//    [IPv4address] =>//    [regname] => www.jmrware.com//    [port] => 80//    [path_abempty] => /articles//    [query] => height=10&width=75//    [fragment] => fragone//    [url] => http://www.jmrware.com:80/articles?height=10&width=75#fragone// )function url_valid($url) {if (strpos($url, 'www.') === 0) $url = 'http://'. $url;if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url;if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host.^(?P<scheme>[A-Za-z][A-Za-z0-9+\-.]*):\/\/(?P<authority>(?:(?P<userinfo>(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)?(?P<host>(?P<IP_literal>\[(?:(?P<IPV6address>(?:                                                (?:[0-9A-Fa-f]{1,4}:){6}|                                                ::(?:[0-9A-Fa-f]{1,4}:){5}| (?:                          [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}| (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}| (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}| (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::   [0-9A-Fa-f]{1,4}:| (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?P<ls32>[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|   (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::   [0-9A-Fa-f]{1,4}|   (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)| (?P<IPvFuture>[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+))\])| (?P<IPv4address>(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))| (?P<regname>(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+))(?::(?P<port>[0-9]*))?)(?P<path_abempty>(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)(?:\?(?P<query>       (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))?(?:\#(?P<fragment>    (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))?$/mx', $url, $m)) return FALSE;switch ($m['scheme']) {case 'https':case 'http':if ($m['userinfo']) return FALSE; // HTTP scheme does not allow userinfo.break;case 'ftps':case 'ftp':break;default:return FALSE;   // Unrecognized URI scheme. Default to FALSE.}// Validate host name conforms to DNS "dot-separated-parts".if ($m['regname']) { // If host regname specified, check for DNS conformance.if (!preg_match('/# HTTP DNS host name.^                      # Anchor to beginning of string.(?!.{256})             # Overall host length is less than 256 chars.(?:                    # Group dot separated host part alternatives.[A-Za-z0-9]\.        # Either a single alphanum followed by dot|                      # or... part has more than one char (63 chars max).[A-Za-z0-9]          # Part first char is alphanum (no dash).[A-Za-z0-9\-]{0,61}  # Internal chars are alphanum plus dash.[A-Za-z0-9]          # Part last char is alphanum (no dash).\.                   # Each part followed by literal dot.)*                     # Zero or more parts before top level domain.(?:                    # Explicitly specify top level domains.com|edu|gov|int|mil|net|org|biz|info|name|pro|aero|coop|museum|asia|cat|jobs|mobi|tel|travel|[A-Za-z]{2})         # Country codes are exactly two alpha chars.\.?                  # Top level domain can end in a dot.$                      # Anchor to end of string./ix', $m['host'])) return FALSE;}$m['url'] = $url;for ($i = 0; isset($m[$i]); ++$i) unset($m[$i]);return $m; // return TRUE == array of useful named $matches plus the valid $url.}

此函数使用两个正则表达式;一个用于匹配有效泛型URI的子集(具有非空主机的绝对URI),另一个用于验证DNS“dot-分离部分”主机名。尽管此函数目前仅验证HTTP和FTP方案,但它的结构使其可以轻松扩展以处理其他方案。

我认为有些人由于隐含的修饰符而无法使用您的php代码。我按原样复制了您的代码并用作示例:

if(preg_match("/^{$IRI_reference}$/iu",'http://www.url.com')){echo 'true';}

注意“i”和“u”修饰符。没有“u”的php抛出异常说:

Warning: preg_match() [function.preg-match]: Compilation failed: character value in \x{...} sequence is too large at offset XX

我刚刚写了一篇博客文章,介绍了一个很好的解决方案,用于识别最常用格式的URL,例如:

  • www.google.com
  • http://www.google.com
  • mailto:somebody@google.com
  • somebody@google.com
  • www.url-with-querystring.com/?url=has-querystring

使用的正则表达式是:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/

Mathias Bynens有一篇关于许多正则表达式的最佳比较的好文章:寻找完美的URL验证正则表达式

最好的一个发布有点长,但它几乎匹配你可以扔给它的任何东西。

javascript版本

/^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\u00a1-\uffff][a-z0-9\u00a1-\uffff_-]{0,62})?[a-z0-9\u00a1-\uffff]\.)+(?:[a-z\u00a1-\uffff]{2,}\.?))(?::\d{2,5})?(?:[/?#]\S*)?$/i

PHP版本(使用%符号作为分隔符)

%^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\x{00a1}-\x{ffff}][a-z0-9\x{00a1}-\x{ffff}_-]{0,62})?[a-z0-9\x{00a1}-\x{ffff}]\.)+(?:[a-z\x{00a1}-\x{ffff}]{2,}\.?))(?::\d{2,5})?(?:[/?#]\S*)?$%iuS

这是一个相当古老的线程,现在的问题要求一个基于正则表达式的URL验证器。我在寻找完全相同的东西时遇到了这个线程。虽然很可能可以编写一个非常全面的正则表达式来验证URL。我最终决定用另一种方式来做事情-使用PHP的parse_url函数。

如果无法解析url,则返回布尔值false。否则,它将返回方案、主机和其他信息。这可能不足以进行全面的URL检查,但可以深入研究以进行进一步分析。如果目的是简单地捕获拼写错误、无效方案等。这是完全足够的!

function validateURL(textval) {var urlregex = new RegExp("^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&amp;%\$#\=~])*$");return urlregex.test(textval);}

匹配http://www.asdah.com/~joe|ftp://ftp.asdah.co.uk:2828/asdah%20asdah.gif|https://asdah.gov/asdh-ah.as

        function validateURL(textval) {var urlregex = new RegExp("^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&amp;%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&amp;%\$#\=~_\-]+))*$");return urlregex.test(textval);}

匹配http://site.com/dir/file.php?var=moo|ftp://user:pass@site.com: 21/file/dir

不匹配site.com|http://site.com/dir//

我写了个小版本你可以运行

它匹配以下URL(这对我来说已经足够了)

public static void main(args) {String url = "go to http://www.m.abut.ly/abc its awesome"url = url.replaceAll(/https?:\/\/w{0,3}\w*?\.(\w*?\.)?\w{2,3}\S*|www\.(\w*?\.)?\w*?\.\w{2,3}\S*|(\w*?\.)?\w*?\.\w{2,3}[\/\?]\S*/ , { it ->"woof${it}woof"})println url}
http://google.comhttp://google.com/help.phphttp://google.com/help.php?a=5
http://www.google.comhttp://www.google.com/help.phphttp://www.google.com?a=5
google.com?a=5google.com/help.phpgoogle.com/help.php?a=5
http://www.m.google.com/help.php?a=5 (and all its permutations)www.m.google.com/help.php?a=5 (and all its permutations)m.google.com/help.php?a=5 (and all its permutations)

对于任何不以httpwww开头的URL,重要的是它们必须包含/?

我敢打赌,这可以再调整一点,但它的工作非常好,因为它是如此的短和紧凑……因为你几乎可以把它分成3部分:

查找以http开头的任何内容:

https?:\/\/w{0,3}\w*?\.\w{2,3}\S*

查找以www开头的任何内容:

www\.\w*?\.\w{2,3}\S*

或者找到任何必须有文本,然后是点,然后至少有2个字母,然后是?/的东西:

\w*?\.\w{2,3}[\/\?]\S*

我试图制定我的url版本。我的要求是在字符串中捕获实例,其中可能的url可以cse.uom.ac.mu-注意它前面没有超文本传输协议也没有www

String regularExpression = "((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})";
assertTrue("www.google.com".matches(regularExpression));assertTrue("www.google.co.uk".matches(regularExpression));assertTrue("http://www.google.com".matches(regularExpression));assertTrue("http://www.google.co.uk".matches(regularExpression));assertTrue("https://www.google.com".matches(regularExpression));assertTrue("https://www.google.co.uk".matches(regularExpression));assertTrue("google.com".matches(regularExpression));assertTrue("google.co.uk".matches(regularExpression));assertTrue("google.mu".matches(regularExpression));assertTrue("mes.intnet.mu".matches(regularExpression));assertTrue("cse.uom.ac.mu".matches(regularExpression));
//cannot contain 2 '.' after wwwassertFalse("www..dr.google".matches(regularExpression));
//cannot contain 2 '.' just before comassertFalse("www.dr.google..com".matches(regularExpression));
// to test case where url www must be followed with a '.'assertFalse("www:google.com".matches(regularExpression));
// to test case where url www must be followed with a '.'//assertFalse("http://wwwe.google.com".matches(regularExpression));
// to test case where www must be preceded with a '.'assertFalse("https://www@.google.com".matches(regularExpression));

对于Python,这是Django 1.5.1中使用的实际URL验证regex:

import reregex = re.compile(r'^(?:http|ftp)s?://'  # http:// or https://r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'  # domain...r'localhost|'  # localhost...r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|'  # ...or ipv4r'\[?[A-F0-9]*:[A-F0-9:]+\]?)'  # ...or ipv6r'(?::\d+)?'  # optional portr'(?:/?|[/?]\S+)$', re.IGNORECASE)

这同时执行ipv4和ipv6地址以及端口和GET参数。

这里的代码,第44行找到。

简单明了的FILTER_VALIDATE_URL有什么不好?

 $url = "http://www.example.com";
if(!filter_var($url, FILTER_VALIDATE_URL)){echo "URL is not valid";}else{echo "URL is valid";}

我知道这不是确切的问题,但当我需要验证URL时,它为我做了这项工作,所以认为它可能对其他人有用,他们遇到这篇文章寻找同样的事情

我找不到我要找的正则表达式,所以我修改了一个正则表达式来满足我的要求,显然它现在工作得很好。我的要求是:

在这里我想出了什么,任何建议是赞赏:

@Testpublic void testWebsiteUrl(){String regularExpression = "((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&amp;:/~\\+#]*[\\w\\-\\@?^=%&amp;/~\\+#])?";
assertTrue("www.google.com".matches(regularExpression));assertTrue("www.google.co.uk".matches(regularExpression));assertTrue("http://www.google.com".matches(regularExpression));assertTrue("http://www.google.co.uk".matches(regularExpression));assertTrue("https://www.google.com".matches(regularExpression));assertTrue("https://www.google.co.uk".matches(regularExpression));assertTrue("google.com".matches(regularExpression));assertTrue("google.co.uk".matches(regularExpression));assertTrue("google.mu".matches(regularExpression));assertTrue("mes.intnet.mu".matches(regularExpression));assertTrue("cse.uom.ac.mu".matches(regularExpression));
assertTrue("http://www.google.com/path".matches(regularExpression));assertTrue("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e".matches(regularExpression));assertTrue("http://www.google.com/?queryparam=123".matches(regularExpression));assertTrue("http://www.google.com/path?queryparam=123".matches(regularExpression));
assertFalse("www..dr.google".matches(regularExpression));
assertFalse("www:google.com".matches(regularExpression));
assertFalse("https://www@.google.com".matches(regularExpression));
assertFalse("https://www.google.com\"".matches(regularExpression));assertFalse("https://www.google.com'".matches(regularExpression));
assertFalse("http://www.google.com/path'".matches(regularExpression));assertFalse("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e'".matches(regularExpression));assertFalse("http://www.google.com/?queryparam=123'".matches(regularExpression));assertFalse("http://www.google.com/path?queryparam=12'3".matches(regularExpression));
}

这对我来说很好。(https?|ftp)://(www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(\:|\.)([a-zA-Z0-9.]+|(\d+)?)([/?:].*)?

对我来说最好的URL正则表达式是:

"(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?"

这可能不是正则表达式的工作,而是您选择的语言中的现有工具的工作。您可能希望使用已经编写、测试和调试的现有代码。

在PHP中,使用parse_url函数。

Perl:#0模块

Ruby:#0模块

. NET:Uri类

正则表达式不是一根魔杖,你可以在每个涉及字符串的问题上挥舞。

以下RegEx将起作用:

"@((((ht)|(f))tp[s]?://)|(www\.))([a-z][-a-z0-9]+\.)?([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+[/]?[a-z0-9._\/~#&=;%+?-]*@si"

检查URL正则表达式将是:

^http(s{0,1})://[a-zA-Z0-9_/\\-\\.]+\\.([A-Za-z/]{2,5})[a-zA-Z0-9_/\\&\\?\\=\\-\\.\\~\\%]*

用这个它对我有用

function validUrl(Url) {var myRegExp  =/^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?$/i;
if (!RegExp.test(Url.value)) {$("#urlErrorLbl").removeClass('highlightNew');return false;}
$("#urlErrorLbl").addClass('highlightNew');return true;}

这将匹配所有URL

  • 有或没有超文本传输协议/https
  • 有或没有www

…包括子域和那些新的顶级域名扩展名,例如.博物馆,.学院,.基金会等等,最多可以有63个字符(不仅仅是.com、.、.信息等)

(([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?

因为今天可用顶级域名扩展名的最大长度是13个字符,例如.国际,您可以将表达式中的数字63更改为13,以防止有人滥用它。

如javascript

var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;
$('textarea').on('input',function(){var url = $(this).val();$(this).toggleClass('invalid', urlreg.test(url) == false)});
$('textarea').trigger('input');
textarea{color:green;}.invalid{color:red;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><textarea>http://www.google.com</textarea><textarea>http//www.google.com</textarea><textarea>googlecom</textarea><textarea>https://www.google.com</textarea>

维基百科文章:所有互联网顶级域名列表

我使用这个regex:

((https?:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?

支持两者:

http://stackoverflow.comhttps://stackoverflow.com

还有:

//stackoverflow.com

这里是最好的和最匹配的正则表达式为这种情况

^(?:http(?:s)?:\/\/)?(?:www\.)?(?:[\w-]*)\.\w{2,}$

为方便起见,这里有一个用于URL的单行正则表达式,它也将匹配localhost您更有可能拥有比.com或类似端口的端口。

(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}(\.[a-z]{2,6}|:[0-9]{3,4})\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)

要将URL与域匹配:

(^(\bhttp)(|s):\/{2})(?=[a-z0-9-_]{1,255})\.\1\.([a-z]{3,7}$)

它可以简化为:

(^(\bhttp)(|s):\/{2})(?=[a-z0-9-_.]{1,255})\.([a-z]{3,7})

后者不检查结束行的结束,以便以后可以使用它创建具有完整路径和查询字符串的完整URL。

这是来自Android源代码的现成Java版本。这是我找到的最好的。

public static final Matcher WEB  = Pattern.compile(new StringBuilder().append("((?:(http|https|Http|Https|rtsp|Rtsp):").append("\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)").append("\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_").append("\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?").append("((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+")   // named host.append("(?:")   // plus top level domain.append("(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])").append("|(?:biz|b[abdefghijmnorstvwyz])").append("|(?:cat|com|coop|c[acdfghiklmnoruvxyz])").append("|d[ejkmoz]").append("|(?:edu|e[cegrstu])").append("|f[ijkmor]").append("|(?:gov|g[abdefghilmnpqrstuwy])").append("|h[kmnrtu]").append("|(?:info|int|i[delmnoqrst])").append("|(?:jobs|j[emop])").append("|k[eghimnrwyz]").append("|l[abcikrstuvy]").append("|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])").append("|(?:name|net|n[acefgilopruz])").append("|(?:org|om)").append("|(?:pro|p[aefghklmnrstwy])").append("|qa").append("|r[eouw]").append("|s[abcdeghijklmnortuvyz]").append("|(?:tel|travel|t[cdfghjklmnoprtvwz])").append("|u[agkmsyz]").append("|v[aceginu]").append("|w[fs]").append("|y[etu]").append("|z[amw]))").append("|(?:(?:25[0-5]|2[0-4]") // or ip address.append("[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]").append("|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]").append("[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}").append("|[1-9][0-9]|[0-9])))").append("(?:\\:\\d{1,5})?)") // plus option port number.append("(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~")  // plus option query params.append("\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?").append("(?:\\b|$)").toString()).matcher("");

这应该工作:

function validateUrl(value){return /^(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)$/gi.test(value);}
console.log(validateUrl('google.com')); // trueconsole.log(validateUrl('www.google.com')); // trueconsole.log(validateUrl('http://www.google.com')); // trueconsole.log(validateUrl('http:/www.google.com')); // falseconsole.log(validateUrl('www.google.com/test')); // true

您没有指定您使用的语言。如果是PHP,则有一个本机函数:

$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/';
if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {// Wrong}else {// Valid}

返回过滤后的数据,如果过滤失败则返回FALSE。

点击这里>>

希望有帮助。

我找到了以下URL的正则表达式,通过500多个URL成功测试

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

我知道它看起来很丑,但好的是它有效。:)

在regex101上使用581个随机URL进行解释和演示。

来源:寻找完美的URL验证正则表达式

我想我找到了一个更通用的正则表达式来验证URL,特别是网站

​(https?:\/\/)?(www\.)[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)|(https?:\/\/)?(www\.)?(?!ww)[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

它不允许例如www.something或超文本传输协议://wwwhttp://www.something

点击这里:http://regexr.com/3e4a2

我创建了一个类似的正则表达式(PCRE),与RFC3987以及其他RFC文档提供的@eyelidless ness类似。@eyelidless ness和我的正则表达式之间的主要区别主要是易读性和URN支持。

下面的正则表达式都是一体的(而不是与PHP混合),因此它可以很容易地在不同的语言中使用(只要它们支持PCRE)

测试此正则表达式的最简单方法是使用regex101并使用适当的修饰符复制粘贴下面的代码和测试字符串(gmx)。

要在PHP中使用此正则表达式,请将下面的正则表达式插入以下代码:

$regex = <<<'EOD'// Put the regex hereEOD;


您可以通过执行以下操作来匹配没有方案的链接:
要匹配没有方案的链接(即john.doe@gmail.comwww.google.com/pathtofile.php?query),请替换此部分:

  (?:(?<scheme>(?<urn>urn)|(?&d_scheme)):)?

用这个:

  (?:(?<scheme>(?<urn>urn)|(?&d_scheme)):)?

但是,请注意,通过替换它,正则表达式不会变得100%可靠。


正则表达式(PCRE)带有gmx修饰符,用于下面的多行测试字符串

(?(DEFINE)# Definitions(?<ALPHA>[\p{L}])(?<DIGIT>[0-9])(?<HEX>[0-9a-fA-F])(?<NCCHAR>(?&UNRESERVED)|(?&PCT_ENCODED)|(?&SUB_DELIMS)|@)(?<PCHAR>(?&UNRESERVED)|(?&PCT_ENCODED)|(?&SUB_DELIMS)|:|@|\/)(?<UCHAR>(?&UNRESERVED)|(?&PCT_ENCODED)|(?&SUB_DELIMS)|:)(?<RCHAR>(?&UNRESERVED)|(?&PCT_ENCODED)|(?&SUB_DELIMS))(?<PCT_ENCODED>%(?&HEX){2})(?<UNRESERVED>((?&ALPHA)|(?&DIGIT)|[-._~]))(?<RESERVED>(?&GEN_DELIMS)|(?&SUB_DELIMS))(?<GEN_DELIMS>[:\/?#\[\]@])(?<SUB_DELIMS>[!$&'()*+,;=])# URI Parts(?<d_scheme>(?!urn)(?:(?&ALPHA)((?&ALPHA)|(?&DIGIT)|[+-.])*(?=:)))(?<d_hier_part_slashes>(\/{2})?)(?<d_authority>(?&d_userinfo)?)(?<d_userinfo>(?&UCHAR)*)(?<d_ipv6>(?![^:]*::[^:]*::[^:]*)((((?&HEX){0,4}):){1,7}((?&d_ipv4)|:|(?&HEX){1,4})))(?<d_ipv4>((?&octet)\.){3}(?&octet))(?<octet>(25[]0-5]|2[0-4](?&DIGIT)|1(?&DIGIT){2}|[1-9](?&DIGIT)|(?&DIGIT)))(?<d_reg_name>(?&RCHAR)*)(?<d_urn_name>(?&UCHAR)*)(?<d_port>(?&DIGIT)*)(?<d_path>(\/((?&PCHAR)*)*(?=\?|\#|$)))(?<d_query>(((?&PCHAR)|\/|\?)*)?)(?<d_fragment>(((?&PCHAR)|\/|\?)*)?))^(?<link>(?:(?<scheme>(?<urn>urn)|(?&d_scheme)):)(?(urn)(?:(?<namespace_identifier>[0-9a-zA-Z\-]+):(?<namespace_specific_string>(?&d_urn_name)+))|(?<hier_part>(?<slashes>(?&d_hier_part_slashes))(?<authority>(?:(?<userinfo>(?&d_authority))@)?(?<host>(?<ipv4>\[?(?&d_ipv4)\]?)|(?<ipv6>\[(?&d_ipv6)\])|(?<domain>(?&d_reg_name)))(?::(?<port>(?&d_port)))?)(?<path>(?&d_path))?)(?:\?(?<query>(?&d_query)))?(?:\#(?<fragment>(?&d_fragment)))?))$

测试弦

# Valid URIsftp://cnn.example.com&story=breaking_news@10.0.0.1/top_story.htmftp://ftp.is.co.za/rfc/rfc1808.txthttp://www.ietf.org/rfc/rfc2396.txtldap://[2001:db8::7]/c=GB?objectClass?onemailto:John.Doe@example.comnews:comp.infosystems.www.servers.unixtel:+1-816-555-1212telnet://192.0.2.16:80/urn:isbn:0451450523urn:oid:2.16.840urn:isan:0000-0000-9E59-0000-O-0000-0000-2urn:oasis:names:specification:docbook:dtd:xml:4.1.2http://localhost/test/somefile.php?query=someval&variable=value#fragmenthttp://[2001:db8:a0b:12f0::1]/testftp://username:password@domain.com/path/to/file/somefile.html?queryVariable=value#fragmenthttps://subdomain.domain.com/path/to/file.php?query=value#fragmenthttps://subdomain.example.com/path/to/file.php?query=value#fragmentmailto:john.smith(comment)@example.commailto:user@[2001:DB8::1]mailto:user@[255:192:168:1]mailto:M.Handley@cs.ucl.ac.ukhttp://localhost:4433/path/to/file?query#fragment# Note that the example below IS a valid as it does follow RFC standardslocalhost:4433/path/to/file
# These work with the optional scheme group although I'd suggest making the scheme mandatory as misinterpretations can occurjohn.doe@gmail.comwww.google.com/pathtofile.php?query[192a:123::192.168.1.1]:80/path/to/file.html?query#fragment

要匹配URL,有各种选项,这取决于您的要求。下面是一些

_(^|[\s.:;?\-\]<\(])(https?://[-\w;/?:@&=+$\|\_.!~*\|'()\[\]%#,☺]+[\w/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])_i
#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#iS

还有一个链接,它为您提供了10多种不同的URL验证变体。

这不是正则表达式,但完成了同样的事情(仅限Javascript):

function isAValidUrl(url) {try {new URL(url);return true;} catch(e) {return false;}}

希望对你有帮助……

^(http|https):\/\/+[\www\d]+\.[\w]+(\/[\w\d]+)?

这个怎么样:

^(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})$

这些是测试用例:

测试用例

你可以在这里:https://regex101.com/r/mS9gD7/41中尝试一下

据我所发现,这种表达对我有好处-

(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})

工作实例-

function RegExForUrlMatch(){var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})/g;
var regex = new RegExp(expression);var t = document.getElementById("url").value;
if (t.match(regex)) {document.getElementById("demo").innerHTML = "Successful match";} else {document.getElementById("demo").innerHTML = "No match";}}
<input type="text" id="url" placeholder="url" onkeyup="RegExForUrlMatch()">
<p id="demo">Please enter a URL to test</p>

下面的表达式将适用于所有流行的域。它将接受以下网址:

此外,它还会使用url作为链接发送消息
例如please visit yourwebsite.com
在上面的例子中,它将yourwebsite.com作为超链接

if (new RegExp("([-a-z0-9]{1,63}\\.)*?[a-z0-9][-a-z0-9]{0,61}[a-z0-9]\\.(com|com/|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au|org/|gov/|cm/|net/|online/|live/|biz/|us/|uk/|co.us/|co.uk/|in/|co.in/|int/|info/|edu/|mil/|ca/|co/|co.au/)(/[-\\w@\\+\\.~#\\?*&/=% ]*)?$").test(strMessage) || (new RegExp("^[a-z ]+[\.]?[a-z ]+?[\.]+[a-z ]+?[\.]+[a-z ]+?[-\\w@\\+\\.~#\\?*&/=% ]*").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage)) || (new RegExp("^[a-z ]+[\.]?[a-z ]+?[-\\w@\\+\\.~#\\?*&/=% ]*").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage))) {if (new RegExp("^[a-z ]+[\.]?[a-z ]+?[\.]+[a-z ]+?[\.]+[a-z ]+?$").test(strMessage) && new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(strMessage)) {var url1 = /(^|&lt;|\s)([\w\.]+\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au))(\s|&gt;|$)/g;var html = $.trim(strMessage);if (html) {html = html.replace(url1, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3');}returnString = html;return returnString;} else {var url1 = /(^|&lt;|\s)(www\..+?\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au)[^,\s]*)(\s|&gt;|$)/g,url2 = /(^|&lt;|\s)(((https?|ftp):\/\/|mailto:).+?\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au)[^,\s]*)(\s|&gt;|$)/g,url3 = /(^|&lt;|\s)([\w\.]+\.(?:com|org|gov|cm|net|online|live|biz|us|uk|co.us|co.uk|in|co.in|int|info|edu|mil|ca|co|co.au)[^,\s]*)(\s|&gt;|$)/g;
var html = $.trim(strMessage);if (html) {html = html.replace(url1, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3').replace(url2, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="$2">$2</a>$5').replace(url3, '$1<a style="color:blue; text-decoration:underline;" target="_blank"  href="http://$2">$2</a>$3');}returnString = html;
return returnString;}}

经过严格的搜索,我终于解决了以下问题

^[a-zA-Z0-9]+\:\/\/[a-zA-Z0-9]+\.[-a-zA-Z0-9]+\.?[a-zA-Z0-9]+$|^[a-zA-Z0-9]+\.[-a-zA-Z0-9]+\.[a-zA-Z0-9]+$

这个东西适用于未来的URL。

我找到的最好的正则表达式是:/(^|\s)((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/gi

对于iOS Swift:(^|\\s)((https?:\\/\\/)?[\\w-]+(\\.[\\w-]+)+\\.?(:\\d+)?(\\/\\S*)?)

http://jsfiddle.net/9BYdp/1/

找到这里

有趣的是,上面的答案都不符合我的需要,所以我想我会提供我的解决方案。我需要能够做到以下几点:

  • 匹配http(s)://www.google.comhttp://google.comwww.google.comgoogle.com
  • 匹配GitHub Markdown样式的链接,如[Google](http://www.google.com)
  • 匹配所有可能的域扩展名,如. com、. io或. guru等。基本上长度在2-6个字符之间
  • 将所有内容拆分为适当的分组,以便我可以根据需要访问每个部分。

以下是解决方案:

/^(\[[A-z0-9 _]*\]\()?((?:(http|https):\/\/)?(?:[\w-]+\.)+[a-z]{2,6})(\))?$

这给了我上述所有要求。如果需要,您可以选择添加ftp和文件的功能:

/^(\[[A-z0-9 _]*\]\()?((?:(http|https|ftp|file):\/\/)?(?:[\w-]+\.)+[a-z]{2,6})(\))?$

我认为这是一个非常简单的方法。而且效果非常好。

var hasURL = (str) =>{var url_pattern = new RegExp("(www.|http://|https://|ftp://)\w*");if(!url_pattern.test(str)){document.getElementById("demo").innerHTML = 'No URL';}elsedocument.getElementById("demo").innerHTML = 'String has a URL';};
<p>Please enter a string and test it has any url or not</p><input type="text" id="url" placeholder="url" onkeyup="hasURL(document.getElementById('url').value)"><p id="demo"></p>

这是我制作的一个正则表达式,它从URL中提取不同的部分:

^((?:(?:http|ftp|ws)s?|sftp):\/\/?)?([^:/\s.#?]+\.[^:/\s#?]+|localhost)(:\d+)?((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?([^#]+)?(#[\w-]*)?$

((?:(?:http|ftp|ws)s?|sftp):\/\/?)?(第1组):提取协议
([^:/\s.#?]+\.[^:/\s#?]+|localhost)(第2组):提取主机名
(:\d+)?(第3组):提取端口号
((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?(第4组和第5组):提取路径部分
([^#]+)?(第6组):提取查询部分
(#[\w-]*)?(第7组):提取哈希部分

对于上面列出的正则表达式的每个部分,您可以删除结尾?以强制它(或添加一个以使其成为兼性的)。您还可以删除正则表达式开头的^和结尾的$,这样它就不需要匹配整个字符串。

regex101上看到它。

注意:这个正则表达式不是100%安全的,可能会接受一些不一定是有效URL的字符串,但它确实验证了一些标准。它的主要目标是提取URL的不同部分而不是验证它。

https?:\/{2}(?:[\/-\w.]|(?:%[\da-fA-F]{2}))+

您可以使用此模式来检测URL。

以下是概念验证

RegExr:URL检测器

^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$

现场演示:https://regex101.com/r/HUNasA/2

我测试了各种表达式来满足我的要求。

作为用户,我可以使用以下字符串点击浏览器搜索栏:

有效网址

无效链接

改进

检测像这样的URL:

正则表达式:

/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$/gm

如果你想应用更严格的规则,以下是我开发的:

isValidUrl(input) {var regex = /^(((H|h)(T|t)(T|t)(P|p)(S|s)?):\/\/)?[-a-zA-Z0-9@:%._\+~#=]{2,100}\.[a-zA-Z]{2,10}(\/([-a-zA-Z0-9@:%_\+.~#?&//=]*))?/return regex.test(input)}

这是一个涵盖所有可能情况的好规则:端口、参数等

/(https?:\/\/(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9])(:?\d*)\/?([a-z_\/0-9\-#.]*)\??([a-z_\/0-9\-#=&]*)/g

不管问的是什么广泛的问题,我为将来寻找简单东西的人发布这个……因为我认为验证URL没有完美的正则表达式可以满足所有需求,这取决于你的要求,即:在我的情况下,我只需要验证URL是否为domain.extension的形式,我想允许www或任何其他子域,如blog.domain.extension我不在乎超文本传输协议,因为在我的应用程序中,我有一个字段,上面写着“输入URL”,所以很明显输入的字符串是什么。

这里是regEx:

/^(www\.|[a-zA-Z0-9](.*[a-zA-Z0-9])?\.)?((?!www)[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9])\.[a-z]{2,5}(:[0-9]{1,5})?$/i

这个regExp中的第一个块是:

(www\.|[a-zA-Z0-9](.*[a-zA-Z0-9])?\.)?--->我们开始检查URL是以www.还是[a-zA-Z0-9](.*[a-zA-Z0-9])?开头,这意味着一个字母OrNumber+(anyCharacter(0或多次)+另一个letterOrNumber)后跟一个点

请注意,我们由翻译的(.*[a-zA-Z0-9])?\.)?(anyCharacter(0或多次)+另一个letterOrNumber)是可选的(可以是或不是),这就是为什么我们将它分组在括号之间,然后加上问号?

到目前为止,我们讨论的整个块也放在括号之间,后面跟着?这意味着www或任何其他单词(代表子域)都是可选的。

第二部分是:((?!www)[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9])\.--->代表“域”部分,它可以是任何单词(除了www),以字母或数字开头+任何其他字母(包括破折号“-”)重复一次或多次,并以任何字母或数字结尾,后跟一个点。

最后一部分是[a-z]{2,}--->代表“扩展”,它可以是任何字母重复2次或更多次,所以它可以是com、net、org、art基本上任何扩展

/^(http|HTTP)+(s|S)?:\/\/[\w.-]+(?:\.[\w\.-]+)+[\w\-\._\$\(\)/]+$/g

使用测试检查演示:

https://regexr.com/5cedu

一个简单的URL检查是

^(ftp|http|https):\/\/[^ "]+$

以下Regex对我有用:

(http(s)?:\/\/.)?(ftp(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{0,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

匹配:

https://google.com t.me https://t.me ftp://google.com http://sm.tj http://bro.tj t.me/rshss https:google.com www.cool.com.au http://www.cool.com.au http://www.cool.com.au/ersdfs http://www.cool.com.au/ersdfs?dfd=dfgd@s=1 http://www.cool.com:81/index.html

最好的正则表达式是这里最好的答案的组合!哈哈哈!我刚刚测试了它们,并把最好的放在一起!我稍微改变了一下,只有一个捕获组!我在这个页面的源代码中找到了637个URL!只有几个误报!

((?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)|(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+(?::[0-9]+)?|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)|(?:(?:(?:(?:[A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)(?:(?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?))|(?:(?:(?:[\\w]+:)?//)?(?:(?:[\\d\\w]|%[a-fA-f\\d]{2,2})+(?::(?:[\\d\\w]|%[a-fA-f\\d]{2,2})+)?@)?(?:[\\d\\w][-\\d\\w]{0,253}[\\d\\w]\\.)+[\\w]{2,4}(?::[\\d]+)?(?:/(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})*)*(?:\\?(?:&?(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})=?)*)?(?:#(?:[-+_~.\\d\\w]|%[a-fA-f\\d]{2,2})*)?)|(?:https?:\/\/(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9])(?::?\d*)\/?(?:[a-z_\/0-9\-#.]*)\??(?:[a-z_\/0-9\-#=&]*)|(?:(?:(?:https?:)?(?:\/?\/))(?:(?:[\d\w]|%[a-fA-f\d]{2,2})+(?::(?:[\d\w]|%[a-fA-f\d]{2,2})+)?@)?(?:[\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(?::[\d]+)?(?:/(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(?:\?(?:&?(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(?:#(?:[-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?)|(?:(?:https?|ftp)://(?:www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(?:\:|\.)(?:[a-zA-Z0-9.]+|(?:\d+)?)(?:[/?:].*)?)|(?:\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b))

Javascript现在有一个名为new URL()的URL构造函数。它允许您完全跳过REGEX。

/**** The URL() constructor returns a newly created URL object representing* the URL defined by the parameters.** https://developer.mozilla.org/en-US/docs/Web/API/URL/URL**/let requestUrl = new URL('https://username:password@developer.mozilla.org:8080/en-US/docs/search.html?par1=abc&par2=123&par3=true#Recent');
let urlParts = {origin: requestUrl.origin,href: requestUrl.href,protocol: requestUrl.protocol,username: requestUrl.username,password: requestUrl.password,host: requestUrl.host,hostname: requestUrl.hostname,port: requestUrl.port,pathname: requestUrl.pathname,search: requestUrl.search,searchParams: {par1: String(requestUrl.searchParams.get('par1')),par2: Number(requestUrl.searchParams.get('par2')),par3: Boolean(requestUrl.searchParams.get('par3')),},hash: requestUrl.hash};
console.log(urlParts);

感谢@eyelidless ness提供了非常全面(尽管很长)的基于RFC的正则表达式。

然而,对于我们这些使用EICMAScript/JavaScript/Apps脚本的人来说,它不起作用。这是他的答案的一个精确副本,可以使用这些(以及一个要运行的片段-例如-整洁的新功能!):

regEx_valid_URL = /^[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,1}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+\.[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@])|[\uE000-\uF8FF}\uF0000-\uFFFFD\u100000-\u10FFFD\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\uA0}-\uD7FF}\uF900-\uFDCF}\uFDF0}-\uFFEF}\u10000-\u1FFFD\u20000-\u2FFFD\u30000-\u3FFFD\u40000-\u4FFFD\u50000-\u5FFFD\u60000-\u6FFFD\u70000-\u7FFFD\u80000-\u8FFFD\u90000-\u9FFFD\uA0000-\uAFFFD\uB0000-\uBFFFD\uC0000-\uCFFFD\uD0000-\uDFFFD\uE1000-\uEFFFD!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i;
checkedURL = RegExp(regEx_valid_URL).exec('gopher://example.somewhere.university/');
if (checkedURL != null) {console.log('The URL ' + checkedURL + ' is valid');}

我用这个:/((https?:\/\/|ftp:\/\/|www\.)\S+\.[^()\n ]+((?:\([^)]*\))|[^.,;:?!"'\n\)\]<* ])+)/

它很短,但它处理边缘情况,如某些以括号结尾的维基百科链接(https://en.wikipedia.org/wiki/Sally_(名称)),这里投票最多的答案似乎没有涵盖。

[(http(s)?):\/\/(www\.)?a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)