Go語言正則表達式實例分析

發(fā)布時間：2022-04-25 13:38:01 來源：億速云閱讀：150 作者：iii 欄目：開發(fā)技術(shù)

這篇文章主要介紹了Go語言正則表達式實例分析的相關(guān)知識，內(nèi)容詳細易懂，操作簡單快捷，具有一定借鑒價值，相信大家閱讀完這篇Go語言正則表達式實例分析文章都會有所收獲，下面我們一起來看看吧。

前言

在計算中，我們經(jīng)常需要將特定模式的字符或字符子集匹配為另一個字符串中的字符串。此技術(shù)用于使用特別的語法來搜索給定字符串中的特定字符集。

如果搜索到的模式匹配，或者在目標字符串中找到給定的子集，則搜索被稱為成功；否則被認為是不成功的。

什么是正則表達式

正則表達式（或 RegEx）是一個特殊的字符序列，它定義了用于匹配特定文本的搜索模式。在 Golang 中，有一個內(nèi)置的正則表達式包: regexp 包，其中包含所有操作列表，如過濾、修改、替換、驗證或提取。

正則表達式可以用于文本搜索和更高級的文本操作。正則表達式內(nèi)置于 grep 和 sed 等工具，vi 和 emacs 等文本編輯器，Go、Java 和 Python 等編程語言中。表達式的語法主要遵循這些流行語言中使用的已建立的 RE2 語法。 RE2 語法是 PCRE 的一個子集，有各種注意事項。

MatchString 函數(shù)

MatchString() 函數(shù)報告作為參數(shù)傳遞的字符串是否包含正則表達式模式的任何匹配項。

package main
import (
"fmt"
"log"
"regexp"
)
func main() {
words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
for _, word := range words {
found, err := regexp.MatchString(".even", word)
if err != nil {
log.Fatal(err)
}
if found {
fmt.Printf("%s matches\n", word)
} else {
fmt.Printf("%s does not match\n", word)
}
}
}

運行該代碼：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

但同時我們能看到編輯器有提示：

Go語言正則表達式實例分析

編譯器已經(jīng)開始提醒我們，MatchString 直接使用性能很差，所以考慮使用 regexp.Compile 函數(shù)。

Compile 函數(shù)

Compile 函數(shù)解析正則表達式，如果成功，則返回可用于匹配文本的 Regexp 對象。編譯的正則表達式產(chǎn)生更快的代碼。

MustCompile 函數(shù)是一個便利函數(shù)，它編譯正則表達式并在無法解析表達式時發(fā)生 panic。

package main
import (
"fmt"
"log"
"regexp"
)
func main() {
words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
re, err := regexp.Compile(".even")
if err != nil {
log.Fatal(err)
}
for _, word := range words {
found := re.MatchString(word)
if found {
fmt.Printf("%s matches\n", word)
} else {
fmt.Printf("%s does not match\n", word)
}
}
}

在代碼示例中，我們使用了編譯的正則表達式。

re, err := regexp.Compile(".even")

即使用 Compile 編譯正則表達式。然后在返回的正則表達式對象上調(diào)用 MatchString 函數(shù)：

found := re.MatchString(word)

運行程序，能看到同樣的代碼：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

MustCompile 函數(shù)

package main
import (
"fmt"
"regexp"
)
func main() {
words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
re := regexp.MustCompile(".even")
for _, word := range words {
found := re.MatchString(word)
if found {
fmt.Printf("%s matches\n", word)
} else {
fmt.Printf("%s does not match\n", word)
}
}
}

FindAllString 函數(shù)

FindAllString 函數(shù)返回正則表達式的所有連續(xù)匹配的切片。

package main
import (
"fmt"
"os"
"regexp"
)
func main() {
var content = `Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox.`
re := regexp.MustCompile("(?i)fox(es)?")
found := re.FindAllString(content, -1)
fmt.Printf("%q\n", found)
if found == nil {
fmt.Printf("no match found\n")
os.Exit(1)
}
for _, word := range found {
fmt.Printf("%s\n", word)
}
}

在代碼示例中，我們找到了單詞 fox 的所有出現(xiàn)，包括它的復(fù)數(shù)形式。

re := regexp.MustCompile("(?i)fox(es)?")

使用 (?i) 語法，正則表達式不區(qū)分大小寫。（es）？表示“es”字符可能包含零次或一次。

found := re.FindAllString(content, -1)

我們使用 FindAllString 查找所有出現(xiàn)的已定義正則表達式。第二個參數(shù)是要查找的最大匹配項； -1 表示搜索所有可能的匹配項。

運行結(jié)果：

["Foxes" "Foxes" "Foxes" "fox" "fox"]
Foxes
Foxes
Foxes
fox
fox

FindAllStringIndex 函數(shù)

package main
import (
"fmt"
"regexp"
)
func main() {
var content = `Foxes are omnivorous mammals belonging to several genera
of the family Canidae. Foxes have a flattened skull, upright triangular ears,
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every
continent except Antarctica. By far the most common and widespread species of
fox is the red fox.`
re := regexp.MustCompile("(?i)fox(es)?")
idx := re.FindAllStringIndex(content, -1)
for _, j := range idx {
match := content[j[0]:j[1]]
fmt.Printf("%s at %d:%d\n", match, j[0], j[1])
}
}

在代碼示例中，我們在文本中找到所有出現(xiàn)的 fox 單詞及其索引。

Foxes at 0:5
Foxes at 81:86
Foxes at 196:201
fox at 296:299
fox at 311:314

Split 函數(shù)

Split 函數(shù)將字符串切割成由定義的正則表達式分隔的子字符串。它返回這些表達式匹配之間的子字符串切片。

package main
import (
"fmt"
"log"
"regexp"
"strconv"
)
func main() {
var data = `22, 1, 3, 4, 5, 17, 4, 3, 21, 4, 5, 1, 48, 9, 42`
sum := 0
re := regexp.MustCompile(",\s*")
vals := re.Split(data, -1)
for _, val := range vals {
n, err := strconv.Atoi(val)
sum += n
if err != nil {
log.Fatal(err)
}
}
fmt.Println(sum)
}

在代碼示例中，我們有一個逗號分隔的值列表。我們從字符串中截取值并計算它們的總和。

re := regexp.MustCompile(",\s*")

正則表達式包括一個逗號字符和任意數(shù)量的相鄰空格。

vals := re.Split(data, -1)

我們得到了值的一部分。

for _, val := range vals {
n, err := strconv.Atoi(val)
sum += n
if err != nil {
log.Fatal(err)
}
}

我們遍歷切片并計算總和。切片包含字符串；因此，我們使用 strconv.Atoi 函數(shù)將每個字符串轉(zhuǎn)換為整數(shù)。

運行代碼：

189

Go 正則表達式捕獲組

圓括號 () 用于創(chuàng)建捕獲組。這允許我們將量詞應(yīng)用于整個組或?qū)⒔惶嫦拗茷檎齽t表達式的一部分。為了找到捕獲組（Go 使用術(shù)語子表達式），我們使用 FindStringSubmatch 函數(shù)。

package main
import (
"fmt"
"regexp"
)
func main() {
websites := [...]string{"webcode.me", "zetcode.com", "freebsd.org", "netbsd.org"}
re := regexp.MustCompile("(\w+)\.(\w+)")
for _, website := range websites {
parts := re.FindStringSubmatch(website)
for i, _ := range parts {
fmt.Println(parts[i])
}
fmt.Println("---------------------")
}
}

在代碼示例中，我們使用組將域名分為兩部分。

re := regexp.MustCompile("(\w+)\.(\w+)")

我們用括號定義了兩個組。

parts := re.FindStringSubmatch(website)

FindStringSubmatch 返回包含匹配項的字符串切片，包括來自捕獲組的字符串。

運行代碼：

$ go run capturegroups.go
webcode.me
webcode
me
---------------------
zetcode.com
zetcode
com
---------------------
freebsd.org
freebsd
org
---------------------
netbsd.org
netbsd
org
---------------------

正則表達式替換字符串

可以用 ReplaceAllString 替換字符串。該方法返回修改后的字符串。

package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"regexp"
"strings"
)
func main() {
resp, err := http.Get("http://webcode.me")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
content := string(body)
re := regexp.MustCompile("<[^>]*>")
replaced := re.ReplaceAllString(content, "")
fmt.Println(strings.TrimSpace(replaced))
}

該示例讀取網(wǎng)頁的 HTML 數(shù)據(jù)并使用正則表達式去除其 HTML 標記。

resp, err := http.Get("http://webcode.me")

我們使用 http 包中的 Get 函數(shù)創(chuàng)建一個 GET 請求。

body, err := ioutil.ReadAll(resp.Body)

我們讀取響應(yīng)對象的主體。

re := regexp.MustCompile("<[^>]*>")

這個模式定義了一個匹配 HTML 標簽的正則表達式。

replaced := re.ReplaceAllString(content, "")

我們使用 ReplaceAllString 方法刪除所有標簽。

ReplaceAllStringFunc 函數(shù)

ReplaceAllStringFunc 返回一個字符串的副本，其中正則表達式的所有匹配項都已替換為指定函數(shù)的返回值。

package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
content := "an old eagle"
re := regexp.MustCompile(`[^aeiou]`)
fmt.Println(re.ReplaceAllStringFunc(content, strings.ToUpper))
}

在代碼示例中，我們將 strings.ToUpper 函數(shù)應(yīng)用于字符串的所有字符。

$ go run replaceallfunc.go
aN oLD eaGLe

關(guān)于“Go語言正則表達式實例分析”這篇文章的內(nèi)容就介紹到這里，感謝各位的閱讀！相信大家對“Go語言正則表達式實例分析”知識都有一定的了解，大家如果還想學(xué)習(xí)更多知識，歡迎關(guān)注億速云行業(yè)資訊頻道。

向AI問一下細節(jié)

Go語言正則表達式實例分析

前言

什么是正則表達式

MatchString 函數(shù)

Compile 函數(shù)

MustCompile 函數(shù)

FindAllString 函數(shù)

FindAllStringIndex 函數(shù)

Split 函數(shù)

Go 正則表達式捕獲組

正則表達式替換字符串

ReplaceAllStringFunc 函數(shù)

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標簽