您好,登錄后才能下訂單哦!
這篇文章主要講解了“golang基本數(shù)據(jù)結(jié)構(gòu)與算法之如何使用哈希表”,文中的講解內(nèi)容簡單清晰,易于學(xué)習(xí)與理解,下面請大家跟著小編的思路慢慢深入,一起來研究和學(xué)習(xí)“golang基本數(shù)據(jù)結(jié)構(gòu)與算法之如何使用哈希表”吧!
哈希表存儲的是由鍵(key)和值(value)組成的數(shù)據(jù)。 在哈希表中,我們可以利用哈希函數(shù)快速訪問到數(shù)組中的目標(biāo)數(shù)據(jù)。 如果發(fā)生哈希沖突(哈希函數(shù)對不同的鍵, 返回了相同的哈希值), 就使用鏈表進(jìn)行存儲(因此, 哈希表一般至少是數(shù)組+鏈表的二維結(jié)構(gòu))。 如果數(shù)組的空間太小, 使用哈希表的時候就容易發(fā)生沖突, 線性查找的使用頻率也會更高; 反過來,如果數(shù)組的空間太大, 就會出現(xiàn)很多空箱子,造成內(nèi)存的浪費(fèi)。 摘自 <<我的第一本算法書>>(【日】石田保輝;宮崎修一)
實現(xiàn)一個哈希表, 提供對鍵值對數(shù)據(jù)的增刪改查, 且性能不能太差
物理結(jié)構(gòu)
采用分區(qū) + 數(shù)組 + 鏈表的三維結(jié)構(gòu)
分區(qū)是一個雙向鏈表, 由小到大呈2次冪增長
當(dāng)前分區(qū)總是指向最新也是最大的那個分區(qū)
查找某個鍵時, 需要遍歷所有分區(qū)
哈希函數(shù)
合適的哈希函數(shù)對性能影響非常巨大
對哈希函數(shù)的要求一般是足夠快, 足夠"散", 對不同鍵值最好能隨機(jī)均勻分布
本示例采用hash/crc64, 多項式為ECMA
如果哈希函數(shù)的計算比較耗時, 則應(yīng)當(dāng)盡可能重復(fù)利用計算結(jié)果
擴(kuò)容與縮容
填充因子為3/4, 即當(dāng)前分區(qū)的元素數(shù)>容量的3/4時, 創(chuàng)建2倍大的新分區(qū)
擴(kuò)容不做任何拷貝和rehash.
擴(kuò)容后, 非當(dāng)前分區(qū)變成只讀和只刪.
擴(kuò)容
縮容: 當(dāng)某個分區(qū)未持有任何元素, 且不為當(dāng)前分區(qū)時, 刪除此分區(qū)
IHasher: 哈希支持接口, 定義哈希計算函數(shù), 和鍵值相等比較函數(shù)
IMap: 哈希表接口
IMapIterator: 哈希表迭代器接口
tCrc64Hasher:
基于hash/crc64, 實現(xiàn)IHasher接口.
支持多種鍵類型
對于整型的鍵, 返回自身;
其他類型的鍵, 轉(zhuǎn)為%v字符串再計算crc64的哈希值.
tHashMap: 哈希表的實現(xiàn), 使用分區(qū)(雙鏈)+數(shù)組+鏈表(單鏈)三維結(jié)構(gòu)
tPartition: 哈希表分區(qū). 其大小呈2次冪
tBucket: 哈希桶, 每個桶內(nèi)的任何元素, 哈希值是一致的
tHashMapIterator: 哈希表迭代器的實現(xiàn)
hashmap_test.go, 除基本功能測試, 還測試了百萬鍵值插入+遍歷的性能
package data_structure import ( "fmt" "learning/gooop/data_structure/hashmap" "strconv" "testing" "time" ) func Test_HashMap(t *testing.T) { fnAssertTrue := func(b bool, msg string) { if !b { t.Fatal(msg) } } fnEquals := func(a interface{}, b interface{}) bool { i1,b1 := a.(int) if b1 { i2,b2 := b.(int) if b2 { return i1 == i2 } } s1,b1 := a.(string) if b1 { s2,b2 := b.(string) if b2 { return s1 == s2 } } return a == b } hasher := hashmap.NewCrc64Hashful(fnEquals) hm := hashmap.NewHashMap(hasher, 2) hm.Put(1, "10") t.Log(hm) fnAssertTrue(hm.Size() == 1, "expecting size == 1") fnAssertTrue(hm.IsNotEmpty(), "expecting not empty") ok,v := hm.Get(1) fnAssertTrue(ok, "expecting ok") fnAssertTrue(v == "10", "expecting 10") hm.Put(2, "20") hm.Put(3, "30") hm.Put(4, "40") hm.Put(5, "50") t.Log(hm) fnAssertTrue(hm.Size() == 5, "expecting size == 5") ok,v = hm.Get(2) fnAssertTrue(ok == true && v == "20", "expecting true and 20") hm.Clear() t.Log(hm) fnAssertTrue(hm.Size() == 0, "expecting size == 0") fnAssertTrue(hm.IsEmpty(), "expecting empty") iter := hm.Iterator() fnAssertTrue(!iter.More(), "expecting no more") hm.Put(1, "10") hm.Put(2, "20") hm.Put(3, "30") t.Log(hm) fnAssertTrue(hm.Has(1) && hm.Has(2) && hm.Has(3) && !hm.Has(4), "expecting has 1,2,3") hm.Put(4, "40") hm.Put(5, "50") hm.Put(6, "60") t.Log(hm) iter = hm.Iterator() fnAssertTrue(iter.More(), "expecting more") e,k,v := iter.Next() t.Logf("%v>%s", k, v) fnAssertTrue(e == nil, "e == nil") e,k,v = iter.Next() t.Logf("%v>%s", k, v) fnAssertTrue(e == nil, "e == nil") e,k,v = iter.Next() t.Logf("%v>%s", k, v) fnAssertTrue(e == nil, "e == nil") e,k,v = iter.Next() t.Logf("%v>%s", k, v) fnAssertTrue(e == nil, "e == nil") e,k,v = iter.Next() t.Logf("%v>%s", k, v) fnAssertTrue(e == nil, "e == nil") e,k,v = iter.Next() t.Logf("%v>%s", k, v) fnAssertTrue(e == nil, "e == nil") e,k,v = iter.Next() fnAssertTrue(e != nil, "expecting e != nil") ok,v = hm.Remove(3) t.Log(hm) fnAssertTrue(ok && v == "30" && hm.Size() == 5, "expecting remove 30") ok,v = hm.Remove(2) t.Log(hm) fnAssertTrue(ok && v == "20" && hm.Size() == 4, "expecting remove 20") t0 := time.Now().UnixNano() hm.Clear() size := 1000 * 1000 for i := 0; i < size;i++ { hm.Put(strconv.Itoa(i), i) } millis := (time.Now().UnixNano() - t0) / 1000000 t.Logf("putting %v string>int = %v ms", size, millis) fnAssertTrue(hm.Size() == size, fmt.Sprintf("expecting %v", size)) for i := 0;i < size;i++ { ok, v = hm.Get(strconv.Itoa(i)) fnAssertTrue(ok == true && v == i, "expecting i") } }
$ go test -v hashmap_test.go === RUN Test_HashMap hashmap_test.go:42: s=1,v=1 p[1:b[1 1]] hashmap_test.go:54: s=5,v=5 p[4:b[1 4],5:b[1 5]] p[1:b[1 1],2:b[1 2],3:b[1 3]] hashmap_test.go:60: s=0,v=6 p[] hashmap_test.go:70: s=3,v=9 p[1:b[1 1],2:b[1 2],3:b[1 3]] hashmap_test.go:76: s=6,v=12 p[1:b[1 1],2:b[1 2],3:b[1 3],4:b[1 4],5:b[1 5],6:b[1 6]] hashmap_test.go:80: 1>10 hashmap_test.go:83: 2>20 hashmap_test.go:86: 3>30 hashmap_test.go:89: 4>40 hashmap_test.go:92: 5>50 hashmap_test.go:95: 6>60 hashmap_test.go:101: s=5,v=13 p[1:b[1 1],2:b[1 2],4:b[1 4],5:b[1 5],6:b[1 6]] hashmap_test.go:105: s=4,v=14 p[1:b[1 1],4:b[1 4],5:b[1 5],6:b[1 6]] hashmap_test.go:115: putting 1000000 string>int = 1590 ms --- PASS: Test_HashMap (2.17s) PASS ok command-line-arguments 2.181s
哈希支持接口, 定義哈希計算函數(shù), 和鍵值相等比較函數(shù)
package hashmap type IHasher interface { Hash(key interface{}) uint64 Equals(a interface{}, b interface{}) bool }
哈希表接口
package hashmap type IMap interface { Size() int IsEmpty() bool IsNotEmpty() bool Put(key interface{}, value interface{}) Get(key interface{}) (bool,interface{}) Has(key interface{}) bool Remove(key interface{}) (bool, interface{}) Clear() Iterator() IMapIterator String() string }
哈希表迭代器接口
package hashmap type IMapIterator interface { More() bool Next() (err error, key interface{}, value interface{}) }
基于hash/crc64, 實現(xiàn)IHasher接口.
支持多種鍵類型
對于整型的鍵, 返回自身;
其他類型的鍵, 轉(zhuǎn)為%v字符串再計算crc64的哈希值.
package hashmap import ( "fmt" "hash/crc64" ) var gCrc64Table = crc64.MakeTable(crc64.ECMA) type FnEquals func(a interface{}, b interface{}) bool type tCrc64Hasher struct { fnEquals FnEquals } const INT_MAX = int(^uint(0) >> 1) const INT_MIN = ^INT_MAX const INT32_MAX = int32(^uint32(0) >> 1) const INT32_MIN = ^INT32_MAX const INT64_MAX = int64(^uint64(0) >> 1) const INT64_MIN = ^INT64_MAX func (me *tCrc64Hasher) Hash(it interface{}) uint64 { if it == nil { return 0 } if v,ok := it.(int);ok { return uint64(v - INT_MIN) } else if v,ok := it.(int64);ok { return uint64(v - INT64_MIN) } else if v,ok := it.(int32);ok { return uint64(v - INT32_MIN) } else if v,ok := it.(uint32);ok { return uint64(v) } else if v,ok := it.(uint64);ok { return v } else if v,ok := it.(string);ok { return crc64.Checksum([]byte(v), gCrc64Table) } else { data := []byte(fmt.Sprintf("%v", it)) return crc64.Checksum(data, gCrc64Table) } } func (me *tCrc64Hasher) Equals(a interface{}, b interface{}) bool { if a == nil && b == nil { return true } if a == nil || b == nil { return false } fn := me.fnEquals if fn == nil { return a == b } else { return fn(a, b) } } func NewCrc64Hashful(fn FnEquals) IHasher { return &tCrc64Hasher{ fnEquals: fn, } }
哈希表的實現(xiàn), 使用分區(qū)(雙鏈)+數(shù)組+鏈表(單鏈)三維結(jié)構(gòu)
package hashmap import ( "fmt" "strings" ) type tHashMap struct { hasher IHasher partitions *tPartition size int version int } func NewHashMap(hasher IHasher, capacity int) IMap { if capacity < 4 { capacity = 4 } part := newPartition(hasher, capacity) return &tHashMap{ hasher: hasher, partitions: part, size: 0, version: 0, } } func (me *tHashMap) Size() int { return me.size } func (me *tHashMap) IsEmpty() bool { return me.Size() <= 0 } func (me *tHashMap) IsNotEmpty() bool { return !me.IsEmpty() } func (me *tHashMap) Put(key interface{}, value interface{}) { hash := me.hasher.Hash(key) ok, _, bucket, node, _ := me.findByKeyAndHash(key, hash) if ok { bucket.putAt(node, key, value) } else { if me.partitions.nearlyFull() { // create new partition part := newPartition(me.hasher, int(me.partitions.bucketCount * 2)) part.next = me.partitions me.partitions.prev = part me.partitions = part part.appendByKeyAndHash(key, value, hash) } else { me.partitions.appendByKeyAndHash(key, value, hash) } me.size++ } me.version++ } func (me *tHashMap) findByKey(key interface{}) (ok bool, part *tPartition, bucket *tBucket, node *tLinkedNode, prev *tLinkedNode) { hash := me.hasher.Hash(key) return me.findByKeyAndHash(key, hash) } func (me *tHashMap) findByKeyAndHash(key interface{}, hash uint64) (ok bool, part *tPartition, bucket *tBucket, node *tLinkedNode, prev *tLinkedNode) { for part = me.partitions; part != nil; part = part.next { ok, bucket, node, prev = part.findByKeyAndHash(key, hash) if ok { return ok, part, bucket, node, prev } } return false, nil, nil, nil, nil } func (me *tHashMap) Get(key interface{}) (bool,interface{}) { ok, _, _, node, _ := me.findByKey(key) if ok { return true, node.value } else { return false, nil } } func (me *tHashMap) Has(key interface{}) bool { ok, _, _, _, _ := me.findByKey(key) return ok } func (me *tHashMap) Remove(key interface{}) (ok bool, value interface{}) { ok, part, bucket, node, prev := me.findByKey(key) if ok { value = node.value bucket.removeAt(node, prev) // 縮容 if part.size <= 0 && part != me.partitions { me.removePartition(part) } me.size-- me.version++ return ok, value } else { return false, nil } } func (me *tHashMap) removePartition(part *tPartition) { prev := part.prev next := part.next if prev != nil { prev.next = next } if next != nil { next.prev = prev } if me.partitions == part { me.partitions = next } } func (me *tHashMap) Clear() { if me.IsEmpty() { return } part := newPartition(me.hasher, len(me.partitions.buckets)) me.partitions = part me.size = 0 me.version++ } func (me *tHashMap) Iterator() IMapIterator { return newHashMapIterator(me) } func (me *tHashMap) String() string { itemStrings := make([]string, 0) for p := me.partitions;p != nil;p = p.next { itemStrings = append(itemStrings, p.String()) } return fmt.Sprintf("s=%v,v=%v %s", me.size, me.version, strings.Join(itemStrings, " ")) }
哈希表分區(qū). 其大小呈2次冪
package hashmap import ( "fmt" "strings" ) type tPartition struct { hasher IHasher buckets []*tBucket bucketCount uint64 prev *tPartition next *tPartition size int threshhold int } func newPartition(hasher IHasher, bucketCount int) *tPartition { it := &tPartition{ hasher: hasher, buckets: make([]*tBucket, bucketCount), bucketCount: uint64(bucketCount), prev: nil, next: nil, size: 0, threshhold: bucketCount * 3 / 4, } for i,_ := range it.buckets { it.buckets[i] = newBucket(hasher) } return it } func (me *tPartition) putByKey(key interface{}, value interface{}) bool { hash := me.hasher.Hash(key) return me.putByKeyAndHash(key, value, hash) } func (me *tPartition) putByKeyAndHash(key interface{}, value interface{}, hash uint64) bool { if me.getBucketByHash(hash).put(key, value) { me.size++ return true } return false } func (me *tPartition) appendByKeyAndHash(key interface{}, value interface{}, hash uint64) { me.getBucketByHash(hash).append(key, value) me.size++ } func (me *tPartition) getBucketByKey(key interface{}) *tBucket { hash := me.hasher.Hash(key) return me.getBucketByHash(hash) } func (me *tPartition) getBucketByHash(hash uint64) *tBucket { return me.buckets[int(hash % me.bucketCount)] } func (me *tPartition) get(key interface{}) (bool,interface{}) { return me.getBucketByKey(key).get(key) } func (me *tPartition) findByKey(key interface{}) (ok bool,bucket *tBucket,node *tLinkedNode, prev *tLinkedNode) { bucket = me.getBucketByKey(key) ok,node,prev = bucket.find(key) return ok,bucket,node, prev } func (me *tPartition) findByKeyAndHash(key interface{}, hash uint64) (ok bool,bucket *tBucket,node *tLinkedNode, prev *tLinkedNode) { bucket = me.getBucketByHash(hash) ok,node,prev = bucket.find(key) return ok,bucket,node, prev } func (me *tPartition) remove(key interface{}) (bool, value interface{}) { ok,node := me.getBucketByKey(key).remove(key) if ok { me.size-- return true, node.value } else { return false, nil } } func (me *tPartition) nearlyFull() bool { return me.size >= me.threshhold } func (me *tPartition) String() string { itemStrings := make([]string, 0) for i,b := range me.buckets { if b.size > 0 { itemStrings = append(itemStrings, fmt.Sprintf("%v:%s", i, b.String())) } } return fmt.Sprintf("p[%s]", strings.Join(itemStrings, ",")) }
哈希桶, 每個桶內(nèi)的任何元素, 哈希值是一致的
package hashmap import ( "fmt" "strings" ) type tBucket struct { hasher IHasher nodes *tLinkedNode size int } type tLinkedNode struct { key interface{} value interface{} next *tLinkedNode } func newBucket(hasher IHasher) *tBucket { return &tBucket{ hasher: hasher, nodes: nil, size: 0, } } func newLinkedNode(key interface{}, value interface{}) *tLinkedNode { return &tLinkedNode{ key: key, value: value, next: nil, } } func (me *tBucket) put(key interface{}, value interface{}) bool { existed, node, _ := me.find(key) me.putAt(node, key, value) return !existed } func (me *tBucket) append(key interface{}, value interface{}) { me.putAt(nil, key, value) } func (me *tBucket) putAt(node *tLinkedNode, key interface{}, value interface{}) { if node != nil { node.value = value return } it := newLinkedNode(key, value) if me.nodes == nil { me.nodes = it } else { it.next = me.nodes me.nodes = it } me.size++ } func (me *tBucket) get(key interface{}) (bool, interface{}) { ok, node, _ := me.find(key) if ok { return true, node.value } return false, nil } func (me *tBucket) find(key interface{}) (ok bool, node *tLinkedNode, prev *tLinkedNode) { prev = nil for it:=me.nodes;it != nil;it = it.next { if me.hasher.Equals(it.key, key) { return true, it, prev } prev = it } return false, nil, nil } func (me *tBucket) remove(key interface{}) (bool, *tLinkedNode) { ok,node, prev := me.find(key) if !ok { return false, nil } me.removeAt(node, prev) return true, node } func (me *tBucket) removeAt(node *tLinkedNode, prev *tLinkedNode) { if prev == nil { me.nodes = node.next } else { prev.next = node.next } me.size-- } func (me *tBucket) String() string { itemStrings := make([]string, me.size) i := 0 for it := me.nodes;it != nil;it = it.next { itemStrings[i] = fmt.Sprintf("%v", it.key) i++ } return fmt.Sprintf("b[%v %s]", me.size, strings.Join(itemStrings, ",")) }
哈希表迭代器的實現(xiàn)
package hashmap import "errors" type tHashMapIterator struct { hashMap *tHashMap pindex *tPartition bindex int nindex *tLinkedNode version int visited int } func newHashMapIterator(hashMap *tHashMap) IMapIterator { return &tHashMapIterator{ hashMap: hashMap, pindex: hashMap.partitions, bindex: -1, nindex: nil, version: hashMap.version, visited: 0, } } func (me *tHashMapIterator) nextNode() *tLinkedNode { node := me.nindex for { if node == nil { bkt := me.nextBucket() if bkt == nil { return nil } else { me.nindex = bkt.nodes return me.nindex } } else { node = node.next if node != nil { me.nindex = node return node } } } } func (me *tHashMapIterator) nextBucket() *tBucket { part := me.pindex bi := me.bindex + 1 for { if bi >= len(part.buckets) { part = me.nextPartition() if part == nil { return nil } bi = 0 } bkt := part.buckets[bi] if bkt.nodes != nil { me.bindex = bi return bkt } bi++ } } func (me *tHashMapIterator) nextPartition() *tPartition { if me.pindex == nil { return nil } me.pindex = me.pindex.next return me.pindex } func (me *tHashMapIterator) More() bool { if me.version != me.hashMap.version { return false } return me.visited < me.hashMap.size } func (me *tHashMapIterator) Next() (err error, key interface{}, value interface{}) { if me.version != me.hashMap.version { return gConcurrentModificationError, nil, nil } if !me.More() { return gNoMoreElementsError, nil, nil } node := me.nextNode() if node == nil { return gNoMoreElementsError, nil, nil } else { me.visited++ return nil, node.key, node.value } } var gConcurrentModificationError = errors.New("concurrent modification error") var gNoMoreElementsError = errors.New("no more elements")
感謝各位的閱讀,以上就是“golang基本數(shù)據(jù)結(jié)構(gòu)與算法之如何使用哈希表”的內(nèi)容了,經(jīng)過本文的學(xué)習(xí)后,相信大家對golang基本數(shù)據(jù)結(jié)構(gòu)與算法之如何使用哈希表這一問題有了更深刻的體會,具體使用情況還需要大家實踐驗證。這里是億速云,小編將為大家推送更多相關(guān)知識點(diǎn)的文章,歡迎關(guān)注!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。