哈希表】 哈希算法和哈希表的区别

2022-01-14 18:27发布

9条回答
猫的想法不敢猜
2022-01-20 13:41

哈希表(也称为哈希图)是一种将键映射到值的数据结构。 它是一种称为散列的技术,另一部分是散列函数。 哈希函数是一种算法,它生成一个索引,该索引可在哈希表中找到或存储值。

Some important notes about hash tables:

有关哈希表的一些重要说明:

  1. Values are not stored in a sorted order.


    值未按排序顺序存储。

  2. You mush account for potential collisions. This is usually done with a technique called chaining. Chaining means to create a linked list of values, the keys of which map to a certain index.


    您必须考虑潜在的碰撞。 通常使用称为链接的技术来完成此操作。 链接意味着创建值的链接列表,其键映射到某个索引。

哈希表的实现 (Implementation of a hash table)

The basic idea behind hashing is to distribute key/value pairs across an array of placeholders or "buckets" in the hash table.

散列背后的基本思想是在散列表中的一组占位符或“存储桶”之间分布键/值对。

A hash table is typically an array of linked lists. When you want to insert a key/value pair, you first need to use the hash function to map the key to an index in the hash table. Given a key, the hash function can suggest an index where the value can be found or stored:

哈希表通常是链接列表的数组。 当您要插入键/值对时,首先需要使用哈希函数将键映射到哈希表中的索引。 给定一个键,哈希函数可以建议一个索引,可以在其中找到或存储该值:

index = f(key, array_size)

This is often done in two steps:

这通常分两个步骤完成:

hash = hashfunc(key)index = hash % array_size

Using this method, hash is independent of the size of the hash table. hash is reduced to an index – a number between 0, the start of the array, and array_size - 1, the end of the array – using the modulo (%) operator.

使用此方法, hash与哈希表的大小无关。 使用模(%)运算符将hash减少为一个索引-一个介于0(在数组的开头)和array_size - 1 (在数组的结尾)之间的数字。

Consider the following string, S:

考虑以下字符串S :

string S = “ababcd”

You need to count the frequency of all the characters in S. The easiest way to do this is to iterate through all the possible characters and count the frequency of each, one by one.

您需要计算S中所有字符的频率。 最简单的方法是遍历所有可能的字符并逐个计数每个字符的频率。

This works, but it's slow – the time complexity of such an approach is O(26*N), with N being the size of the string S multiplied by 26 possible characters from A-Z.

这可行,但是很慢–这种方法的时间复杂度为O(26 * N),其中N是字符串S大小乘以AZ中的26个可能的字符。

void countFre(string S){for(char c = ‘a’;c <= ‘z’;++c){int frequency = 0;for(int i = 0;i < S.length();++i)if(S[i] == c)frequency++;cout << c << ‘ ‘ << frequency << endl;}}

Output:

输出:

a 2b 2c 1d 1e 0f 0…z 0

Let's take a look at a solution that uses hashing.

让我们看一下使用哈希的解决方案。

Take an array and use the hash function to hash the 26 possible characters with indices of the array. Then iterate over S and increase the value of the current character of the string with the corresponding index for each character.

取得一个数组,并使用哈希函数对带有数组索引的26个可能的字符进行哈希处理。 然后遍历S并使用每个字符的相应索引增加字符串的当前字符的值。

The complexity of this hashing approach is O(N), where N is the size of the string.

这种哈希方法的复杂度为O(N),其中N是字符串的大小。

int Frequency[26];int hashFunc(char c){return (c - ‘a’);}void countFre(string S){for(int i = 0;i < S.length();++i){int index = hashFunc(S[i]);Frequency[index]++;}for(int i = 0;i < 26;++i)cout << (char)(i+’a’) << ‘ ‘ << Frequency[i] << endl;}

Output

输出量

a 2b 2c 1d 1e 0f 0…z 0

哈希冲突 (Hash Collisions)

Since your hash map will probably be significantly smaller than the amount of data you're processing, hash collisions are unavoidable. There are two main approaches to handling collisions: chaining and open addressing.

由于您的哈希映射可能会大大小于您正在处理的数据量,因此哈希冲突不可避免。 处理冲突有两种主要方法: 链接开放式寻址 。

链式 (Chaining)

As mentioned earlier, chaining means that each key/value pair in the hash table, the value is a linked list of data rather than a single cell.

如前所述,链接意味着哈希表中的每个键/值对,值都是数据的链表,而不是单个单元格。

For example, imagine that the key 152 holds the value "John Smith". If the value "Sandra Dee" is added to the same key, "Sandra Dee" is added as another element to key 152, just after "John Smith".

例如,假设键152的值是“ John Smith”。 如果将值“ Sandra Dee”添加到同一键,则在“ John Smith”之后,将“ Sandra Dee”作为另一个元素添加到键152。

152: [["John Smith", "p01"]]...152: [["John Smith", "p01"] ["Sandra Dee", "p02"]]

The main drawback of chaining is the increase in time complexity. Instead of 0(1) as with a regular hash table, each lookup will take more time since we need to traverse each linked list to find the correct value.

链接的主要缺点是时间复杂度的增加。 代替常规哈希表的0(1),每次查找将花费更多时间,因为我们需要遍历每个链表以找到正确的值。

开放式寻址 (Open addressing)

Open addressing means that, once a value is mapped to a key that's already occupied, you move along the keys of the hash table until you find one that's empty. For example, if "John Smith" was mapped to 152, "Sandra Dee" will be mapped to the next open index:

开放式寻址意味着,将值映射到已被占用的键之后,您将沿着哈希表的键移动,直到找到一个空的键。 例如,如果“ John Smith”映射到152,则“ Sandra Dee”将映射到下一个打开的索引:

152: ["John Smith", "p01"]...152: ["John Smith", "p01"],153: ["Sandra Dee", "p02"]

The main drawback to open addressing is that, when you look up values, they might not be at the key map you expect them at. Instead, you have to traverse different parts of the hash table to find the value you're looking for.

开放式寻址的主要缺点是,当您查找值时,它们可能不在您期望的键映射中。 相反,您必须遍历哈希表的不同部分以查找所需的值。


一周热门 更多>