Bloom Filter is named after Burton Bloom.
What is bloom filter?
http://blog.csdn.net/v_july_v/article/details/6685894
Define K hash methods.
Given any data, we will have K hashed value.
Set these positions in a bit array as 1.
Check wheter some data is in the set?
Compute K value for this data, and check each position.
It contains "Negative False".
class BloomFilter<T> { interface Hash { int hash(T t); } bit[] bitArray; Hash[] hashMethods; void init(Iterator<T> data) { // Init a bit array. bitArray = initBitArray(); // Init k hash methods. hashMethods = initHashMethods(k); while (data.hasNext()) { T t = data.next(); for (Hash hash : hashMethods) { int pos = hash.hash(t); bitArray[pos] = 1; } } } boolean contains(T t) { for (Hash hash : hashMethods) { int pos = hash.hash(t); if (bitArray[pos] == 0) return false; } return true; } }
Keep Low Fault Rate?
Given max falut rate, Calculate the bit array size?
还有一个比较重要的问题,如 何根据输入元素个数n,确定位数组m的大小及hash函数个数。当hash函数个数k=(ln2)*(m/n)时错误率最小。在错误率不大于E的情况 下,m至少要等于n*lg(1/E)才能表示任意n个元素的集合。但m还应该更大些,因为还要保证bit数组里至少一半为0,则m应 该>=nlg(1/E)*lge 大概就是nlg(1/E)1.44倍(lg表示以2为底的对数)。