下面的文章专门针对搜索引擎里的倒排列表 sorted sets研究交集算法,思路类似快排,非常值得一看
www.cs.ucr.edu/~stelo/cpm/cpm04/25_Baeza-yates.pdf
合并sorted sequence算法:
https://github.com/rklaehn/rklaehn.github.io/blob/master/_posts/2016-01-05-binarymerge.md
汇总资料:https://github.com/TechConf/CodeMash2016/blob/master/Great%20Galloping%20Cuckoos-%20Algorithms%20Faster%20than%20log(n)/index.html
关键信息:
| ## Comparisons of Set Intersections |
| |
| <small>Excerpted from [Faster Adaptive Set Intersections for Text Searching](http://www.cs.toronto.edu/~tl/papers/wea06.pdf)</small> |
| |
| Algorithm | # of comparisons |
| -----------|----------------: |
| Sequential | 119479075 |
| Adaptive | 83326341 |
| Small Adaptive | 68706234 |
| Interpolation Sequential | 55275738 |
| Interpolation Adaptive | 58558408 |
| |
| |
| </markdeep></section><section><markdeep> |
| ## Comparisons of Set Intersections |
| |
| <small>Excerpted from [Faster Adaptive Set Intersections for Text Searching](http://www.cs.toronto.edu/~tl/papers/wea06.pdf)</small> |
| |
| Algorithm | # of comparisons |
| -----------|----------------: |
| Sequential | 119479075 |
| Interpolation Small Adaptive | 44525318 |
| Extrapolation Small Adaptive | 50018852 |
| Extrapolate Many Small Adaptive | 44087712 |
| Extrapolate Ahead Small Adaptive | 43930174 |
## Resources: Sets | |
| - [A Fast Set Intersection Algorithm for Sorted Sequences](http://www.cs.ucr.edu/~stelo/cpm/cpm04/25_Baeza-yates.pdf) |
| - [Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences](https://cs.uwaterloo.ca/~ajsaling/papers/paper-spire.pdf) |
| - [Experimental Comparison of Set Intersection Algorithms for Inverted Indexing](http://ceur-ws.org/Vol-1003/58.pdf) |
| - [Fast Set Intersection in Memory](http://research.microsoft.com/pubs/142850/p255-dingkoenig.pdf) |
| - [Faster Adaptive Set Intersections for Text Searching](http://www.cs.toronto.edu/~tl/papers/wea06.pdf) |
| - [Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions](http://www.vldb.org/pvldb/vol8/p293-inoue.pdf) |
| - [SIMD Compression and the Intersection of Sorted Integers](http://arxiv.org/abs/1401.6399) |
https://github.com/lemire/SIMDCompressionAndIntersection
A C++ library to compress and intersect sorted lists of integers using SIMD instructions
里面提及的一些资料:
Documentation
- Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, Software Practice & Experience 46 (6), 2016 http://arxiv.org/abs/1401.6399
- Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract
- Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387
- Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen, A General SIMD-based Approach to Accelerating Compression Algorithms, ACM Transactions on Information Systems 33 (3), 2015.http://arxiv.org/abs/1502.01916
This work has also inspired other work such as...
- T. D. Wu, Bitpacking techniques for indexing genomes: I. Hash tables, Algorithms for Molecular Biology 11 (5), 2016.http://almob.biomedcentral.com/articles/10.1186/s13015-016-0069-5
提及较多的:
https://github.com/Randl/CS/tree/master/Hwang-Lin