Alibaba笔试题:给定一段产品的英文描述,包含M个英文字母,每个英文单词以空格分隔,无其他标点符号;再给定N个英文单词关键字,请说明思路并编程实现方法String extractSummary(String description,String[] key words),目标是找出此产品描述中包含N个关键字(每个关键词至少出现一次)的长度最短的子串,作为产品简介输出。(不限编程语言)20分。
这道笔试题和编程之美最短摘要生成的方法类似,先来看看这些序列:
w0,w1,w2,w3,q0,w4,w5,q1,w6,w7,w8,q0,w9,q1
问题在于,如何一次把所有的关键词都扫描到,并且不遗漏。扫描肯定是无法避免的,但是如何把两次扫描的结果联系起来呢?这是一个值得考虑的问题。
沿用前面的扫描方法,再来看看。第一次扫描的时候,假设需要包含所有的关键词,从第一个位置w0处将扫描到w6处:
w0,w1,w2,w3,q0,w4,w5,q1,w6,w7,w8,q0,w9,q1
那么,下次扫描应该怎么办呢?先把第一个被扫描的位置挪到q0处。
w0,w1,w2,w3,q0,w4,w5,q1,w6,w7,w8,q0,w9,q1
然后把第一个被扫描的位置继续往后面移动一格,这样包含的序列中将减少了关键词q0。那么,我们便可以把第二个扫描位置往后移,这样就可以找到下一个包含所有关键词的序列。即从w4扫描到w9处,便包含了q1,q0:
w0,w1,w2,w3,q0,w4,w5,q1,w6,w7,w8,q0,w9,q1
这样,问题就和第一次扫描时碰到的情况一样了。依次扫描下去,在w中找出所有包含q的序列,并且找出其中的最小值,就可得到最终的结果。
- #include "stdafx.h"
- #include "iostream"
- #include <vector>
- #include <string>
- #include<set>
- #include <map>
- using namespace std;
- void FindMinLenAbstract()
- {
- int n; //number of document words
- int m; //number of keywords
- while (cin>>n>>m) {
- // input
- vector<string> seq;
- while (n--)
- *back_inserter(seq) = *istream_iterator<string>(cin);
- set<string> kwords;
- while (m--)
- *inserter(kwords, kwords.end()) = *istream_iterator<string>(cin);
- // find shortest abstract
- typedef vector<string>::iterator Vsit;
- //q is current scan range, and r is min abstract range
- pair<Vsit, Vsit> q(seq.begin(), seq.begin()), r(seq.begin(),seq.end());
- //record words that not being found between q.first and q.second
- set<string> notfound = kwords;
- //record words with an associate appearance count
- //that being found between q.first and q.second
- map<string, int> found;
- for(;;) {
- //still have keyword not being found
- if (!notfound.empty()) {
- //all conditions have being considered
- if (q.second == seq.end())
- break;
- set<string>::iterator it = notfound.find(*q.second);
- //current word is an not-found word
- if (it != notfound.end()) {
- ++found[*it];
- notfound.erase(it);
- }
- else {
- map<string, int>::iterator it2 = found.find(*q.second);
- if (it2 != found.end())
- ++(it2->second);
- }
- //next keyword in sequence
- ++(q.second);
- }
- //all keywords have being found
- else {
- // find an min range from q.first to q.second that
- // include all keywords.
- map<string, int>::iterator it = found.find(*q.first);
- if (it != found.end() && !--(it->second)) {
- size_t rlen = distance(r.first, r.second);
- size_t qlen = distance(q.first, q.second);
- if (!rlen || rlen > qlen) r = q;
- notfound.insert(it->first);
- found.erase(it);
- }
- ++(q.first);
- }
- }
- // output
- if (r.second == seq.end() && r.first == seq.begin())
- cout<<"No abstract available.";
- else {
- if (r.first != seq.begin())
- cout<<"... ";
- for (bool first = true; r.first != r.second; ++r.first, first = false) {
- if (!first)
- cout<<' ';
- cout<<*r.first;
- }
- if (r.second != seq.end())
- cout<<" ...";
- }
- cout<<'\n';
- }
- }
- int main(int argc, char* argv[])
- {
- FindMinLenAbstract();
- return 0;
- }