Disjunction
Max
析取最大(并集)


本质多域联合搜索,并且不同域指定不同的权重,命中时取最大得分域结果作为结果得分。与直接多域boost求和是完全不同的结果。使用起来非常复杂,需要debugquery
看结果,反复尝试!


http://wiki.apache.org/solr/DisMax

http://searchhub.org/dev/2010/05/23/whats-a-dismax/



What’sa“DisMax”?Posted
byhossman

Theterm“dismax”gets
tossedaround(
被抛出来)on
theSolrlistsfrequently,whichcanbefairlyconfusingtonew
users.Itoriginatedasashorthandnameforthe
DisMaxRequestHandler
(whichInamedafterthe
DisjunctionMaxQueryParser
,whichInamedafterthe
DisjunctionMaxQuery
classthatitusesheavily).Inrecent
years,theDisMaxRequestHandlerandtheStandardRequestHandlerwere
bothrefactoredinto(
重构)
asingleSearchHandlerclass,and
nowtheterm“dismax”usuallyreferstothe
DisMaxQParser
.


注解:dismax现在对应于DisMaxQParser,而DismaxRequestHandlerstandardRequestHandler重构到SearchHandler


ClearasMudd,
right?

Regardlessofwhetheryou
usetheDisMaxRequestHandlerviatheqt=dismax
parameter,orusetheSearchHandlerwiththeDisMaxQParservia
defType=dismaxtheendresultisthatyour
qparametergetsparsedbythe
DisjunctionMaxQueryParser.


注解:qt=dismax,采取DisMaxRequestHandler,defType=dismax,SearchHandler中使用DisMaxQParser,二者q的参数采取DisJunctionMaxQueryParser解析


The
originalgoals
ofdismax(whichevermeaningyoumightinfer)
haveneverchanged:

…supportsasimplified
versionoftheLuceneQueryParsersyntax.Quotescanbeusedto
groupphrases(
分组短语),and
+/-canbeusedtodenotemandatory(
强制性、必选的)andoptional(可选的)clauses…butallotherLucenequeryparser
specialcharactersareescapedtosimplifytheuserexperience.The
handlertakesresponsibilityforbuildingagoodqueryfromthe
user’sinputusingBooleanQueriescontainingDisjunctionMaxQueries
acrossfieldsandboostsyouspecifyItalsoallowsyoutoprovide
additionalboostingqueries,boostingfunctions,andfiltering
queriestoartificially(
人工)affecttheoutcomeofallsearches.Theseoptionscanall
bespecifiedasdefaultparametersforthehandlerinyour
solrconfig.xmloroverriddentheSolrqueryURL.

Inshort:Youworryabout
whatfieldsandboostsyouwanttousewhenyouconfigureit,your
usersjustgiveyouwordsw/oworryingtoomuchabout
syntax.


注解:dismax句柄主要负责使用布尔查询封装DisjunctionMaxQueries,同时允许手工执行query激励、函数激励、过滤query影响最终搜索结果。所有参数可以通过在solrconfig.xml中配置,作为全局查询用,也可以通过url添加参数,在每一次或者每一类查询中动态使用。


Themagicofdismax(inmy
opinion)comesfromthequerystructureitproduces.Whatit
essentiallyboilsdowntois
matrixmultiplication
:aonecolumnmatrixofeach“chunk”of
youruser’sinput,multipliedbyaonerowmatrixofthe
qffieldstoproduceabigmatrixofeveryfield:chunk
permutation(
排列).
ThematrixisthenturnedintoaBooleanQueryconsistingof
DisjunctionMaxQueriesforeachrow
inthematrix.DisjunctionMaxQueryisusedbecause
it’sscoreisdeterminedbythemaximumscoreofit’s
subclauses
—insteadofthesumlikeaBooleanQuery—sonoone
wordfromtheuserinputdominatesthefinalscore.Thebestwayto
explainthisiswithanexample,solet’sconsiderthefollowing
input…

span lang="EN-US">defType = dismax</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp; </span>mm = 50%</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp; </span>qf = features^2 name^3</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>q = +"apache solr" search server</span>

Firstoff,weconsiderthe
“markup”charactersoftheparserthatappearinthis
qstring:

  • whitespace–dividinginput
    stringintochunk(

    分词
    )

  • quotes–makesasinglephrase
    chunk(

    括号
    )

  • +–makesachunkmandatory
    (

    组合关系
    )

Sowehave3“chunks”ofuserinput:

  • “apachesolr”(must
    match)

  • “search”(should
    match)

  • “server”(should
    match>

Ifwe“multiply”thatwith
ourqflist(features,name)wegeta
matrixlikethis…

features:”apache
solr”

name:”apache
solr”

(mustmatch)

features:”search”

name:”search”

(shouldmatch)

features:”server”

name:”server”

(shouldmatch)

Ifwethenfactorinthe
mmparamtodetermingthe“minimumnumberof
‘ShouldMatch’clausesthat(ahem)mustmatch”(50%of2==1)we
getthefollowingquerystructure(inpsuedo-code)…

<span lang="EN-US">q = BooleanQuery(</span>
<span lang="EN-US"><span>&nbsp; </span><b>minNumberShouldMatch</b> =&gt; 1,</span>
<span lang="EN-US"><span>&nbsp; </span>booleanClauses =&gt; ClauseList(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>MustMatch(DisjunctionMaxQuery(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>PhraseQuery("features","apache solr")^2,</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>PhraseQuery("name","apache solr")^3)</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>),</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>ShouldMatch(DisjunctionMaxQuery(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>TermQuery("features","search")^2,</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>TermQuery("name","search")^3)</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>),</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>ShouldMatch(DisjunctionMaxQuery(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>TermQuery("features","server")^2,</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>TermQuery("name","server")^3))</span>
<span lang="EN-US">));</span>
<span lang="EN-US">&nbsp;</span>
<span style="font-size:9.0pt"><b>注解:<span lang="EN-US">boolean</span>查询这个是最最基本的原子查询,其他高级查询都是基于这个查询的组合、封装,<span lang="EN-US">Dismax</span>也是如此。从<span lang="EN-US">dismax qp</span>分解过程和定义看,<span lang="EN-US">dismax</span>也是分解为<span lang="EN-US">boolean</span>查询,并且<span lang="EN-US">field</span>激励也同一般域<span lang="EN-US">boost</span>一致,但是不同的时候<span lang="EN-US">dismax</span>是以最大得分作为最终得分,而一般多域独立<span lang="EN-US">boost</span>时候是求和得分。<br>
<br></b></span>

Withmesofar
right?

Wherepeopletendtoget
trippedup(
绊倒),isinthinkingabouthowSolr’sper-fieldanalysis
configuration(inschema.xml)impactsallofthis.Ourexample
abovewasprettystraightforward,butletsconsiderforamoment
whatmighthappenif:

  • Thename
    fieldusestheWordDelimiterFilter单词分割符过滤器atquerytimebutfeaturesdoesnot.

  • Thefeaturesfieldisconfiguredsothat“the”isastopword,but
    nameis
    not.

Nowlet’slookatwhatwe
getwhenourinputparametersarestructurallysimilartowhatwe
hadbefore,butjustdifferentenoughtoforWordDelimiterFilter
andStopFiltertocomeintoplay…

<span lang="EN-US">defType = dismax</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp; </span>mm = 50%</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp; </span>qf = features^2 name^3</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>q = +"apache solr" the search-server</span>

Ourresultingqueryisgoing

tobesomethinglike…

<span lang="EN-US">q = BooleanQuery(</span>
<span lang="EN-US"><span>&nbsp; </span>minNumberShouldMatch =&gt; 1,</span>
<span lang="EN-US"><span>&nbsp; </span>booleanClauses =&gt; ClauseList(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span><span style="color:red">MustMatch</span>(DisjunctionMaxQuery(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>PhraseQuery("features","apache solr")^2,</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>PhraseQuery("name","apache solr")^3)</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>),</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span><span style="color:red">ShouldMatch</span>(DisjunctionMaxQuery(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>TermQuery("name","the")^3)</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span>),</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp; </span><span style="color:red">ShouldMatch</span>(DisjunctionMaxQuery(</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>TermQuery("features","search-server")^2,</span>
<span lang="EN-US"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>PhraseQuery("name","search server")^3))</span>
<span lang="EN-US"><span>&nbsp; </span>));</span>

Theuseof
WordDelimiterFilterhasn’tchangedthingsverymuch:featuresis
treating“search-server”asasingleTerm,whileinthe
namefieldwearesearchingforthephrase“search
server”—hopefullythisshouldn’tsurpriseanyonegiventheuseof
WordDelimiterFilterforthenamefield(presumablythat’swhyit’s
beingused).ThisDisjunctionMaxQuerystill“makessense”,but
otherfieldswithoddanalysisthatproduceless/moreTokensthena
“typical”fieldforthesamethunkmightproducequeriesthat
aren’taseasilytounderstand.Inparticularconsiderwhathas
happenedinourexamplewiththeword“the”:Because“the”isa
stopwordinthefeaturesfield,noQueryobjectis
producedforthatfield/chunkcombination.ButaQueryisproduced
forthenamefield,whichmeansthetotalnumberof
“ShouldMatch”clausesinourtoplevelqueryisstill2soour
minNumberShouldMatchisstill1(50%of2==1).

Thistypeofsituationtends
toconfusealotofpeople:since“the”isastopwordinone
field,theydon’texpectittomatterinthefinalquery—butas
longasatleastoneqffieldproducesaTokenforit
(nameinourexample)itwillbeincludedinthefinal
query,andwillcontributetothecountof“ShouldMatch”
clauses.

So,what’sthetakeaway
fromallofthis?

DisMaxisacomplicated
creature.Whenusingit,youneedtoconsiderallofit’s
options
carefully,andlookatthedebugQuery=true
outputwhileexperimentingwithdifferentquerystringsand
differentanalysisconfigurationstomakereallysureyou
understandhowqueriesfromyouruserswillbeparsed.

注解:dismax构造非常复杂,使用的时候需要仔细考虑所有选项,同时,开启debugQuery=true,针对不同的查询串和分词器。

Forqf(QueryFields),pf(PhraseFields),
mm(Minimum‘Should’Match),andtie(TieBreaker),
see:theSolr
WikiDisMaxQParserPlugin
.


Solr:
ForcingitemswithallquerytermstothetopofaSolrsearch
RobotLibrarian

http://robotlibrarian.billdueber.com/solr-forcing-items-with-all-query-terms-to-the-top-of-a-solr-search/



LucidImaginationSolrPoweredISFDB–Part
#10:TweakingRelevancy

http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/

LucidImaginationSolrPoweredISFDB–Part
#11:UsingDisMax

http://searchhub.org/dev/2011/08/08/solr-powered-isfdb-part-11/


http://tm.durusau.net/?p=21573

Using
Solr’sDismaxTieParameterAnotherWordForIt
(tie
breake
配合断路器)

http://java.dzone.com/articles/using-solrs-dismax-tie


SolrPoweredISFDB–Part#11:Using
DisMax

http://searchhub.org/dev/2011/06/20/solr-powered-isfdb-part-10/