关于trieField的理解补充下3篇文档,相当的系统、全面!看相关文档连接,不解释。
http://blog.csdn.net/fancyerii/article/details/7256379
http://hadoopcn.iteye.com/blog/1550402
http://rdc.taobao.com/team/jm/archives/1699
public final class <b>NumericRangeQuery<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Number.html?is-external=true" title="class or interface in java.lang">Number</a>></b>
- extends
MultiTermQuery
AQuery
thatmatchesnumericvalueswithina
specifiedrange.Tousethis,youmustfirstindexthenumeric
valuesusingNumericField
(expert:NumericTokenStream
).Ifyourtermsareinstead
textual,youshoulduseTermRangeQuery
.NumericRangeFilter
isthefilterequivalentof
thisquery.
YoucreateanewNumericRangeQuerywiththestaticfactory
methods,eg:
Query q = NumericRangeQuery.newFloatRange("weight", 0.03f, 0.10f, true, true);
matchesalldocumentswhosefloatvalued“weight”fieldranges
from0.03to0.10,inclusive.
TheperformanceofNumericRangeQueryismuchbetterthanthe
correspondingTermRangeQuery
becausethenumberoftermsthat
mustbesearchedisusuallyfarfewer,thankstotrieindexing,
describedbelow.
YoucanoptionallyspecifyaprecisionStep
whencreatingthisquery.Thisis
necessaryifyou’vechangedthisconfigurationfromitsdefault(4)
duringindexing.Lowervaluesconsumemorediskspacebutspeedup
searching.Suitablevaluesarebetween1and8.A
goodstartingpointtotestis4,whichisthedefaultvalue
forallNumeric*
classes.See
belowfordetails.
Thisquerydefaultsto
MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULTfor32bit
(int/float)rangeswithprecisionStep≤8and64bit(long/double)
rangeswithprecisionStep≤6.Otherwiseituses
MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITEasthenumberof
termsislikelytobehigh.Withprecisionstepsof≤4,thisquery
canberunwithoneoftheBooleanQueryrewritemethodswithout
changingBooleanQuery’sdefaultmaxclausecount.
Howitworks
SeethepublicationaboutpanFMP,wherethisalgorithmwasdescribed
(referredtoasTrieRangeQuery
):
“ Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023
Aquotefromthispaper:BecauseApacheLuceneisa
full-textsearchengineandnotaconventionaldatabase,itcannot
handlenumericalranges(e.g.,fieldvalueisinsideuserdefined
bounds,evendatesarenumericalvalues).Wehavedevelopedan
extensiontoApacheLucenethatstoresthenumericalvaluesina
specialstring-encodedformatwithvariableprecision(all
numericalvalueslikedoubles,longs,floats,andintsare
convertedtolexicographicsortablestringrepresentationsand
storedwithdifferentprecisions(foramoredetaileddescription
ofhowthevaluesarestored,seeNumericUtils
).Arangeisthendividedrecursively
intomultipleintervalsforsearching:Thecenteroftherangeis
searchedonlywiththelowestpossibleprecisioninthe
trie,whiletheboundariesarematchedmoreexactly.This
reducesthenumberoftermsdramatically.
Forthevariantthatstoreslongvaluesin8different
precisions(eachreducedby8bits)thatusesalowestprecisionof
1byte,theindexcontainsonlyamaximumof256distinctvaluesin
thelowestprecision.Overall,arangecouldconsistofa
theoreticalmaximumof7*255*2+255=3825
distinct
terms(whenthereisatermforeverydistinctvalueofan
8-byte-numberintheindexandtherangecoversalmostallofthem;
amaximumof255distinctvaluesisusedbecauseitwouldalwaysbe
possibletoreducethefull256valuestoonetermwithdegraded
precision).Inpractice,wehaveseenupto300termsinmostcases
(indexwith500,000metadatarecordsandauniformvalue
distribution).
Precision
Step
YoucanchooseanyprecisionStep
whenencoding
values.Lowerstepvaluesmeanmoreprecisionsandsomoretermsin
index(andindexgetslarger).Ontheotherhand,themaximum
numberoftermstomatchreduces,whichoptimizedqueryspeed.The
formulatocalculatethemaximumtermcountis:
n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
(thisformulaisonlycorrect,whenbitsPerValue/precisionStep
isaninteger;inother
cases,thevaluemustberoundedupandthelastsummandmust
containthemoduloofthedivisionasprecisionstep).For
longsstoredusingaprecisionstepof4,n=15*15*2+15=
,andforaprecisionstepof2,
465n=31*3*2+3=
.Butthefastersearchspeedisreducedbymoreseeking
189
inthetermenumoftheindex.Becauseofthis,theidealprecisionStep
valuecanonlybefoundoutbytesting.
Important:Youcanindexwithalowerprecisionstepvalue
andtestsearchspeedusingamultipleoftheoriginalstep
value.
GoodvaluesforprecisionStep
aredependingon
usageanddatatype:
Thedefaultforalldatatypesis4,whichisused,when
noprecisionStep
isgiven.Idealvalueinmostcasesfor64bitdatatypes
(long,double)is6or8.Idealvalueinmostcasesfor32bitdatatypes
(int,float)is4.Forlowcardinalityfieldslargerprecisionstepsaregood.If
thecardinalityis<100,itisfairtouseInteger.MAX_VALUE
(seebelow).Steps≥64forlong/doubleand≥32for
int/floatproducesonetokenpervalueintheindexand
queryingisasslowasaconventionalTermRangeQuery
.Butitcanbeusedtoproduce
fields,thataresolelyusedforsorting(inthiscasesimplyuseInteger.MAX_VALUE
asprecisionStep
).
UsingNumericFields
forsortingisideal,because
buildingthefieldcacheismuchfasterthanwithtext-only
numbers.Thesefieldshaveonetermpervalueandthereforealso
workwithtermenumerationforbuildingdistinctlists(e.g.facets
/preselectedvaluestosearchfor).Sortingisalsopossiblewith
rangequeryoptimizedfieldsusingoneoftheaboveprecisionSteps
.
ComparisonsofthedifferenttypesofRangeQueriesonanindex
withabout500,000docsshowedthatTermRangeQuery
inbooleanrewritemode(with
raisedBooleanQuery
clausecount)tookabout30-40secs
tocomplete,TermRangeQuery
inconstantscorefilterrewrite
modetook5secsandexecutingthisclasstook<100msto
complete(onanOpteron64machine,Java1.5,8bitprecisionstep).
Thisquerytypewasdevelopedforageographicportal,wherethe
performancefore.g.boundingboxesorexactdate/timestampsis
important.
- Since:
- 2.9
- SeeAlso:
SerializedForm