关于trieField的理解补充下3篇文档,相当的系统、全面!看相关文档连接,不解释。


http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html

http://blog.csdn.net/fancyerii/article/details/7256379

http://hadoopcn.iteye.com/blog/1550402

http://rdc.taobao.com/team/jm/archives/1699

public final class <b>NumericRangeQuery<a href="http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Number.html?is-external=true" title="class or interface in java.lang">Number</a>&gt;</b>
extends
MultiTermQuery

A
Query
thatmatchesnumericvalueswithina
specifiedrange.Tousethis,youmustfirstindexthenumeric
valuesusing
NumericField
(expert:
NumericTokenStream
).Ifyourtermsareinstead
textual,youshoulduse
TermRangeQuery
.
NumericRangeFilter
isthefilterequivalentof
thisquery.

YoucreateanewNumericRangeQuerywiththestaticfactory
methods,eg:

Query q = NumericRangeQuery.newFloatRange("weight", 0.03f, 0.10f, true, true);

matchesalldocumentswhosefloatvalued“weight”fieldranges
from0.03to0.10,inclusive.

TheperformanceofNumericRangeQueryismuchbetterthanthe
corresponding
TermRangeQuery
becausethenumberoftermsthat
mustbesearchedisusuallyfarfewer,thankstotrieindexing,
describedbelow.

Youcanoptionallyspecifya
precisionStep
whencreatingthisquery.Thisis
necessaryifyou’vechangedthisconfigurationfromitsdefault(4)
duringindexing.Lowervaluesconsumemorediskspacebutspeedup
searching.Suitablevaluesarebetween1and8.A
goodstartingpointtotestis4,whichisthedefaultvalue
forallNumeric*classes.See
below
fordetails.

Thisquerydefaultsto
MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT
for32bit
(int/float)rangeswithprecisionStep≤8and64bit(long/double)
rangeswithprecisionStep≤6.Otherwiseituses
MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE
asthenumberof
termsislikelytobehigh.Withprecisionstepsof≤4,thisquery
canberunwithoneoftheBooleanQueryrewritemethodswithout
changingBooleanQuery’sdefaultmaxclausecount.

Howitworks

SeethepublicationaboutpanFMP,wherethisalgorithmwasdescribed
(referredtoasTrieRangeQuery):

“
Schindler, U, Diepenbroek, M, 2008.
Generic XML-based Framework for Metadata Portals.
Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023

Aquotefromthispaper:BecauseApacheLuceneisa
full-textsearchengineandnotaconventionaldatabase,itcannot
handlenumericalranges(e.g.,fieldvalueisinsideuserdefined
bounds,evendatesarenumericalvalues).Wehavedevelopedan
extensiontoApacheLucenethatstoresthenumericalvaluesina
specialstring-encodedformatwithvariableprecision(all
numericalvalueslikedoubles,longs,floats,andintsare
convertedtolexicographicsortablestringrepresentationsand
storedwithdifferentprecisions(foramoredetaileddescription
ofhowthevaluesarestored,see
NumericUtils
).Arangeisthendividedrecursively
intomultipleintervalsforsearching:Thecenteroftherangeis
searchedonlywiththelowestpossibleprecisioninthe
trie,whiletheboundariesarematchedmoreexactly.This
reducesthenumberoftermsdramatically.

Forthevariantthatstoreslongvaluesin8different
precisions(eachreducedby8bits)thatusesalowestprecisionof
1byte,theindexcontainsonlyamaximumof256distinctvaluesin
thelowestprecision.Overall,arangecouldconsistofa
theoreticalmaximumof7*255*2+255=3825distinct
terms(whenthereisatermforeverydistinctvalueofan
8-byte-numberintheindexandtherangecoversalmostallofthem;
amaximumof255distinctvaluesisusedbecauseitwouldalwaysbe
possibletoreducethefull256valuestoonetermwithdegraded
precision).Inpractice,wehaveseenupto300termsinmostcases
(indexwith500,000metadatarecordsandauniformvalue
distribution).

Precision
Step

YoucanchooseanyprecisionStepwhenencoding
values.Lowerstepvaluesmeanmoreprecisionsandsomoretermsin
index(andindexgetslarger).Ontheotherhand,themaximum
numberoftermstomatchreduces,whichoptimizedqueryspeed.The
formulatocalculatethemaximumtermcountis:

n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )

(thisformulaisonlycorrect,when
bitsPerValue/precisionStepisaninteger;inother
cases,thevaluemustberoundedupandthelastsummandmust
containthemoduloofthedivisionasprecisionstep)
.For
longsstoredusingaprecisionstepof4,n=15*15*2+15=
465
,andforaprecisionstepof2,n=31*3*2+3=
189
.Butthefastersearchspeedisreducedbymoreseeking
inthetermenumoftheindex.Becauseofthis,theideal
precisionStepvaluecanonlybefoundoutbytesting.
Important:Youcanindexwithalowerprecisionstepvalue
andtestsearchspeedusingamultipleoftheoriginalstep
value.

GoodvaluesforprecisionSteparedependingon
usageanddatatype:

  • Thedefaultforalldatatypesis4,whichisused,when
    noprecisionStepisgiven.

  • Idealvalueinmostcasesfor64bitdatatypes
    (long,double)is6or8.

  • Idealvalueinmostcasesfor32bitdatatypes
    (int,float)is4.

  • Forlowcardinalityfieldslargerprecisionstepsaregood.If
    thecardinalityis<100,itisfairtouse
    Integer.MAX_VALUE
    (seebelow).

  • Steps≥64forlong/doubleand≥32for
    int/floatproducesonetokenpervalueintheindexand
    queryingisasslowasaconventional
    TermRangeQuery
    .Butitcanbeusedtoproduce
    fields,thataresolelyusedforsorting(inthiscasesimplyuse

    Integer.MAX_VALUE
    asprecisionStep).
    Using
    NumericFields
    forsortingisideal,because
    buildingthefieldcacheismuchfasterthanwithtext-only
    numbers.Thesefieldshaveonetermpervalueandthereforealso
    workwithtermenumerationforbuildingdistinctlists(e.g.facets
    /preselectedvaluestosearchfor).Sortingisalsopossiblewith
    rangequeryoptimizedfieldsusingoneoftheabove
    precisionSteps.

ComparisonsofthedifferenttypesofRangeQueriesonanindex
withabout500,000docsshowedthat
TermRangeQuery
inbooleanrewritemode(with
raised
BooleanQuery
clausecount)tookabout30-40secs
tocomplete,
TermRangeQuery
inconstantscorefilterrewrite
modetook5secsandexecutingthisclasstook<100msto
complete(onanOpteron64machine,Java1.5,8bitprecisionstep).
Thisquerytypewasdevelopedforageographicportal,wherethe
performancefore.g.boundingboxesorexactdate/timestampsis
important.

Since:
2.9
SeeAlso:

SerializedForm