jeudi 18 mai 2017

Lucene/Solr Test inconsistent endOffset

I'm using BaseTokenStreamTestCase to perform some tests... against a custom TokenFilter.

The test is failing in an inexplicable way. You can see from my debug output, that the token it's complaining about, has an endOffset of 17...

inconsistent endOffset 1 pos=1 posLen=1 token=hello expected:<11> but was:<17>

   original: wheel chair hello there foo bar
  increment:      1        1     1      1   
     tokens: wheel chair hello there foo bar
  positions: ----------- ----- ----- -------
    lengths:      2        1     1      2   
   sequence:      1        2     3      4   
             0123456789012345678901234567890
                      10        20        30
  start-end: 1:[0-11], 2:[12-17], 3:[18-23], 4:[24-31]

Heres the test code:

assertAnalyzesTo(analyzer, input,
        new String[] {"wheel chair", "hello", "there", "foo bar"},
        new int[] {0, 12, 18, 24},  // start offsets
        new int[] {11, 17, 23, 31}, // end offsets
        null,                       // types
        new int[] {1, 1, 1, 1},     // positionIncrement
        new int[] {2, 1, 1, 2});    // positionLength

Why does it think the 2nd token should end at 11 ?

Aucun commentaire:

Enregistrer un commentaire