Quantcast
Channel: Forum SQL Server Database Engine
Viewing all articles
Browse latest Browse all 15694

Containstable queries not making sense.

$
0
0
I have a few questions related to using CONTAINSTABLE in a query that I hope someone can help with.

I am working on a project to add document search capabilities to my companies product using fulltext indexing. Part of this requirement is an ability to breakdown the component parts of of the search query and provide information on *why* documentX ranked higher than documentY.
This is a bit convoluted, but taking this (very simple) example - the user wishes to search for 2 skills - "HTML" and/or "XML".
The generated query looks a little like :-

Select DOC.DOC_ID, RANK1.RANK, RANK2.RANK, RANK3.RANK
from DOCS DOC
  inner join CONTAINSTABLE(docs, doc, 'HTML AND XML') as RANK1 on RANK1.DOC_ID=DOC.DOC_ID
  inner join CONTAINSTABLE(docs, doc, 'HTML') as RANK2 on RANK2.DOC_ID=DOC.DOC_ID 
  inner join CONTAINSTABLE(docs, doc, 'XML') as RANK3 on RANK3.DOC_ID=DOC.DOC_ID

This returns the "overall" rank, and a rank for the 2 component parts, so I can say this doc ranked XXX overall because it scored "rank1" for HTML and scored "rank2" for XML  etc....

My question on this part is about the values for the "overall rank". If the query contained an OR it always seems to return the highest of the "rankX" values, and if it doesnt, it returns the lowest.
e.g. for the example
for  java and word and excel and access - the overall ranking is 2 , java=36, word=2, excel=16 and access=36
for  java and word or excel and access - the overall ranking is 16 , java=36, word=2, excel=16 and access=36
for  (java and word) or (excel and access) - the overall ranking is 16 , java=36, word=2, excel=16 and access=36

So in the first example, regardless of what the other values are, the rank returned is always 2 (the score for "word"). My resultset has 100ish rows, all with a rank of < 5 for word, but all with ranks of 18-100 for the other 3 values - yet the "overall" rank always matched the "word" rank.....??
This doesnt feel right to me somehow, I would expect a different value as if the document ranked really highly for one value but low for the other, it doesnt feel right the value is clamped to the lowest? Or am I just understanding it wrong?
If I use "freetexttable" the overall rank is a little more meaningful - but unfortunately I also need to use weighting, which brings me to my next question . . .


This question is about rankings returned from the ISABOUT function.
In the following example,
select * from documents as DOC
    inner join containstable(docs,doc,'project') as doc0 on DOC.DOC_ID=doc0."key"
    inner join containstable(docs,doc,'ISABOUT (project weight (1.0))') as doc1 on DOC.DOC_ID=doc1."key"
    inner join containstable(docs,doc,'ISABOUT (project weight (0.5))') as doc2 on DOC.DOC_ID=doc2."key"
    inner join containstable(docs,doc,'ISABOUT (project weight (0.1))') as doc3 on DOC.DOC_ID=doc3."key"
    inner join containstable(docs,doc,'ISABOUT (project weight (0.0))') as doc4 on DOC.DOC_ID=doc4."key"
order by doc0.rank desc

The values I get from the doc1/2/3/4.RANK columns dont seem right.
In this example,
doc0.rank = 133
doc1.rank = 150
doc2.rank = 330
doc3.rank = 924
doc4.rank = 0

These values dont make any sense to me, as the rank seems to go UP when the documentation on ISABOUT says it goes down (I think it says somewhere the calculated rank is multiplied by the weight?).
Once again, is there something I missed or am I understanding it wrong?

Thanks in advance for any help into understanding the whys of this...






Viewing all articles
Browse latest Browse all 15694

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>