Relieving@Sharing: February 2013

Friday, February 22, 2013

Muddiest Point after week 6

1. Are there any existing system that can evaluate an Information Retrieval Model, for example, in the final project, where we are supposed to construct one model. How we can claim it is good and bad?
2. How to evaluate multilingual information retrieval model?

Relevance feedback

Relevance feedback is a information retrieval system feature that helps improve the search query by explicit supplies from the user (explicit feedback), or by observing the user system interaction behavior (implicit feedback), or by a method of automatic local analysis (blind feedback or pseudo feedback). The relevance feedback is very helpful at either adjusting the weights of original query terms or adding new terms that are more close to what user search for.

Friday, February 15, 2013

Muddiest Point after week 5

As in class, Dr. He talked about two language models: Unigram or higher-order models and Multinomial or Multiple-Bernoulli. He also talked about three models for ranking: Query-likelihood, Document-likelihood, and Divergence of query and document models. I am wondering what kind of factors do you need to take into considerations when determining the language models or the ranking models?

One criticism for Information Retrieval Evaluation

I think IR evaluation needs to come from the user side. It is generally difficult in terms of different users may perceive different understanding of the returning document set and thus have the different interpretation of which document is relevant. Since it's difficult to dismiss because of the subjectivieness associated with the task of deciding the relevance, it lacks a solid formal framework as a basic foundation. I guess that's why User-Oriented Measures is used under this consideration. I am wondering what the circumstances that we use different measures, like The Harmonic Mean, The E Measure?

Friday, February 1, 2013

Modeling

I listed the following comparisons as my note for classic modeling.

Boolean Model

Advantages:

Allows for logic
Provides all that has been matched

Disadvantages

Has no particular order of output
Treats all retrievals equally from the most to least relevant ones
Often requires examination of large output

Vector Model:

Advantages:

Returns ranked retrieval
Terms are weighted by importance
Partial matches

Disadvantages

Assumes terms are independent
Weighting is intuitive, but not very formal

My question for vector model is that it's really hard to make sense how 10% relevant to query conveys less meaningful information for resolving queried information problem than a 40% relevant one? Do they have clear difference in terms of fulfilling the query need?