Wednesday, October 23, 2013

Research Perspective from Google

October 2 @CMU

I was in the talk from Alfred Z. Spector, Vice President of Research & Special Initiatives, Google. He talks about some of their key research initiatives, projects in Natural Language, Vision, User Experience, and Systems. I am quite interested in their efforts in education domain, as that's my primary research interest. He introduced their crowd education platform CourseBuilder, which from my believe, is an excellent product, even better than Coursera, which is overwhelmingly popular recently. It does not have better UI or functions compared to Coursera, but I appreciate the idea that the best way of learning it is to teach it. So everyone is encouraged to create a whole course of their particular expertise, which sounds more interesting to me, as in Coursera, only distinguished lecturers with the support from world class university could develop a course. In interesting question is that which one will succeed in the end, CourseBuilder or Coursera, collective intelligence or individual expert?

Friday, March 29, 2013

Muddist Point for week 10

In class Dr. He talked about Web Transaction Logs Studies. For privacy and business secret issue, most of these logs are not public, even for research purpose. I am concerning the fact that in this case, industry is probably going to lead in the field, not only information retrieval, but could also be observed in other domains, such as social network analysis, since they possess huge amount of data to conduct research study. So my question is that, how academia researchers deal with this situation to make the breakthrough?

Reading Note for Week 11

The main idea of clustering is to divide a set to subsets according to shared properties. In the domain of Information Retrieval, we are trying to dividing documents into categories. Each document can be represented by a vector of different words in its collection. Documents with in a cluster are supposed to as similar as possible to each other. The key in this task is how to measure the similarity between document representations. Distance is primarily used as along with some algorithms, such as the most popular one- K-Means, Hierarchical algorithms, Spectral algorithms. The other topic discussed in the book is text classification. Different from clustering where it divides documents into subsets without knowing nothing about the categories, and machine has to decide how to divide and where to put the document, in text classification, categories are predefined and machine has to decide where to put the document after that. Naive beyesian algorithm, SVM are used in this task.

Friday, March 22, 2013

Reading notes for Week 10

Chapter 19 talks about the history of web from information retrieval perspective, provides some implications of web retrieval system on business. It also discussed about the technical issues of improving web retrieval systems, for instance, as the number of documents online is increasing exponentially, it's becoming very difficult for search engines to index them all. 

Friday, March 8, 2013

Muddist point after week 8

Dr. He talked about some innovations of the search result visualization, which were pretty cool actually. But sadly most of them were eventually shut off for some reasons respectively. My question is that for now and maybe future, what are the biggest opportunities for innovation in the search interfaces or search result visualization?
Also, besides the unsuccessful examples of user interface innovation in IR, can Dr. He give us some examples of successful innovations other than what are well known, such as Google, Bing, or Yahoo.

Thanks!

Friday, February 22, 2013

Muddiest Point after week 6

1. Are there any existing system that can evaluate an Information Retrieval Model, for example, in the final project, where we are supposed to construct one model. How we can claim it is good and bad?
2. How to evaluate multilingual information retrieval model?

Relevance feedback

Relevance feedback is a information retrieval system feature that helps improve the search query by explicit supplies from the user (explicit feedback), or by observing the user system interaction behavior (implicit feedback), or by a method of automatic local analysis (blind feedback or pseudo feedback). The relevance feedback is very helpful at either adjusting the weights of original query terms or adding new terms that are more close to what user search for.