Friday, March 29, 2013

Muddist Point for week 10

In class Dr. He talked about Web Transaction Logs Studies. For privacy and business secret issue, most of these logs are not public, even for research purpose. I am concerning the fact that in this case, industry is probably going to lead in the field, not only information retrieval, but could also be observed in other domains, such as social network analysis, since they possess huge amount of data to conduct research study. So my question is that, how academia researchers deal with this situation to make the breakthrough?

Reading Note for Week 11

The main idea of clustering is to divide a set to subsets according to shared properties. In the domain of Information Retrieval, we are trying to dividing documents into categories. Each document can be represented by a vector of different words in its collection. Documents with in a cluster are supposed to as similar as possible to each other. The key in this task is how to measure the similarity between document representations. Distance is primarily used as along with some algorithms, such as the most popular one- K-Means, Hierarchical algorithms, Spectral algorithms. The other topic discussed in the book is text classification. Different from clustering where it divides documents into subsets without knowing nothing about the categories, and machine has to decide how to divide and where to put the document, in text classification, categories are predefined and machine has to decide where to put the document after that. Naive beyesian algorithm, SVM are used in this task.

Friday, March 22, 2013

Reading notes for Week 10

Chapter 19 talks about the history of web from information retrieval perspective, provides some implications of web retrieval system on business. It also discussed about the technical issues of improving web retrieval systems, for instance, as the number of documents online is increasing exponentially, it's becoming very difficult for search engines to index them all. 

Friday, March 8, 2013

Muddist point after week 8

Dr. He talked about some innovations of the search result visualization, which were pretty cool actually. But sadly most of them were eventually shut off for some reasons respectively. My question is that for now and maybe future, what are the biggest opportunities for innovation in the search interfaces or search result visualization?
Also, besides the unsuccessful examples of user interface innovation in IR, can Dr. He give us some examples of successful innovations other than what are well known, such as Google, Bing, or Yahoo.

Thanks!