Wednesday, December 11, 2013

Completing the Virtuous Cycle between Payment and Social Engagement in Freemium Social Communities

I was there in this talk from Dr. Ravi. The talk was about the different business model that are currently employed by some of the social communities, for example, last.fm, Spotify. He mainly talked about the Freemium model, where users are free to use the service from the website within a certain range. But using beyond these service would charge the user. His team has lead some experiments that would increase the freemium users to payment users.

Detecting Associations Between Genetic Variants and Output Traits Using Prior Biological Knowledge

In this talk, Dr. Seunghak Lee presented a novel method that uses prior biological knowledge to boost the statistical power of detecting genetic variants associated with traits. This work reminds me the research in courseagent, in which we are planning to use the prior knowledge on the course histories of past students' and output the courses that might be interesting to the current students with same path.

The Password That Never Was

Dec 9, 2013 11AM
I attended the talk from Dr. Ari Juels, in which he described the threatens to the most common defense of passwords, hashing. Although hashing is supposed to be very hard for attacks, there are password cracking tools that can easily defeat hashing. He introduced a new defense called honeywords, which are decoys designed to be indistinguishable from legitimate passwords, and a related idea, called honey encryption, which creates ciphertexts that decrypt under incorrect keys to seemingly valid messages.

Opening the Mind to Learning: Intellectual Humility Predicts Beneficial Approaches to Learning

I was there in a talk from Dr. Karina Schumann, Stanford University. She introduced a new terminology to me, Intellectual Humility. Intellectual Humility means being aware that you don't know everything, that you are probably wrong as often as you are right, and that others have as much opportunity to be right if they apply themselves as you do. It also means not fearing to be wrong, but instead viewing it as an opportunity to be right later. She presented several studies related to this topic. One of the studies found that intellectual humility was associated with greater motivation to learn and greater self-reported engagement in adaptive strategies for learning. In other study, intellectual humility was associated with greater openness to learning from others in a disagreement context. These are really interesting heads ups regarding designing the online social learning system, where high motivation of the learners are critical factors influencing their learning outcomes.

Wednesday, October 23, 2013

Big Learning Systems

I was in a talk from Tyson Condi, who is a very smart guy. He used to work at Microsoft and now moved to Academia. He is a faculty at UCLA. His talk is from the engeering side of Machine Learning. He introduces his work at Microsoft on the project YARN, which is a java library that split up the two major functionalities of the jobTracker, resource management and job scheduling/monitoring, into separate daemons.

Content-Based Cross-Domain Recommendations Using Segmented Models

I was in the talk from Sherry, my distinguished colleague, who just had an excellent internship at LinkedIn. In the talk, she introduces the work she was doing at LinkedIn on cross-domain recommendation. She and her collaborators at LinkedIn have experimented their logistic regression algorithm based on both online metrics, such as precison, recall and accuracy, and online metrics using A/B testing. Great Work! I hope I would have such a valuable intern experience in the next couple of years.

WithU - Collective Support System for People with Disease

After my surgery, I had a long time of depression thinking about dying, crying at midnight, complaining the unfairness, questioning the meaning of life. That was very hard for me cause very few friends know about my situation, and I did not tell any of my family. So I went through all alone. Fortunetely, I decided to carry on my life. When looking back this period of time. I was hoping that I could have some friends with similar situations so we could talk, support each other, since it looks like we then would have the same feelings getting through pains. I then made one goal in my bucket list to create a system for people with disease (PSD). I am so happy that this goal is coming closer each day as I have convinced a group of people working on it together. They are nice people with mercy to those people, and with creative thinkings. We have come up the proposal for it. I have attached the proposal for this project.
See the presentation at
http://prezi.com/yorpxqg601so/?utm_campaign=share&utm_medium=copy

Interesting project! Really hope it's being developed, being used and being helpful to PSD!

Introduction

General social network like facebook or twitter provide people a good way to communicate with each other. However, they don’t provide a strong community for people with serious disease (PSD), to share their similar experience and hence to solve the problems. We want to build a more purposeful social network for people who are experiencing health difficult problems. This kind of people could suffer a lot with both physical and mental problems. Sharing the experience could help them better go through the similar situation. Besides, people who have financial ability could be more easily encouraged to provide funding support to others under the similar health problems.

logo

Purpose

Our project goal is to design and implement a collective support web site for PSD. The website will allow them to guide, help and support each other emotionally and financially, such as encouraging others, and making donations to fund other members who can not afford paying, for example: procedures costs. And we provide some functions to help plan their limited time.

Targeted Audience

People who are experiencing serious health problems.

Functions:

As the purpose of this website is to provide emotional support for PSD, we would like to have particular functions to acknowledge that. Given people in most common scenarios also require financial support, we believe the web application will be of great help if it provides functions to enable fund raising. We also believe that functions aiding both processes should not be completely separated. It would be even better these functions interact or associate with each other and thus to make the users in the website more real and closer.

The web application will be a social website. Users of this website would get their own pages, like Facebook. However, instead of showing the timeline of each user like Facebook, we will show information more related to our particular users. We have thought about what people care about if they are with serious diseases. Before coming up with the solutions to this question, we need to understand their needs within information technology domain. Although this should be collected through surveys or interviews, our team has been brainstorming the ideas about what our users could have. To know better about users, though it is hard to simulate their feelings or actions, we wished to image as close as possible what they are thinking about and how they will deal with this suffering time.

1. Story Sharing

At beginning, it would be very hard to acknowledge that they are dying. PSDs will be always intertwined with questions like ‘Is this really going to happen to me’, ‘Why me?’. It’s very natural for people to complain and to question the meaning of life. PSD have the need to vent their emotion. They may either go to their families or friends to find comforts from them. But what if that’s not enough, what if friends or families are not around, or what if PSD think they are not supposed to know it. Maybe other PSDs have the struggling feelings in common, so they understand better and can get more comforts from each other. We would like to ask the users to share their stories, and receive comments of comforts from others. In this way, they will relieve from the despair and gain hope for life.

2. Future Letter

The next step is to say goodbye to all the beloved friends and families, although it’s always the hardest. PSDs are hesitated to tell them because they could not imagine how their beloved could bear the loss of them through the rest days. But they need it anyway. We are thinking about having a future letter function, so that PSDs can write letters in advance to their friends and families. Future letters could help PSDs write down the feelings when they have trouble telling their situations at that moment. Or it can record words that PSDs want to tell their beloved in the future. When their friends and families receive the letters in the future, words from past would be a comfort. Besides, after coming up this function, we think it would also be nice to write future letters to themselves, maybe as a reward after every deathly operation or treatment.

3. Bucket-List

PSD are dying, but they are still living and moving on with their limited time. Now it could be the best time for them to re-visit their goals in life and those got remained and persisted in the rest could be on the bucket-list. PSDs would take advantage of the bucket-list to write down their remaining wishes. The bucket-lists give PSDs hope and energy to live the best in the rest.

4. Joint Calendar

As mentioned above, each operation or treatment for PSDs could lead to death. Everyone fears it and feels helpless about it. There is nothing to do with it except one being strong. We understand that one could be strong for many reasons, for their families or their friends. One could also be strong by receiving the support from their online peers. We want to implement a joint calendar in the website, so the users could share their important dates on it. All users could see it and join to support. In this way, we could build up an optimistic and collective support system for PSD.

5. Fund Raising

Although money is not important at this point, some of PSDs do need money to pay for their medical bills. We would like to implement fund raising in the system, so every PSD in poverty gets to raise funding from the crowd to continue or conduct their treatment or surgery. The functions above are making this person more real and more favorable to the audience as PSD are building their own figures through their posts or interactions with others. To some point, we believe PSD would have more willingness to donate than others since they know better about the feelings and money are not crucial to them in some case.

To sum up, in order to provide support from crowd to PSD, we want to have five main functions, including story sharing, fund raising, future letter, bucket-list, and joint calendaring, ordered by the priority of implementation, given the limited time in the final project.

Standards

For the front-end part, we plan to use HTML and CSS. And we will use JavaScript to implement the functions.

1. HTML

HTML gives authors the methods to publish online documents with headings, text, tables, lists and photos, design forms for conducting transactions with remote services, for use in searching for information, making reservations and ordering products, retrieve online information via hypertext links or at the click of a button, and include video clips, sound clips, and other applications directly in their documents.

For our project, we will use HTML to design the structures of the social network pages, embed images and objects, create structured documents by denoting structural semantics for text and add scripts written in languages like JavaScript.

2. CSS

CSS can be used to describe the presentation of Web pages, including colors, layout, and fonts. It allows us to adapt the presentation to different types of devices, such as large screens, small screens, or printers. CSS is independent of HTML and can be used with any XML-based markup language.

CSS makes sites clear and stylish. We will use CSS to share style sheets across pages, and tailor pages to different environments, which reduce complexity and repetition in the structural content and make it easier to maintain our sites.

3. ECMAScript (JavaScript)

ECMAScript is the scripting language standardized by Ecma International in the ECMA-262 specification and ISO/IEC 16262. The language is widely used for client-side scripting on the web, in the form of several well-known implementations such as JavaScript.

With JavaScript, we will write functions that are embedded in or included from the HTML pages that interact with DOM of the pages. We will use JavaScript to implement the functions in many ways. By using JavaScript we can add animation of page elements, fading them in and out, resizing them and moving them, or append some interactive content to the pages.

Research Perspective from Google

October 2 @CMU

I was in the talk from Alfred Z. Spector, Vice President of Research & Special Initiatives, Google. He talks about some of their key research initiatives, projects in Natural Language, Vision, User Experience, and Systems. I am quite interested in their efforts in education domain, as that's my primary research interest. He introduced their crowd education platform CourseBuilder, which from my believe, is an excellent product, even better than Coursera, which is overwhelmingly popular recently. It does not have better UI or functions compared to Coursera, but I appreciate the idea that the best way of learning it is to teach it. So everyone is encouraged to create a whole course of their particular expertise, which sounds more interesting to me, as in Coursera, only distinguished lecturers with the support from world class university could develop a course. In interesting question is that which one will succeed in the end, CourseBuilder or Coursera, collective intelligence or individual expert?

Friday, March 29, 2013

Muddist Point for week 10

In class Dr. He talked about Web Transaction Logs Studies. For privacy and business secret issue, most of these logs are not public, even for research purpose. I am concerning the fact that in this case, industry is probably going to lead in the field, not only information retrieval, but could also be observed in other domains, such as social network analysis, since they possess huge amount of data to conduct research study. So my question is that, how academia researchers deal with this situation to make the breakthrough?

Reading Note for Week 11

The main idea of clustering is to divide a set to subsets according to shared properties. In the domain of Information Retrieval, we are trying to dividing documents into categories. Each document can be represented by a vector of different words in its collection. Documents with in a cluster are supposed to as similar as possible to each other. The key in this task is how to measure the similarity between document representations. Distance is primarily used as along with some algorithms, such as the most popular one- K-Means, Hierarchical algorithms, Spectral algorithms. The other topic discussed in the book is text classification. Different from clustering where it divides documents into subsets without knowing nothing about the categories, and machine has to decide how to divide and where to put the document, in text classification, categories are predefined and machine has to decide where to put the document after that. Naive beyesian algorithm, SVM are used in this task.

Friday, March 22, 2013

Reading notes for Week 10

Chapter 19 talks about the history of web from information retrieval perspective, provides some implications of web retrieval system on business. It also discussed about the technical issues of improving web retrieval systems, for instance, as the number of documents online is increasing exponentially, it's becoming very difficult for search engines to index them all.

Friday, March 8, 2013

Muddist point after week 8

Dr. He talked about some innovations of the search result visualization, which were pretty cool actually. But sadly most of them were eventually shut off for some reasons respectively. My question is that for now and maybe future, what are the biggest opportunities for innovation in the search interfaces or search result visualization?
Also, besides the unsuccessful examples of user interface innovation in IR, can Dr. He give us some examples of successful innovations other than what are well known, such as Google, Bing, or Yahoo.

Thanks!

Friday, February 22, 2013

Muddiest Point after week 6

1. Are there any existing system that can evaluate an Information Retrieval Model, for example, in the final project, where we are supposed to construct one model. How we can claim it is good and bad?
2. How to evaluate multilingual information retrieval model?

Relevance feedback

Relevance feedback is a information retrieval system feature that helps improve the search query by explicit supplies from the user (explicit feedback), or by observing the user system interaction behavior (implicit feedback), or by a method of automatic local analysis (blind feedback or pseudo feedback). The relevance feedback is very helpful at either adjusting the weights of original query terms or adding new terms that are more close to what user search for.

Friday, February 15, 2013

Muddiest Point after week 5

As in class, Dr. He talked about two language models: Unigram or higher-order models and Multinomial or Multiple-Bernoulli. He also talked about three models for ranking: Query-likelihood, Document-likelihood, and Divergence of query and document models. I am wondering what kind of factors do you need to take into considerations when determining the language models or the ranking models?

One criticism for Information Retrieval Evaluation

I think IR evaluation needs to come from the user side. It is generally difficult in terms of different users may perceive different understanding of the returning document set and thus have the different interpretation of which document is relevant. Since it's difficult to dismiss because of the subjectivieness associated with the task of deciding the relevance, it lacks a solid formal framework as a basic foundation. I guess that's why User-Oriented Measures is used under this consideration. I am wondering what the circumstances that we use different measures, like The Harmonic Mean, The E Measure?

Friday, February 1, 2013

Modeling

I listed the following comparisons as my note for classic modeling.

Boolean Model

Advantages:

Allows for logic
Provides all that has been matched

Disadvantages

Has no particular order of output
Treats all retrievals equally from the most to least relevant ones
Often requires examination of large output

Vector Model:

Advantages:

Returns ranked retrieval
Terms are weighted by importance
Partial matches

Disadvantages

Assumes terms are independent
Weighting is intuitive, but not very formal

My question for vector model is that it's really hard to make sense how 10% relevant to query conveys less meaningful information for resolving queried information problem than a 40% relevant one? Do they have clear difference in terms of fulfilling the query need?

Friday, January 25, 2013

Huffman Code on all small probability distribution

I am just curious on a question about Hoffman Code. Huffman code for a probability distribution p is the prefix code with the minimum weighted average coded word length ∑piℓi, where ℓi is the length of the ith codword from the book. It works great on symbols where they have different probability distribution, by which I mean there exists high diversity in it. what if there is a upper bound on the largest probability distribution, or say all the probabilities are small? Is Huffman Code still the best?

Friday, January 11, 2013

Personalization and Privacy in Information Retrieval

An information retrieval process is used to deal with the problem of information overload due to the complex and huge volume of documents (html page, sound track, video) available online. Since I have primarily been working on adaptive personalization system, I have never doubted the need of such a system to help user obtain their information wanted efficiently and effectively, however there are some questions always in my mind, as the same with dimensions of the IR success, which are the novelty and diversity of Information Retrieval. AIS (adaptive information system) aims to provide the user items that personalized to him by studying the user profile and recommending similar items user. This might lead to a problem that users are always getting same stream of documents if the system works perfectly with the scarification of diversity, or even novelty, because he can never have something 'new'. This has occurred to google search when Google implemented personalized search. One would be that when I type weather in the searchbox, the weather pittsburgh was automatically prompted out. Interesting, but might not be the case all the time.

Another problem with information retrieval is the Privacy issue. As I have also worked on the research of social networks and online communities, I found increasing number of people have started to take privacy issue into consideration when interacting with online activities. One contradiction from my perspective is obviously the example of Google another. For the same account holder, gmail, in this case, google is recording every search query you made and keeping them in their database. I do not worry too much about it because I am a less sensitive person with fewer things to hide around, But some one does, and we should do more about it.