Blogroll
Archives
- December 2011 (1)
- November 2011 (1)
- January 2011 (2)
- December 2010 (2)
My research ideas have recently been mainly about how we can characterize and exploit collective discourse.
So what is collective discourse?
With the growth of Web 2.0, millions of individuals involve in collective discourse. They participate in online discussions, share their opinions, and generate content about the same artifacts, objects, and news events in Web portals like amazon.com, epinions.com, imdb.com and so forth. This massive amount of text is mainly written on the Web by non-expert individuals with different perspectives, and yet exhibits accurate knowledge as a whole.
In social media, collective discourse is often a collective reaction to an event. A collective reaction to a well-defined subject emerges in response to an event (a movie release, a breaking story, a newly published paper) in the form of independent writings (movie reviews, news headlines, citation sentences) by many individuals.
A common characteristic of collective discourse, just like many other collective behaviors, is the diversity among individuals engaging in it. This diversity is emerges in form of diverse perspectives that different people have about the discussed topic.
The diversity of perspectives in non-expert contributions in collective discourse can be exploited to discover various aspects about a subject that are otherwise hard to unveil.
(Read More … )
What I liked the most about ACL submissions this year was the opportunity to upload datasets and source codes. Although it seems far-fetched to me that reviewers would try to reproduce the results reported in each paper, it at least, to some extend, encourages reposting transparent, reproducible results.
Thanks to Chris Brockett, who shared an interesting relevant article a few days ago:
“John P. A. Ioannidis. Why Most Published Research Findings Are False. PLoS Medicine”. (a summary of this paper is written by the mathematician John Allen Paulos: http://abcnews.go.com/print?id=12510202)
What I liked about this paper was the 6 corollaries on the validity of scientific findings:
Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.Corollary 6, which is my favorite, implies that the competitive nature in research and the urge to publish have had negative impacts on the quality of published research.
Finally, I am excited about ACL’s action on requesting datasets, and think we will start to see stronger measures in (hopefully) near future.
It sounds natural that the focus of a research community changes over time. This focus, which presumably shows people’s main interest, is easy to observe by reading the papers that appear within a community every year.
Well, ACL is no exception, with Statistical Machine Translation being considered popular these days. But was that always the case? To answer this question, I collected all the papers that appeared in ACL in each year, and found which article these papers have cited the most. This would, to some degree, show what the ACL community have written about in each year.
Interestingly, the top cited topics are Dialogue and Discourse till mid 90s, but the focus has shifted to Parsing, and then Machine Translation in the past decade.
1979: A Goal Oriented Model Of Human Dialogue 1980: Ungrammaticality And Extra-Grammaticality In Natural Language Understanding Systems 1981: A Snapshot Of KDS A Knowledge Delivery System 1982: Linguistic Analysis Of Natural Language Communication With Computers 1983: A Practical Comparison Of Parsing Strategies 1984: Relaxation Techniques For Parsing Grammatically Ill-Formed Input In Natural Language Understanding Systems 1985: Parsing As Deduction 1986: Providing A Unified Account Of Definite Noun Phrases In Discourse 1987: Categorical Unification Grammars 1988: Attention, Intentions, And The Structure Of Discourse 1989: Attention, Intentions, And The Structure Of Discourse 1990: A Logical Semantics For Feature Structures 1991: Attention, Intentions, And The Structure Of Discourse 1992: Attention, Intentions, And The Structure Of Discourse 1993: Attention, Intentions, And The Structure Of Discourse 1994: Attention, Intentions, And The Structure Of Discourse 1995: Attention, Intentions, And The Structure Of Discourse 1996: A Stochastic Parts Program And Noun Phrase Parser For Unrestricted Text 1997: The Mathematics Of Statistical Machine Translation: Parameter Estimation 1998: Building A Large Annotated Corpus Of English: The Penn Treebank 1999: Building A Large Annotated Corpus Of English: The Penn Treebank 2000: Building A Large Annotated Corpus Of English: The Penn Treebank 2001: The Mathematics Of Statistical Machine Translation: Parameter Estimation 2002: A Maximum-Entropy-Inspired Parser 2003: The Mathematics Of Statistical Machine Translation: Parameter Estimation 2004: The Mathematics Of Statistical Machine Translation: Parameter Estimation 2005: A Maximum-Entropy-Inspired Parser 2006: The Mathematics Of Statistical Machine Translation: Parameter Estimation 2007: Minimum Error Rate Training In Statistical Machine Translation 2008: Minimum Error Rate Training In Statistical Machine Translation 2009: Minimum Error Rate Training In Statistical Machine Translation 2010: Moses: Open Source Toolkit for Statistical Machine Translation (Raw data from http://clair.si.umich.edu/clair/anthology/)
I knew that lexical choice results in the diversity of ways that people talk about the same thing in one language.
However, it was interesting to find out how different cultures use the same idiom, but in different wordings. Being familiar with a few of these cultures, I can tell where for instance, blind in Azeri, camel in Arabic, and pot in English, Turkish, and Persian come from.
The complete list with translations at
http://en.wikipedia.org/wiki/Pot_calling_the_kettle_black