Monthly Archives: January 2011

Some Useful NLP Datasets (to be completed)

Bibliometric and Survey Generation

Paraphrase Generation

Sentiment Analysis

Co-occurrence Graphs

Advertisements

Reproducibility and Scientific Findings

What I liked the most about ACL submissions this year was the opportunity to upload datasets and source codes. Although it seems far-fetched to me that reviewers would try to reproduce the results reported in each paper, it at least, to some extend, encourages  reposting transparent, reproducible results.

Thanks to Chris Brockett, who shared an interesting relevant article a few days ago:
“John P. A. Ioannidis. Why Most Published Research Findings Are False. PLoS Medicine”. (a summary of this paper is written by the mathematician John Allen Paulos: http://abcnews.go.com/print?id=12510202)

What I liked about this paper was the 6 corollaries on the validity of scientific findings:

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are tobe true.
Corollary 4: The greater the flexibility in designs, definitions,outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research fi ndings are to be true.
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.

Corollary 6, which is my favorite, implies that the competitive nature in research and the urge to publish have had negative impacts on the quality of published research.

Finally, I am excited about ACL’s action on requesting datasets, and think we will start to see stronger measures in (hopefully) near future.


%d bloggers like this: