< Link
Home->Link

Research
Publication
Tutorial
Code
Link
Personal


Help On Unix

Data Resources


  • Sam Roweis' Page (NYU, CS)
  • Andrew McCullum's Page (Umass, CS)
  • Ron Bekkerman's Page (Umass, CS)
  • Gregor Heinrich (Univ of Leipzig, Germany)
  • Stanford Microarray Database
  • Stanford Genomic Resources
  • University of Edinburgh-School of Informatics
  • 20 newsgroup dataset in a matlab file
  • Some of the popular text datasets in matlab format
  • UCI Machine Learning Repository
  • US National Public Health Datasets
  • IMS Stuttgart Computational Linguistics - Resources and Institutions
  • Umass Project: Labeled faces in the wild, a database of face photographs
  • Natural Language Toolkit Data
  • Public Domain Text by Norman Herr from California State University
  • Collection of Blog Data: Trec-BLOG
  • ADNI
  • Webpages, Blogs and Wikis


  • Jieping Ye (Arizona State, CS)
  • Chong Wang (Princeton, CS)
  • Xuerui Wang (Yahoo! Labs)
  • Shibamouli Lahiri (PSU, CS)
  • Jing Gao (UIUC, CS)
  • Hanna Wallach (Umass, CS)
  • NLP Blog by Hal Daume III
  • Information Retrieval Wikipage
  • Trec Tracks
  • Google PhD Fellowship
  • Softwares and Tools


  • Lingpipe-tool kit for processing text using computational linguistics
  • Guideline for installing R
  • Guideline for installing R, Winbugs and Openbugs
  • Tutorial for Openbugs
  • Hierarchical Bayes Compiler
  • FACTORIE
  • FACTORIE Guidelines
  • MALLET
  • CppBugs-C++ Library for MCMC Sampling
  • Tutorial for Scala
  • Apache Mahut
  • Apache Hadoop
  • Infer.NET Microsoft Research
  • Miscellaneous


  • If you are working on topic modeling do join topic modeling mailing list.
  • List of stop words in text cleaning
  • Ubuntu Guides
  • Latex Help
  • Unix Help
  • Scala Help
  • EE380L Course Website
  • Hyper Spectral Dataset


  • Botswana 9 Class Dataset (.zip)
  • Botswana Temporal Data (May to July) (.zip)

  •