TMW 2008

History of the Workshop

This is the sixth in the series of Text Mining workshops held in conjunction with SDM. Previous ones have taken place in 2001, 2002, 2003, 2006, and 2007, and Last year in Minneapolis, 31 authors representing industry, academia and national research laboratories from 8 different countries submitted a total of 12 papers. After careful review, 7 papers and were selected for publication and presentation. In addition, NASA Ames sponsored a text mining competition based on anomaly detection using documents from the Airline Safety Reporting System (ASRS). Photos shown above and below are from the 2007 SDM Text Mining Workshop.

General Topics

The proliferation of digital computing devices and their use in communication has resulted in an increased demand for systems and algorithms capable of mining textual data. Thus, the development of techniques for mining unstructured, semi-structured, and fully structured textual data has become quite important in both academia and industry. As a result, this Workshop will survey the emerging field of Text Mining - the application of techniques of machine learning in conjunction with natural language processing, information extraction and algebraic/mathematical approaches to computational information retrieval. Many issues are being addressed in this field ranging from the development of new learning approaches to the parallelization of existing algorithms. The goal of this workshop is to provide a venue for researchers to share initial approaches and preliminary results of recent research in Text Mining. Through the careful selection and review of submitted workshop papers, we hope to provide a suitable selection of topics that will both generate interest and provide insight into the state of the field of Text Mining.

Special Topics - Text Mining with the Enron Data Set and VAST 2007 Contest Data

Because of the continued interest generated from the availability of the Enron data set of 1.3 million email messages (see Enron Email Dataset) and its versatility in terms of potential research topics (link analysis, pattern matching), researchers are encouraged to submit papers to this workshop. In addition, the dataset of news stories and blog entries used in the IEEE Symposium on Visual Analytics Science and Technology (VAST) 2007 Contest is an interesting corpus for research in topic detection/tracking, role playing, and scenario analysis (see VAST 2007 Contest for more details on this dataset). Researchers whose work is more focused on social networking models of the Enron and VAST-2007 datasets should contact the organizers of the SDM Link Analysis (SLA) Workshop. With the authors' permission, a paper may be re-assigned to the SLA workshop (especially if the Program Committee makes the recommendation based on the content of the paper).

Other Specific Topics of Interest Include:

Algorithms and Models

* Bayesian Models * Concept Decomposition * Orthogonal Decomposition * Probabilistic Models * Vector Space Models * Latent Semantic Indexing * Graph-based Models * Text Streaming Models

Applications

* Clustering * Factor Analysis * Visualization Techniques * Metadata Generation * Information Extraction * Text Classification * Text Purification * Text Segmentation * Text Summarization * Query Structures * Trend Detection * Distributed Storage and Retrieval This CfP was obtained from WikiCFP