MMIS 2009

The need for mining multiple information sources (MMIS) is almost ubiquitous and applications abound in all spheres, including in business domains, government/enterprise organizations, and many scientific disciplines. For example, biological and biomedical research studies microarray data of different species. Although genomes of different species are not identical, they are not entirely independent: even remote species exhibit correlations in their genomes. An interesting research problem that arises here is how to unify heterogeneous data sources from different species to facilitate data mining. As another example, in business intelligence, movie rental companies (such as Netflix) usually maintain and utilize at least two major datasets, namely the movie data and the customer data. This allows us to build models for different purposes, e.g, predicting in which types of movies customers are interested, or identifying high-attrition customers. A more interesting problem is how to leverage information from other sources, such as the actor data, for multi-task learning.

While most data mining algorithms are conceived for mining data from a single source, the need to develop general theories, frameworks, data structures, and heuristics, for mining multiple heterogeneous information sources that share dependency has become more and more crucial. Unleashing the full power of multiple information sources is, however, a very challenging task, considering that the schema of different data collections might be very different (data heterogeneity), the distributions and patterns underlying different data sources may undergo continuous changes (concept evolving), and mining tasks for each data source might also be different (mining diversity). Although several approaches for utilizing multiple information sources have been proposed, these methods are usually rather ad-hoc and do not address adequately some of the most fundamental research issues in this field: (1) Harnessing Complex Data Relationships: Multiple information sources represent a collection of highly correlated data, issues such as data integration, data integration, model integration, and model transferring across different domains, play fundamental roles in supporting KDD from multiple information sources; (2) Integrative and Cooperative Mining: For heterogeneous information sources with diverse mining tasks, our goal is to unify such data to generate enhanced global models, as well as help individual data collections to best achieve their respective mining goals; and (3) Differentiation and Correlation: Differentiate and coordinate the difference between data sources at the knowledge level is one crucial step for users to gain a high-level understanding of their data.

The aim of this workshop is to bring together data mining experts to advance research on pattern discovery from multiple information sources, and identify current needs for such purposes. Representative issues to be addressed include but are not limited to:

1. Machine Learning in Multi-source Environments

* Multi-view learning, multi-task learning, transfer learning * Ensemble learning and ensemble clustering

2. Harnessing Complex Data Relationship

* Database similarity assessment * Automatic schema mapping and relationship discovery * New mapping framework for multiple information sources * Data source classification and clustering * Data cleansing, data preparation, data/pattern selection, conflict and inconsistency resolution

3. Integrative and Cooperative Mining

* Model integration for heterogeneous information sources * Mode transferring across different data domains * Incremental and scalable data mining algorithms

4. Differentiation and Correlation

* Local pattern analysis and fusion * Global pattern synthesizing and assessment * Merging local rules for global pattern discovery * Pattern summarization from multiple datasets * Multi-dimensional pattern search and comparison * Pattern comparison across multiple data sources * Inter pattern discovery from complex data sources

5. Stream data mining algorithms

* Clustering and classification of data of changing distributions * Data stream processing, storage, and retrieval systems * Sensor networking

6. Interactive data mining systems

* Query languages for mining multiple information sources * Query optimization for distributed data mining * Distributed data mining operators in supporting interactive data mining queries

Paper Types

We solicit two types of papers: Research paper and Application paper (8 pages for all submissions inclusive of all references and figures).

Research papers should focus on new designs, algorithms, and solutions for mining multiple information sources, whereas Application papers may provide frameworks and systems related to real-world multi-source mining applications. Alternatively, the authors can submit a data track application paper (2 pages) which purely discusses real-world multi-source data and related research topics. A copy of multi-source data must be submitted (through email) for verification. If there is any copyright issue related to the submitted data, the authors should clearly mention this issue in the submission.

All papers should be submitted in IEEE proceedings format (two columns). Please follow the IEEE Computer Society Press Proceedings Author Guidelines at http://www.computer.org/portal/pages/cscps/cps/final/icdm06.xml

The workshop proceedings will be published by the IEEE Digital Library and distributed during the workshop

Important Dates

* July 17, 2009: Submission Due Date (Late submission can be directly sent to the workshop co-chairs) * Sept. 8, 2009: Author notification * Sept. 28, 2009: Submission of Camera-ready papers * Dec. 6, 2009: Workshop in Miami, FL	 This CfP was obtained from WikiCFP