COLLOQUIUM Department of Computer Science and Engineering University of South Carolina Topic Detection and Tracking Using Nonnegative Matrix and Tensor Factorizations Michael W. Berry Department of Electrical Engineering and Computer Science University of Tennessee Date: January 24, 2008 Time: 1400-1500 Place: 300 Main B103 Abstract Automated approaches for the identification and clustering of semantic features or topics are highly desired for text mining applications. Using a low rank non-negative matrix factorization (NNMF) algorithm to retain natural data non-negativity, we eliminate the need to use subtractive basis vector and encoding calculations present in techniques such as principal component analysis for semantic feature abstraction. Using non-negative tensor factorization (NNTF), temporal and semantic proximity can be exploited to enable tracking of focused discussions as well as latent (unknown) communication patterns. Demonstrations of NNMF and NNTF algorithms for topic (or discussion) detection and tracking using the Enron Email Collection and documents from the Airline Safety Reporting System (ASRS) are provided. Michael W. Berry holds the title of Full Professor and Coordinator for Computer Science in the newly formed Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville. He received the BS degree in Mathematics from the University of Georgia in 1981 and the MS degree in Applied Mathematics from North Carolina State University in 1983. He worked in the Communications Product Division of IBM in Raleigh, NC for about one year before accepting a research staff position in the Center for Supercomputing Research and Development at the University of Illinois at Urbana-Champaign. In 1990, he received the PhD degree in Computer Science from the University of Illinois at Urbana-Champaign. Prof. Berry is the co-author of Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods (SIAM, 1994) and Understanding Search Engines: Mathematical Modeling and Text Retrieval, Second Edition (SIAM, 2005) and editor of Computational Information Retrieval (SIAM, 2001), Survey of Text Mining: Clustering, Classification, and Retrieval (Springer, 2003, 2007), and Lecture Notes in Data Mining (World Scientific, 2006). He has published over 100 refereed journal and conference publications. His research interests include information retrieval, data and text mining, computational ecology, bioinformatics, and parallel computing. Prof. Berry is currently supported by grants and contracts from the National Science Foundation, National Institutes of Health, and the National Aeronautics and Space Administration..