Deep Learning Based Sound Event Detection and Classification

Monday, March 29, 2021 - 01:00 pm
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Alireza Nasiri Advisor : Dr. Jianjun Hu Date : March 29, 2021 Time : 1:00 - 3:00 pm Place : Virtual Defense (link below) Zoom link: https://us02web.zoom.us/j/83125251774?pwd=NDEwK3M4b3NuT0djQ25BMlQ2cGtuZ… Abstract Hearing sense has an important role in our daily lives. During the recent years, there has been many studies to transfer this capability to the computers. In this dissertation, we design and implement deep learning based algorithms to improve the ability of the computers in recognizing the different sound events. In the first topic, we investigate sound event detection, which identifies the time boundaries of the sound events in addition to the type of the events. For sound event detection, we propose a new method, AudioMask, to benefit from the object-detection techniques in computer vision. In this method, we convert the question of identifying time boundaries for sound events, into the problem of identifying objects in images by treating the spectrograms of the sound as images. AudioMask first applies Mask R-CNN, an algorithm for detecting objects in images, to the log-scaled mel-spectrograms of the sound files. Then we use a frame-based sound event classifier trained independently from Mask R-CNN, to analyze each individual frame in the candidate segments. Our experiments show that, this approach has promising results and can successfully identify the exact time boundaries of the sound events. In the second topic, we present SoundCLR, a supervised contrastive learning based method for effective environmental sound classification with state-of-the-art performance, which works by learning representations that disentangle the samples of each class from those of other classes. We also exploit transfer learning and strong data augmentation to improve the results. Our extensive benchmark experiments show that our hybrid deep network models trained with combined contrastive and cross-entropy loss achieved the state-of-the-art performance on three benchmark datasets ESC-10, ESC-50, and US8K with validation accuracies of 99.75%, 93.4%, and 86.49% respectively. The ensemble version of our models also outperforms other top ensemble methods. Finally, we analyze the acoustic emissions that are generated during the degradation process of SiC composites. The aim here is to identify the state of the degradation in the material, by classifying its emitted acoustic signals. As our baseline, we use random forest method on expert-defined features. Also we propose a deep neural network of convolutional layers to identify the patterns in the raw sound signals. Our experiments show that both of our methods are reliably capable of identifying the degradation state of the composite, and in average, the convolutional model significantly outperforms the random forest technique.

From the Lab to Community: AI for Document Understanding and Public Health

Monday, March 29, 2021 - 11:00 am
Topic: Seminar: Muhammad Rahman Time: Mar 29, 2021 11:00 AM Eastern Time (US and Canada) Join Zoom Meeting https://zoom.us/j/97536411087?pwd=RExTdkVQcEg4OERFMUJhWm5rQThndz09 Title: Abstract: Artificial intelligence (AI) has made incredible scientific and technological contributions in many areas including business, healthcare and psychology. Due to the multidisciplinary nature and the ability to revolutionize, almost every field has started welcoming AI. The last decade is the witness of progresses of AI and machine learning, and their applications. In this talk, I will present my work that used AI and machine learning to solve interesting research challenges. The first part of my talk will describe an AI-powered framework that I have developed for large document understanding. The research contributed by modeling and extracting the logical and semantic structure of electronic documents using machine learning techniques. In the second part of my talk, I will present an ongoing work that uses computational technology to design a study for measuring COVID-19 effects on people with substance use disorders. I will conclude the talk by introducing few other AI-powered initiatives in mental health, substance use and addiction that I am currently working on. Bio: Dr. Muhammad Rahman is a Postdoctoral Researcher at National Institutes of Health (NIH). Before that, he was a Postdoctoral Fellow in the Center for Language and Speech Processing (CLSP) research lab at Johns Hopkins University. He obtained his Ph.D. in computer science from the University of Maryland, Baltimore County. His research is at the intersection of artificial intelligence (AI), machine learning, natural language processing, mental health, addiction and public health. Dr. Rahman’s current research mostly focuses on the real-world applications of advanced AI and machine learning techniques in addiction, mental health and behavioral psychology. As a part of NIH, he is working on designing and developing real-time digital intervention techniques to support substance use disorders and mental illness patients. During his Ph.D., Dr. Rahman worked on large document understanding that automatically identifies different sections of documents and understands their purpose within the document. He also had research internships at AT&T Labs and eBay Research where he worked on large scale industrial research projects. https://irp.drugabuse.gov/staff-members/muhammad-mahbubur-rahman-ph-d/

Sonar sensing algorithm inspired by the auditory system of big brown bats

Friday, March 26, 2021 - 11:00 am
Friday, March 26 at 11 am Zoom Meeting details: https://zoom.us/j/95651622905?pwd=M3IxbGY0WWpBUEJnRE5XRmhnRW91UT09 Echolocating animals rely on biosonar for navigation and foraging, operating on lower energy input but achieving higher accuracy compared with engineered sonar. My research focuses on understanding the mechanism of bat biosonar by simulating different acoustic scenes involving vegetation and defining the invariants in the foliage echoes that provide the tree type information for bats to use as landmarks. Additionally, I have developed Spectrogram Correlation and Transformation (SCAT) model that simulates the bat’s auditory system with a gammatone filterbank, a half-wave rectifier, and a low-pass filter. The SCAT model splits a signal into many parallel frequency channels and maps the acoustic “image” of the target by finding the crossings at each channel with the same threshold. It can estimate the range delay between a sound source and targets as well as fine delay within reflecting points in one target – signal delays as short as a few microseconds. Currently, I am expanding the SCAT model by including a convolutional neural network for binaural localization of small targets. Bio: Chen Ming, Ph.D., received a BS and an MS in Mechanical Engineering from Hunan University in China. She then moved to the US to study bioacoustics at Virginia Tech with a focus on foliage echoes in natural environments. After graduation, she joined the Neuroscience department at Brown University as a postdoc, where she has been working on the modeling of the auditory system of big brown bats and acoustic scene reconstruction as a part of a Multidisciplinary University Research Initiative (MURI) Program to inspire advanced Navy sonar designs. Her long-term research goal is to design sonar for small autonomous aerial vehicles and incorporate AI for precise sensing. Recently she has been selected as a speaker for the Neuroscience Institute’s Rising Star Postdoctoral Seminar Series at the University of Chicago with her research on bioacoustics. Link to my webpage: https://cming8.wixsite.com/mysite

Dynamic Learning and Control for Complex Population Systems and Networks

Wednesday, March 24, 2021 - 11:00 am
Systems commonly encountered in diverse scientific domains are complex, highly interconnected, and dynamic. These include the processes and mechanisms previously confined to biology, quantum science, social science, etc., which are increasingly studied and analyzed from a systems-theoretic viewpoint. The ability to decode the structural and dynamic information of such dynamical systems using observation (or measurement) data and the capability to precisely manipulate them are both essential steps toward enabling their safe and efficient deployment in critical applications. In this talk, I will present some of the emerging learning and control problems associated with dynamic population systems and networks, including data-integrated methods for control synthesis, perturbation-based inference of nonlinear dynamic networks, and moment-based methods for ensemble control. In particular, I will present the bottlenecks associated with these challenging yet critical problems, motivating the need for a synergy between systems and control theory and techniques from artificial intelligence to build novel mathematically grounded tools that enable systematic solutions to these complex problems. In this context, in the first part of my talk, I will present some of the recent developments in solving inference problems for decoding the dynamics and the connectivity structure of nonlinear dynamical networks. Then, I will present model-agnostic data-integrated methods for solving optimal control problems associated with complex dynamic population systems such as neural networks and robotic systems. Bio: Vignesh Narayanan (Member, IEEE) received the B.Tech. Electrical and Electronics Engineering degree from SASTRA University, Thanjavur, India, the M.Tech. degree with specialization in Control Systems from the National Institute of Technology Kurukshetra, Haryana, India, in 2012 and 2014, respectively, and the Ph.D. degree from the Missouri University of Science and Technology, Rolla, MO, USA, in 2017. He joined the Applied Mathematics Lab and Brain Dynamics and Control Research Group in the Dept. of Electrical and Systems Engineering at the Washington University in St. Louis, where he is currently working as a postdoctoral research associate. His current research interests include learning and adaptation in dynamic population systems, complex dynamic networks, reinforcement learning, and computational neuroscience. Wednesday, March 24 at 11 am https://zoom.us/j/95594086334?pwd=dlJvdmhhOENOZE9qY1dhM1g4SmVPUT09 Meeting ID: 955 9408 6334 Passcode: 1928

Towards Machine Learning-Driven Precision Mental Health

Monday, March 22, 2021 - 12:00 pm
Topic: Dr. Wei Wu's Seminar Time: Mar 22, 2021 12:00 PM Eastern Time (US and Canada) Dr. Wei Wu is a candidate for the AI-Neuroscience Faculty position. Abstract: Psychiatric disorders are major causes of the global burden of disease affecting more than 1 billion people globally. Current psychiatric diagnoses are defined based on constellations of symptoms. However, patients with identical diagnoses may in fact fall into biologically heterogeneous subgroups, each of which may require a different therapy. Yet to date, we still lack validated neurobiological biomarkers that can reliably dissect such heterogeneity and allow us to objectively diagnose and treat psychiatric disorders. In this talk, I will present our recent discoveries of EEG biomarkers for dissecting the biological heterogeneity of psychiatric disorders, enabled by tailed machine learning methods for decoding disease-relevant information from EEG. These biomarkers can also be leveraged to drive therapeutic development using brain stimulation tools. Our findings therefore lay a path towards machine-learning driven personalized treatment to psychiatric disorders and have the potential of being translated to the clinic as point-of-care biological tests. Short Bio: Wei Wu is the Co-Founder and Chief Technology Officer of Alto Neuroscience, Inc., Los Altos, CA. He is also an Instructor affiliated with the Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA. He received the Ph.D. degree in Biomedical Engineering from Tsinghua University, Beijing, China, in 2012. From 2012-2016, he was an Associate Professor with the School of Automation Science and Engineering, South China University of Technology, Guangzhou, China. His research interests include computational psychiatry, brain signal processing, neural engineering, and brain stimulation. Dr. Wu is an IEEE Senior Member, an Associate Editor of Neural Processing Letters and Frontiers in Computational Neuroscience, and served as an Associate Editor of Neurocomputing from 2013-2019. He is also a member of the IEEE Biomedical Signal Processing Technical Committee. Homepage: https://weiwuneuro.wixsite.com/home Zoom Meeting details: https://zoom.us/j/99931576592?pwd=dGZEQlJ1NzNjeWVXWXd2SDlvQ2ZLQT09

Deep Learning based Models for Classification from Natural Language Processing to Computer Vision

Friday, February 12, 2021 - 02:30 pm
Online

DISSERTATION DEFENSE

Department of Computer Science and Engineering University of South Carolina

Author : Xianshan Qu

Advisor : Dr. John Rose

Date : Feb 12, 2021

Time : 02:30 pm

Place : Virtual Defense (link below)

Please use the following link to participate my defense (scheduled for Feb, 12th Friday 2:30pm-4:30pm EST): https://zoom.us/j/98430235673?pwd=SlNLSU9TOVJ4c29nZHU2cytkTEZHQT09

Abstract With the availability of large scale data sets, researchers in many different areas such as natural language processing, computer vision, recommender systems have started making use of deep learning models and have achieved great progress in recent years. In this dissertation, we study three important classification problems based on deep learning models. First, with the fast growth of e-commerce, more people choose to purchase products online and browse reviews before making decisions. It is essential to build a model to identify helpful reviews automatically. Our work is inspired by the observation that a customer's expectation of a review can be greatly affected by review sentiment and the degree to which the customer is aware of pertinent product information. To model such customer expectation and capture important information from a review text, we propose a novel neural network which encodes the sentiment of a review through an attention module, and introduces a product attention layer that fuses information from both the target product and related products. The results demonstrate that both attention layers contribute to the model performance, and the combination of them has a synergistic effect. We also evaluate our model performance as a recommender system using three commonly used metrics: NDCG@10, Precision@10 and Recall@10. Our model outperforms PRH-Net, a state-of-the-art model, on all three of these metrics. Second, real-time bidding (RTB) that features per-impression-level real-time ad auctions has become a popular practice in today's digital advertising industry. In RTB, click-through rate (CTR) prediction is a fundamental problem to ensure the success of an ad campaign and boost revenue. We present a dynamic CTR prediction model designed for the Samsung demand-side platform (DSP). We identify two key technical challenges that have not been fully addressed by the existing solutions: the dynamic nature of RTB and user information scarcity. To address both challenges, we develop a model that effectively captures the dynamic evolutions of both users and ads and integrates auxiliary data sources to better model users' preferences. We evaluate our model using a large amount of data collected from the Samsung advertising platform and compare our method against several state-of-the-art methods that are likely suitable for real-world deployment. The evaluation results demonstrate the effectiveness of our method and the potential for production. Third, for Highway Performance Monitoring System (HPMS) purposes, the South Carolina Department of Transportation (SCDOT) must provide to the Federal Highway Administration (FHA) a classification of vehicles. However, due to limited lighting conditions at nighttime, classifying vehicles at nighttime is quite challenging. To solve this problem, we designed three CNN models to operate on thermal images. These three models have different architectures. Of these, model 2 achieves the best performance. Based on model 2, to avoid overfitting and improve the performance further, we propose two training-test methods based on data augmentation technique. The experimental results demonstrate that the second training-test method improves the performance of model 2 further with regard to both accuracy and f1-score.

Learning How to Search: Generating Effective Test Cases Through Adaptive Fitness Function Selection

Monday, December 21, 2020 - 10:00 am
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Hussien Almulla Advisor : Dr. Gregory Gay Date : Dec 21, 2020 Time : 10:00 am Place : Virtual Defense Abstract Search-based test generation is guided by feedback from one or more fitness functions— scoring functions that judge solution optimality. Choosing informative fitness functions is crucial to meeting the goals of a tester. Unfortunately, many goals—such as forcing the class-under-test to throw exceptions, increasing test suite diversity, and attaining Strong Mutation Coverage—do not have effective fitness function formulations. We propose that meeting such goals requires treating fitness function identification as a secondary optimization step. An adaptive algorithm that can vary the selection of fitness functions could adjust its selection throughout the generation process to maximize goal attainment, based on the current population of test suites. To test this hypothesis, we have implemented two reinforcement learning algorithms in the EvoSuite framework, and used these algorithms to dynamically set the fitness functions used during generation for the three goals identified above. We have evaluated our framework, EvoSuiteFIT, on a set of real Java faults. EvoSuiteFIT techniques attain significant improvements for two of the three goals, and show small improvements on the third when the number of generations of evolution is fixed. For all goals, EvoSuiteFIT detects faults missed by the other techniques. The ability to adjust fitness functions allows EvoSuiteFIT to make strategic choices that efficiently produce more effective test suites, and examining its choices offers insight into how to attain our testing goals. We find that AFFS is a powerful technique to apply when an effective fitness function does not exist for a testing goal.

Internet of Acoustic Things (IoAT): Challenges, Opportunities, and Threats

Friday, November 20, 2020 - 02:20 pm
We have another exciting talk this week as part of the CSCE 791, and I invite you to join the talks. Please encourage your students to join as well via the link below: https://us.bbcollab.com/guest/e51c76fda53042b790ea62ff2d7b2895 Friday, 11/20/2020, from 2:20 pm to 3:10 pm EDT, we have a talk from Nirupam Roy, Assistant Professor, UMD College Park. Abstract: The recent proliferation of acoustic devices, ranging from voice assistants to wearable health monitors, is leading to a sensing ecosystem around us -- referred to as the Internet of Acoustic Things or IoAT. My research focuses on developing hardware-software building blocks that enable new capabilities for this emerging future. In this talk, I will sample some of my projects. For instance, (1) I will demonstrate carefully designed sounds that are completely inaudible to humans but recordable by all microphones. (2) I will discuss our work with physical vibrations from mobile devices, and how they conduct through finger bones to enable new modalities of short range, human-centric communication. (3) Finally, I will draw attention to various acoustic leakages and threats that arrive with sensor-rich environments. I will conclude this talk with a glimpse of my ongoing and future projects targeting a stronger convergence of sensing, computing, and communications in tomorrow’s IoT, cyber-physical systems, and healthcare technologies. Bio: Nirupam Roy is an Assistant Professor in Computer Science at the University of Maryland, College Park (UMD). He received his Ph.D. from the University of Illinois, Urbana-Champaign (UIUC) in 2018. His research interests are in wireless networking, mobile computing, and embedded systems with applications to IoT, cyber-physical-systems, and security. His recent projects include low-power sensing techniques to enable self-defense in robots and drones. His doctoral thesis was selected for the 2019 CSL Ph.D. thesis award at UIUC. Nirupam is the recipient of the Valkenburg graduate research award, the Lalit Bahl fellowship, and the outstanding thesis awards from both his Bachelor's and Master's institutes. His research received the MobiSys best paper award and was selected for the ACM SIGMOBILE research highlights. Many of his research projects have been featured in news media such as the MIT Technology Review, The Telegraph, and The Huffington Post. https://www.cs.umd.edu/~nirupam/ I hope to see you all on Friday at Blackboard Collaborate.

Wireless and Mobile Sensing problems in IoT: Sports, Drones, and Material Sensing

Friday, November 13, 2020 - 02:20 pm
We have another exciting talk this week as part of the CSCE 791, and I invite you to join the talks. Please encourage your students to join as well via the link below: https://us.bbcollab.com/guest/e51c76fda53042b790ea62ff2d7b2895 Friday, 11/13/2020, from 2:20 pm to 3:10 pm EDT, we have a talk from Mahanth Gowda, Assistant Professor, Pennsylvania State University. Title: Wireless and Mobile Sensing problems in IoT: Sports, Drones, and Material Sensing Abstract: Motion tracking and RF sensing is a broad area with classical problems that dates back many decades. While significant advances have come from the areas of robotics, control systems, and signal processing, the emergence of mobile and IoT devices is ushering a new age of embedded, human-centric applications. Fitbit is a simple example that has rapidly mobilized proactive healthcare; medical rehabilitation centers are utilizing wearable devices towards injury diagnosis and prediction. In this talk, I will discuss a variety of (new and old) IoT applications that present unique challenges at the intersection of mobility, multi-modal sensing, and indirect inference. For instance, I will discuss how inertial sensors embedded in balls, racquets, and shoes can be harnessed to deliver real-time sports analytics on your phone. In a separate application, I will show how GPS signals can be utilized to track the 3D orientation of an aggressively flying drone, ultimately delivering the much needed reliability against crashes. Finally, I will discuss sensing liquid materials by passing WiFi-like signals through containers holding liquids. In general, I hope to show that information fusion across wireless signals, sensors, and physical models can together deliver motion-related insights, useful to a range of applications in IoT, healthcare, and cyber physical systems. Bio: Mahanth Gowda is an Assistant Professor in Computer Science and Engineering at Penn State. His research interests include wireless networking, mobile sensing, and wearable computing, with applications to IoT, cyber physical systems, and human gesture recognition. He has published across diverse research forums, including NSDI, MobiCom, WWW, Infocom, Hotnets, ASPLOS, etc. http://www.cse.psu.edu/~mkg31/ I hope to see you all on Friday at Blackboard Collaborate.

Infusing External Knowledge into Natural Language Inference Models

Monday, November 9, 2020 - 04:00 pm
Date and time: Monday, Nov 9, 2020; 4:00-5:00 pm Abstract: Natural Language Inference is a fundamental task in natural language processing particularly due to its ability to evaluate the reasoning ability of models. Most approaches for solving the problem use only the textual content present in training data. However, use of knowledge graphs for Natural Language Inference is not well explored. In this presentation, I will detail two novel approaches that harness ConceptNet as a knowledge base for Natural Language Inference. The framework we use for both these approaches include selecting relevant information from the knowledge graph and encoding them to augment the text based models. The first approach selects concepts mentioned in the text and shows how information from knowledge graph embeddings of these concepts can augment the text based embeddings. However, it ignores the primary issue of noise from knowledge graphs while selecting relevant. The second approach builds upon the first by alleviating noise and using graph convolutional networks for not only considering the concepts mentioned in text but also their neighborhood structure that can be utilized. Overall we show that knowledge graphs can augment the existing text based NLI models with being robust in comparing to text-based models only. Bio: Pavan Kapanipathi is a Research Staff Member in the AI-foundations reasoning group at IBM Research. He is broadly intereseted in Knowledge Graphs, Semantic Web, Reasoning and Natural Language Processing. He graduated with a PhD from Wright State University in 2016. Pavan Kapanipathi has had a winning entry in the open track for the Triplification Challenge at I-Semantics and a best paper award at MRQA workshop at ACL, 2018. He has served as a Program Committee member of prominent AI, NLP, and Web conferences. Blackboard link: https://us.bbcollab.com/guest/4bae3374fe194ee0a0fd2ef232d48aec