banner image

Research Projects

1. SLIC (Semantically Linked Instructional Content)

The SLIC (Semantically Linked Instructional Content) project aims to assist students and scholars to efficiently browse and seek segments of interest in educational videos of lectures and talks. In particular, it focuses on lectures that use slides, where the content of the slides file gives valuable hints as to how to break the video into meaningful parts (segments), and how to enable students to access these segments. In this way, a student who is seeking a specific topic in the video of a lecture(s) can first find the relevant slide(s), and start watching the video only from the segment(s) where this slide was used. Using similar ideas, the system has the potential to improve significantly the understandability of the video, improve its quality, and increase the overall effectiveness of the learning process. Additionally, the system has promise for assisting students with disabilities and bilingual students to access the video.

synchronize We have developed a fully automatic and robust slide-to-video matching algorithm that can handle a variety of different videos captured by one or multiple PTZ cameras. Our approach applies SIFT keypoints to match frames to slides under constraint homography by using RANSAC. A multi-phase algorithm was proposed for gaining both high accuracy and efficiency. To further improve accuracy, we integrate the visual features with temporal information and camera cues into dynamic Hidden Markov models to find an optimal slide sequence for the frame sequence.

More details about the project and a demo. can be found on the project webpage.

2. Ambiguity Correspondence Reduction in Loosely Labeled Data

Obtaining labeled data in large quantities for model-learning purpose is tedious and time-consuming. In contrast, loosely labeled data where data items are associated with sets of plausible labels are becoming increasingly available today, for example Corel and Flicker. It is an interesting question whether we can push loosely labeled data into more tightly labeled data without help from any supervisory data. One clear advantage of doing this is that improved data with ambiguity correspondence reduced will allow us to directly apply up-to-date supervised learning algorithms for various classification/recognition tasks. .

In this project, we focus on the domain of image annotation and region labeling. we explore the idea of exclusion reasoning to turn the keywords of Corel images into region labels, i.e we associate a keyword to each region/segment in an image. label_image

3. Evaluation of Localized Semantics In Images

In this project, we create a new data set of 1014 images with manual segmentations and semantic labels for each segment, and present a methodology for using this kind of data for recognition evaluation. The evaluation methodology establishes protocols for mapping machine segmentation to human segmentation, scoring matches at different levels of specificity, and taking synonyms, sense ambiguity and multiple labels into accounted. Based on these protocols, we develop two evaluation approaches for measuring the range and the frequency of semantics that an algorithm can recognize correctly.


More details such as data and software can be found on the project webpage. label_ex1

3. Continuous Dynamic Time Warping

While widely used in the context of speech recognition, Dynamic Time Warping (DTW) is a ubquitous measure across different domain. For example, it can be applied directly to curve matching. One drawback of this measure is that it is defined between sequences of points rather than curves. In this work, we generalize the discrete DTW to continuous domains, and first propose efficient polynomial algorithms to compute continous DTW.

We reformulize the prolem which leads to an interesting connection with finding shortest paths in a combinatorial manifold constructed on the input chains. The special manifold also yield a faster approximate algorithm based on dynamic time warping. We demonstrate the quality of this measure in signature verification.

label_ex1 Top image is the matching using discrete DTW and the bottom one is from continous DTW.

4. Hardware-assisted Natural Neighbor Interpolation

Natural Neighbor Interpolation is (NNI) a popular interpolation method based on Voronoi tessellation. However it suffers from being computationally expensive. In this project, we explore the powerful ability of Graphic Processing Unit (GPU) to speed up NNI. Unlike traditional softwre-based approaches that process one query at a time, we develop a scheme that compute he entire scalar field induced by NNI, at which point a query is a trivial array lookup, and range queries over the field are easy to compute. We also present a simple scheme that requires no advanced graphics capabilities and can process NNI queries faster than existing software-based approaches. The precision loss caused by the bounded size of graphics frame buffers is also considered. steal

5. Grounded Emotion Modeling

This work involves modeling the interplay of emotional components during conversation. Initial work focuses on predicting emotional experience from facial behavior and physiological responses. Quantitative models for emotion processes are desirable for suggesting psychological theory, and for applications to HCI such as as on-line education systems and robots which need to interact sensibly with humans.

Human emotion is a complex multi-component system that includes emotional experience, behaviors, and physiological responses. Prior work has mainly focused on recognizing facial expressions posed by actors who may or may not experience any particular emotion while doing so. By contrast, we have an interesting psychology data set collected during an emotional conversation between two women who have just met. It includes facial/vocal behaviors captured with video cameras and physiological measurements as well as a self-reported quantitative measure of feelings that are spontaneously induced in conversation. In this study, we are developing models for statistically linking relevant facial behaviors, physiological responses, and emotional experience (within and between partners) in this noisy data set.

We use an Active-Appearance-Model based approach to track faces in large pose change and self occlusion. A simple but effective way is then applied to extract pose-invariant facial features. We then further link facial and physiological features to emotional experience by using a statistical model. Our preliminary results suggest that physiological responses are more reliable cues for predicting self reported positive/negative experience and that facial behaviors can differ from experience significantly if one attempts to suppress emotional expression (one of the manipulations in the experiment).