T-1: High-resolution Depth Map Generation for Free-viewpoint 3DTV Services Speaker: Yo-Sung Ho (Gwangju Institute of Science and Technology) Venue: Room 308
T-2: Audio Signal Feature Extraction: Methods and Applications Speaker: Sri Krishnan (Ryerson University) Venue: Room 309
T-3: Distributed Video Coding: Theory, Practice, and New Promises Speaker: Wenjun Zeng (University of Missouri) Venue: Room 310
T-4: Multimedia Signal Processing on Platforms with Multiple Cores Speaker: Yen-Kuang Chen (Intel Corporation) Venue: Room 308
T-5: Video Quality Assessment of 2D and 3D Video Speaker: Anil Fernando (University of Surrey) Venue: Room 309
T-6: Sensor based Human Activity Recognition Speakers: Narayanan C. Krishnan and Sethuraman Panchanathan (Arizona State University) Venue: Room 310
In recent years, various multimedia services have become available and the demand for three-dimensional television (3DTV) is growing rapidly. Since 3DTV is considered as the next generation broadcasting service that can deliver realistic and immersive experiences by supporting user-friendly interactions, a number of advanced three-dimensional video technologies have been studied. Among them, multi-view video coding (MVC) is the key technology for various applications including free-viewpoint video (FVV), free-viewpoint television (FVT), 3DTV, immersive teleconference, and surveillance systems. In order to support free-viewpoint video services, we need to obtain accurate depth information of the 3D scene.
In this tutorial lecture, after reviewing the current status of 3DTV research activities, we are going to explain several approaches for obtaining depth information of the 3D scene. Generally speaking, they can be classified into two major categories: passive and active methods. Although the stereo matching algorithm is widely employed among passive methods, there are still some problems remaining to be solved. The most popular active method is using the time-of-flight active sensor to obtain the depth information of the scene in real-time. However, TOF depth cameras also have some problems to overcome. In this tutorial lecture, we are going to discuss various problems of the conventional methods and introduce a new method to generate high-resolution depth information of the 3D scene for free-viewpoint 3DTV services by combining both the active and passive approaches.
Yo-Sung Ho received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1981 and 1983, respectively, and the Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara, in 1990. He joined ETRI (Electronics and Telecommunications Research Institute), Daejon, Korea, in 1983. From 1990 to 1993, he was with Philips Laboratories, Briarcliff Manor, New York, where he was involved in development of the Advanced Digital High-Definition Television (AD-HDTV) system. In 1993, he rejoined the technical staff of ETRI and was involved in development of the Korean DBS Digital Television and High-Definition Television systems. Since September 1995, he has been with Gwangju Institute of Science and Technology (GIST), where he is currently Professor of Information and Communications Department. Since August 2003, he has been Director of Realistic Broadcasting Research Center (RBRC) at GIST in Korea. He gave several tutorial lectures at various international conferences, including the IEEE Region Ten Conference (TenCon) in 1999 and 2000, the Pacific-Rim Conference on Multimedia (PCM) in 2006, 2007 and 2008, the IEEE Pacific-Rim Symposium on Image and Video Technology (PSIVT) in 2006 and 2007, the 3DTV Conference in 2008, and the IEEE International Conference on Image Processing (ICIP) in 2009. He is presently serving as an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT). His research interests include Digital Image and Video Coding, Image Analysis and Image Restoration, Three-dimensional Image Modeling and Representation, Advanced Source Coding Techniques, Three-dimensional Television (3DTV) and Realistic Broadcasting Technologies.
Audio signals are important sources of information in understanding the content of multimedia, and audio signal features play an essential role in many key applications such as content-based retrieval, audio fingerprinting, auditory scene analysis, audio-based biometrics and auditory display. The first step in all the audio signal analysis methods is to extract low-level representative and discriminative features from an audio clip. Over the last several years, several feature extraction techniques have been introduced. In general, all the feature extraction methods utilize one of the following three signal representation domains: temporal domain, spectral or joint time-frequency (TF) domain. The tutorial will cover the main techniques that are associated with each of the three domains.
Temporal domain feature extraction approaches such as, signal energy, pitch, zero-crossing rate and Entropy modulation will be discussed first. Spectral feature analysis methods such as spectral roll-off point, spectral centroid, mean frequency, cepstral coefficients, high and low frequency slopes will be covered in some detail. Advantages and challenges associated with audio feature extraction from temporal and spectral domains will be discussed. Spectral features generally assume the stationarity of the signal in the analysis frame, and do not provide any information on the temporal evolution or localization of the extracted features. As a result, only spectral features are not enough for audio analysis. Because of the non-stationary nature of audio signals, specially the artificially created sounds such as music, and the shortcomings of the temporal and spectral features, there have been some recent attempts to derive joint TF features. In contrast to the two previous methods, TF features are effective for revealing non-stationary aspects of signals such as trends, discontinuities, repeated patterns and long-term feature representation where other signal processing approaches fail or are not as effective. The tutorial will cover the recent advancements in feature extraction using TF-methods based on adaptive signal representations such as pursuits-based and empirical mode decompositions. The applications of these methodologies to areas such as content-based retrieval, audio fingerprinting/watermarking, auditory scene analysis, audio-based biometrics and auditory display will be discussed with some new results.
Sridhar (Sri) Krishnan received the B.E. degree in Electronics and Communication Engineering from Anna University, Madras, India, in 1993, and the M.S. and Ph.D. degrees in Electrical and Computer Engineering from the University of Calgary, Calgary, Alberta, Canada, in 1996 and 1999 respectively. He joined Ryerson University, Toronto, Canada in 1999 and since October 2007 he has been appointed as a Canada Research Chair. Sri Krishnan is a recipient of many national and provincial awards including the 2007 Young Engineer Achievement Award from Engineers Canada; 2006 South Asian Community Achiever Award; 2006 New Pioneers Award in Science and Technology; 2006 Best IEEE Chapter Chair Award (Toronto Section); and 2005 Research Excellence Award from the Faculty of Engineering, Ryerson University.
Distributed video coding (DVC) is a new coding paradigm that shifts the burden of exploiting the video source redundancy to the decoder and provides better error resilience. It is suitable for many emerging applications where constraints on battery power, processing capability, and/or bandwidth for moving data around (for centralized coding) are significant concerns. Example scenarios include wireless sensor networking, "uplink" wireless video communication, extremely high rate and high dynamic range imaging and compression, multi-view 3D video acquisition and compression, etc..
Despite the promising information theoretic results of distributed source coding (DSC), the performance of practical DVC systems has been shown to exhibit a significant gap from that of conventional video coding system such as H.264. As a result, it seems the excitement and enthusiasm about the great promise of DSC/DVC have started to fade and abundant confusion surfaces in recent years. What is the fate of DVC?
The primary purpose of this tutorial is to provide the participants with a comprehensive and balanced coverage of the theoretical foundation and architectures of DVC, discuss the significant challenges that have prevented most existing DVC schemes from performing anywhere comparable to conventional video coding, present and advocate a framework that explores the important concept of decoder-progressive-learning in DVC, and address both theoretical analysis and practical design of progressive-learning-based DVC schemes. With the progressive approach, the decoder can learn from the already decoded partial source data to refine the estimation of the side information and the source correlation models, which in turn would significantly improve the coding performance of the remaining majority of the source data. The lectures will also demonstrate progressive-learning-based DVC's promising applications in a number of applications, including efficient low-complexity video encoding, efficient compression of encrypted images/videos, power-efficient video communications in wireless sensor networks, and efficient distributed multi-view video coding. Trends, new promises and future direction will be discussed.
Wenjun Zeng is an Associate Professor with the Computer Science Department of University of Missouri, Columbia, MO. He received his B.E., M.S., and Ph.D. degrees from Tsinghua University, China, in 1990, the University of Notre Dame in 1993, and Princeton University in 1997, respectively, all in electrical engineering. His current research interests include multimedia communications and networking, distributed source and video coding, and content and network security. A number of his papers have been widely cited.
Prior to joining Univ. of Missouri in 2003, he had worked for PacketVideo Corp., Sharp Labs of America, Bell Labs, and Matsushita Info. Tech. Lab of Panasonic. He has also consulted with Microsoft Research, Huawei Technologies, and a couple of start-up companies. From 1998 to 2002, He was an active contributor to the MPEG4 Intellectual Property Management & Protection standard and the JPEG 2000 image coding standard, where four of his proposals were adopted. He is an Associate Editor of the IEEE Trans. on Info. Forensics & Security, and of IEEE Multimedia Magazine, and is on the Steering Committee of IEEE Trans. on Multimedia, of which he also served as an Associate Editor from 2005 to 2008. He is serving as the Steering Committee Chair of IEEE Inter. Conf. Multimedia and Expo (ICME), and has served as the TPC Vice-Chair of ICME 2009, the TPC Chair of the 2007 IEEE Consumer Communications and Networking Conference, the TPC Co-Chair of the Multimedia Comm. and Home Networking Sym. of the 2005 IEEE Inter. Conf. Communications. He was a Guest Editor of the Proceedings of the IEEE's Special Issue on Recent Advances in Distributed Multimedia Communications published in January 2008 and the Lead Guest Editor of IEEE Trans. on Multimedia's Special Issue on Streaming Media published in April 2004.
Platforms with multiple cores are now prevalent everywhere from desktops and graphics processors to laptops and embedded systems. By adding more parallel computational resources while managing power consumption, multi-core platforms offer better programmability, performance, and power efficiency. Multimedia signal processing systems of tomorrow will be and must be implemented on platforms with multiple cores. Writing efficient parallel applications that utilize the computing capability of many processing cores require some effort. Algorithm designers must understand the nuances of a multi-core computing engine, only then can the tremendous computing power that such platforms provide be harnessed efficiently. The goal of this tutorial is to bring the awareness of the trends and the challenges of many core eras to the development of multimedia applications. It will answer the following questions:   
The tutorial will cover (1) the overview and high-level introduction of the multi-core CPUs and GPUs, (2) the basic principles for algorithm designs for thread-level parallelism, data-level parallelism, and cache localities on the multi-core processors, and (3) some design examples of multimedia applications on the multi-core platforms.
Yen-Kuang Chen received the Ph.D. from Princeton University and is a Principal Engineer at Intel Corporation. His research interests include developing innovative multimedia applications, analyzing the performance bottleneck in current computers, and designing next generation microprocessor/platform with many cores. In particular, he is analyzing the emerging multimedia applications and providing inputs to the definition of the next-generation CPUs and GPUs with many cores. He has 20+ US patents, 25+ pending patent applications, and 85+ technical publications.
He is an associate editor of the Journal of Signal Processing Systems (including a special issue on "Multi-core Enabled Multimedia Applications & Architectures"), of IEEE Transactions on Circuit and System for Video Technology (including a special issue on "Algorithm/Architectures Co-Exploration of Visual Computing"), of IEEE Transactions on Multimedia, and of IEEE Transactions on Circuit and System I. He is a guest editor of the special issue on "Signal Processing on Platforms with Multiple Cores: Part 1 -- Overview and Methodology" and the special issue on "Signal Processing on Platforms with Multiple Cores: Part 2 -- Design and Applications" for IEEE Signal Processing Magazine. He is a member of Multimedia Signal Processing TC, IEEE Signal Processing Society, Design and Implementation of Signal Processing Systems TC, IEEE Signal Processing Society, Multimedia Systems and Applications TC, IEEE Circuits and Systems Society, and Visual Signal Processing and Communications TC, IEEE Circuits and Systems Society. He has served as a program committee member of 35+ international conferences and workshops.
Currently, he is trying to bring the awareness of the trends and the challenges of many core eras to the development of multimedia applications. In ICME 2007 and 2008, he organized the special sessions on "Multi-Core Enabled Multimedia Applications and Standards" and "Multimedia Signal Processing on Graphics Processors with Hundreds of Cores." In ICME 2009, he organized the workshop on "Multimedia Signal Processing and Novel Parallel Computing." In ICASSP 2008, he gave a tutorial on "Multimedia Signal Processing on Processors with Many Cores." In ICASSP 2009 and ISCAS 2009, he gave a tutorial on "Multimedia Signal Processing on CPUs and GPUs with Many Cores." He is an IEEE Senior Member and an ACM Senior Member.
The definition of video quality evaluation mechanisms plays a major role in the overall design of video communication systems. Most of the efforts in the research community have been focused on the image quality assessment, and only recently has video quality assessment received more attention. The most reliable way of assessing the quality of a video is subjective evaluation, because human beings are the ultimate receivers in most 2D and 3D video applications. The Mean Opinion Score (MOS), which is a subjective quality measurement obtained from a number of human observers, has been regarded for many years as the most reliable form of quality measurement. However, the MOS method is too inconvenient, slow and expensive for most applications. Therefore, researchers have identified the need of an automatic mean to measure the quality in video and it is defined as objective quality measurement. The objective quality metrics are important because they provide video designers and standard organizations with means for making meaningful quality evaluations without convening viewer panels. The goal of objective video quality assessment research is to design quality metrics that can predict perceived video quality automatically.
In this tutorial we will discuss the motivations behind the quality evaluation of 2D and 3D video and its impertinence for future multimedia services, existing objective based metrics and limitations. We further focus on cost efficient delivery of rich interactive media services in real-life environments by jointly addressing user perceptual experience and quality/resource trade-offs. This would allow for a supplier independent interoperable multimedia network and service infrastructure that focuses on the users, focuses on their needs and expectations and allows for a seamless, personalized, trusted and, most importantly, satisfying experience.
Anil Fernando leads the Video Codec group in University of Surrey, UK. He has been working in video coding since 1998 and has published more than 200 international refereed journal and proceeding papers in this area. Furthermore, he has published more than 15 international refereed journal and conference papers in video quality assessments. He is a member of the editorial board of the international journal of multimedia tools and applications. He has published nearly 200 international refereed journals/conference papers. He has also been nominated as the guest editor for several journals. He had recent tutorials in ICME, ICASSP and ICIP international conferences.
His main research interests are video quality assessments, Quality of Experience (QoE) in multimedia, 3D and multiview video coding/processing, distributed video coding and content aware coding. Currently he is leading video quality assessment and QoE work in one of the largest EU funded project (MUSCADE) on 3D video broadcasting.
Human physical activity recognition has captured the attention of multimedia community for the past two decades. The variety of applications that it supports motivates research in this area. There is abundant literature available on human activity recognition that uses images and videos as the primary sensing modality. The challenges involved in estimating motion, recognizing people and objects from images and videos have limited the success of computer vision based approaches for human activity recognition. However the past 6-7 years has seen significant advancement in the area of ubiquitous, pervasive and wearable computing resulting in the development of a variety of low bandwidth, data rich environmental and body sensor networks. These sensors provide a reliable and non-intrusive methodology for capturing activity data from humans and the environments they inhabit. This paradigm of using multitude of low bandwidth, highly specific sensors for activity recognition is commonly known as sensor based activity recognition.
The aim of the tutorial is to bring awareness about sensors, techniques, and models for human activity recognition, with the objective of promoting further research in this area. To facilitate this, the tutorial will be divided into three parts: sensor technologies - introduces the different sensors and discusses their roles in capturing activity data; processing and learning methodologies for modeling activities - discusses the pattern recognition and machine learning tools and techniques commonly adopted for extracting activity information from a variety of sensor streams; and finally applications of sensor based activity recognition - presents application contexts from the literature where this paradigm has been demonstrated. We will cover the state-of-the-art in each of these three parts, providing relevant references from the literature. We will highlight the different research challenges that interest the multimedia community as we walk through the tutorial.
Narayanan C Krishnan is a doctoral candidate in the Department of Computer Science and Engineering at Arizona State University. He is also a research associate at the Center for Cognitive Ubiquitous Computing. He received his B.Sc degree in Mathematics in 2000; M.Sc degree in Mathematics in 2002; and M.Tech degree in Computer Science in 2004; all from the Sri Sathya Sai University, Puttaparthi, India. His graduate supervisor is Prof. Panchanthan and his PhD thesis is on computational framework for wearable accelerometer based activity/gesture recognition. CK's research interests include activity recognition, human centered multimedia computing, pattern recognition and machine learning for pervasive health care and assistive technologies. He is a student member of IEEE and ACM.
Sethuraman Panchanathan received his B.Sc. degree in Physics from the University of Madras, India in 1981; B.E. degree in Electronics and Communication Engineering from the Indian Institute of Science, Bangalore, India in 1984; M. Tech degree in Electrical Engineering from the Indian Institute of Technology, Madras, India in 1986; and his Ph.D. degree in Electrical and Computer Engineering from the University of Ottawa in 1989. Dr. Panchanathan is currently the Deputy Vice-President for Research and Economic Affairs at Arizona State University, Tempe, Arizona. He is a Professor in the department of Computer Science and Engineering Department and Director of Center for Cognitive Ubiquitous Computing (CUbiC) at Arizona State University. He is also an Affiliate faculty in the University of Arizona College of Medicine, Phoenix program and an Affiliate Professor in the Department of Electrical Engineering at ASU, and co-founder and President of a start-up company MotionEase Inc., which is focused on developing video based motion capture solutions for rehabilitative applications.
He leads a team of researchers and graduate students working in various areas, including: Ubiquitous Multimedia Computing; Activity Recognition, Human Centered Multimedia Computing, Visual Computing and Communications; Media Processor Designs; Multimedia Communication; Face/Gait Analysis and Recognition; Genomic Signal Processing; and Ubiquitous Computing Environments for Blind Persons.
Dr. Panchanathan has published over 300 papers in refereed journals and conferences. He has also served as the Editor-in-Chief of the IEEE Multimedia Magazine and is an associate editor of seven other journals and transactions. Dr. Panchanathan is a Fellow of the IEEE and SPIE. He is a member of the European Association for Signal Processing (EURASIP), Association of Computing Machinery (ACM) and ASEE.