Current projects
Projects that are currently underway are shown here and include work on biometrics and language change detection.
Current research projects
- Biometrics
- Speaker Recognition (Voice Biometrics)
- Speaker Change Detection
- Multimodal Biometrics
- Language Change Detection
- Singing Voice Separation
- Automatic Number Plate Recognition on FPGA
- Voice/Face De-Identification in Multimedia Documents (for Privacy Protection)
Biometrics
Biometrics has been a subject of extensive research within this research group over the past several years. The scope of activities has covered such areas as speaker recognition (voice biometrics), face recognition and multimodal biometrics (eg voice and face).
These have a wide range of applications in physical access, logical access, and law enforcement. The research has been conducted mainly in collaboration with industrial and EU partners.
Speaker recognition (voice biometrics)
Speaker recognition is defined as the automatic authentication of the identity of an individual based on a presented sample utterance.
The impetus for research into speaker recognition has been its applications in many diverse areas including telephone and internet banking, online trading, and forensics. The work conducted at the University of Hertfordshire over the past several years has resulted in major advances in the field.
An important aspect of these advances has been the development of novel and effective methods for robust speaker verification under adverse conditions.
The current research activities in this field are concerned with enhancing the quality of speaker modelling methods, and improving the reliability in real-time speaker verification.
Speaker change detection
The automatic detection of speaker changes in audio/video documents has various applications in such areas as audio indexation, speaker tracking, and authorisation control in smart environments.
A major aspect of efforts in this area has been the development of effective and computationally efficient methods for multiple-speaker change detection in a large audio stream.
A major facet of the current work in this area is research into methods for enhancing the effectiveness in speaker change detection in the presence of varying background conditions.
Multimodal biometrics
An area of considerable interest in biometric recognition is the use of multiple modalities. This is partly in view of the possibility of such limitations as non-universality and impersonation with unimodal biometric techniques.
However, a main attraction of multimodal biometrics is that it provides the opportunity for enhancing the recognition accuracy beyond that achievable with unimodal biometrics. The focus of investigations at the University of Hertfordshire has been that of effectively fusing voice and face biometrics.
The concern in the current work in this area is the development of methods for minimising the adverse effects of relative and absolute degradations in the individual biometric data types. The outcome can be of considerable value in enhancing the recognition accuracy when operating in uncontrolled environments.
Language change detection
Automatic language change detection without prior knowledge of the type of language is an essential capability for accurate and efficient navigation of large audio data with multiple languages spoken by different speakers.
It has applications in a variety of scenarios including the indexation of multilingual audio documents/recordings for the purpose of transcription or conversion to a single language.
The purpose of the current research in this area is the development of an effective approach for detecting the points of language change in multi-speaker, multi-lingual documents, in the presence of variation in data conditions.
Singing Voice Separation (SVS)
Singing Voice Separation (SVS) can be defined as the process of extracting the vocal element from a given song recording. The impetus for research in this area is mainly that of facilitating certain important applications of Music Information Retrieval (MIR) such as lyrics recognition, singer identification, and melody extraction.
The current research in this field includes investigations into effective methods for unsupervised extraction of singing voice from stereophonic studio recordings. The approaches considered include time-and frequency-domain methods as well as a fusion of these.
Automatic Number Plate Recognition on FPGA
Automatic Number Plate Recognition (ANPR) systems allow users to track, identify and monitor moving vehicles by automatically extracting their number plates. These systems are rapidly becoming used for a vast number of applications. These include automatic congestion charge systems, access control, tracing of stolen cars.
The fundamental requirements of an ANPR system are image capture using an ANPR camera, and processing of the captured image. The image processing part, which is a computationally intensive task, includes two stages i.e. plate localisation and character recognition.
The common hardware choice for its implementation is often high performance workstations and expensive supercomputers. However, the cost, compactness and power issues that come with these solutions motivate the search for other platforms.
Recent improvements in the computing power of Programmable Gate Arrays (FPGAs) and Digital Signal Processors (DSPs) have motivated researchers to consider them as low cost solution for accelerating such computationally intensive task.
It is the aim of this research project to propose new algorithms (or improve existing ones) to implement an ANPR system and their efficient FPGA implementation in term of power consumption, maximum running frequency (i.e. execution time) and on-chip resources usage.
Voice/face de-identification in multimedia documents (for privacy protection)
Recent advances in electronic recording devices and signal processing have highly facilitated the efficacy of audio and video acquisition. This capability is now widely exploited either for immediate inspection of captured data or for storage and subsequent analysis/sharing.
As a result, concerns have been raised regarding the privacy of people identifiable in the recordings. The purpose in this research work is to develop effective methods for voice and face de-identification in multimedia documents whilst preserving the naturalness of data and its usability.