Global information
- Repository: https://github.com/georgesterpu/pyVSR
- Contact:
- License:
- Reference:
@inproceedings{Sterpu2017,
  author       = {George Sterpu and Naomi Harte},
  title        = {Towards Lipreading Sentences with Active Appearance Models},
  year         = 2017,
  booktitle    = {Proc. The 14th International Conference on Auditory-Visual Speech Processing},
  pages        = {70--75},
  doi          = {10.21437/AVSP.2017-14},
  url          = {http://dx.doi.org/10.21437/AVSP.2017-14}
}
Description
pyVSR is a Python toolkit aimed at running Visual Speech Recognition (VSR) experiments in a traditional framework (e.g. handcrafted visual features, Hidden Markov Models for pattern recognition).
The main goal of pyVSR is to easily reproduce VSR experiments in order to have a baseline result on most publicly available audio-visual datasets.
What can you do with pyVSR:
- Fetch a filtered list of files from a dataset
    - currently supported:
        - TCD-TIMIT
            - speaker-dependent protocol
- speaker-independent protocol
- single person
 
 
- TCD-TIMIT
            
 
- currently supported:
        
- Extract visual features:
    - Discrete Cosine Transform (DCT)
        - Automatic ROI extraction
- Configurable window size
- Fourth order accurate derivatives
- Sample rate interpolation
- Storage in HDF5 format
 
- Active Appearance Models (AAM)
        - Do NOT require manually annotated landmarks
- Face, lips, and chin models supported
- Parameters obtainable either through fitting or projection
- Implementation based on Menpo
 
- Point cloud of facial landmarks
        - OpenFace wrapper
 
 
- Discrete Cosine Transform (DCT)
        
- Train Hidden Markov Models (HMMs)
    - easy HTK wrapper for Python
- optional bigram language model
- multi-threaded support (both for training and decoding at full CPU Power)
 
- Extend the support for additional features
    - pyVSR has a simple, modular, object-oriented architecture
 
