E6620 Applied Signal Recognition (2019)

Instructor:

  Prof. Homayoon Beigi <beigi@recotechnologies.com>

Textbook:

  H. Beigi, "Fundamentals of Speaker Recognition," Springer, New York 2011.

Grading:

  Midterm (40%): 

    - Implementation of a signal recognition project homework project
      and data selection/acquisition.

    - 2-page extended abstract describing the results and proposing
      modifications to one specific part of the engine to increase
      performance (accuracy, speed, or both)

    - 10 minute presentation of the above.

    - Scripts and results.

  Final (60%): 
    - Discussion and Implementation of an Improvement in one of the aspects
      of the Signal recognition engine.

    - 6-page IEEE conference style paper describing the system and
      results obtained from the modification.

    - Code and Results.

    - 10 minute presentation of the results.

Course Description:

Applied Signal Recognition is a comprehensive course, covering all
aspects of Signal Recognition from theory to practice.  In this course
such topics as Time and Spatial Signals (such as Audio, Image, and
Vibration signals) Signal Representation, Signal Processing and
Feature Extraction, Probability Theory and Statistics, Information
Theory, Metrics and Divergences, Decision Theory, Parameter
Estimation, Clustering and Learning, Transformation, Hidden Markov
Modeling, Search Techniques, Deep Neural Networks, Support Vector
Machines and other recent machine learning techniques used in signal
recognition are covered in some detail.  Also, applications in Machine
and Structural Health analysis/prognosis, Objection Detection and
Recognition, Audio Event Detection, Multimodal analysis, Image
Recognition, Video Analysis are covered in detail.

Also, several open source software packages are introduced, with
detailed hands-on projects using Kaldi, Darknet, and Caffe to produce
a fully functional signal recognition engine.  The lectures cover the
theoretical aspects as well as practical coding techniques.  The
course is graded based on a project.  The Midterm (40% of the grade is
in the form of a two page proposal for the project and the final (60%
of the grade) is an oral presentation of the project plus a 6-page
conference style paper describing the results of the research project.
The instructor uses his own Textbook for the course, Homayoon Beigi,
"Fundamentals of Speaker Recognition," Springer-Verlag, New York,
2011.  Every week, the slides of the lecture are made available to the
students.

Topics to be covered:

- Introduction (Overview of Speaker Recognition and its history)

- Audio, Image, Virbration, brain-wave, and applications include human
  biometrics, imaging, geophysics, machinery, electronics, networking,
  languages, communications, and finance

- Signal Representation of time-dependent signals
  Sampling, Quantization and Amplitude Errors
  Practical Sampling and Associated Errors

- Signal Processing of time-dependent signals and Feature Extraction
    The Sampling Process
    Integral Transforms
    Spectral Analysis and Direct Method Features
    Linear Predictive Cepstral Coefficients (LPCC)
    Perceptual Linear Predictive (PLP) Analysis
    Alternative Cepstral-Based Features
    Other Features
    Signal Enhancement and Pre-Processing

- Audio, Image, Video, Vibration, and Natural Language Processing
    Audio Event Detection
    Machine Health Analysis/Prognosis
    Structural Health Analysis/Prognosis
    Object and Face Detection and Recognition
    Emotion Analysis
    Multimodal Corpora
    Natural Language Processing
    
- Recognition Software
    Creating a complete Recognition System
      Training
        Kaldi
        Darknet
        Caffe

- Probability Theory and Statistics
    Set Theory
    Measure Theory
    Probability Measure
    Integration
    Functions
    Statistical Moments
    Discrete Random Variables
    Moment Estimation
    Multi-Variate Normal Distribution

- Information Theory
    Sources
    The Relation between Uncertainty and Choice
    Discrete Sources
    Discrete Channels
    Continuous Sources
    Relative Entropy
    Fisher Information

- Metrics and Divergences
    Distance (Metric)
    Divergences and Directed Divergences

- Decision Theory
    Hypothesis Testing
    Bayesian Decision Theory
    Bayesian Classifier
    Decision Trees

- Parameter Estimation
    Maximum Likelihood Estimation
    Maximum A-Posteriori (MAP) Estimation
    Maximum Entropy Estimation
    Minimum Relative Entropy Estimation
    Maximum Mutual Information Estimation (MMIE)
    Model Selection (AIC and BIC)

- Unsupervised Clustering and Learning
    Vector Quantization (VQ)
    Basic Clustering Techniques
    Estimation using Incomplete Data

- Transformation
    Principal Component Analysis (PCA)
    Generalized Eigenvalue Problem
    Nonlinear Component Analysis
    Linear Discriminant Analysis (LDA)
    Factor Analysis
    Probabilistic Linear Discriminant Analysis (PLDA)

- Hidden Markov Modeling (HMM)
    Memoryless Models
    Discrete Markov Chains
    Markov Models
    Hidden Markov Models
    Model Design and States
    Training and Decoding
    Gaussian Mixture Models (GMM)
    Practical Issues

- Search Techniques
    Practical Issues
    Finite State Transducers

- Deep Neural Networks
    Perceptron
    Feedforward Networks
    Convolutional Neural Networks (CNN)
    Recurrent Neural Networks (RNN)
    Time-Delay Neural Networks (TDNN)
    Long-Short Term Memory Networks (LSTM)
    End-to-End Sequence (Encoder/Decoder) Neural Networks
    Hierarchical Mixtures of Experts (HME)
    Deep Learning Network and Practical Issues
    Transfer Learning

- Support Vector Machines
    Risk Minimization
    The Two-Class Problem
    Kernel Mapping
    Positive Semi-Definite Kernels
    Non Positive Semi-Definite Kernels
    Kernel Normalization
    Kernel Principal Component Analysis (Kernel PCA)
    Nuisance Attribute Projection (NAP)
    The multiclass (Γ -Class) Problem