Research

Interpretable and Representation-Driven Machine Learning & Neural Models and Loss Functions for Fairness,
Natural Language, Bioinformatics, Image Processing and Information Retrieval

    My dissertation focuses on developing Accurate and Fair neural models for Natural Language tasks including Toxicity and related domains with emphasis on computationally efficient & interpretable architectures and loss functions to promote group-specific diversity measures. The nature and form of toxic language targeting demographic groups can vary quite markedly across groups. The target group adds a layer of context since what might be considered toxic when directed at one group might not carry the same meaning and significance when directed at another. A "one-size-fits-all" modeling may yield sub-optimal performance by risking over-fitting forms of toxic language most relevant to the majority group(s), potentially at the expense of weaker model performance on minority group(s), thereby raising concerns of algorithmic fairness. My work addresses these issues using two parallel tasks of 1. Multi Task Learning for group-targeted toxicity detection models and 2. Fairness loss function for promoting symmetric errors in fair target-group detection.

    Broadly, I focus on developing low Redundancy and Interpretable Neural models. There is a common misconception in the ML community -- Accuracy and Interpretability are inversely correlated -- my goal is to show the opposite. Models with proper representation are highly accurate and at the same time fully interpretable. These become relevant when AI models are being used in key aspects of society. Being able to shed light into the workings of Neural Nets and ensuring that they are based on proper mathematical foundations and not overfitting is a required check. Interpretation and explanations of models are often tailored towards a group. My work provides interpration and verification for domain experts and modelers. I attempt address these issues through the use of model guided losses and orthogonality as applicable to a variety of problems in Natural Language, Bioinformatics, Image Processing and Information Retrieval.

Neural Nets

Multi Task Learning and and Group-targeted Labels

Multi Task Learning (MTL) paradigm is applicable a single data instance is associated with multiple labels or losses. MTL learns shared respresentations that are common across all tasks, while learning task-specific representations in individual branches. Our work explores untouched aspects of MTL, including i) cases where partial (sparse) labels are present, ii) designing scalable neural architectural frameworks and iii) training and evaluation of such networks under soft labels and iv) how MTL maps the input data to the output labels, specifically when the overall distribution is a mixture of several underlying distributions.

Differentiable Fairness Loss function for Symmetric errors

Lack of differentiable fairness measures prevents gradient-based loss training or requires surrogate losses that diverge from the true metric of interest. We introduce a differentiable fairness loss function that is mapped to balanced accuracy across groups, thereby avoiding metric divergence. It optimizes for symmetric errors across groups, where errors in mis-predicting a group is equally as harmful as mis-predicting the other groups(s) -- a concept often overlooked in traditional fairness applications. Our work enhances fairness across minority and marginalized groups through well-defined optimization.

Optimal Pareto Set Detection

Pareto Optimal solutions map frontiers jointly optimizing multiple competing objectives and are increasingly utilized in classification and ranking scenarios. We propose a consistent two-stage, hybrid neural Pareto optimization approach that is accurate and scalable in terms of runtime, dimension of input data and number of functions and constraints. The first stage is a low-weight neural network to efficiently extract the weak Pareto front, using Fritz-John conditions, without assuming convexity. The second stage uses this weak front to approximate the strong Pareto optimal set through an effcient Pareto filter.

HyperSpectral Unmixing using Self-Correcting Autoencoder

Linear Mixture Model involves separating a mixed pixel as a linear combination of its endmembers and abundances. Our work shows for the first time that a two-layer autoencoder (SCA-Net), with 2FK parameters (F features, K endmembers), achieves error metrics (1e-5) that are scales apart from previously reported values around of 1e-2. SCA-Net converges to this low error solution starting from a random initialization of weights. We also show that SCA-Net, based upon a bi-orthogonal representation, performs a self-correction when the the number of endmembers are over-specified.

Streaming Singular Value Decomposition for Big Data

Singular Value Decomposition in a Big Data setting is often restrictive due to the main memory requirements imposed by the dataset. We present a two stage neural optimization approach where the memory requirement depends explicitly on the feature dimension and desired rank, independent of the sample size. The proposed scheme reads data samples in a streaming setting with the network minimization problem converging to a low rank approximation with high precision. Our architecture is fully interpretable where all the network outputs and weights have a specific meaning.

Intersecting Manifold Detection for Non Convex Functions

Constrained Optimization solution algorithms are restricted to point based solutions. Single or multiple objectives must be satisfied, wherein both the objective function and constraints can be non-convex resulting in multiple optimal solutions. Scenarios include intersecting surfaces as Implicit Functions. We present neural solutions for extracting optimal sets as approximate manifolds, where non-convex objectives and constraints are defined as modeler guided, domain-informed L2 loss function. This promotes interpretability since modelers can confirm the results against known analytical forms.

CNN for Identifying Kernels of Physical Process

We present a fully convolutional architecture that captures the invariant structure of the domain to reconstruct the observable system, defined by Partial Differential Equations. Our intent is to learn coupled dynamic processes interpreted as deviations from true kernels representing isolated processes. The architecture is robust and transparent in capturing process kernels and system anomalies. We also show that high weights representation is not only redundant but also impacts network interpretability. These allow us to identify redundant kernels and their manifestations in activation maps to guide better designs.

Machine Learning

Higher Order Feature Seletion

Feature selection is a process of choosing a subset of relevant features so that the quality of prediction models can be improved. We present a higher-order MI based approximation technique called Higher OrderFeature Selection (HOFS). Instead of producing a single list of features, our method produces a ranked collection of feature subsets that maximizes MI, giving better comprehension (feature ranking) as to which features work best together when selected, due to their underlying interdependent structure.

Minimum Noise Fraction based Denoising of HyperSpectral Images

Minimum Noise Fraction (MNF) developed by Green et al. has been extensively studied as a method for noise removal in HSI data. However, it entails a manual speed-accuracy trade-off, namely the process of manually selecting the relevant bands in the MNF space. We present a low-rank model where the computational time of the algorithm is reduced, as well as the entire process of band selection is fully automated. This automated approximations produced by the algorithm show the reconstruction accuracy vs storage (50×) and runtime speed (60×) trade-off.

Information Retrieval

Ranking of IR metrics based on Information Content

Given limited time and space, IR studies report few evaluation metrics which must be carefully selected. We quantify correlation between popular IR metrics on TREC test collections. Next, we investigate prediction of unreported metrics: given 1-3 metrics, we assess the best predictors for 10 others. We further explore whether high-cost evaluation measures can be predicted using low-cost measures. We also present a novel model for ranking evaluation metrics based on covariance, enabling selection of a set of metrics that are most informative and distinctive.