← back · research
What I'm researching
Three threads run through my work: the geometry of learned representations, the cost of multimodal inference, and the small engineering choices that decide whether a model is usable in practice. Below are the threads themselves, then the papers that fall out of them.
01 : Interests
-
Representation geometry
How embedding spaces deform under fine-tuning, quantization, and LoRA adaptation — and whether unlabeled, model-agnostic metrics can predict downstream degradation before it shows up in eval. This is the thread behind SemanticSentry.
-
Cost-aware multimodal inference
Practical ways to keep vision-language pipelines cheap: frame deduplication, hierarchical attention budgets, adaptive sampling that preserves signal at a fraction of the API cost. This is the thread behind AdaFrame and the AdLovin pipeline.
-
Systems for ML at scale
The plumbing — Rust OCR engines, TensorRT inference, structured downstream analytics — that lets research artifacts move from notebook to production. This is the thread behind my work at Skop Intelligence.
02 : Publications
-
Geometric Drift Metrics are Insufficient: A Matched-Magnitude Dissociation Between Aligned and Anti-Aligned Fine-Tuning
Geometric similarity metrics — Centered Kernel Alignment, neighborhood preservation, isotropy — are widely used to compare neural representations and infer functional similarity. We show this inference fails in two complementary directions. On E5-base-v2, matched-NPS conditions produce order-of-magnitude differences in retrieval damage depending on the fine-tuning objective; on BERT, near-trivial geometric drift coexists with a large, CI-disjoint functional gap between MLM and contrastive fine-tuning on the same corpus. Geometric drift magnitude does not predict the sign or scale of functional change without knowledge of the gradient–pretraining alignment.
contributions
- A matched-magnitude dissociation on E5 between gradient-aligned and gradient-anti-aligned fine-tuning at fixed Neighborhood Preservation Score, demonstrated across three seeds and a pre-registered hypothesis.
- A converse dissociation on BERT where the more-drifted condition is functionally better, opposite to what NPS would predict.
- A pre-registered corpus control ruling out distribution shift, plus structural controls (matched-Frobenius random rank-4, full-rank fine-tuning) ruling out low-rank confinement as the driver.
- A linear-probe methodology finding: up to 28.5 pp accuracy swings at identical embeddings under weak vs. cross-validated probe configurations — large enough to contaminate published fine-tuning evaluations.
-
AdaFrame: Hierarchical Multimodal Deduplication with Adaptive Information Budgeting for Cost-Efficient Video Advertisement Analysis
Modern video advertisement analysis pipelines spend most of their compute and API budget on near-duplicate frames. AdaFrame tackles this with a hierarchical multimodal deduplication scheme that allocates an adaptive information budget across frames, prioritizing visual-textual diversity over uniform sampling. On an advertising-video corpus the method achieves 70–90% frame deduplication and reduces downstream vision-API cost by roughly 70% while preserving structured signal extraction.
contributions
- Hierarchical deduplication combining CLIP image embeddings with audio-state features from HuBERT.
- An adaptive per-clip information budget driven by local visual-textual diversity.
- 70–90% frame deduplication with negligible quality loss on downstream extraction tasks.
-
Advanced Facial Emotion Classification with 135 Classes for Enhanced Cybersecurity Applications
A fine-grained facial-emotion classifier scaling to 135 compound emotion classes, oriented toward cybersecurity applications where coarse 7-class emotion models miss subtle deception, stress, and intent signals. Presented at the IEEE International Conference on Artificial Intelligence in Cybersecurity (ICAIC) 2025.
Code for these papers lives at github.com/abtonmoy. For talks, drafts, or collaboration: atonmoy27@wabash.edu.