Bhargav Kumar Nath.
Data Scientist & ML Engineer.

Available for full-time work

Building predictive models on millions of real-world records, seamlessly bridging the gap between raw data pipelines and deployed ML systems.

bhargav@ml-cluster ~
$system-status --live
>
ALL SYSTEMS NOMINAL|GPU: 94% util|Uptime: 847d
4.5×

Conversion Uplift

£0.139

Forecasting MAE

<1ms

Inference Latency

8–32×

Serving Throughput

40%

VRAM Reduction

100M+

Event Pipeline

About Me

Researcher who ships
production systems.

I hold an MSc in Data Science and Analytics from the University of Leeds and a BTech in Computer Science and Engineering from Assam Don Bosco University. My expertise bridges Machine Learning and Software Engineering: not just building models, but deploying and scaling them reliably.

I specialize in predictive modeling, deep learning architectures, and end-to-end MLOps pipelines. From processing datasets exceeding 100 million records using PySpark to deploying high-performance inference engines, I focus on unlocking tangible value through robust data ecosystems.

Current Focus Areas

Building scalable prediction systems
Natural Language Processing & LLMs
Real-time analytics & streaming pipelines
Bridging data science research and production engineering

MSc

Data Science & Analytics, Leeds

~2

Years professional experience

11

End-to-end ML projects shipped

Philosophy

"A model is a mathematical fantasy, but an ML system is a living entity. I design for the shifting reality of the human world, not the static perfection of a laboratory."
01

Escape the State-of-the-Art Trap

Leaderboard victories rarely survive reality. I start with the simplest model to establish an honest baseline and prove if building ML is even necessary.

02

Algorithms Fade, Data is Foundational

Architectures change but long-term success depends on data quality. Real-world data is noisy and evolving; inflexible systems quickly become obsolete.

03

Deployment is the Starting Line

Standard software fails loudly, but ML systems fail silently via confident incorrect predictions. Production models need continuous monitoring to stay reliable.

Proof of Work

Data Science & ML Projects

Demonstrating end-to-end expertise uniting predictive modeling with resilient MLOps.

View all on GitHub
8-32× throughput

PageForge

Paged KV-Cache Memory Manager for LLM Inference

Problem

Standard KV-cache pre-allocates contiguous tensors for max sequence length, wasting up to 90% of VRAM.

Approach

Engineered a custom PagedAttention memory manager from scratch using Rust for O(1) page allocation and custom CuPy CUDA kernels.

Results

8-32× more concurrent sequences on the same GPU: 424 sequences/GB vs. 53 sequences/GB in naive approaches.

RustCUDAPythonPagedAttentionPyO3CuPy
View case study
1.847 Sharpe OOS

Andria Systems

Hedge Fund Signal Intelligence Platform

Problem

Quantitative hedge funds struggle to ingest and extract actionable signals from fragmented high-velocity alternative data streams.

Approach

Engineered a high-throughput signal intelligence platform leveraging async stream processing, vector embeddings, and low-latency pipelines.

Results

Real-time dashboard analytics enabling sub-second signal extraction and predictive insights for algorithmic trading decisions.

Next.jsReactAlternative DataSignal ProcessingFinancial Analytics
View case study
11.7s · 17T configs

EMPAS

Evolutionary Mixed Precision Architecture Search

Problem

Edge LLMs face severe memory bottlenecks because uniform quantization degrades accuracy.

Approach

NSGA-II optimizer in PyTorch coupled with a zero-cost Hessian-based sensitivity proxy to automate mixed-precision search.

Results

40% VRAM reduction and 20% throughput increase on TinyLlama-1.1B. Slashed search from days to minutes.

PyTorchCUDAGenetic AlgorithmsLLM QuantizationHessian Analysis
View case study
67K products · £0.14 MAE

PricePoint Dynamics

UK Supermarket Competitive Intelligence

Problem

Classical string matching fails to track modern competitive UK supermarket pricing dynamics across 9.5M products.

Approach

Semantic vector matching via Sentence-BERT and FAISS with LightGBM predictive modeling under strict data contracts.

Results

Expanded match rate to 67,000+ products and achieved £0.139 MAE with R² = 0.98 on 9.5M listings.

FAISSNLP MatchingLightGBMSHAPPanderaSentence-BERT
View case study
0.91 Faithfulness · +56% F1

FinSight-Alpha

Production-Grade Agentic RAG Pipeline

Problem

Standard RAG architectures frequently hallucinate on complex financial documents and fail during multi-hop reasoning.

Approach

Architected an Agentic RAG pipeline using LangGraph, integrating hybrid retrieval via Qdrant and strict tool-execution constraints.

Results

Highly faithful, traceable reasoning with resilient error boundaries, validated end-to-end using automated RAGAS metrics.

Agentic RAGLangGraphQdrantPythonRAGAS
View case study
+$0.14/user · 2,667×

Dynamic Experimentation Engine

A/B Testing & Unified Uplift Modeling

Problem

Standard A/B tests optimize vanity metrics and miss true incremental value.

Approach

Causal Inference pipeline targeting persuadable segments using X-Learners and Thompson Sampling, distilled into a fast decision tree.

Results

<1ms inference latency while identifying micro-segments driving 70% of total algorithmic uplift.

CausalMLX-LearnersThompson SamplingKnowledge Distillation
View case study
97% memory reduction · 4.5x lift

Customer Intelligence Platform

109.9M Events Analyzed on Commodity Hardware

Problem

Processing 100M+ event logs typically requires expensive cloud data warehouses.

Approach

End-to-end local analytical engine using DuckDB and Polars to train LightGBM classifiers entirely in-memory.

Results

Reduced 109.9M events from 14.7GB to 1.9GB, powering models that drove a 4.5× conversion lift.

DuckDBPolarsLightGBMPropensity Modeling
View case study
O(N) vs O(N log N) · +5.1% AUPRC

Synthetic Intelligence

Generative Pipeline for Data Scarcity

Problem

Classical oversampling algorithms generate noise when scaling complex tabular datasets.

Approach

Model-driven rejection sampling pipeline using PyTorch autoencoders to guarantee strict manifold alignment.

Results

Scaled generation linearly and systematically outperformed classical SMOTE baselines while preserving predictive fidelity.

PyTorchGenerative Modelingt-SNEPrivacy AI
View case study
98% compression · R²=0.91

Fitness Tracker Analytics

Production-ready Analytics Platform

Problem

Generating reliable insights from noisy, unstructured fitness sensors.

Approach

Decoupled system separating PySpark ETL pipelines from Scikit-Learn inference with FFT feature extraction.

Results

Extracted 198 critical temporal features via FFT and deployed a zero-latency interactive clustering dashboard.

Apache SparkScikit-LearnFFT/PCADocker
View case study
24.59 K MAE · <50ms

Melting Point Prediction

GNN Fusion Architecture for Material Science

Problem

Experimental hardware screening for thermodynamic material properties is exceptionally slow and costly.

Approach

Hybrid neural network blending molecular graphs (PyTorch Geometric) with classical RDKit features and LightGBM.

Results

Sub-50ms inference latency, 20% MAE reduction vs pure deep learning architectures.

PyTorch GeometricRDKitOptunaXGBoost
View case study
0.53 F1 · +197% over GRU

MALLORN

Multi-Channel Rare Transient Detection

Problem

Identifying rare astronomical events hampered by extreme multi-band data sparsity and severe class imbalance (4.86% target).

Approach

Automated statistical feature extraction via tsfresh and LightGBM classifiers replacing brittle neural networks.

Results

Distilled complex signals into 198 optimal features, maximizing F1 on an extremely rare target class.

PyTorch (RNN/GRU)Signal ProcessingTime SeriesImbalanced Data
View case study

Technical Arsenal

Tools & Technologies

The stack I use to explore data and engineer solutions.

PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric
PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric
PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric
PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric

Engineering Track

Experience & Education

A track record of engineering impact across institutions.

Professional Engineering
M/S Sanjog TradingGuwahati, India
Jul 2020 to Nov 2021

End-to-End ETL Infrastructure

Architected Python ETL pipelines using Pandas & NumPy, eliminating manual processing bottlenecks and establishing reliable data infrastructure for downstream ML.

Predictive Forecasting System

Developed and validated statistical time-series forecasting models for daily/monthly sales trends, enabling algorithmic inventory planning and supply chain optimization.

Real-Time BI Dashboard

Engineered interactive Streamlit dashboards to visualize sales KPIs, seasonal trends, and performance metrics, enabling live data-driven pricing strategies.

IIT GuwahatiGuwahati, India
Jul 2022 to Aug 2022

Relational Database Architecture

Designed normalized MySQL schemas with optimized indexing for ACID-compliant data integrity across 15+ frontend modules.

Constraint-Satisfaction Optimization

Conceptualized a heuristic constraint-satisfaction algorithm for automated timetable generation, foundational to operations research and resource-allocation ML.

Evidence-Based UX Research

Conducted structured quantitative user research across multiple institutions and performed competitive feature analysis of 4 EdTech platforms.

Airports Authority of IndiaNER Regional HQ, India
Jul 2023 to Aug 2023

Enterprise Asset Analytics

Analysed operational data for 1,053+ IT assets through an enterprise Asset Management System, tracking deployment status, warranty lifecycles, and maintenance history.

SAP ERP Data Workflows

Documented cross-functional SAP ERP workflows covering HR, Finance, and Procurement, gaining hands-on exposure to enterprise ETL architecture and data governance.

Network Infrastructure Mapping

Mapped enterprise network infrastructure including MPLS/ILL load balancing, core switching, and firewall architectures, critical for production MLOps deployment.

Academic Foundations
Sep 2024 to Nov 2025

MSc Data Science and Analytics

University of Leeds, UK

Specialized in advanced machine learning, predictive modeling, data mining, and big data architecture. Developed expertise in end-to-end data pipelines, real-time analytics, and MLOps principles.

Aug 2020 to Jun 2024

BTech Computer Science and Engineering

Assam Don Bosco University, India

Solid foundation in software engineering, algorithms, data structures, and database management. Led projects integrating classical software design with early predictive modeling applications.

Key Capabilities

Agentic AI Workflows
MLOps & ResOps
High-Performance Computing
Quantitative Research
End-to-End ML Pipelines
LLM Inference Optimization

Deep Learning Lab

A dependency-free Mathematical Engine built in TypeScript. Experiment with hyperparameters, inject live training noise, and compare train/test behavior in real-time.

Epoch0
Train / Test1.0000 / 1.0000
Train LossTest Loss
Data Volume100%

Layer 1
4
Layer 2
4

Activation Function
Learning Rate Strategy
Current LR: 0.03000
Base Learning Rate0.030
Training Noise Level0.00
L1 Regularization0
L2 Regularization0

Let's build something
exceptional together.

I'm currently seeking new full-time opportunities. If you have an open role or just want to connect, my inbox is always open.

Designed & built by Bhargav Kumar Nath