Bhargav Kumar Nath.
Data Scientist & ML Engineer.

Available for full-time work

Building predictive models on millions of real-world records, seamlessly bridging the gap between raw data pipelines and deployed ML systems.

More about me View Resume

bhargav@ml-cluster ~

$system-status --live

ALL SYSTEMS NOMINAL|GPU: 94% util|Uptime: 847d

4.5×

Conversion Uplift

£0.139

Forecasting MAE

<1ms

Inference Latency

8–32×

Serving Throughput

40%

VRAM Reduction

100M+

Event Pipeline

About Me

Researcher who ships
production systems.

I hold an MSc in Data Science and Analytics from the University of Leeds and a BTech in Computer Science and Engineering from Assam Don Bosco University. My expertise bridges Machine Learning and Software Engineering: not just building models, but deploying and scaling them reliably.

I specialize in predictive modeling, deep learning architectures, and end-to-end MLOps pipelines. From processing datasets exceeding 100 million records using PySpark to deploying high-performance inference engines, I focus on unlocking tangible value through robust data ecosystems.

Current Focus Areas

Building scalable prediction systems

Natural Language Processing & LLMs

Real-time analytics & streaming pipelines

Bridging data science research and production engineering

MSc

Data Science & Analytics, Leeds

Years professional experience

End-to-end ML projects shipped

Philosophy

"A model is a mathematical fantasy, but an ML system is a living entity. I design for the shifting reality of the human world, not the static perfection of a laboratory."

Escape the State-of-the-Art Trap

Leaderboard victories rarely survive reality. I start with the simplest model to establish an honest baseline and prove if building ML is even necessary.

Algorithms Fade, Data is Foundational

Architectures change but long-term success depends on data quality. Real-world data is noisy and evolving; inflexible systems quickly become obsolete.

Deployment is the Starting Line

Standard software fails loudly, but ML systems fail silently via confident incorrect predictions. Production models need continuous monitoring to stay reliable.

Proof of Work

Data Science & ML Projects

Demonstrating end-to-end expertise uniting predictive modeling with resilient MLOps.

View all on GitHub

8-32× throughput

PageForge

Paged KV-Cache Memory Manager for LLM Inference

Problem

Standard KV-cache pre-allocates contiguous tensors for max sequence length, wasting up to 90% of VRAM.

Approach

Engineered a custom PagedAttention memory manager from scratch using Rust for O(1) page allocation and custom CuPy CUDA kernels.

Results

8-32× more concurrent sequences on the same GPU: 424 sequences/GB vs. 53 sequences/GB in naive approaches.

RustCUDAPythonPagedAttentionPyO3CuPy

View case study

1.847 Sharpe OOS

Andria Systems

Hedge Fund Signal Intelligence Platform

Problem

Quantitative hedge funds struggle to ingest and extract actionable signals from fragmented high-velocity alternative data streams.

Approach

Engineered a high-throughput signal intelligence platform leveraging async stream processing, vector embeddings, and low-latency pipelines.

Results

Real-time dashboard analytics enabling sub-second signal extraction and predictive insights for algorithmic trading decisions.

Next.jsReactAlternative DataSignal ProcessingFinancial Analytics

View case study

11.7s · 17T configs

EMPAS

Evolutionary Mixed Precision Architecture Search

Problem

Edge LLMs face severe memory bottlenecks because uniform quantization degrades accuracy.

Approach

NSGA-II optimizer in PyTorch coupled with a zero-cost Hessian-based sensitivity proxy to automate mixed-precision search.

Results

40% VRAM reduction and 20% throughput increase on TinyLlama-1.1B. Slashed search from days to minutes.

PyTorchCUDAGenetic AlgorithmsLLM QuantizationHessian Analysis

View case study

67K products · £0.14 MAE

PricePoint Dynamics

UK Supermarket Competitive Intelligence

Problem

Classical string matching fails to track modern competitive UK supermarket pricing dynamics across 9.5M products.

Approach

Semantic vector matching via Sentence-BERT and FAISS with LightGBM predictive modeling under strict data contracts.

Results

Expanded match rate to 67,000+ products and achieved £0.139 MAE with R² = 0.98 on 9.5M listings.

FAISSNLP MatchingLightGBMSHAPPanderaSentence-BERT

View case study

0.91 Faithfulness · +56% F1

FinSight-Alpha

Production-Grade Agentic RAG Pipeline

Problem

Standard RAG architectures frequently hallucinate on complex financial documents and fail during multi-hop reasoning.

Approach

Architected an Agentic RAG pipeline using LangGraph, integrating hybrid retrieval via Qdrant and strict tool-execution constraints.

Results

Highly faithful, traceable reasoning with resilient error boundaries, validated end-to-end using automated RAGAS metrics.

Agentic RAGLangGraphQdrantPythonRAGAS

View case study

+$0.14/user · 2,667×

Dynamic Experimentation Engine

A/B Testing & Unified Uplift Modeling

Problem

Standard A/B tests optimize vanity metrics and miss true incremental value.

Approach

Causal Inference pipeline targeting persuadable segments using X-Learners and Thompson Sampling, distilled into a fast decision tree.

Results

<1ms inference latency while identifying micro-segments driving 70% of total algorithmic uplift.

CausalMLX-LearnersThompson SamplingKnowledge Distillation

View case study

97% memory reduction · 4.5x lift

Customer Intelligence Platform

109.9M Events Analyzed on Commodity Hardware

Problem

Processing 100M+ event logs typically requires expensive cloud data warehouses.

Approach

End-to-end local analytical engine using DuckDB and Polars to train LightGBM classifiers entirely in-memory.

Results

Reduced 109.9M events from 14.7GB to 1.9GB, powering models that drove a 4.5× conversion lift.

DuckDBPolarsLightGBMPropensity Modeling

View case study

O(N) vs O(N log N) · +5.1% AUPRC

Synthetic Intelligence

Generative Pipeline for Data Scarcity

Problem

Classical oversampling algorithms generate noise when scaling complex tabular datasets.

Approach

Model-driven rejection sampling pipeline using PyTorch autoencoders to guarantee strict manifold alignment.

Results

Scaled generation linearly and systematically outperformed classical SMOTE baselines while preserving predictive fidelity.

PyTorchGenerative Modelingt-SNEPrivacy AI

View case study

98% compression · R²=0.91

Fitness Tracker Analytics

Production-ready Analytics Platform

Problem

Generating reliable insights from noisy, unstructured fitness sensors.

Approach

Decoupled system separating PySpark ETL pipelines from Scikit-Learn inference with FFT feature extraction.

Results

Extracted 198 critical temporal features via FFT and deployed a zero-latency interactive clustering dashboard.

Apache SparkScikit-LearnFFT/PCADocker

View case study

24.59 K MAE · <50ms

Melting Point Prediction

GNN Fusion Architecture for Material Science

Problem

Experimental hardware screening for thermodynamic material properties is exceptionally slow and costly.

Approach

Hybrid neural network blending molecular graphs (PyTorch Geometric) with classical RDKit features and LightGBM.

Results

Sub-50ms inference latency, 20% MAE reduction vs pure deep learning architectures.

PyTorch GeometricRDKitOptunaXGBoost

View case study

0.53 F1 · +197% over GRU

MALLORN

Multi-Channel Rare Transient Detection

Problem

Identifying rare astronomical events hampered by extreme multi-band data sparsity and severe class imbalance (4.86% target).

Approach

Automated statistical feature extraction via tsfresh and LightGBM classifiers replacing brittle neural networks.

Results

Distilled complex signals into 198 optimal features, maximizing F1 on an extremely rare target class.

PyTorch (RNN/GRU)Signal ProcessingTime SeriesImbalanced Data

View case study

Technical Arsenal

Tools & Technologies

The stack I use to explore data and engineer solutions.

PyTorch

TensorFlow

Scikit-Learn

Pandas

NumPy

Python

SQL

Spark

Hadoop

Docker

Kubernetes

AWS

GCP

FastAPI

React

Next.js

TypeScript

CUDA

MLflow

PostgreSQL

TensorRT

Redis

Kafka

Hugging Face

Airflow

GitHub

Bash / Shell

NLTK

Streamlit

DuckDB

LightGBM

XGBoost

Optuna

PyTorch Geometric

PyTorch

TensorFlow

Scikit-Learn

Pandas

NumPy

Python

SQL

Spark

Hadoop

Docker

Kubernetes

AWS

GCP

FastAPI

React

Next.js

TypeScript

CUDA

MLflow

PostgreSQL

TensorRT

Redis

Kafka

Hugging Face

Airflow

GitHub

Bash / Shell

NLTK

Streamlit

DuckDB

LightGBM

XGBoost

Optuna

PyTorch Geometric

PyTorch

TensorFlow

Scikit-Learn

Pandas

NumPy

Python

SQL

Spark

Hadoop

Docker

Kubernetes

AWS

GCP

FastAPI

React

Next.js

TypeScript

CUDA

MLflow

PostgreSQL

TensorRT

Redis

Kafka

Hugging Face

Airflow

GitHub

Bash / Shell

NLTK

Streamlit

DuckDB

LightGBM

XGBoost

Optuna

PyTorch Geometric

PyTorch

TensorFlow

Scikit-Learn

Pandas

NumPy

Python

SQL

Spark

Hadoop

Docker

Kubernetes

AWS

GCP

FastAPI

React

Next.js

TypeScript

CUDA

MLflow

PostgreSQL

TensorRT

Redis

Kafka

Hugging Face

Airflow

GitHub

Bash / Shell

NLTK

Streamlit

DuckDB

LightGBM

XGBoost

Optuna

PyTorch Geometric

Engineering Track

Experience & Education

A track record of engineering impact across institutions.

Professional Engineering

M/S Sanjog TradingGuwahati, India

Jul 2020 to Nov 2021

End-to-End ETL Infrastructure

Architected Python ETL pipelines using Pandas & NumPy, eliminating manual processing bottlenecks and establishing reliable data infrastructure for downstream ML.

Predictive Forecasting System

Developed and validated statistical time-series forecasting models for daily/monthly sales trends, enabling algorithmic inventory planning and supply chain optimization.

Real-Time BI Dashboard

Engineered interactive Streamlit dashboards to visualize sales KPIs, seasonal trends, and performance metrics, enabling live data-driven pricing strategies.

IIT GuwahatiGuwahati, India

Jul 2022 to Aug 2022

Relational Database Architecture

Designed normalized MySQL schemas with optimized indexing for ACID-compliant data integrity across 15+ frontend modules.

Constraint-Satisfaction Optimization

Conceptualized a heuristic constraint-satisfaction algorithm for automated timetable generation, foundational to operations research and resource-allocation ML.

Evidence-Based UX Research

Conducted structured quantitative user research across multiple institutions and performed competitive feature analysis of 4 EdTech platforms.

Airports Authority of IndiaNER Regional HQ, India

Jul 2023 to Aug 2023

Enterprise Asset Analytics

Analysed operational data for 1,053+ IT assets through an enterprise Asset Management System, tracking deployment status, warranty lifecycles, and maintenance history.

SAP ERP Data Workflows

Documented cross-functional SAP ERP workflows covering HR, Finance, and Procurement, gaining hands-on exposure to enterprise ETL architecture and data governance.

Network Infrastructure Mapping

Mapped enterprise network infrastructure including MPLS/ILL load balancing, core switching, and firewall architectures, critical for production MLOps deployment.

Academic Foundations

Sep 2024 to Nov 2025

MSc Data Science and Analytics

University of Leeds, UK

Specialized in advanced machine learning, predictive modeling, data mining, and big data architecture. Developed expertise in end-to-end data pipelines, real-time analytics, and MLOps principles.

Aug 2020 to Jun 2024

BTech Computer Science and Engineering

Assam Don Bosco University, India

Solid foundation in software engineering, algorithms, data structures, and database management. Led projects integrating classical software design with early predictive modeling applications.

Key Capabilities

Agentic AI Workflows

MLOps & ResOps

High-Performance Computing

Quantitative Research

End-to-End ML Pipelines

LLM Inference Optimization

Writing

Published Articles

Exploring algorithms, trends, and the intersection of technology, AI research, and society.

AI History

Deep Learning Lab

A dependency-free Mathematical Engine built in TypeScript. Experiment with hyperparameters, inject live training noise, and compare train/test behavior in real-time.

Let's build something
exceptional together.

I'm currently seeking new full-time opportunities. If you have an open role or just want to connect, my inbox is always open.

Designed & built by Bhargav Kumar Nath

Bhargav Kumar Nath.Data Scientist & ML Engineer.

Researcher who shipsproduction systems.

Escape the State-of-the-Art Trap

Algorithms Fade, Data is Foundational

Deployment is the Starting Line

Data Science & ML Projects

PageForge

Andria Systems

EMPAS

PricePoint Dynamics

FinSight-Alpha

Dynamic Experimentation Engine

Customer Intelligence Platform

Synthetic Intelligence

Fitness Tracker Analytics

Melting Point Prediction

MALLORN

Tools & Technologies

Experience & Education

MSc Data Science and Analytics

BTech Computer Science and Engineering

Published Articles

The Evolution of Artificial Intelligence: From Symbolic AI to Deep Learning

Beyond the Hill: The Modern Algorithm's Quest for Global Optima

ESG in the Age of AI: Why the Stakes Have Never Been Higher

Deep Learning Lab

Let's build somethingexceptional together.

Bhargav Kumar Nath.
Data Scientist & ML Engineer.

Researcher who ships
production systems.

Let's build something
exceptional together.