Bhargav Kumar Nath.
Engineer at heart.

Available for full-time work

Data Scientist | Machine Learning Engineer crafting intelligent, scalable systems to solve complex problems.

bhargav@ml-cluster ~
$system-status --live
>
ALL SYSTEMS NOMINAL|GPU: 94% util|Uptime: 847d

A bit about me.

I'm a recent graduate with a deep passion for unlocking value from data. My foundation bridges Machine Learning and Software Engineering, allowing me to not just build models, but to deploy and scale them reliably.

While I'm at the beginning of my professional journey, I've spent my academic career and personal time diving deep into end-to-end ML pipelines, big data technologies like Spark, and modern web frameworks to bring data to life.

Goals & Interests

  • Building scalable prediction systems
  • Natural Language Processing & LLMs
  • Bridging the gap between data science research and production engineering

Projects that ship

Demonstrating end-to-end expertise uniting Machine Learning with resilient software engineering.

Evolutionary Mixed Precision Architecture Search

Automated Mixed Precision Quantization in LLMs

Edge LLMs face memory bottlenecks because uniform quantization degrades accuracy.

Automated mixed-precision architecture search using genetic algorithms.

Engineered an NSGA-II optimizer in PyTorch coupled with a zero-cost proxy.

Slashed search from days to minutes and reduced TinyLlama VRAM usage by 40%.

PythonStreamlitPyTorchCUDAGenetic AlgorithmsLLM QuantizationHessian Analysis

Customer Intelligence Platform

109.9M Events Analyzed on Commodity Hardware

Processing 100M+ event logs typically requires expensive cloud data warehouses.

Built an end-to-end local analytical engine to predict purchase probabilities.

Used DuckDB and Polars for in-memory processing to train a LightGBM classifier.

Reduced 109M events to 1.9GB, powering models that drove a 4.5x conversion lift.

DuckDBLightGBMBehavioral AnalyticsPropensity Modeling

Dynamic Experimentation Engine

A/B Testing & Unified Uplift Modeling

Standard A/B tests optimize vanity metrics and miss true incremental value.

Designed a Causal Inference pipeline to target highly persuadable user segments.

Combined X-Learners and Thompson Sampling, distilled into a fast decision tree.

Achieved sub-millisecond inference latency and turned ad spend into net profit.

CausalMLThompson SamplingBootstrappingKnowledge Distillation

PricePoint Dynamics

UK Supermarket Competitive Intelligence

Fuzzy matching fails to track competitive UK supermarket pricing dynamics.

Engineered semantic vector matching to reliably identify identical products.

Used Sentence-BERT, FAISS, and LightGBM integrated via strict data contracts.

Expanded match rate to 67,000+ products and forecasted prices to within £0.14.

9.5 million product listingsFAISSNLP product matchingLightGBMPanderaSHAPAnomaly detection

Synthetic Intelligence

Generative Pipeline for Data Scarcity

Classical oversampling algorithms generate noise in complex tabular datasets.

Developed a model-driven rejection sampling pipeline to create synthetic data.

Used PyTorch autoencoders and AutoML to guarantee strict manifold alignment.

Scaled generation linearly and significantly outperformed SMOTE AUPRC baselines.

PyTorchGenerative Modelingt-SNEPrivacy AI

Fitness Tracker Analytics

Production-ready Analytics Platform

Generating insights from noisy fitness sensors requires resilient processing.

Architected a decoupled system separating batch ETL pipelines from ML inference.

Processed data lakes using PySpark and built structured Scikit-Learn pipelines.

Deployed a zero-latency interactive dashboard for algorithmic user clustering.

Apache SparkScikit-LearnFFT/PCADockerRandom Forest

Melting Point Prediction

GNN Fusion Architecture

Experimental screening for thermodynamic material properties is slow and costly.

Engineered a hybrid neural network blending molecular graphs with descriptors.

Fused PyTorch Geometric with RDKit features and fine-tuned LightGBM estimators.

Delivered sub-50ms inference latency and reduced the mean absolute error by 20%.

PyTorch GeometricRDKitOptunaXGBoost

MALLORN

Multi-Channel Rare Transient Detection

Identifying rare astronomical events is hampered by extreme data sparsity.

Replaced brittle neural networks with automated statistical feature extraction.

Leveraged LightGBM and tsfresh to process irregular multi-band optical signals.

Distilled complex inputs into 198 optimal features to maximize the F1 score.

PyTorch (RNN/GRU)Signal ProcessingTime SeriesImbalanced Data

Writing

Exploring algorithms, trends, and the intersection of technology and society.

The Evolution of Artificial Intelligence: From Symbolic AI to Deep Learning

Read Article →
LeedsFINsights

Beyond the Hill: The Modern Algorithm’s Quest for Global Optima

Read Article →
LeedsFINsights

ESG in the Age of AI: Why the Stakes Have Never Been Higher

Read Article →
LeedsFINsights

Deep Learning Lab

A dependency-free Mathematical Engine built in TypeScript. Experiment with hyperparameters, inject live training noise, and compare train/test behavior in real-time.

Epoch0
Train / Test1.0000 / 1.0000
Train LossTest Loss
Data Volume100%

Layer 1
4
Layer 2
4

Activation Function
Learning Rate Strategy
Current LR: 0.03000
Base Learning Rate0.030
Training Noise Level0.00
L1 Regularization0
L2 Regularization0

Tools & Technologies

The stack I use to explore data and engineer solutions.

PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric
PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric
PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric
PyTorch
TensorFlow
Scikit-Learn
Pandas
NumPy
Python
R
SQL
Spark
Hadoop
Docker
Kubernetes
AWS
GCP
FastAPI
React
Next.js
TypeScript
CUDA
MLflow
PostgreSQL
TensorRT
Redis
Kafka
C
Hugging Face
Airflow
GitHub
Bash / Shell
NLTK
Streamlit
DuckDB
LightGBM
XGBoost
Optuna
PyTorch Geometric

My journey so far

Data Analyst Intern

Airports Authority of India, NER Regional HQ, IndiaJul 2023 — Aug 2023
  • Audited lifecycle data for over 1,053 IT assets in SAP ERP, looking at failure patterns and maintenance logs to flag areas of unplanned downtime across 137 airport management units.
  • Compared GeM digital procurement workflows against legacy processes and found roughly 15–20% in administrative overhead that could be cut — findings that fed directly into vendor selection decisions.
  • Cleaned and validated 19,000+ employee records across 8 departments ahead of a SAP migration, building checks that caught inconsistencies before they could cause issues at go-live.

Software Development Intern

Indian Institute of Technology, Guwahati, IndiaJul 2022 — Aug 2022
  • Built a scheduling engine that automatically detected conflicts across 500+ weekly constraints, saving faculty coordinators significant time they'd previously spent resolving timetable clashes by hand.
  • Gathered feedback from 50+ teachers and reviewed 4 competitor platforms to understand real pain points, then put together a prioritized feature list that the dev team actually used to guide their roadmap.

Junior Data Analyst

M/S Sanjog Trading, Guwahati, IndiaJul 2020 — Nov 2021
  • Wrote Python ETL pipelines to automate data ingestion and transformation, cutting manual preprocessing time by around 40% and keeping the data clean enough to reliably feed into ML models.
  • Built time-series forecasting models that blended statistical and ML approaches to help the team make better daily inventory decisions.
  • Tracked down and fixed silent data anomalies caused by edge-case transaction logs, resolving bugs before they could skew downstream reporting and dashboards.

"A model is a mathematical fantasy, but an ML system is a living entity. I design for the shifting reality of the human world, not the static perfection of a laboratory."

We Must Escape the State of the Art Trap

Leaderboard victories rarely survive reality. I start with the simplest model to establish an honest baseline and prove if building ML is even necessary.

Algorithms Fade but Data is Foundational

Architectures change but long term success depends on data quality. Real world data is noisy and evolving so inflexible systems quickly become obsolete.

Deployment is the Starting Line

Standard software fails loudly but ML systems fail silently via confident incorrect predictions. Production models need continuous monitoring to stay reliable.

Let's build something
exceptional together.

I'm currently seeking new full-time opportunities. If you have an open role or just want to connect, my inbox is always open.

Designed & built by Bhargav Kumar Nath