Bhargav Kumar Nath.
Engineer at heart.
Data Scientist | Machine Learning Engineer crafting intelligent, scalable systems to solve complex problems.
A bit about me.
I'm a recent graduate with a deep passion for unlocking value from data. My foundation bridges Machine Learning and Software Engineering, allowing me to not just build models, but to deploy and scale them reliably.
While I'm at the beginning of my professional journey, I've spent my academic career and personal time diving deep into end-to-end ML pipelines, big data technologies like Spark, and modern web frameworks to bring data to life.
Goals & Interests
- •Building scalable prediction systems
- •Natural Language Processing & LLMs
- •Bridging the gap between data science research and production engineering
Projects that ship
Demonstrating end-to-end expertise uniting Machine Learning with resilient software engineering.
Automated Mixed Precision Quantization in LLMs
Edge LLMs face memory bottlenecks because uniform quantization degrades accuracy.
Automated mixed-precision architecture search using genetic algorithms.
Engineered an NSGA-II optimizer in PyTorch coupled with a zero-cost proxy.
Slashed search from days to minutes and reduced TinyLlama VRAM usage by 40%.
109.9M Events Analyzed on Commodity Hardware
Processing 100M+ event logs typically requires expensive cloud data warehouses.
Built an end-to-end local analytical engine to predict purchase probabilities.
Used DuckDB and Polars for in-memory processing to train a LightGBM classifier.
Reduced 109M events to 1.9GB, powering models that drove a 4.5x conversion lift.
A/B Testing & Unified Uplift Modeling
Standard A/B tests optimize vanity metrics and miss true incremental value.
Designed a Causal Inference pipeline to target highly persuadable user segments.
Combined X-Learners and Thompson Sampling, distilled into a fast decision tree.
Achieved sub-millisecond inference latency and turned ad spend into net profit.
UK Supermarket Competitive Intelligence
Fuzzy matching fails to track competitive UK supermarket pricing dynamics.
Engineered semantic vector matching to reliably identify identical products.
Used Sentence-BERT, FAISS, and LightGBM integrated via strict data contracts.
Expanded match rate to 67,000+ products and forecasted prices to within £0.14.
Generative Pipeline for Data Scarcity
Classical oversampling algorithms generate noise in complex tabular datasets.
Developed a model-driven rejection sampling pipeline to create synthetic data.
Used PyTorch autoencoders and AutoML to guarantee strict manifold alignment.
Scaled generation linearly and significantly outperformed SMOTE AUPRC baselines.
Production-ready Analytics Platform
Generating insights from noisy fitness sensors requires resilient processing.
Architected a decoupled system separating batch ETL pipelines from ML inference.
Processed data lakes using PySpark and built structured Scikit-Learn pipelines.
Deployed a zero-latency interactive dashboard for algorithmic user clustering.
GNN Fusion Architecture
Experimental screening for thermodynamic material properties is slow and costly.
Engineered a hybrid neural network blending molecular graphs with descriptors.
Fused PyTorch Geometric with RDKit features and fine-tuned LightGBM estimators.
Delivered sub-50ms inference latency and reduced the mean absolute error by 20%.
Multi-Channel Rare Transient Detection
Identifying rare astronomical events is hampered by extreme data sparsity.
Replaced brittle neural networks with automated statistical feature extraction.
Leveraged LightGBM and tsfresh to process irregular multi-band optical signals.
Distilled complex inputs into 198 optimal features to maximize the F1 score.
Writing
Exploring algorithms, trends, and the intersection of technology and society.
Deep Learning Lab
A dependency-free Mathematical Engine built in TypeScript. Experiment with hyperparameters, inject live training noise, and compare train/test behavior in real-time.
Tools & Technologies
The stack I use to explore data and engineer solutions.
My journey so far
Data Analyst Intern
- Audited lifecycle data for over 1,053 IT assets in SAP ERP, looking at failure patterns and maintenance logs to flag areas of unplanned downtime across 137 airport management units.
- Compared GeM digital procurement workflows against legacy processes and found roughly 15–20% in administrative overhead that could be cut — findings that fed directly into vendor selection decisions.
- Cleaned and validated 19,000+ employee records across 8 departments ahead of a SAP migration, building checks that caught inconsistencies before they could cause issues at go-live.
Software Development Intern
- Built a scheduling engine that automatically detected conflicts across 500+ weekly constraints, saving faculty coordinators significant time they'd previously spent resolving timetable clashes by hand.
- Gathered feedback from 50+ teachers and reviewed 4 competitor platforms to understand real pain points, then put together a prioritized feature list that the dev team actually used to guide their roadmap.
Junior Data Analyst
- Wrote Python ETL pipelines to automate data ingestion and transformation, cutting manual preprocessing time by around 40% and keeping the data clean enough to reliably feed into ML models.
- Built time-series forecasting models that blended statistical and ML approaches to help the team make better daily inventory decisions.
- Tracked down and fixed silent data anomalies caused by edge-case transaction logs, resolving bugs before they could skew downstream reporting and dashboards.
"A model is a mathematical fantasy, but an ML system is a living entity. I design for the shifting reality of the human world, not the static perfection of a laboratory."
We Must Escape the State of the Art Trap
Leaderboard victories rarely survive reality. I start with the simplest model to establish an honest baseline and prove if building ML is even necessary.
Algorithms Fade but Data is Foundational
Architectures change but long term success depends on data quality. Real world data is noisy and evolving so inflexible systems quickly become obsolete.
Deployment is the Starting Line
Standard software fails loudly but ML systems fail silently via confident incorrect predictions. Production models need continuous monitoring to stay reliable.