About Me

I’m a Senior Software Engineer at o9 Solutions, building production GenAI systems - agents that can use tools, pull the right context through RAG, and maintain memory across long-running conversations.

What I enjoy most is the part that turns demos into reliable products: shaping clean APIs, managing context and memory without blowing up latency or cost, and making “non-deterministic” behavior something you can actually measure, debug, and operate in the real world.

Before this, I worked on large-scale AutoML for time-series forecasting, helping build pipelines and services that supported thousands of retail locations. Earlier at o9, I also spent time on data and platform foundations - PySpark/Hadoop workflows, reliability and monitoring, CI/CD, and improving the day-to-day data science experience. In grad school, I worked on interesting research problems in computer vision and federated learning.


Skills

Legend: Skills marked with * are skills I haven't actively worked with in a while.

Backend

Python
SQL
C#*
Java*

ML & GenAI

LLMs & GenAI Systems
Machine Learning
Langchain / Langgraph
RAG and Vector Databases
Time Series Forecasting
TensorFlow / PyTorch*
Computer Vision
NLP

Tools & Infra

PySpark*
Git
Docker
PostgreSQL
MongoDB
Redis
Kubernetes / ArgoCD
Hadoop*
Databricks
Splunk
Jupyter / JupyterHub
Jenkins
Airflow
Linux
Tableau*

Frontend

TypeScript*
HTML*
CSS*

Professional Timeline

GenAI Agentic Workflow

Senior Software Engineer, GenAI

o9 Solutions

April 2025 - Present

Building production-grade GenAI systems with agentic workflows, retrieval-augmented generation, and long-running conversation support.

AutoML Time Series Forecasting

Software Engineer II, Machine Learning

o9 Solutions

April 2022 - April 2025

Built large-scale AutoML pipelines for time series forecasting using Python, PySpark, TensorFlow, and Kubernetes.

Software Engineer, Machine Learning

o9 Solutions

August 2019 - April 2022

Worked on scalability, optimization, and monitoring for a large retail forecasting system deployed across 10,000 stores.

Cursor Tracking System

Graduate Research Assistant

University at Buffalo

September 2018 - May 2019

Cursor detection and tracking in Adobe Photoshop screen-recording videos (with Adobe Research).

Distributed Learning

Student Research Assistant

University at Buffalo

September 2017 - November 2018

Research with Prof. Haimonti Dutta across distributed machine learning and NLP.

Object Tracking System

Applied Machine Learning Intern

Clarifai

May 2018 - August 2018

Built hybrid tracking approaches and analyzed depth sensor noise characteristics.

Enterprise Chatbot

Software Engineer

Fidelity Investments

August 2016 - June 2017

Built an enterprise customer-service chatbot and NLP analytics tools in the pre-LLM era.

Executive Graduate Trainee

Fidelity Investments

June 2015 - July 2016

Analytics, KPI reporting, and Tableau-based dashboards for distributed teams.


Education

Summer Research Intern (Computer Vision)

Indian Institute of Science

May 2014 - July 2014

Classical CV feature extraction + supervised learning for offline OCR (Dr. Rathna G. N.).

Master of Science (MS), Computer Science

University at Buffalo

2017 - 2019

Focused on computer vision, distributed machine learning, and NLP through research and coursework.

Bachelor of Technology (B.Tech.), Electrical and Electronics Engineering

National Institute of Technology Karnataka

2011 - 2015

Undergraduate training with foundations in math, signals, and systems.


Selected Publications

Mouse Cursor Detection and Tracking in Instructional Videos

Nataraj, N., Zhou, C., & Yuan, J. (2019). Technical Report, University at Buffalo. [Submitted to WACV 2019]

📄 Download PDF

Cursor Detection and Tracking

Abstract: Many expert users of particular software, e.g. Adobe Photoshop regularly post instructional videos online, imparting knowledge on how to perform certain tasks. An important feature that can be used to understand the instructor's actions in an instructional video, is to detect and track the mouse cursor throughout the entirety of each video. Despite recent progress of object detection and tracking, identifying the mouse cursor in such videos is a unique and difficult problem since the mouse cursor typically occupies a very small percentage of the entire frame (0.05-1%), exhibits fast movement, and is prone to instant appearance changes and background clutter. We propose a novel three-step tracking-by-detection approach for mouse cursor detection and tracking: unsupervised cursor discovery, multi-scale template matching, and optimal spatiotemporal path search. Our approach is completely unsupervised and is able to handle instant appearance changes and fast movements of the mouse cursor. We present evaluations on a dataset of annotated Adobe Photoshop instructional videos, and show that our method beats conventional online tracking methods such as TLD, MIL and CSRT trackers by a large margin. For a more fair comparison, we also compare our results with Faster-RCNN, a deep learning based object detector, and show comparable success rates.


Consensus Based Vertically Partitioned Multi-layer Perceptrons for Edge Computing

Dutta, H., Mahindre, S., & Nataraj, N. (2021). Proceedings of the 20th IEEE International Conference on Machine Learning and Applications (ICMLA).

📄 View Paper

Consensus Based Vertically Partitioned MLPs

Abstract: Storing large volumes of data on distributed devices has become commonplace in recent years. Applications involving sensors capture data in different modalities including image, video, audio, GPS and others. Novel distributed algorithms are required to learn from this rich, multi-modal data. We present an algorithm for learning consensus based multi-layer perceptrons on resource-constrained devices. Assuming nodes (devices) in the distributed system are arranged in a graph and contain vertically partitioned data and labels, the goal is to learn a global function that minimizes the loss. Each node learns a feed-forward multi-layer perceptron and obtains a loss on data stored locally. It then gossips with a neighbor, chosen uniformly at random, and exchanges information about the loss. The updated loss is used to run a back propagation algorithm and adjust local weights appropriately. This method enables nodes to learn the global function without exchange of data in the network. Empirical results reveal that the consensus algorithm converges to the centralized model and has performance comparable to centralized multi-layer perceptrons and tree-based algorithms.


Consensus Based Multi-Layer Perceptrons for Edge Computing

Dutta, H., Nataraj, N., & Mahindre, S. (2021). arXiv preprint.

📄 View on arXiv

Consensus Based MLPs for Edge Computing

Abstract: In recent years, storing large volumes of data on distributed devices has become commonplace. Applications involving sensors capture data in different modalities including image, video, audio, GPS and others. Novel algorithms are required to learn from this rich distributed data. We present consensus based multi-layer perceptrons for resource-constrained devices. Assuming nodes (devices) in the distributed system are arranged in a graph and contain vertically partitioned data, the goal is to learn a global function that minimizes the loss. Each node learns a feed-forward multi-layer perceptron and obtains a loss on data stored locally. It then gossips with a neighbor, chosen uniformly at random, and exchanges information about the loss. The updated loss is used to run a back propagation algorithm and adjust weights appropriately. This method enables nodes to learn the global function without exchange of data in the network.


GADGET SVM: A Gossip-bAseD sub-GradiEnT Solver for Linear SVMs

Dutta, H., & Nataraj, N. (2018). arXiv preprint.

📄 View on arXiv

Federated Learning System

Abstract: In the era of big data, an important weapon in a machine learning researcher's arsenal is a scalable Support Vector Machine (SVM) algorithm. SVMs are extensively used for solving classification problems. Traditional algorithms for learning SVMs often scale super linearly with training set size which becomes infeasible very quickly for large data sets. In recent years, scalable algorithms have been designed which study the primal or dual formulations of the problem. This often suggests a way to decompose the problem and facilitate development of distributed algorithms. We present a distributed algorithm for learning linear Support Vector Machines in the primal form for binary classification called Gossip-bAseD sub-GradiEnT (GADGET) SVM. The algorithm is designed such that it can be executed locally on nodes of a distributed system. Each node processes its local homogeneously partitioned data and learns a primal SVM model. It then gossips with random neighbors about the classifier learnt and uses this information to update the model. Extensive theoretical and empirical results suggest that this anytime algorithm has performance comparable to its centralized and online counterparts.