AI/ML Engineer | Agentic AI | Multi-Agent RAG | Applied Research

Software Engineer | Cloud & Distributed Systems | Full-Stack

Deepak Sai
Pendyala

I build AI systems that work in the real world. My focus is on agentic architectures, multi-agent orchestration, retrieval-augmented generation, and end-to-end ML platforms that are reliable, explainable, and ready for production.

Right now, I do applied AI research at NC State while building enterprise agentic platforms in industry.

I build scalable, high-performance systems across full-stack AI/ML platforms, microservices, distributed workflows, and cloud-native infrastructure. I care about clean code, reliability, and things that work under load.

My work spans backend systems, frontend platforms, DevOps, and data-heavy applications. I ship full-stack platforms at NC State and have built production systems in industry.

  • Autonomous multi-agent reasoning systems
  • Hybrid retrieval combining vector search, web data, and knowledge graphs
  • Automated AI evaluation for reliability and failure detection
  • Production ML systems with solid MLOps foundations
  • Clean service architectures that scale
  • Optimized databases and well-designed APIs
  • CI/CD pipelines and containerized deployments
  • Real-time data processing and monitoring
Multi-Agent Reasoning Hybrid Retrieval + KG Automated Evaluation MLOps at Scale
Full-Stack AI/ML Systems Microservices + APIs Cloud-Native Deployments Reliability + Performance
AI stickman illustration on chalkboard Software engineer stickman illustration

Currently Building

Advanced agentic AI platform for scientific labs, with citation-grounded reasoning and automated evaluation.

See my experience

Currently Shipping

Scalable full-stack AI/ML platforms with clean architectures, fast APIs, and cloud-native deployments.

See my experience
0+

Production AI Systems

0

Research Publications

0+

Community Members Led

Top 0

DeepLearning.AI Ambassador 2022

Experience

Applied research and production engineering across AI, analytics, and MLOps.

Full-stack systems engineering across research platforms, cloud infrastructure, and distributed workflows.

Graduate Research Assistant, Applied AI

STEPS Center, NC State University

  • Built a scalable on-prem agentic AI platform for scientific labs.
  • Integrated paper collection, document parsing, and knowledge graph ingestion into a single pipeline.
  • Deployed autonomous analysis agents for extraction, comparison, and visualization.

Local LLM deployments from 3B to 70B parameters across mixed hardware.

  • Automated AI evaluation with synthetic Q/A, LLM-judge scoring, and three-stage checks.
  • Built failure detection and rerouting for continuous reliability monitoring.

AI Engineer Intern

SproutsAI

  • Built an agentic analytics platform that turns natural language into chart-ready insights.
  • Multi-agent query planning, hybrid retrieval with Qdrant + Neo4j, and automated evaluation.
  • Kubernetes deployment with CI/CD and monitoring.

Impact: 60% accuracy gain, 40% latency reduction, 99.9% uptime, 2x engagement.

Research Assistant, Machine Learning

CAMAL Lab, NC State University

  • Built real-time ML monitoring for DARPA-funded additive manufacturing systems.
  • Integrated predictive models into production pipelines for quality control.
  • Developed anomaly detection with FastAPI, OpenCV, and GPU-optimized inference.

4.3x reduction in inference time, 70% improvement in accuracy and efficiency.

Applied Scientist Intern

Amazon

  • Developed ML and GenAI automation for finance and tax workflows.
  • Built a multilingual commodity code classifier using transformer models.
  • Created event-driven MLOps pipelines with automated retraining loops.

60% manual review reduction, 90% effort decrease, 80% scalability improvement.

GRA, Full-Stack Platform Engineering

STEPS Center, NC State University

  • Designed and managed a full-stack Django + React AI platform for research workflows and demos.
  • Implemented RBAC authentication, audit logging, and optimized REST APIs for high-traffic usage.
  • Built responsive UI components for research visualization.
  • Redesigned databases with normalized schemas, indexing, and query tuning.

~40% improvement in API latency and frontend responsiveness, plus higher stability.

AI Engineer Intern (Full-Stack & Cloud)

SproutsAI

  • Built a full-stack analytics platform using React, FastAPI, and MongoDB.
  • Containerized microservices, parallelized query execution, and added caching layers.
  • Deployed on AWS and GCP Kubernetes clusters with GitHub Actions CI/CD.
  • Integrated monitoring and evaluation pipelines for reliability.

3x faster queries, 40% latency reduction, 99.9% uptime.

Research Assistant, Real-Time Systems

CAMAL Lab, NC State University

  • Built a FastAPI real-time anomaly detection platform for manufacturing workflows.
  • Implemented live video processing with OpenCV and GPU-optimized inference.
  • Added socket-based alerts, async processing, and monitoring dashboards.

4.3x reduction in inference latency and earlier fault detection.

Applied Scientist Intern (Systems + MLOps)

Amazon

  • Engineered event-driven ML workflows with SageMaker, Lambda, and Step Functions.
  • Built fault-tolerant async inference with retries and clean service boundaries.
  • Integrated microservices with ECS, API Gateway, and CloudWatch.
  • Defined REST API contracts and connected frontend systems to ML backends.

90% reduction in manual review, 80% scalability gain, ~9 minutes faster per job.

Core Expertise

Deep focus on agentic AI, RAG, and scalable ML systems.

Full-stack engineering with rigor around reliability, performance, and maintainability.

Agentic AI & Multi-Agent Systems

LangGraph orchestration, modular agent roles, shared state models, and self-healing pipelines.

Retrieval-Augmented Generation

Hybrid retrieval, RAG-Fusion, query expansion, multi-hop retrieval, citation-backed generation.

LLM Engineering

Fine-tuning large language models, multilingual NLP systems, prompt engineering, agent workflows.

Machine Learning

Deep learning with PyTorch and TensorFlow, time-series forecasting, computer vision, optimization.

MLOps & AI Infrastructure

Automated retraining, event-driven workflows, CI/CD for ML systems, scalable inference on Kubernetes.

Data & Knowledge Systems

Vector databases, knowledge graphs, structured and unstructured data ingestion at scale.

Full-Stack Development

Django + React platforms, FastAPI microservices, REST API design, contract testing, responsive UI.

Backend & Distributed Systems

Microservices architecture, async I/O, parallel execution, caching, query optimization, event-driven workflows.

Cloud & DevOps

Dockerized applications, Kubernetes on AWS/GCP, GitHub Actions CI/CD, monitoring and logging.

Data & Infrastructure

PostgreSQL, MySQL, MongoDB, DynamoDB, Redis, schema design, migrations, index tuning.

Reliability & Engineering Practices

Unit and integration testing, error handling and retries, observability dashboards, clean architecture.

Publications

RL-CURATE-KG: Multi-Agent RL for Scalable KG Curation (IEEE Big Data 2025).

Generative Transformers and Text Generation Models (IET Generative AI Unleashed 2025).

Leadership

Founder and Lead, Intel IoT Club (2,000+ members, 10+ national AI trainings).

Top 10 Global DeepLearning.AI Ambassador (2022).

Top 0.05% Amazon ML Summer School (converted to Applied Scientist Intern).

Certifications

Intel Edge AI Developer.

AWS Machine Learning Foundations.

Google IT Support Specialization.

Tech Stack

Languages: Python, C, C++, JavaScript, Bash, MATLAB, Embedded C, Assembly.

Frameworks: React, Node.js, Django, FastAPI, Flask, Streamlit.

Cloud & DevOps: AWS, GCP, Docker, Kubernetes, GitHub Actions CI/CD.

Data Systems: PostgreSQL, MySQL, MongoDB, DynamoDB, Redis.

Leadership

Founder and Lead, Intel IoT Club (2,000+ members, 10+ national hackathons and trainings).

Top 10 Global DeepLearning.AI Ambassador (2022).

Top 0.05% Amazon ML Summer School (converted to internship).

Engineering Philosophy

I focus on clean, maintainable architectures, measurable performance improvements, reliability over unnecessary complexity, and strong automation and testing.

I like building systems that scale, stay stable under load, and are easy for teams to extend.

Flagship Projects

Selected AI/ML systems that combine research depth with production impact.

Selected software systems focused on clean architecture, performance, and reliability.

Python FastAPI LangGraph Streamlit Qdrant Neo4j
Agentic AI Platform

ActiveRAG Next

A production-grade, explainable multi-agent RAG platform. Built with LangGraph orchestration across coordinator, retrieval, reasoning, validation, and feedback agents. Features hybrid retrieval with RAG-Fusion and web augmentation, citation-backed responses with confidence-triggered reruns, and a real-time Streamlit UI with traceable agent execution.

PyTorch LSTM Genetic Algorithms Optimization
Forecasting & Optimization

Predictive Power Price Tagging

A hybrid optimization and forecasting system for energy markets. Uses LSTM deep learning for trajectory forecasting, genetic algorithms for resource allocation and pricing strategies, and Economic Load Dispatch optimization for market clearing price predictions.

Django React PostgreSQL REST API CI/CD
Full-Stack Platform

Research Platform (Django + React)

Production-ready web platform supporting research demos and workflows. Secure authentication with RBAC and audit logging, optimized REST APIs and responsive UI, database schema redesign with migrations and performance tuning, CI/CD automation and rapid server portability.

FastAPI Pydantic Streamlit Async I/O
Microservice Architecture

ActiveRAG Next (System Design)

AI-focused system showcasing strong SDE design and orchestration. Modular microservice-like agents with shared state via Pydantic models, async I/O communication with FastAPI backend and Streamlit frontend, emphasis on clean interfaces, scalability, and observability.

Python Discord.py Spotify API GCP CI/CD
Distributed System

Enigma: Discord Music Platform

Scalable music streaming and recommendation platform. Multi-source streaming with Spotify-integrated search, queue management, recommendation engine over 24K+ tracks with playlist curation. Cloud deployment on GCP with CI/CD, logging, and monitoring.

AWS Step Functions Lambda SageMaker CloudWatch
Cloud MLOps

Event-Driven ML Orchestration

Reusable cloud workflows for automated ML operations. Step Functions orchestration with Lambda triggers and SageMaker jobs, fault tolerance with retries and async execution, operational monitoring and clean service boundaries.

Get In Touch

Have a project, research idea, or opportunity? I would love to talk.

Contact illustration

Location

Raleigh, NC, USA

Social

Availability

Open to research collaborations, AI product work, and internships.

Book a Call