
Principal Machine Learning Engineer with 14 years of experience in architecting enterprise-scale AI platforms and distributed infrastructure. Delivered secure execution environments and GenAI orchestration frameworks that enhance critical business workflows. Focused on driving adoption of LLM-based systems and MLOps practices while mentoring engineers and addressing complex infrastructure challenges.
Agent-To-Agent Server Framework (A2A):
• Automated Agent Orchestration: Designed and implemented end-to-end A2A workflows that orchestrate agent task execution, message handling, and client-server interactions across local Python services and API endpoints.
Modular Integration Framework: Built a dynamic agent registration framework using AST-based scanning and custom decorators to automatically discover and register Python functions into the Agent server.
• Production Reliability & Security: Strengthened platform robustness through configuration-driven authentication controls, resilient client/server patterns, and standardized health/metrics observability to support secure, reliable operation of A2A endpoints.
• Resilience Engineering: Developed fault-tolerant middleware with retries, timeouts, and circuit-breaker controls to reduce cascading failures and improve uptime for distributed agent interactions.
• Observability & Diagnostics: Integrated tracing, structured logging, and metrics across client/server flows to improve visibility into A2A transactions and reduce troubleshooting time for production incidents.
• Test automation & delivery: Established Docker-based test execution and comprehensive pytest coverage for unit and integration scenarios, enhancing release confidence and reducing regression risk.
Model-Context Protocol (MCP):
• Automated Tool Orchestration: Designed and implemented automated workflows using FastMCP, enabling seamless chaining and invocation of local Python tools and remote API endpoints via MCP servers.
• Authorization & security: Designed and implemented a full authentication and authorization pipeline, including JWT validation, OAuth2 token acquisition, and group-based access control, ensuring secure tool-level permissions across all MCP endpoints.
• Production Reliability: Implemented per-tool circuit breakers, comprehensive error handling, and an SQLite-backed authorization cache to ensure high availability and low-latency access control under production load.
• Observability: Integrated OpenTelemetry distributed tracing, Prometheus metrics, and custom observability middleware to capture tool execution latency, error rates, and cross-system trace context for end-to-end debugging and monitoring.
Enterprise GenAI Orchestration Platform (IR Package) & Automated RAG Framework.
• Strategic AI Framework: Architected the Information Retrieval enterprise package to standardize LLM and vector database workflows, significantly reducing engineering overhead and accelerating GenAI deployment across business units.
• Unified AI Gateway: Integrated with the modular interface for seamless integration with multiple providers (AWS Bedrock, SageMaker, OpenAI) and vector stores (OpenSearch, FAISS), enabling low-configuration model invocation and semantic search.
• Advanced Data Pipelines: Developed configuration-driven workflows for custom embeddings, intelligent chunking, and document processing to streamline data ingestion and evaluation.
• Extensible Orchestration: Leveraged LangChain to build unified wrappers for vector representations, VDB connectors, and flexible retrieval chains, ensuring a highly extensible interface for complex AI workflows.
• AutoRAG Innovation: Designed and implemented an automated RAG system that executes parallel experiments to identify optimal retrieval and generation strategies using diverse performance metrics.
• Cloud-Native Deployment: Orchestrated containerized applications via Kubernetes (Kompass), managing end-to-end deployments across Dev, Test, and Prod with automated scaling and rollout policies.
• Technical leadership & adoption: Authored comprehensive API documentation and onboarding materials to promote consistent usage patterns across multidisciplinary teams.
Other Projects:
• Strategic Insights Delivery: Led the design and deployment of an enterprise-grade social intelligence platform aggregating data from Reddit, X, and LinkedIn to deliver real-time insights to executives.
• Standardized Feature Engineering: Co-developed a Python-based 'Model Ready Data' framework generating standardized features for 50+ models with metadata-driven lineage and Control-M scheduling.