Professional Summary

Site Reliability Engineer with 6+ years of expertise in building and maintaining resilient, high-availability systems for enterprise-scale platforms. Currently ensuring 99.99% uptime for Cisco Webex Meetings, managing 50+ Kubernetes clusters across global data centers with focus on incident response, observability, and deployment automation. Reduced MTTD by 40% and MTTR by 35% through proactive monitoring and automation. Skilled in CI/CD orchestration, infrastructure as code, and cloud platforms (AWS, Azure). Passionate about GenAI and AIOps-driven innovation.

Technical Skills

☁️ Cloud Platforms

AWS (EC2, EKS, VPC, IAM, ECR), Azure (DevOps, AKS, VMs, Azure OpenAI)

🐳 Containers & Orchestration

Docker, Kubernetes, ArgoCD, Helm

🚀 CI/CD

Jenkins, GitLab CI/CD, GitHub Actions

🏗️ Infrastructure as Code

Terraform, Ansible

📊 Observability & Monitoring

Prometheus, Grafana, ELK/EFK Stack, AppDynamics, ThousandEyes

💻 Programming & Scripting

Python, Bash, Shell Scripting, Java

🐧 Operating Systems

Linux (RHEL, Ubuntu, CentOS), System Administration

🔧 Tools & Services

Git, GitHub, GitLab, Apache Kafka

Professional Experience

Site Reliability Engineer

Cisco Systems
📍 Bangalore, India | 2023 - Present
  • Maintain 99.99% uptime for Cisco Webex Meetings platform, managing 50+ Kubernetes clusters across 8 global data centers serving 100M+ users
  • Automate CI/CD workflows using Jenkins, GitLab CI/CD, GitHub Actions, and ArgoCD, reducing deployment time by 40%
  • Design and implement 25+ Grafana dashboards tracking service availability, error rates, latency percentiles, and capacity metrics
  • Leverage AppDynamics for end-to-end transaction tracing; utilize ThousandEyes to monitor global endpoint reachability across 15+ regions
  • Reduce MTTD by 40% and MTTR by 35% through proactive alerting and automated runbook execution
  • Develop 20+ Python and Bash automation scripts saving 15+ hours weekly in manual operations
  • Conduct root cause analysis (RCA) for 100+ production incidents, implementing preventive measures reducing repeat incidents by 60%
  • Built and published "Copilot Chat History Search" VS Code extension on VS Code Marketplace
  • Lead AIOps initiatives integrating ML models for anomaly detection and intelligent alerting

Software Engineer

Torry Harris Integration Solutions
📍 Bangalore, India | Sep 2021 - 2023
  • Developed 10+ enterprise Java applications leveraging AWS SDK for cloud-native integrations
  • Built and maintained CI/CD pipelines using Jenkins and Docker, achieving 95% deployment success rate
  • Designed and deployed production-grade Kafka clusters processing 500K+ messages daily on Kubernetes
  • Built intelligent PDF Q&A chatbot using LangChain and LLM models with 85% query accuracy
  • Integrated Hugging Face models for NLP tasks, reducing document processing time by 50%
  • Collaborated with cross-functional teams to optimize deployment workflows, reducing release cycles by 30%

Associate Software Engineer

Torry Harris Integration Solutions
📍 Bangalore, India | Aug 2019 - Sep 2021
  • Provisioned and managed Azure cloud infrastructure using Terraform, reducing provisioning time by 60%
  • Containerized 15+ Kafka applications using Docker for consistent deployment across environments
  • Implemented configuration management solutions standardizing infrastructure across 3 environments
  • Developed 10+ RPA automation workflows using UiPath, automating 200+ hours of manual tasks monthly
  • Participated in agile development cycles with 95% sprint completion rate

Key Projects

☁️ Cloud Instance Manager (CIM)

Enterprise AWS EC2 instance management platform with Spring Boot backend. Features JWT authentication, role-based access control, and multi-region AWS resource management including instance lifecycle, security groups, and EBS volumes.

Java Spring Boot AWS SDK MySQL JWT REST API

🔍 Copilot Chat History Search (VS Code Extension)

Developed TypeScript-based VS Code extension enabling local search and navigation of GitHub Copilot chat conversations. Published on VS Code Marketplace.

TypeScript VS Code API Node.js

🤖 PDF Q&A Chatbot (AI/ML Application)

Built intelligent conversational AI chatbot using LangChain framework and LLM models for PDF document analysis. Features document chunking, embedding generation, and vector store for semantic search with context-aware responses.

Python LangChain LLM Models Hugging Face Vector DB Streamlit

📊 SLO Dashboards & Observability Platform

Designed and implemented SLO/SLI dashboards in Grafana for tracking service reliability metrics. Includes error budget burn-rate alerts, availability tracking, and Prometheus integration with custom PromQL queries.

Grafana Prometheus PromQL AlertManager

Education

Bachelor of Engineering - Computer Science

Velammal Engineering College, Anna University
📍 Chennai, India | Jun 2015 – May 2019
GPA: 74%

High School

Adhiyaman Matric H.R. Secondary School
📍 Uthangarai, India | Jun 2014 – Apr 2015
GPA: 94%

Interests & Future Goals

Open Source Contributions GenAI & LLMs AIOps Cloud-Native Technologies Technical Writing Model Fine-tuning