Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hi, I’m Shahzad Khawar, a Site Reliability Engineer (SRE) and DevOps professional with 6+ years of experience designing, automating, and maintaining highly available, multi-cloud infrastructure. I’m passionate about operational excellence, scalability, and reliability across hybrid and multi-cloud environments. I enjoy bridging gaps between development, operations, and business teams, mentoring juniors, and building robust monitoring, incident response, and automation solutions that prevent problems before they occur.…Hi, I’m Shahzad Khawar, a Site Reliability Engineer (SRE) and DevOps professional with 6+ years of experience designing, automating, and maintaining highly available, multi-cloud infrastructure. I’m passionate about operational excellence, scalability, and reliability across hybrid and multi-cloud environments. I enjoy bridging gaps between development, operations, and business teams, mentoring juniors, and building robust monitoring, incident response, and automation solutions that prevent problems before they occur.

Shahzad Khawar

DevOps Developer, Cloud Developer, Full Stack Developer, +2





Hi, I’m Shahzad Khawar, a Site Reliability Engineer (SRE) and DevOps professional with 6+ years of experience designing, automating, and maintaining highly available, multi-cloud infrastructure. I’m passionate about operational excellence, scalability, and reliability across hybrid and multi-cloud environments. I enjoy bridging gaps between development, operations, and business teams, mentoring juniors, and building robust monitoring, incident response, and automation solutions that prevent problems before they occur.…Hi, I’m Shahzad Khawar, a Site Reliability Engineer (SRE) and DevOps professional with 6+ years of experience designing, automating, and maintaining highly available, multi-cloud infrastructure. I’m passionate about operational excellence, scalability, and reliability across hybrid and multi-cloud environments. I enjoy bridging gaps between development, operations, and business teams, mentoring juniors, and building robust monitoring, incident response, and automation solutions that prevent problems before they occur.

Available to hire

Hi, I’m Shahzad Khawar, a Site Reliability Engineer (SRE) and DevOps professional with 6+ years of experience designing, automating, and maintaining highly available, multi-cloud infrastructure. I’m passionate about operational excellence, scalability, and reliability across hybrid and multi-cloud environments.

I enjoy bridging gaps between development, operations, and business teams, mentoring juniors, and building robust monitoring, incident response, and automation solutions that prevent problems before they occur.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Work Experience

Site Reliability Engineer / DevOps / Cloud Automation at IBM

June 1, 2023 - Present

Mentored junior engineers on best practices in CI/CD, IaC (Terraform), and SRE principles; improved team capability and code quality. Bridged communication gaps between development, operations, and business teams to improve deployment velocity. Implemented comprehensive monitoring and alerting with Grafana to proactively detect and mitigate performance bottlenecks. Led end-to-end CI/CD pipelines with Jenkins, automated builds/tests/deployments, and established GitOps workflows with GitLab. Automated configuration management and deployments across hundreds of nodes using Ansible. Deployed and managed Kubernetes clusters, optimizing resource allocation and deployment strategies. Wrote robust Python and Bash automation scripts for daily operations. Drove high-availability across multi-cloud production environments. Led Incident Management with structured root cause analyses and post-mortems reducing MTTR. Implemented a proactive monitoring and alerting solution using Grafana. Established

SRE, DevOps, & Cloud Automation at IBM

June 1, 2023 - Present

Actively mentoring junior engineers on best practices in CI/CD, IaC (Terraform), and SRE principles, significantly improving team capability and code quality. Drove SRE principles and practices to maintain high availability across multi-cloud production environments. Established end-to-end incident management, structured root cause analysis, and post-mortems to reduce MTTR. Implemented comprehensive monitoring and alerting using Grafana to proactively detect and mitigate performance bottlenecks. Managed end-to-end CI/CD pipelines with Jenkins, automated build/test/deploy cycles for speed and reliability. Configured GitOps workflows with GitLab for source-of-truth and immutability, and maintained core cloud resources with Terraform for AWS, GCP, and Azure. Automated configuration management across hundreds of nodes using Ansible, boosting infrastructure consistency and scaling capabilities. Deployed, managed, and troubleshot Kubernetes clusters, optimized resource allocation and deploym

Linux Systems & Infrastructure at AT&T

April 1, 2019 - June 30, 2023

Administered critical enterprise systems on RHEL and Ubuntu with a focus on security and compliance. Managed centralized patching, provisioning, and content management for RHEL environments using Red Hat Satellite. Configured and managed AWS services (EC2, S3, VPC) with proper security and tagging. Codified initial infrastructure deployments in AWS using Terraform and maintained version control for core cloud resources. Supported physical bare-metal servers and VMware-based virtual infrastructure, ensuring hybrid environment stability. Wrote Bash scripts to automate routine administration, health checks, and data backups; performed in-depth performance analysis to optimize resources. Implemented structured post-mortems and runbooks to retain institutional knowledge and enable repeatable success.

SRE, DevOps, & Cloud Automation at AT&T

April 1, 2019 - June 1, 2023

Led incident management processes, bridged communication gaps between development, operations, and business teams, and implemented proactive monitoring using Grafana. Built and maintained CI/CD pipelines with Jenkins, implemented IaC with Terraform, and used Ansible for configuration management. Managed Kubernetes clusters and automated deployment pipelines for speed and reliability.

Linux Systems & Infrastructure at AT&T

April 1, 2015 - April 1, 2019

Early career role focusing on Linux systems administration and infrastructure management. Administered RHEL/Ubuntu systems, implemented patching strategies, and supported cloud and on-prem resources. Contributed to automation and scripting efforts to streamline operational tasks and improve reliability.