I'm a seasoned Site Reliability Engineer and DevOps professional with 10+ years of experience designing, automating, and supporting enterprise cloud platforms across AWS, Azure, GCP, and hybrid multi-cloud environments. I focus on cloud engineering, infrastructure automation, platform reliability, observability, incident response, and operational excellence to deliver scalable and resilient production systems. I thrive in cross-functional teams, mentoring engineers, creating reusable automation, and driving cost optimization and security in regulated environments. I enjoy partnering with engineering, security, product, and operations to deliver secure, scalable cloud solutions that align with business goals and performance targets.

Muhammad Rafi Austin

I'm a seasoned Site Reliability Engineer and DevOps professional with 10+ years of experience designing, automating, and supporting enterprise cloud platforms across AWS, Azure, GCP, and hybrid multi-cloud environments. I focus on cloud engineering, infrastructure automation, platform reliability, observability, incident response, and operational excellence to deliver scalable and resilient production systems. I thrive in cross-functional teams, mentoring engineers, creating reusable automation, and driving cost optimization and security in regulated environments. I enjoy partnering with engineering, security, product, and operations to deliver secure, scalable cloud solutions that align with business goals and performance targets.

Available to hire

I’m a seasoned Site Reliability Engineer and DevOps professional with 10+ years of experience designing, automating, and supporting enterprise cloud platforms across AWS, Azure, GCP, and hybrid multi-cloud environments. I focus on cloud engineering, infrastructure automation, platform reliability, observability, incident response, and operational excellence to deliver scalable and resilient production systems.

I thrive in cross-functional teams, mentoring engineers, creating reusable automation, and driving cost optimization and security in regulated environments. I enjoy partnering with engineering, security, product, and operations to deliver secure, scalable cloud solutions that align with business goals and performance targets.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert

Language

English
Fluent

Work Experience

Senior DevOps & Site Reliability Engineer at PracticeTek
February 1, 2023 - Present
Designed and managed highly available AWS and Azure cloud infrastructure supporting healthcare SaaS applications using Terraform and Pulumi for repeatable and scalable deployments. Built and administered Kubernetes platforms on Amazon EKS and Azure AKS, improving scalability and operational efficiency. Managed core cloud networking components (VPCs, subnets, NAT gateways, routing policies, Transit Gateway) to ensure secure connectivity. Implemented advanced observability using Prometheus, Grafana, Datadog, Azure Monitor, and CloudWatch to enhance service visibility and reduce alert fatigue. Led major incident response activities, coordinated recovery, performed root cause analysis, and delivered remediation plans to prevent recurrence. Automated TLS certificate lifecycle management and platform maintenance with Bash and PowerShell. Developed reusable Terraform modules and Git-based deployment workflows to standardize provisioning. Implemented CI/CD pipelines with GitHub Actions and Arg
Cloud Infrastructure Engineer at CSI
June 1, 2021 - January 31, 2023
Designed and supported secure Azure and Kubernetes infrastructure for enterprise transaction systems requiring high availability and strong compliance controls. Managed AKS and GKE clusters including upgrades, node scaling, networking configuration, workload troubleshooting and platform performance tuning. Built and maintained Terraform-based IaC across multiple environments. Developed CI/CD pipelines using Jenkins, Azure DevOps, and Bitbucket. Automated operational tasks with Python, Bash, and PowerShell. Implemented secure cloud networking with Azure VNets, NSGs and routing controls. Managed secrets and privileged access using enterprise vault solutions with least privilege principles. Supported audit readiness by implementing technical controls and maintaining infrastructure documentation for regulated environments. Monitored platform health with Prometheus and ELK, resolved performance bottlenecks before production impact. Served as escalation support during critical incidents to e
DevOps Engineer at Bank of America
November 1, 2019 - May 31, 2021
Supported production AWS and Azure environments hosting critical financial applications with emphasis on reliability, compliance and secure operations. Developed Infrastructure as Code using Terraform and CloudFormation embedding security controls and deployment standards into cloud provisioning workflows. Built and enhanced enterprise observability platforms using Prometheus, Grafana, ELK and CloudWatch improving system visibility and operational response times. Designed Docker container platforms and Kubernetes deployment models using Helm and blue-green release strategies to support zero-downtime deployments. Managed secrets, certificates and secure credentials using HashiCorp Vault and AWS Secrets Manager across distributed environments. Automated operating system patching and routine infrastructure maintenance using Ansible and PowerShell. Responded to critical incidents, led post-incident reviews and implemented corrective actions to improve platform resilience. Worked closely wi
DevOps Engineer at LexisNexis
April 1, 2016 - October 31, 2019
Supported enterprise CI/CD platforms for multiple engineering teams ensuring reliable build, release and deployment processes across development environments. Automated deployments and recurring operational tasks using Python and Bash. Implemented infrastructure automation using Terraform and Ansible. Containerized legacy and modern applications using Docker and supported Kubernetes-based deployments. Built and maintained Jenkins pipelines using Git-driven workflows to support continuous integration and continuous delivery practices. Collaborated with engineering and data teams to improve deployment reliability and support large scale production workloads. Implemented monitoring and logging tools to improve system health visibility and detect issues early in the software delivery lifecycle. Created technical documentation, operational runbooks and onboarding materials improving team knowledge transfer and support readiness.

Education

Associate in Business & Science at Richland Community College
January 11, 2030 - January 1, 2015

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Healthcare, Financial Services, Media & Entertainment, Education