Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hi, I’m Ibrahim Jafri, a Site Reliability Engineer with over a decade of hands-on experience automating, architecting, and operating cloud-based, large-scale systems. I specialize in Linux, cloud platforms, Kubernetes, and infrastructure as code to drive reliability, scalability, and cost efficiency. I’ve led incident response and built robust observability to help product teams move faster with confidence. I’m passionate about turning complex reliability challenges into repeatable, automated solutions that empower teams. I thrive in collaborative environments, partner closely with development and product teams, and continually refine deployment safety, resilience, and performance. I enjoy turning hard reliability problems into scalable, repeatable workflows that enable teams to innovate with confidence.…Hi, I’m Ibrahim Jafri, a Site Reliability Engineer with over a decade of hands-on experience automating, architecting, and operating cloud-based, large-scale systems. I specialize in Linux, cloud platforms, Kubernetes, and infrastructure as code to drive reliability, scalability, and cost efficiency. I’ve led incident response and built robust observability to help product teams move faster with confidence. I’m passionate about turning complex reliability challenges into repeatable, automated solutions that empower teams. I thrive in collaborative environments, partner closely with development and product teams, and continually refine deployment safety, resilience, and performance. I enjoy turning hard reliability problems into scalable, repeatable workflows that enable teams to innovate with confidence.

Ibrahim Jafri

DevOps Developer, Cloud Developer, Full Stack Developer, +2





Hi, I’m Ibrahim Jafri, a Site Reliability Engineer with over a decade of hands-on experience automating, architecting, and operating cloud-based, large-scale systems. I specialize in Linux, cloud platforms, Kubernetes, and infrastructure as code to drive reliability, scalability, and cost efficiency. I’ve led incident response and built robust observability to help product teams move faster with confidence. I’m passionate about turning complex reliability challenges into repeatable, automated solutions that empower teams. I thrive in collaborative environments, partner closely with development and product teams, and continually refine deployment safety, resilience, and performance. I enjoy turning hard reliability problems into scalable, repeatable workflows that enable teams to innovate with confidence.…Hi, I’m Ibrahim Jafri, a Site Reliability Engineer with over a decade of hands-on experience automating, architecting, and operating cloud-based, large-scale systems. I specialize in Linux, cloud platforms, Kubernetes, and infrastructure as code to drive reliability, scalability, and cost efficiency. I’ve led incident response and built robust observability to help product teams move faster with confidence. I’m passionate about turning complex reliability challenges into repeatable, automated solutions that empower teams. I thrive in collaborative environments, partner closely with development and product teams, and continually refine deployment safety, resilience, and performance. I enjoy turning hard reliability problems into scalable, repeatable workflows that enable teams to innovate with confidence.

Available to hire

Hi, I’m Ibrahim Jafri, a Site Reliability Engineer with over a decade of hands-on experience automating, architecting, and operating cloud-based, large-scale systems. I specialize in Linux, cloud platforms, Kubernetes, and infrastructure as code to drive reliability, scalability, and cost efficiency. I’ve led incident response and built robust observability to help product teams move faster with confidence. I’m passionate about turning complex reliability challenges into repeatable, automated solutions that empower teams.

I thrive in collaborative environments, partner closely with development and product teams, and continually refine deployment safety, resilience, and performance. I enjoy turning hard reliability problems into scalable, repeatable workflows that enable teams to innovate with confidence.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

English

Fluent

Work Experience

Site Reliability Engineer at Shopify

January 1, 2021 - Present

Design, operate, and improve the reliability of large-scale distributed systems running on Kubernetes. Define and implement SLOs/SLIs and error budgets. Automate infrastructure provisioning and lifecycle management using Terraform and internal platform tooling. Participate in on-call rotations, incident command, and post-incident reviews. Improve observability through metrics, logging, alerting, and reliability dashboards. Collaborate with application teams to improve deployment safety, resilience, and system performance. Contribute to platform reliability initiatives supporting high-traffic production services.

Senior DevOps Engineer at Clio

January 1, 2017 - December 31, 2020

Led DevOps initiatives for a growing SaaS platform. Designed and managed Kubernetes (EKS) clusters for production workloads. Built and maintained cloud infrastructure using Terraform and Helm. Developed and maintained CI/CD pipelines for safe, repeatable deployments. Implemented monitoring and alerting using Prometheus and Grafana. Led incident response efforts and authored postmortems. Mentored junior engineers and helped establish DevOps best practices across teams.

Cloud Engineer at TextNow

June 1, 2014 - December 31, 2016

Managed and scaled AWS-based infrastructure supporting high-traffic services. Designed cloud networking, load balancing, and auto-scaling solutions. Introduced Docker-based workflows for application deployment. Built CI/CD pipelines to automate build and release processes. Improved system reliability and uptime through proactive monitoring and automation. Worked closely with development teams to optimize cloud performance and operational costs.

Automation Engineer at Absorb Software

June 1, 2012 - May 31, 2014

Automated deployment and operational workflows using Bash and Python. Managed Linux-based servers and early cloud infrastructure. Built and maintained CI pipelines using Jenkins. Assisted development teams with release automation and environment provisioning. Implemented basic monitoring and alerting solutions. Supported production systems and participated in on-call rotations.