I'm a DevOps Engineer with 20+ years of hands-on experience designing and managing scalable Linux, Windows, and multi-cloud environments. I specialize in building automated deployment pipelines, GPU clusters for AI workloads, and migrating petabyte-scale storage, all while maintaining production systems with strict SLA requirements. I excel at incident response, root cause analysis, and developing robust runbooks. My passion is turning complex infrastructure into reliable, observable platforms that empower research and enterprise teams to move fast and safely.

Ahmed Fahmy

I'm a DevOps Engineer with 20+ years of hands-on experience designing and managing scalable Linux, Windows, and multi-cloud environments. I specialize in building automated deployment pipelines, GPU clusters for AI workloads, and migrating petabyte-scale storage, all while maintaining production systems with strict SLA requirements. I excel at incident response, root cause analysis, and developing robust runbooks. My passion is turning complex infrastructure into reliable, observable platforms that empower research and enterprise teams to move fast and safely.

Available to hire

I’m a DevOps Engineer with 20+ years of hands-on experience designing and managing scalable Linux, Windows, and multi-cloud environments. I specialize in building automated deployment pipelines, GPU clusters for AI workloads, and migrating petabyte-scale storage, all while maintaining production systems with strict SLA requirements.

I excel at incident response, root cause analysis, and developing robust runbooks. My passion is turning complex infrastructure into reliable, observable platforms that empower research and enterprise teams to move fast and safely.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert

Language

English
Fluent
German
Intermediate
Russian
Intermediate
Ukrainian
Intermediate
Arabic
Intermediate

Work Experience

Senior System Administrator, Infrastructure at AINOSTICS
January 1, 2025 - January 1, 2026
Led infrastructure automation initiatives for an AI research startup, managing hybrid cloud and on-premises environments supporting machine learning workloads. Designed and deployed AWS cloud infrastructure using CloudFormation templates with GitHub Actions automation, implementing services across compute (EC2, ECS, Parallel Computing Service), networking (VPC, ELB, Route 53), storage (S3, EFS), databases (RDS), security (WAF, Secrets Manager, KMS), monitoring (CloudWatch), data processing (Glue), messaging (SNS, SQS), and content delivery (CloudFront, ECR). Built large-scale storage provisioning with IaC, handling petabyte-scale data requirements for AI training workflows. Maintained Proxmox virtualization clusters supporting GPU farms with 99% uptime. Created comprehensive infrastructure documentation and automated runbooks that reduced incident resolution time. Provided on-call support for critical AI training infrastructure, performing root cause analysis and implementing permanent
DevOps, Technical Support Specialist at TransTech UK & Europe Ltd
June 1, 2024 - December 1, 2024
Supported logistics technology systems with focus on Golang applications and Azure DevOps automation. Built CI/CD pipelines using Azure DevOps for transport management systems, automating testing and deployment workflows. Provided technical support and troubleshooting for production environments, managing incidents across multiple customer deployments. Reduced incident resolution time through automated monitoring, diagnostic workflows, and systematic root cause analysis. Implemented infrastructure monitoring solutions that improved system visibility and enabled proactive issue detection.
Linux System Administrator at Cubic
September 1, 2022 - May 1, 2024
Managed infrastructure for transportation technology solutions across Linux and Windows environments. Maintained high availability across production systems, consistently meeting strict SLA requirements. Built PowerShell and Bash automation scripts reducing manual maintenance tasks by 85%. Developed and tested disaster recovery procedures, validating RTOs through regular testing. Participated in on-call rotation, managing incidents from detection to resolution and implementing permanent fixes. Collaborated with hardware vendors to resolve compatibility issues, ensuring optimal performance.
Senior System Administrator at EPAM
March 1, 2017 - September 1, 2022
Managed infrastructure automation for Fortune 500 clients across financial services and other industries. Designed Active Directory forests and hybrid cloud environments for large enterprises, planning and implementing complex directory services architectures. Implemented Infrastructure as Code using Terraform and Ansible for multi-cloud deployments, creating reusable modules and ensuring consistency. Managed client SaaS environments and automated deployment workflows, reducing deployment time and improving reliability. Built comprehensive documentation and maintained infrastructure version control to enable team collaboration and knowledge transfer.
IT Manager at NAOS
May 1, 2014 - March 1, 2017
Managed IT operations for a telecommunications company, leading infrastructure automation initiatives and directing a small technical team. Directed IT delivering VoIP solutions on Linux and Windows, managing technical delivery and team development. Managed Asterisk/SIP systems handling 500+ concurrent calls with high availability, implementing redundancy and failover. Established ITIL-based service management processes, improving incident response and change management. Built resilient infrastructure supporting VoIP services across client sites, ensuring service continuity and quality.
Web Developer at Sparksight
January 1, 2014 - May 1, 2014
Early exposure to DevOps practices, with CI/CD concepts, Jenkins automation, and containerization. Implemented early CI/CD workflows for web applications, automating build and test processes. Used Docker to create consistent development environments and reduce environment-related issues. Built automated testing pipelines for web applications, improving code quality and deployment confidence.
System Administrator at UCGCC
January 1, 2005 - January 1, 2014
Managed IT infrastructure operations, building a foundation in systems automation and expertise across Windows and Linux. Maintained server environments, ensured system availability and performance. Implemented backup and monitoring solutions, protecting critical data and enabling proactive issue detection. Developed automation scripts for routine maintenance tasks to reduce manual effort and improve consistency.

Education

BSc at Middlesex University London
January 11, 2030 - January 1, 2005

Qualifications

LPIC-2 Linux Engineer (202-450)
January 11, 2030 - January 26, 2026
Learning Kubernetes
January 1, 2022 - January 26, 2026
Graphite and Grafana: Visualizing Application Performance
January 1, 2022 - January 26, 2026
Creating a Dev Environment in AWS with Terraform
January 1, 2022 - January 26, 2026
AWS Certified Solutions Architect Associate (SAA-C02) Prep: Storage Design
January 1, 2022 - January 26, 2026
DevOps for Databases in Azure with MySQL and Terraform
January 1, 2022 - January 26, 2026
Learning Puppet
January 1, 2022 - January 26, 2026
Puppet Essential Training
January 1, 2022 - January 26, 2026
Microsoft Azure Fundamentals (AZ-900) - Currently Pursuing
January 11, 2030 - January 26, 2026
AWS Solutions Architect Associate (Full Certification) - Currently Pursuing
January 11, 2030 - January 26, 2026
RHCSA - Currently Pursuing
January 11, 2030 - January 26, 2026
Certified Kubernetes Administrator (CKA) - Currently Pursuing
January 11, 2030 - January 26, 2026

Industry Experience

Computers & Electronics, Software & Internet, Professional Services, Other, Transportation & Logistics