Hi, I'm Christopher Walsh, a Software Engineer with extensive experience in building scalable, production-grade AI infrastructure at major tech companies like Meta and Google. I specialize in developing end-to-end systems for AI model serving, real-time inference, and ML tooling, working across the stack with technologies from Kubernetes orchestration to React frontends. I'm passionate about creating reliable and efficient AI platforms that are deeply integrated into cloud-native infrastructure, enabling teams to manage AI model deployment, monitoring, and incident response with confidence. I enjoy working with modern ML frameworks and architecting microservices to deliver performant solutions and seamless user experiences.

Christopher Walsh

Hi, I'm Christopher Walsh, a Software Engineer with extensive experience in building scalable, production-grade AI infrastructure at major tech companies like Meta and Google. I specialize in developing end-to-end systems for AI model serving, real-time inference, and ML tooling, working across the stack with technologies from Kubernetes orchestration to React frontends. I'm passionate about creating reliable and efficient AI platforms that are deeply integrated into cloud-native infrastructure, enabling teams to manage AI model deployment, monitoring, and incident response with confidence. I enjoy working with modern ML frameworks and architecting microservices to deliver performant solutions and seamless user experiences.

Available to hire

Hi, I’m Christopher Walsh, a Software Engineer with extensive experience in building scalable, production-grade AI infrastructure at major tech companies like Meta and Google. I specialize in developing end-to-end systems for AI model serving, real-time inference, and ML tooling, working across the stack with technologies from Kubernetes orchestration to React frontends.

I’m passionate about creating reliable and efficient AI platforms that are deeply integrated into cloud-native infrastructure, enabling teams to manage AI model deployment, monitoring, and incident response with confidence. I enjoy working with modern ML frameworks and architecting microservices to deliver performant solutions and seamless user experiences.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Beginner
See more

Language

Aragonese
Advanced
Bashkir
Advanced

Work Experience

Software Engineer at Meta
May 1, 2024 - Present
Contributed to SentinelAI, an AI-driven incident response platform supporting Site Reliability Engineers during live outages using fine-tuned LLaMA models. Developed backend engineering console with Python and FastAPI, implementing server-sent events streaming and modular prompt routing. Built core React + TypeScript interface with token-level LLM output rendering, interactive system actions, and telemetry visualizations. Integrated Firebase Auth and applied TailwindCSS for internal design consistency. Implemented and deployed GPU-based inference services with TorchScript and Triton on internal GKE clusters using Helm and ArgoCD. Developed Redis-based context caching and Celery task workers for background processing. Added observability instrumentation using OpenTelemetry, Prometheus, and internal monitoring tools for real-time metrics and tracing.
Software Engineer at Google
May 1, 2024 - July 25, 2025
Worked on the ML Infrastructure team supporting large-scale AI model deployment and inference optimization for Ads and Search. Developed distributed model serving pipelines using TensorFlow Serving, gRPC, and Envoy across multi-region GCP clusters with autoscaling and request tracing. Created custom JAX/Flax training workflows optimized for TPU v4 pods integrated into internal orchestration systems. Refactored data preprocessing jobs using Apache Beam, reducing latency by 45%. Implemented feature caching with Cloud Memorystore (Redis) and Bigtable, coordinated via Kafka and Pub/Sub topics. Developed Go-based CLI tools for experiments and evaluation datasets management. Built real-time monitoring dashboards using React + TypeScript. Designed LLM prompt evaluation frameworks. Contributed to ML observability with OpenCensus. Partnered with SRE for graceful fallbacks and automatic traffic shifting during node degradation. Participated in optimization sprints improving inference efficiency
Software Engineer at Meta
May 1, 2024 - Present
Contributed to SentinelAI, an internal AI-driven incident response platform leveraging fine-tuned LLaMA models to support SRE and infrastructure teams during live outages. Developed the engineering console backend with Python and FastAPI, featuring SSE streaming and modular prompt routing. Built core React + TypeScript frontend features including token-level LLM output rendering, interactive actions, and telemetry visualizations. Integrated Firebase Auth and TailwindCSS. Helped deploy GPU-based inference services using TorchScript and Triton to GKE clusters with Helm and ArgoCD, optimizing for low-latency startup. Developed Redis-based context caching and Celery workers for batch processing. Added observability instrumentation using OpenTelemetry, Prometheus, and internal tools for real-time metrics and tracing.
Software Engineer at Google
May 1, 2024 - July 31, 2025
Worked on ML Infrastructure supporting large-scale AI model deployment and inference optimization for Ads and Search. Developed distributed model serving pipelines with TensorFlow Serving, gRPC, and Envoy across multi-region GCP clusters with autoscaling and tracing. Built custom JAX/Flax training workflows optimized for TPU v4 pods and integrated with Borg and Blaze orchestration. Refactored data preprocessing using Apache Beam, reducing latency by 45%. Implemented feature caching using Cloud Memorystore and Bigtable coordinated with Kafka and Pub/Sub. Created Go CLI tools for model experiment management. Built real-time monitoring dashboards with React and TypeScript. Designed internal LLM prompt evaluation frameworks. Contributed to ML observability with OpenCensus and partnered with SRE teams on fallback mechanisms and traffic shifting. Participated in optimization sprints to reduce inference costs and improve batching. Documented best practices for AI system reliability using Pyth

Education

Bachelor of Science at ETH Zurich – Swiss Federal Institute of Technology
May 1, 2021 - August 31, 2022
Bachelor of Science at ETH Zurich – Swiss Federal Institute of Technology
May 1, 2021 - August 31, 2022

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Computers & Electronics, Professional Services