AI Infrastructure Engineer - EA Experiences

Full time on site
AI Infrastructure Engineer - EA Experiences
Job Description

General Information

Locations: Galway, Ireland

Role ID 214202 Worker Type Regular Employee Studio/Department Marketing Work Model Hybrid

Description \& Requirements

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen. EA Experiences group (XO) is dedicated to ensuring great experiences for our growing communities centered around our world-renowned brands, including fan-favorites like Apex, Battlefield, EA SPORTS FC, Madden NFL and The Sims, just to name a few. We're a multi-functional group, with world-class expertise building fandoms, driving interactive storytelling, and positioning our franchises at the center of the broader entertainment ecosystem. We inspire, connect, and engage fans through culturally relevant content, intentionally architected journeys across channels, and meaningful fan care. Our goal is to provide valuable, easy experiences that fans love – in our games, around our games, and through innovative adjacent experiences to grow and enrich how fans experience EA as we shape the future of entertainment.

To empower more players and fans in new and amazing ways, we need more innovators to join our world-class team. The future of entertainment is interactive, and you can help lead that future, by growing and enriching how hundreds of millions of people (and counting) find joy and belonging, forge friendships, and celebrate their lived experiences through the work we do every single day, together.

You will be the hands-on AI Infrastructure Engineer for our AI and machine learning platform, reporting to the Director, Agentic Solutions. You will design, build, and operate the cloud foundation our models and production AI agents run on, going deep in AWS to make the platform reliable, secure, and cost-effective at scale. You'll bring MLOps and AIOps together: the training, serving, and monitoring infrastructure teams build on, with MLflow-based experiment tracking, model registry, and pipelines on one side, and self-monitoring, self-healing systems on the other. You'll architect and ship the CI/CD, observability, and infrastructure-as-code standards that the rest of XO builds on, and you'll still go deep in the code when the work calls for it. You will define requirements, rapidly prototype, iterate with stakeholders, and establish reusable architectures, standards, and patterns using the latest AI engineering methodologies, models, tools, and platforms. You're creative, innovative, self-motivated, and team-first, equally strong at problem-solving and collaborating across product, data, security, IT, and engineering teams. You will build scalable ML and AI pipelines that let teams spend more time on high-value, creative, and strategic work. You will be a hybrid worker, collaborating with teams 3 days a week from the office; international travel to collaborate with global teams is an added bonus.

Responsibilities

  • Own the MLOps platform: build and operate the platform teams use to train, track, version, and deploy models, with MLflow for experiment tracking, model registry, and lineage.
  • Run the ML pipelines: design and operate training, validation, and deployment pipelines, including automated retraining when data or model performance drifts.
  • Serve models at scale: stand up real-time and batch inference infrastructure, including GPU-backed and LLM serving, and make the calls on hosted versus self-managed serving.
  • Monitor models in production: put drift detection, data quality checks, and performance tracking in place, with alerts that trigger action.
  • Drive AIOps: build self-monitoring, self-healing systems on event-driven automation, with anomaly detection, predictive alerting, and automated remediation.
  • Architect infrastructure as software: implement programmable IaC (AWS CDK preferred) plus reusable patterns, shared libraries, and platform standards across teams.
  • Establish observability and traceability: make services, pipelines, models, and data flows visible end to end.
  • Govern CI/CD and continuous training: design pipelines with security and compliance controls built in (DevSecOps and MLSecOps).
  • Secure the platform: enforce least privilege, identity management, and continuous validation across infrastructure, models, and data.
  • Own reliability: define SLIs/SLOs, run incident response and postmortems, and continuously improve reliability.
  • Partner and mentor: work with teams across XO, guide engineers, and shape architecture decisions.

Your Qualifications

  • 7+ years designing, building, and operating production-grade infrastructure and platforms, with strong software engineering, security, and reliability best practices.
  • Hands-on MLOps experience is the core of this role: building and operating ML platforms with experiment tracking, model registry, and automated training and deployment pipelines (MLflow, or equivalents such as Kubeflow or SageMaker).
  • Deep, hands-on AWS experience across compute and serverless (Lambda, ECS/Fargate, containers), storage, networking (VPC), IAM, observability and telemetry (CloudWatch, tracing, structured logging), and secrets management; experience with SageMaker and Amazon Bedrock is a strong plus.
  • Experience running AIOps practices: anomaly detection, predictive alerting, automated remediation, and self-healing systems built on event-driven automation.
  • Strong infrastructure-as-code and CI/CD experience (CDK preferred; Terraform or CloudFormation), with a track record of building for reliability, scale, and cost efficiency.
  • Experience with ML pipeline orchestration (Airflow, Kubeflow, SageMaker Pipelines, or Step Functions) and model serving and inference (SageMaker, Bedrock, KServe, Seldon, or Triton).
  • Experience with model and data monitoring, including drift detection and data quality.
  • Strong Python skills; working knowledge of at least one additional language (TypeScript/Node.js, Go, Java, or C#).
  • Deep experience with observability tools (Datadog, Prometheus, Grafana, OpenTelemetry) and debugging distributed systems.
  • Solid grasp of the ML lifecycle, from training and evaluation through deployment, monitoring, and retraining.
  • Experience navigating the legal, ethical, and security implications of AI, including data privacy, IP, and safety, and translating policy into engineering controls.
  • Thrive working both collaboratively and independently, with excellent creative, critical thinking, and problem-solving skills, and a demonstrated ability to clearly articulate complex technical concepts.
  • LLMOps experience (serving and fine-tuning LLMs, vector databases, and RAG infrastructure), feature stores (Feast, Tecton, or SageMaker Feature Store), GPU and accelerator infrastructure, Kubernetes (EKS), or Data Lakehouse platforms (e.g., Databricks) is beneficial.
  • Experience working in a gaming company or large-scale consumer platform is beneficial.

About Electronic Arts We’re proud to have an extensive portfolio of games and experiences, locations around the world, and opportunities across EA. We value adaptability, resilience, creativity, and curiosity. From leadership that brings out your potential, to creating space for learning and experimenting, we empower you to do great work and pursue opportunities for growth.

We adopt a holistic approach to our benefits programs, emphasizing physical, emotional, financial, career, and community wellness to support a balanced life. Our packages are tailored to meet local needs and may include healthcare coverage, mental well-being support, retirement savings, paid time off, family leaves, complimentary games, and more. We nurture environments where our teams can always bring their best to what they do.

Electronic Arts is an equal opportunity employer. All employment decisions are made without regard to race, color, national origin, ancestry, sex, gender, gender identity or expression, sexual orientation, age, genetic information, religion, disability, medical condition, pregnancy, marital status, family status, veteran status, or any other characteristic protected by law. We will also consider employment qualified applicants with criminal records in accordance with applicable law. EA also makes workplace accommodations for qualified individuals with disabilities as required by applicable law.

Share this job:
ES Assistant Online
Hello! I am your AI career assistant. How can I help you today?