Elastic Site Reliability Engineer (SRE)
Boston, Massachusetts
Job Id:
166742
Job Category:
Job Location:
Boston, Massachusetts
Security Clearance:
Secret
Business Unit:
Zachary Piper
Division:
Zachary Piper Solutions
Position Owner:
Ryan Lucas
Zachary Piper Solutions is seeking an experienced Elastic Site Reliability Engineer (SRE) to support a high-visibility federal engagement focused on observability, platform reliability, and security operations across classified environments. This position will support mission-critical Elastic infrastructure deployments at either Hanscom AFB, MA or Langley AFB, VA. The ideal candidate will have deep expertise supporting enterprise Elastic Stack environments, Kubernetes-based deployments, and production SRE operations within secure or regulated infrastructure environments.
Key Responsibilities:
- Operate, maintain, and optimize large-scale Elastic Stack environments supporting logging, search, observability, and telemetry operations.
- Ensure platform reliability, uptime, scalability, and performance across production mission systems.
- Manage Kubernetes-based Elastic deployments, including ECK operator environments.
- Develop and maintain automation for deployment workflows, monitoring, alerting, and incident response processes.
- Integrate Elastic infrastructure with SIEM and security tooling including Splunk, EDR platforms, and telemetry systems.
- Troubleshoot complex issues across distributed systems, infrastructure, and application environments.
- Implement and support observability frameworks including logging, metrics, tracing, and monitoring solutions.
- Support CI/CD pipelines and infrastructure-as-code initiatives within DevOps environments.
- Maintain operational runbooks, escalation procedures, and technical documentation.
- Participate in on-call support rotations and incident response activities.
Qualifications :
- 5+ years of experience supporting Site Reliability Engineering, DevOps, or infrastructure operations environments.
- Strong hands-on experience with Elastic Stack in enterprise production environments.
- Advanced Kubernetes experience, including ECK operator deployments.
- Strong Linux/Unix administration and networking fundamentals.
- Experience supporting observability, telemetry, logging, and monitoring platforms.
- Experience working within secure, classified, federal, or highly regulated environments.
- Ability to work onsite at Hanscom AFB (MA) or Langley AFB (VA).
- U.S. Citizenship with ability to obtain or maintain a Secret clearance.
Nice-to-Haves:
- Elastic certifications including Elastic Engineer, Security, or Observability.
- Experience with Terraform, Ansible, and CI/CD pipeline automation.
- Exposure to SIEM and EDR technologies including Splunk, CrowdStrike, or Trellix.
- Experience supporting GovCloud, DoD, or federal infrastructure environments.
- Prior experience supporting distributed logging or telemetry platforms.
Soft Skills:
- Strong incident response and operational troubleshooting mindset.
- Ability to remain calm and effective during production outages or high-pressure situations.
- Strong collaboration skills across security, infrastructure, DevOps, and operations teams.
- Excellent communication skills for escalation and operational coordination environments.
- Self-sufficient and capable of operating independently within classified environments.
Compensation & Benefits
- Target compensation: $180,000 – $200,000 annually.
- Long-term federal engagement supporting mission-critical infrastructure initiatives.
- Opportunity to support advanced observability and security operations within classified environments.
Keywords
elastic sre, site reliability engineer, elastic stack, elasticsearch, kibana, logstash, beats, observability, telemetry, logging infrastructure, distributed systems, kubernetes, eck, elastic cloud on kubernetes, sre, devops, platform engineering, infrastructure engineering, production support, linux administration, unix systems, networking, monitoring, tracing, metrics, incident response, automation, ci/cd, infrastructure as code, terraform, ansible, cloud infrastructure, distributed logging, telemetry systems, siem, splunk, edr, crowdstrike, trellix, platform reliability, reliability engineering, scalability, uptime, performance tuning, root cause analysis, operational excellence, federal infrastructure, dod, govcloud, classified systems, mission systems, secret clearance, hanscom afb, langley afb, secure environments, production engineering, elastic observability, elastic security, sre engineer, kubernetes engineer, platform sre, enterprise infrastructure, cloud operations, mission critical systems, elastic engineer, telemetry engineer, security operations, devsecops, automation engineer, distributed architecture, operational support
#LI-RL1