SRE Engineer
Reston, Virginia
Job Id:
154232
Job Category:
Job Location:
Reston, Virginia
Security Clearance:
No Clearance
Business Unit:
Zachary Piper
Division:
Zachary Piper Solutions
Position Owner:
Nevine Rehan
Zachary Piper Solutions is seeking a Mid-level Site Reliability Engineer. This is a hybrid role based in Reston, VA. The ideal candidate will have strong DevOps and systems engineering experience, with exposure to big data technologies.
Responsibilities for the Site Reliability Engineer include:
• Architect, deploy, and manage large-scale data platforms (Kafka, Spark, Hadoop, Druid) on Kubernetes
• Automate cluster provisioning, scaling, and monitoring using Ansible, Python, and Jenkins
• Participate in technical designs combining open-source, commercial, and custom components
• Ensure platform SLOs through telemetry collection, visualization, and alerting
• Upgrade data platforms to improve capabilities and security with minimal customer impact
• Troubleshoot complex issues in large, distributed environments
• Stay current with data platform best practices and standards, especially for hybrid cloud environments
• Support internal data platform customers
• Participate in a 24x7 on-call rotation (approximately once per month)
• Contribute to platform evolution, including migration to Kubernetes and object storage solutions
Required Qualifications for the Site Reliability Engineer include:
• Bachelor’s degree in Computer Science or related technical field, or equivalent experience
• 5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
• Strong Linux configuration and administration skills
• Deep understanding of infrastructure-as-code, especially Ansible
• Experience with Docker or Kubernetes in production environments
• Strong automation skills and understanding of CI/CD principles
• Ability to debug in Java (basic level)
• Experience creating dashboards is a plus
Compensation for the Site Reliability Engineer includes:
• Salary Range: $100,000–$130,000 depending on experience
• Full Standard Benefits: PTO, Paid Holidays, Medical, Dental, Vision, 401k plan, Sick leave as required by law
This job opens for applications on 11/10/2025. Applications for this job will be accepted for at least 30 days from the posting date.
#LI-NR1
#LI-Hybrid
Keywords: Site Reliability Engineering, DevOps, Big Data, Hadoop, Kafka, Spark, Druid, Jenkins, Ansible, Python, Kubernetes, Docker, Linux, Infrastructure-as-Code, CI/CD, dashboarding, Java debugging, physical servers, bare metal, object storage, hybrid cloud, 24x7 on-call, Verisign, Reston VA