Site Reliability Engineer - Big Data

Reston, Virginia

Zachary Piper Logo

Job Id:
155933

Job Category:

Job Location:
Reston, Virginia

Security Clearance:
Public Trust or Uncleared

Business Unit:
Zachary Piper

Division:
Zachary Piper Solutions

Position Owner:
Gillian Contillo

Zachary Piper Solutions is seeking a Site Reliability Engineer- Big Data responsible for building and managing a Data Platform enabling the creation of large-scale, high-throughput data products and services delivering actionable operational and business intelligence This position is hybrid two days a week onsite in Reston, VA.

 

**Candidate must not require any work authorization**


Key Responsibilities:

Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes

Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins

Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components

Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry

Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact

Troubleshooting complex issues in large and distributed environments.

Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments

Supporting data platform customers

Participating in the on-call rotation monitoring production systems and responding to incidents


Requirements:

Candidate must not require any work authorization

Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience

5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)

Excellent understanding of Linux configuration and administration

Strong automation experience - Not just developing automation, but knowing why we automate and what to automate

Strong understanding of infrastructure-as-code such as Ansible

Experience with Docker or Kubernetes in a production environment

Strong written and verbal communication skills – able to clearly and succinctly describe complex issues.

 

Compensation:

 $140,000-$150,000/year **depending on years of experience and degree**

Full Benefits -Medical, Dental, Vision, 401K, Paid Holidays, PTO, Sick Leave if required by law

 

This job opens for applications on 12/4/2025. Applications for this job will be accepted for at least 30 days from the posting date

 

#LI-Onsite

#LI-GC2

 

Keywords: Site Reliability Engineer, SRE, Big Data, Data Platform, Hybrid Cloud, Operational Intelligence, Business Intelligence, High-throughput Data Products, Distributed Systems, Kafka, Spark, Hadoop, Druid, Kubernetes, Docker, Linux Administration, Cluster Provisioning, CI/CD, Ansible, Python, Jenkins, Infrastructure-as-Code, Telemetry, Monitoring, Automation, Upgrades & Security, Troubleshooting, Open-Source Integration, Data Platform Management, Containerization, Configuration Management, Visualization & Alerting, On-call Rotation, Production Systems Monitoring, DevOps, Linux, automation, design, automate, large-scale, ideation, implementation, deployment, customer onboarding, support, cross-team collaboration, Data Engineering, Infrastructure, Engineering, Security, Operation Teams.

Apply For This Position

Personal Information

Required
Required
Required
Required
Required
Required
Required

Additional Details

Required
Required
Required

Voluntary Self-identification Form

Required
Required
Required

Veteran Status *

Discharge Date:

Resume Upload

Please note only files with .pdf, .docx, or .doc file extensions are accepted.

Currently selected file:

Don't have a resume?