HPC Support Engineer

Rockville, MD

Zachary Piper Logo

Job Id:
129429

Job Category:

Job Location:
Rockville, MD

Security Clearance:
Public Trust or Uncleared

Business Unit:
Zachary Piper

Division:
Zachary Piper Solutions

Position Owner:
Cameron Bagwell

Zachary Piper Solutions is in need of a HPC Support Engineer to support the National Institutes of Allergy and Infectious Diseases (NIAID). The HPC Systems Engineer will support HPC hardware, install scientific applications, and so much more to monitor the health of NIAID's HPC clusters. This is remote position based out of Rockville, MD. The HPC Support Engineer will collaborate with clients to define and document IT project scope. priority, budgets and schedules and oversee implementation of projects.



Responsibilities Include:

  • Work with a 4000+ core HPC cluster that is GPU-focused and a 1,500+ HPC cluster supporting the hardware and operating system environments
  • Supporting bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, and AI/ML
  • Monitor the portfolio of software applications and be proactive in planning upgrades and license renewals
  • Monitor and report on cluster performance and generate data to show usage and trends
  • Triage support requests from the research community and work with others in the Scientific Infrastructure team to resolve issues and complete service requests
  • Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows
  • Engage with researchers to understand their HPC needs to include data life cycle management, integration of scientific instruments to HPC, and storage capacity and compute requirements


Requirements Include:

  • Bachelors Degree in Information Technology or alike field
  • Minimum of 5 years of experience with servers, datacenters, networking, and related technologies
  • Minimum of 5 years of experience managing Linux systems
  • Experience with Spack, easybuild, Lmod, or similar tool to streamline software packages
  • Experience installing and packaging GPU applications and optimizing job submission scripts that are used for ML model training, data mining operations, or high-res graphics rendering
  • Experience with Python scripting, Git, Ansible, and Terraform
  • Ability to obtain a NIH Public Trust


Compensation Includes:

  • $140,000 - 150,000 *depending on experience*
  • Health, Dental, Vision, 401K, PTO, Paid Holidays, etc.


#LI-CB1

#LI-REMOTE


Keywords: Systems Engineer, HPC, High performance compute, cluster, HPC cluster, GPU, GPU focused, core, hardware, operating system, HPC Engineer, python, scripting, python scripting, git, git workflows, git distributed workflows, ansible manage system configuration, ansible, terraform, system, cluster performance, scheduler, schedule, job schedule, slurm, spack, lmod, easybuilder, easy builder, conda, anaconda, storage, storage arrays

Apply For This Position

Personal Information

Required
Required
Required
Required
Required
Required
Required

Additional Details

Required
Required
Required

Voluntary Self-identification Form

Required
Required
Required

Veteran Status *

Discharge Date:

Resume Upload

Please note only files with .pdf, .docx, or .doc file extensions are accepted.

Currently selected file:

Don't have a resume?