Team Lead System Integration, BIDMC

Boston, MA • Beth Israel Deaconess Medical Center • Full-time • Day
Share job:
Apply now

When you join the growing BILH team, you're not just taking a job, you’re making a difference in people’s lives.

As the Team Lead, High-Performance Computing, you will lead the design, engineering, operation and lifecycle management of high-performance computing (HPC) environments that support scalable and secure research workflows across BILH. The candidate will administer compute clusters, manage workload scheduling systems such as Slurm, support secure and policy-compliant compute platforms, and oversee user access and software environments. In this role, you will set technical direction and execute with operational oversight and collaboration across research and infrastructure teams, to establish and maintain a robust foundation for computational activities and research across the institution. You will also serve as a key contributor on HPC infrastructure initiatives spanning multiple projects. The ideal candidate will possess strong infrastructure engineering skills, automation expertise, and a commitment to reliability and performance in support of scientific computing.

Job Description:

Essential Responsibilities but not limited to:

  • Oversee the design, provisioning, configuration, and decommissioning of HPC compute clusters, ensuring system performance and lifecycle sustainability.
  • Engineer, administer and tune workload schedulers (e.g., Slurm) and cluster management to optimize job throughput, resource utilization, and system availability.
  • Design, maintain and support secure, regulated compute environments (e.g. NIST 800-171), ensuring technical safeguards and documentation align with required frameworks necessary for enabling regulated biomedical research.
  • Ensure integration and design of user accounts and identity management with institutional systems, supporting secure and streamlined access to HPC resources.
  • Design and maintain customized and sustainable researcher software environments, including module systems and containerized applications within security standards.
  • Lead team in the software development life cycle for operational tooling and infrastructure automation and deliver expert coding.
  • Research, design, and implement technical solutions to meet infrastructure and research requirements.
  • Identify opportunities to improve and simplify compute platform services and implement related enhancements.
  • Contribute to the creation and maturing of operational and automation best practices, including Service Level Agreements.
  • Act as a technical liaison to internal and external stakeholders and collaborators and mentor junior staff.
  • Participate in off hours on-call schedule.
  • Other duties as assigned.

Qualifications:

Education:

Bachelor’s degree required; Master’s degree preferred.

Experience:

8-10 years of post-secondary education or relevant work experience, including 5 years of experience managing Linux-based HPC systems in a research or academic environment.

Strong experience with workload schedulers preferred (Slurm preferred), cluster provisioning, and performance tuning.

Experience with infrastructure monitoring, configuration management tools (e.g., Ansible), and containerization tools (e.g., Singularity/Apptainer, Docker).

Familiarity with security and compliance requirements in regulated research environments.

Excellent troubleshooting, communication, and collaboration skills.

Ability to work collaboratively in a team and adapt to evolving technologies and priorities.

Excellent interpersonal skills, including the ability to build and cultivate strong relationships and work effectively with diverse groups.

Demonstrated “can do” work ethic coupled with effective time management.

 

 

Pay Range:

$120,000.00 USD – $160,000.00 USD

The pay range listed for this position is the annual base salary range the organization reasonably and in good faith expects to pay for this position at this time. Actual compensation is determined based on several factors, that may include seniority, education, training, relevant experience, relevant certifications, geography of work location, job responsibilities, or other applicable factors permissible by law. 

As a health care organization, we have a responsibility to do everything in our power to care for and protect our patients, our colleagues and our communities. Beth Israel Lahey Health requires that all staff be vaccinated against influenza (flu) as a condition of employment.

More than 35,000 people working together. Nurses, doctors, technicians, therapists, researchers, teachers and more, making a difference in patients' lives. Your skill and compassion can make us even stronger.

Equal Opportunity Employer/Veterans/Disabled

Similar jobs

Go to Top