City: Nashville, TN, US
Employer Reference: 10005524
Company Description
Company Description
At Vanderbilt University , our work - regardless of title or role - is in service to an important and noble mission in which every member of our community serves in advancing knowledge and transforming lives on a daily basis. Located in Nashville, Tennessee, on a 330+ acre campus and arboretum dating back to 1873, Vanderbilt is proud to have been named as one of “America’s Best Large Employers” as well as a top employer in Tennessee and the Nashville metropolitan area by Forbes for several years running. We welcome those who are interested in learning and growing professionally with an employer that strives to create, foster and sustain opportunities as an employer of choice
We understand you have a choice when choosing where to work and pursue a career. We understand you are unique and have a story. We want to hear it. We encourage you to apply today so that you might become a part of our story.
Vanderbilt University is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran, or any other characteristic protected by law.
Job Description
Position Summary:
The Storage Systems Administrator is part of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University and is a key contributor responsible for maintaining the performance and availability of tier 2 and archival storage systems. T Reporting directly to the Director of Research Computing Operations, the Storage System Administrator will be part of ACCRE's Infrastructure Group and collaborate with members of other ACCRE units.
The Storage Systems Administrator will ensure the proper functioning of hardware components in use by ACCRE's storage and tape systems. This includes running diagnostics and monitoring overall system health. As a member of the Infrastructure Group, this role serves as 'remote hands' in the data center and participates in the on-call rotation.
About the Work Unit:
The Advanced Computing Center for Research and Education is a provider of technology and support to Vanderbilt faculty and students in the areas of high-performance computing, large data storage, software development, and visualization. Our goal is to meet faculty and students 'where they are' to give them the easiest access to the broad range of computing tools available for both research and education.
Key Functions and Expected Performance:
ACCRE Storage Systems Administration
Maintain, administer, and improve ACCRE's storage services
Set, implement, and audit user access controls
Aid in operational security implementations
Solve user support issues
Troubleshoot hardware and software problems related to the storage
Be the primary support for the tape library
Be a member of the team developing, deploying, and supporting a distributed NAS system
Work on adapting existing software tools to support the transport and management of research data between various storage pools both on and off campus
ACCRE Compute Cluster Administration
Set up/configure cluster hardware related to storage systems and cluster management infrastructure
Install operating system and related utility software
Monitor the status of the cluster utilizing tools such as CheckMK, including customizing the tools for ACCRE-specific needs
Serve as a technical resource to users and other ACCRE staff members
Coordinate critical tasks with other team members to meet project guidelines
Act as internal technical consultant to ACCRE staff, particularly related to projects on which this position is serving as the primary systems administrator
Supervisory Relationships:
This position does not have supervisory responsibility; this position reports administratively and functionally to the Director of Research Computing Operations.
Education and Certifications
A Bachelor's degree from an accredited institution of higher education.
Experience and Skills:
Required
Vanderbilt Export Compliance regulations designate that this position is limited to US citizens and permanent residents only
The ability to physically move and lift hardware up to 50 pounds
Five years of experience with system administration with UNIX/Linux based operating systems or managing compute cluster subsystems
Demonstrated experience with Bash and/or Python scripting of moderate complexity
Demonstrated self-driven, inquisitive, and productive troubleshooting abilities
Strong ability to work individually and in a team environment is required.
Preferred
Experience with parallel clustered storage solutions including one of: IBM SpectrumScale (GPFS), Auristor, PanFS, or OpenAFS
Knowledge and experience of GIT version control
Knowledge and experience with configuration management tools such as Ansible
Demonstrated success in taking initiative, meeting deadlines, and adjusting to operational shifts
Experience with RedHat based systems
Experience in an HPC environment
Experience with disk storage hardware (SAS, JBOD, RAID, HBA, RAID controllers, etc.)