|Presenting the Holland Computing Center grid|
Creating a new UserBut on a more relevant note, the Campus Grid in Puerto Rico gained a new user while I was there. I always feel that computing infrastructure is best built by user's demand. That has certainly been the case at HCC, where we run between 95%+ utilization on our HPC clusters. I met with Steve Massey to find how the Island Grid can help him.
Steve Massey is a bio-informatician at the University of Puerto Rico -- Rio Pedres. His work is an ideal fit for High Throughput Computing. His processing follows the model of using the same executable against many, many protein pdb files. We talked for a while on Tuesday, both before and after the power was cut to the UPR campus, about how we can enable this work onto the UPR campus grid, flocking to UNL, and finally to the OSG. While I was in PR, I worked with Steve to run one of his workflows on the OSG.
I'm not going to pretend to know what is really happening with the workflow, but it takes as input a set of DNA sequences that where pre-calculated (I believe somewhat random) and a pdb file, and calculates the robustness using an external application, Scwrl4. The output is a robustness file that lists the robustness for the protein with the DNA sequences.
I was able to run this workflow on the OSG using Nebraska's GlideinWMS interface. I created a submit.sh script that wrote out a simple Condor submit script, and wrapper.sh, that configured the environment on the worker node. Both of the scripts are available on github. Together, these two components create the workflow.
There is still work to be done. The executable that Steve wrote does not properly detect the length of the strand of the amino acid, and therefore is not able to properly calculate robustness and/or send to Scwrl4.
Also, there is another workflow that Steve would like to run. I hope to continue to work with him to enable these workflows.