Update : IBM continues to analyze the poor cluster filesystem (GPFS) performance over the hi-speed network and have several personnel engaged in attempts to indentify improvements via tuning. However, there is no estimated time for a resolution at this point.
If you are seeing markedly slower job performance, this is mostly likely the reason. For example, we have received reports that Gaussian jobs are significantly slower in time to completion.
IBM appreciates your patience while they work to restore the GPFS filesystem performance to hopefully at least where it was on the GigE network. Unfortunately, rolling back to the GigE configuration would require another full outage.
Update: IBM continues to investigate - nothing new to report as of this afternoon. IBM is attempting to put together a lab system mirroring our configuration for performance analysis.
We will keep you updated as we find out more info. Thank you for your patience.
The recent scheduled downtime focused primarily on IBM reconfiguring the cluster filesystem (GPFS) to use the hi-speed Infiniband network as opposed to standard GigE. In theory, this should have improved performance.
However, we are receiving user reports that job turn-around times have regressed after this change. Performance data was collected both before and after the reconfig, but this data is still being analyzed by IBM at this time.
We are having ongoing discussions regarding this and IBM is currently formulating a plan.
We appreciate your patience.
Please report any issues to the support address listed in the login message.
Due to unforseen issues, the downtime is continued into Wed, 4/23, time TBD. We apologize for any inconvenience.
Update: 16:30; work still in progress.
Update: 21:30; preliminary open access.
The HPC clusters are down Mon & Tues, 4/21-4/22, so that IBM can complete pending upgrades.
As previously noted in the login message, there will be a cluster-wide outage starting Sunday evening at 6:00PM on 4/20, possibly continuing up to the morning of Wed 4/23. This will affect both the BCX Intel cluster as well as the P-Series cluster.
This downtime is primarily for IBM to make the necessary configuration changes to route the cluster filesystem traffic (GPFS) over the hi-speed Infiniband network and thereby improve performance.
If you have questions or concerns, please direct them to the support list address identified in the login message.
We have rebuilt the old HPC Listserv list and will use it to let the HPC community at UK know what is happening on our supercomputing facilities. If you are not on the list already and would like to subscribe, send email to LISTSERV@LSV.UKY.EDU with the single line SUB HPC in it. If you want off of the list, use UNSUB HPC instead. If you have trouble, please let us know.
Connie Shoemake, IBM Vice President Sales, Public Sector, Central
Region will be hosting a discussion on the HPC Implementation at
UK. This session will take place on Friday, September 21st at 10:30 in
room 327 of McVey Hall . You are welcome to attend and listen to Connie and ask any questions you have concerning the IBM HPC installation here at UK.
The BCX login nodes are currently inaccessible.
Currently running jobs are not affected.
Support staff are working on the problem; thank you for your patience.