For a very long time (about 11 years… wow) since I was in high school, I have been a proud supporter of Folding@Home. It is a great project not only because of the backend research it helps with, but also the technical aspects of their distributed system.
But I have always wanted to run it diskless, just to see if I could. Well, I finally had a need, time and the resources to try it…. and after much hair pulling, I did it!
I have run a few diskless systems before, but it was always live media. This is slightly different.
Why? Why the hell are you doing this?
Recently I had about 60 servers doing nothing after they were used for memory and disk testing. Rather than just unracking them, I thought I would try this.
So the goals:
- Easy. Quick. Simple. Dirty if need be. This is designed to be temporary.
- Be diskless
- Each node should be able to be booted quickly and not have to be checked (eg SSH)
- Support multiple CPU configs (Single Dual core, Single Quad core, Dual Quad Cores) and dynamically add folding slots
- CentOS (if possible) Spoiler alert, I use CentOS 😀
So the setup:
- 1x Installed (to disk) CentOS system for TFTP booting, DHCP, NFS and management
- Hardware that supports PXE
- Servers to run as nodes (Diskless clients to run F@H)
Setting up the control node
So, install a copy of CentOS 7 (I tried 6, but couldn’t get it to work)
Skip to here: http://www.server-world.info/en/note?os=CentOS_7&p=pxe&f=4
because why bother rewriting it? Thats what I followed 😀
Also follow the DHCP setup if you need to (you don’t have to use DHCP server on the management node, but its easier in my opinion)
groups -y install “Server with GUI” –releasever=7 –installroot=/var/lib/tftpboot/centos7/root/
groups -y install “Core” –releasever=7 –installroot=/var/lib/tftpboot/centos7/root/
That way it’s without a GUI
Also, run this to prevent issues later
yum install nfs-utils –releasever=7 –installroot=/var/lib/tftpboot/centos7min/root/
TIPS (and i chased my tail here, so do these tips):
- Use DHCP reservations for your nodes. It prevents issues with IPs changing between PXE boot and OS
- If your nodes have multiple connected NICs, unplug one. Prevents issues with IPs changing
- Use a long DHCP lease time, it prevents it renewing the leases unnecessarily.
Boot up one of your nodes and make sure you get CentOS running before you proceed
A quick note on how this SHOULD work
So from here, you should create a installed environment for each of your nodes and modify the boot line on PXE for each node by MAC address to point to the correct location. Each node would then have its own installed environment and you’re done.
BUT. I didn’t like that idea, I wanted one installed environment to share because (per the goals) this is a quick, dirty, temporary solution.
Installing F@H to the diskless environment
The next bit assumes that you have booted a diskless node and it works and that you are going to either accept this as quick and dirty setup OR you are going to amend this guide to do it properly.
Download the latest F@H client from the F@H website for CentOS and then run
install fahclient*.rpm –installroot=/var/lib/tftpboot/centos7/root/
So if you have read this far you might be thinking “But… The work units… This won’t work as they will all be in the same directory.”
And you would be right – my answer: RAM disk
(Yes, if the power drops you lose your progress, but where I’m running this has UPS, Generator, Dual power)
If you want to save them per node, rather than a RAM disk below for /var/lib/fahclient, you will need to use a NFS mount per node or they will conflict.
So This is where things get interesting
mkdir -p /var/lib/tftpboot/centos7/root/etc/FAH
mv /var/lib/tftpboot/centos7/root/etc/fahclient/config.xml /var/lib/tftpboot/centos7/root/etc/FAH/config.xml
Modify your config in /var/lib/tftpboot/centos7/root/etc/FAH/config.xml
This is the static config that will go across all nodes.
- Add CPU declarations
- Add a </config> tag at the end (will explain shortly)
Add the following 2 lines to /var/lib/tftpboot/centos7/root/etc/fstab
This creates a RAM disk for the work units and for the config to run per node.
tmpfs /var/lib/fahclient tmpfs nodev,noatime,nodiratime,size=256m 0 0
tmpfs /etc/fahclient tmpfs nodev,noatime,nodiratime,size=8m 0 0
Change this line:
This moves the PID file into the RAM disk so that its unique to this node
Add these lines immediately below the variable declarations
# Generate the available CPU slots and add them to the config
cat /etc/FAH/config.xml > /etc/fahclient/config.xml
c=$(grep -c ‘^processor’ /proc/cpuinfo)
while [ $i -lt $c ]
echo “<slot id=’$i’ type=’CPU’/>” >> /etc/fahclient/config.xml
echo “</config>” >> /etc/fahclient/config.xml
(Yes the above could be cleaned up and/or done with sed) this evolved quickly and was dirty… My sed skills are amateur at best.
Disable logging (prevents issues with the root file system, reduces network traffic)
Change all the logs to go to /dev/null
(or you could get fancy and modify it so the hostname is in the log entry and then have it log.)
A couple of quick optimisations and useful tools:
- yum remove iw*firmware a*firmware –releasever=7 –installroot=/var/lib/tftpboot/centos7min/root/
- yum install screen wget bind-utils –releasever=7 –installroot=/var/lib/tftpboot/centos7min/root/
And viola, that should work.
Boot the node and it should work (including autostarting FAH Client)
You can login at see with
tail -f /var/lib/fahclient/log.txt