ReFS is Microsofts newer file system designed for some really fun highly scalable systems… but in a world where things don’t always get deployed “By the book” what are some of the draw backs? This blog talks specifically about space reclamation on ReFS.
If you look around on the net, you will find a lot of “Just use ReFS”, “ReFS is better” or “Its newer so use that”. Throw all that out. If you don’t understand what you are doing, you can’t understand what the impact will be
If you are using ReFS ontop of a network storage system (SAN / NAS), Nested inside another file system (Eg VMFS) or some other quirky config. you should be aware that ReFS has some fun things to do with space reclamation because it DOESN’T Support TRIM or UNMAP.
On dedicated hardware, this isn’t a problem because you don’t care. The disk will do what it wants to. This is only an issue where your physical disk is abstracted a couple of times over.
Straight up: ReFS manual even says this – but people (including me) – gloss over this really important point. So we can’t even be mad at Microsoft for “hiding this” as i have seen some forum posts say. It’s right there! in Purple!
The short version is, consult your vendor on what needs to happen for your specific setup.
I will be going on to talk about what i have seen, which is ReFS ontop of VMFS and ReFS ontop of Dell Compellent (SC Series) where Thin Provisioning is expected (and to be clear here, i mean that the space on the underlying system reflects what is actually in use inside the OS)
When i started out, i understood what the problem was, but there wasn’t a lot of info on how this worked with VMFS and how to get from “Broken” to “Fixed” in depth
The setup on VMware VMFS
So this one had me stumped for a bit. In our setup we have:
VMware VM with a ReFS disk > iSCSI Datastore as VMFS > Dell Compellent
The expectation is that:
You delete a file from ReFS, it sends a note to VMFS to say its free, VMFS space reclamation cleans it up and marks it as free, then tells Compellent it’s free, space is reclaimed. Everyone parties at the pub.
What happens in reality:
You delete a file from ReFS. Space is retained and never shrinks. At some point at 3am, you get an alarm and you start to cry.
The setup on Compellent LUNs
This one drove me crazy for a while because it just didn’t make sense. In out setup we have:
Windows (as a VM) with a iSCSI Disk attached from inside the VM, Formatted as ReFS > Dell Compellent
The expectation is that:
You delete a file from ReFS, It sends a note to compellent to say its free > Compellent Snapshot happens, space is released.
What happens in Reality:
You delete a file from ReFS. Space is retained and never shrinks on the Compellent.
In reality, for both VMFS and Compellent, it’s not a huge problem because there is free space. It will eat into that, but you will have bloat on your arrays if you are doing thin provisioning because you never actually get the space back.
So how do we get to the fix?
Well first up, you need these 2 settings on any server running ReFS (VMFS or Compellent) to allow releasing of blocks to happen
fsutil behavior set DisableDeleteNotify ReFS 1 REG ADD HKLM\System\CurrentControlSet\Control\FileSystem /v RefsEnableLargeWorkingSetTrim /t REG_DWORD /d 1
The first allows a DeleteNotify to happen to the underlying storage
The second allows a unmap to happen more frequently
A reboot is recommended by MS, but in my testing was not required. Its Windows but, so probably reboot.
Once those commands are in place, you need to do a free space wipe to mark the “Free space” as zeros. Sdelete is the one most places recommend
A word of caution: This WILL fill the disk with zeros before releasing them all. If you are space constrained on the underlying system, you may deplete the storage. Use with caution.
sdelete64.exe -z YourDisk:
On VMware VMFS
Once the sdelete finishes, you will have a heap of free space. This is the bit that I was most curious about…. how do we get that space back from VMFS?
Space reclamation on VMFS6 improved greatly, but even left alone with no other IO for days, it was never reclaimed.
Now if you are a VMware expert, right now you are probably thinking the same thing I did “I’ll just storage vMotion the VM! That will sort it out!”.
Narrator: It would not sort it out.
Thanks VMware – Unless you have a datstore with a different block size, you will also need to run a PunchZero – which requires the VM to be offline…….. Awesome.
Once you run a PunchZero but, you should see the space pretty much instantly reclaimed
On Compellent LUNs
Once you run the sdelete (command above) and run a Compellent snapshot, the space will be reclaimed. There is nothing more to do here, but you may need to run these semi-frequently depending on your setup
ReFS has its uses, but with so many layers, it can get tricky to work out exactly where this issue lies and whats gumming up the works.
- In Summary but, if you are using ReFS on Storage Systems:
- Read the manufacturers guide to what is supported
- You probably need those 2 commands above
- Free space wipes may be required from time to time
- On VMFS, you might need a PunchZero now and then
- Really evaluate what ReFS GIVES YOU nested this deep. You may lose all gains it claims to provide
References and further reading:
On my journey to answer this question, the following helped greatly. Please consult them for further information: