As we all know, patching is an essential & integral part of any IT infrastructure. It could be cloud based systems (virtual) or an on-premise virtual systems or physical servers running in a dedicated data center. Patch management has now become an important buzzword in corporate IT organizations and business offices. Patch management is basically the process of acquiring, testing and installing multiple code changes (patches) to systems software and applications.
As we go with such changes, there are at times that we need to have a strong fallback method in case any failures after patching. This method should be resilient and should bring back the infrastructure to a steady state as it was before patching. So, let's talk about some of the industry best practices in this space in brief. We are excluding any third party tools or practice outside of the native Linux infrastructure.
The following are the possible fault-tolerance options at system/patch level.
 System Snapshot from hypervisor or cloud platform for virtual systems.
This is one of the most commonly used and recommended approaches in case of virtual/cloud based estate. This gets initiated and completed on the Virtual Infrastructure end or at cloud end as desired. This can be categorized as one of the best practices in this sector.
- Industry recommended practice
- Easy to execute and restore from either hypervisor or cloud level.
- System level Fault tolerance.
- Does require additional storage space on the backend infrastructure.
- Any changes after the snapshot are not valid when restored.
 System level fault-tolerance using native LVM snapshot with Boom utility in RHEL7.5 onwards.
There are many times that we need a simple yet native solution in Linux which could save the system state(snapshot) and restore later quickly. Yes, a simple and yet native solution is to use LVM Snapshot. This feature facilities in capturing the root file system (/) snapshot and revert the changes later using snapshot. The only prerequisite for this is that the root file system (/) should be on a LVM and there should be free space available within the Volume Group. There are many use cases of this and one of them is to restore system state after making some changes which are not desired or expected. The other main use case would be to restore system state after unsuccessful patching of systems. This is another best practice method.
- Natively supported on RHEL7.5 systems on-wards.
- Could provide a second level of system fault-tolerance on top of system snapshot.
- Requires additional disk space to get it implemented in each system.
- Requires a Linux enterprise.
 Package level roll back using "dnf history" or "yum history" commands.
One of the most easiest and yet native methods of restoring a package state. As most of the system management team is aware of this command and should not be complicated in implementing.
- Native Linux feature.
- Easy to implement and execute.
- No additional disk space since this is a native feature of the rpm database which keeps track of package transaction history.
- Doesn't provide system level fault-tolerance.
- Some packages from SELinux, selinux-policy-*, kernel, glibc can't be rolled back using this feature.
- open-source Linux feature.
- No need of any subscription or license.
- Provides a complete backup solution well suited in case of disaster.