Persistent data is a key resource provided by the operating system. The classical persistent store is files, and filesystems are often topics found in classic operating system textbooks. Other forms of persistence are archives, on-disk swap storage, source code control systems such as Subversion and Git, databases, and configuration systems such as the Microsoft Registry and its Hives.
The seriousness of research allows for each of these providers of persistence to be their own subjects. They respond to very different needs, and are in constant compromise between what is wanted and what can be reliably provided. Many ultimately use a file system as their ultimate data store, where file system is broadly defined as any set of data structures targeting a block device for data retention. However, even if generally the occupations of operating system architect and data store architect are distinct, issues in persistence are fundamental to operating system design.
Among the persistent data stores, the giant is the Database Management System, DBMS, better known as a database. These are certainly topics and occupations all to themselves. Computers are mostly useful because of databases. The information revolution is mostly the realization of vast databases tracking stuff, information and people. The software that implements a database, the database engine is a substantial and highly valuable software product. Our discussion of persistence will borrow from the database community, for they are the experts on persistence, and for many years have provided both problems and solutions for persistent datastore.
Data stores are read and updated. The updates can be considered as state transitions, from the entirety of the data and metadata before the update versus this entirety after the update. Reads function on the entirety at arbitrary moments, and generate results, which are thought of as revealing portions of the state.
Coming out of the vocabulary of traditional database theory are the four properties of Atomicity, Consistency, Isolation and Durability. The theory of these four properties is known as the ACID theory.
Atomicity: Atomicity is the property that the update is all-or-nothing, including all of the intermediate steps or none of them. Either the change, which might be implemented as a series of changes, completes fully, or is rolled back completely.
The fully completed transition is called a transaction. Transactional databases allow for a series of commonly provided atomic actions to be enveloped into an all-or-nothing unit, also called a transaction. Often the failure of completing a series of actions in a transaction is recovered by actively recovering the state from before the start of the transaction. This is called a roll-back.
Isolation: Isolation enhances to the atomicity property by specifying in which way to reads respect the all-or-nothing quality of a transaction.
The strongest form of isolation is serializable actions, meaning that only either the initial or final states of the object are available, and interactions have a consistent time order of occurrence based on whether it observed the initial or final state.
Atomic non-serializable actions are easily demonstrated: two bank accounts which are each asked to be set $10 more than the other. A serialized solution will have the two accounts ultimately settling with one account being $10 less than the other. However they could end up each $10 more than the other had at first without contradicting isolation, but with a failure of serializability.
In this case, to block advance of one transaction by another, during an update, could lead to deadlock, as both transactions wait for the other to finish. One technique used in these cases is the optimistic locking in which the input value is read before and after the output value is calculated and committed into the world. If the two reads are not equal, the transaction is retried. This turns a deadlock into a possible livelock of continual retries.
It is also possible that one action depend on both the initial and final states caused by another action, essentially straddling the time ordering of the other action. Perhaps one transaction updates two variables and a second transaction posts the sum of the two variables. Isolation would speak to whether the summing transaction must know that its read of the two variables occurred both before or both after the updating action.
Durability: Durability is the very practical concern that a transaction, once completed, never be lost. It does not need to go so far as to ensure a consistency such as a value, once committed, will never be (permanently) reverted. Durability is a feature of database backup and recovery, and it is a possibility that a failure might cause fall back on a previous database condition, while recovery is in progress.
For instance, a replicated server might be available when the primary server fails. However, a gap in update propagation exists, although is impervious to server failure. The replicated server would show old values until the eventual arrival of the propagated update.
Consistency: The persistent store has constraints on the relationships between data items. Consistency states that these constraints are always satisfied. This is an abstract definition, that depends on the constraints that one wishes to have; and these constraints work with and build off of the other three properties.
For instance, consistency can mean a read-consistency that once a change is made to the database, all reads consistently see that change.
Example: As an example of a filesystem that enforces line-semantics. Writes to files are append-only, and transitions are only on full lines of text appended (ending with and special end-of-line character).
Atomicity will say that the ultimate state of any file has only full lines appended. Isolation says that each append is the result of starting from a previously, correctly appended file.
However, it is possible that two simultaneous appends propose two new conditions for the file. For instance, two transactions appending to the file “good\n” the lines “night\n” and “day\n” propose either “good\nnight\n” or “good\nday\n”. This is a failure of isolation, since as transactions one transaction must have identified the state as “good\n” when, respecting atomicity, it is one the way to transitioning to “good\nnight\n”. Serialized, the two possibilities are either “good\nnight\nday\n” or “good\nday\nnight”.