Transactions for Non-Volatile Memory

Now that the SNIA Technical Working Group (TWG) on Non-Volatile Memory (NVM) Programming has published version 1 of the NVM Programming Model, the TWG is looking at friendlier programming interfaces such as transactions. I think it's far to early to attempt to standardize persistent transactional memory, but perhaps some rudimentary infrastructure can be specified that can later become part of the run-time of a full-blown persistent transactional memory implementation or some other full-featured transaction library. In the meantime programmers could code directly to this lower level interface and at least gain some benefit over writing their own transaction system.

Additionally, providing even a primitive transaction interface gives the storage system the advantage of visibility of the application transaction boundaries. This visibility enables the storage system to optimize data management operations such as making off-node copies for high availability or disaster recovery, or making an application-consistent snapshot.

One interesting line of investigation follows Satyanarayanan's Lightweight Recoverable Virtual Memory (RVM) concept. RVM is a simple library that provides atomicity and durability but leaves implementation of the rest of the ACID properties to the application. It gains its simplicity by requiring the programmer to specify the ranges of recoverable memory which it intends to modify during a transaction. This avoids the need for hooks into the virtual memory system or the compiler and language run-time that discover the write-set of the transaction. (Since the library does not provide isolation, there is no need to discover the read-set. There is a modern implementation of RVM that provides a static analysis tool to assist the programmer in specifying the write-set.) The RVM system was used in the implementation of the Coda file system from CMU.

Peter Chen's work in Reliable Memory (RAM I/O, or Rio) is very applicable to NVM. The programming model of the Rio File Cache is nearly identical to the Persistent Memory mode of version 1 of the NVM Programming Model. David Lowell and Peter Chen later created a variant of Satyanarayanan's RVM that is optimized for reliable memory. They described their Vista RVM in their paper Free Transactions with Rio Vista. They were able to simplify and speed up the RVM implementation by eliminating the on-disk redo log, since their in-memory undo log is persisted by the Rio file cache.

One criticism of the RVM model, recently raised in the TWG by Hans Boehm, is that it is ill-suited to today's more modular programming styles. These RVM systems were developed in the 1980's and 1990's. At that time systems programming was done in a raw C style where the programmer was explicitly aware of memory allocation.  In this style, it is fairly easy to pre-declare the write-set of a transaction. However, in today's more modular or object-oriented programming style it is quite common for modules to hide memory allocation from the programmer. One might even argue that such hiding is one of the principal benefits of this style of programming. In the modern style, it is effectively impossible for the programmer to pre-declare the write-set of a transaction that calls libraries or sub-modules.

I think this is a quite persuasive argument. Hans argues for a write-ahead after-image redo log design instead of RVM and Vista's before-image undo log. In place of pre-declaring the memory to be changed by a transaction, such a design would provide both the location of the memory and the data to be written in that location to the transaction library. This is equivalent to an ordinary filesystem write system call except that a group of such calls are grouped into an atomic transaction. While the application does not know where a library or sub-module will allocate memory, it may be able to discover the location and size of the allocated structures after-the-fact, and so be able to utilize this write-ahead logging variant of RVM for transactional persistence.

The write-ahead log approach introduces an additional copy of the data that is not needed in the before-image logging approach. The additional copy is unfortunate because one of the principal goals of the NVM Programming TWG is to maximize the performance of the interface. The underlying persistent memory, whether it be power-protected DRAM backed by flash in an NV-DIMM or some new kind of storage-class memory, is expected to be nearly as fast as DRAM.  The before-image logging of Vista requires one copy of the before-image of the write-set. The write-ahead log approach requires one copy of the after-image of the write-set to the write-ahead intent log followed by copying the after-image to the persistent data.  This additional copy doubles the time required for the I/O. 

Moreover, today's highly modular object oriented systems usually completely hide the location and size of their internal data structures. These programs perform I/O by iterating over all their object instances and asking the object to serialize itself on a supplied output stream, rather than discovering the memory locations of the object instances and explicitly writing them to an output stream. So even a write-ahead log design for transactions is difficult for a programmer to apply in a modern object oriented program. 

An alternative transaction interface uses the page protection features of the virtual memory subsystem to determine the write-set of the application at run-time. Lowell and Chen's Vista version of RVM provides this option as well. Such a VM-based mechanism solves the modularity problem but it may introduce a high overhead. The write-set is only discovered with VM-page-level granularity. If the program's persistent data structures do not have a high degree of spatial locality, this coarse granularity results in logging many unnecessary memory locations with the attendant high cost in both log storage space and execution time. One of the attractions of the new NVM technology is that, due to the zero cost for random access, it enables programs to use persistent data structures that do not have a high degree of spatial locality; that is, to not be restricted to disk-friendly data structures. A low degree of spatial locality imposes unacceptable overhead costs on a transaction design that uses VM page-protection to detect the write-set at run-time.

Perhaps the TWG should provide all three interfaces and, in the near-term, let the programmer decide which trade-off is appropriate. In the long-term we can hope for persistent transactional memory extensions to programming languages, compilers, and language run-times, or CPU hardware support for fine-grained tracking of writes at run-time.

© Steve Byan 2011-2019