Web Operations is a terrific book. Baron Schwarz’s chapter on web-centric database is especially strong. For databases that serve web sites, availability is key. This is a change from traditional systems, for example back office financial database where consistency was probably most important. Web based databases tend to be read dominated and deal with simple queries. Writes, when they occur, tend to affect only a single row. Schwarz emphasizes simplicity wherever possible, to the extent of avoiding stored procedures, foreign key constraints, triggers or views. These features negatively affect performance and complicate solution that address scaling.
When trying to improve performance, caching is usually an easy win. Its also important to apply user quotas to potentially costly resources. (For example, enforcing a maximum number of friends may significantly decrease the load to compute a page.) Since web applications are read dominated, they can often be supported by a database servers in a master/slave hierarchy. Only database writes are sent directly to the master. These updates are duplicated on the slaves in a timely manner. All read operations use a slave server. One slave is often assigned to backup and computationally expensive tasks. Instead of master/slave servers one can use knowledge about the application and partition the database across functional requirements. Instead of a series of identical slaves, the data is divided based on value, required access time or some other intelligent metric.
Schwartz gives a nice discussion of sharding and why it often isn’t he right solution. Under a section titled more risky solutions that should not be used, he lists multiple write masters, multilevel replication, ring replication, and entity attribute value databases.
Date: February 27, 2011