Documentation > Architecture Overview
H-Store is highly optimized for three salient features of OLTP applications:
- Most transactions in an OLTP workload only access a small subset of tuples.
- Most transactions have short execution times and no user stalls
- Most transactions are repeatedly drawn from a pre-defined set of stored procedures
Based on these observations, H-Store was designed as a parallel, row-storage relational DBMS that runs on a cluster of shared-nothing, main memory executor nodes.
We define a single H-Store instance as a cluster of two or more computational nodes deployed within the same administrative domain. A node is a single physical computer system that hosts one or more execution sites and one transaction coordinator. A site is the logical operational entity in the system; it is a single-threaded execution engine that manages some portion of the database and is responsible for executing transactions on behalf of its transaction coordinator. The transaction coordinator is responsible for ensuring the serializability of transactions along with the coordinators located at other nodes. We assume that the typical H-Store node contains one or more multi-core CPUs, each with multiple hardware threads per core. As such, multiple sites are assigned to nodes independently and do not share any data structures or memory.
Every table in H-Store is horizontally divided into multiple fragments (sometimes called shards) that each consist of a disjoint sub-section of the entire table. The boundaries of a table’s fragments are based on the selection of a partitioning attribute (i.e., column) for that table; each tuple is assigned to a fragment based on the hash values of this attribute. H-Store also supports partitioning tables on multiple columns. Related fragments from multiple tables are combined together into a partition that is distinct from all other partitions. Each partition is assigned to exactly one site.
All tuples in H-Store are stored in main memory on each node; the system never needs to access a disk in order to execute a query. But it must replicate partitions to ensure both data durability and availability in the event of a node failure. Data replication in H-Store occurs in two ways: (1) replicating partitions on multiple nodes and (2) replicating an entire table in all partitions (i.e., the number of fragments for a table is one, and each partition contains a copy of that fragment). For the former, we adopt the k-safety concept, where $k$ is defined by the administrator as the number of node failures a database can tolerate before it is deemed unavailable.
Client applications make calls to the H-Store system to execute pre-defined stored procedures. Each procedure is identified by a unique name and consists of user-written Java control code intermixed with parameterized SQL commands. The input parameters to the stored procedure can be scalar or array values, and queries can be executed multiple times.
Each instance of a stored procedure executing in response to a client request is called a transaction. Similarly as with tables, stored procedures in H-Store are assigned a partitioning attribute of one or more input parameters. When a new request arrives at a node, the transaction coordinator hashes the value of the procedure’s partitioning parameter and routes the transaction request to the site with the same id as the hash. Once the request arrives at the proper site, the system invokes the procedure’s Java control code, which will use the H-Store API to queue one or more queries in a batch. The control code invokes another command in H-Store to block the transaction while the execution engine executes the batched queries.