{"id":2360,"date":"2013-12-16T20:48:10","date_gmt":"2013-12-17T01:48:10","guid":{"rendered":"http:\/\/hstore.cs.brown.edu\/?page_id=2360"},"modified":"2014-03-02T12:09:11","modified_gmt":"2014-03-02T17:09:11","slug":"jvm-snapshots","status":"publish","type":"page","link":"https:\/\/hstore.cs.brown.edu\/documentation\/deployment\/jvm-snapshots\/","title":{"rendered":"OLAP JVM Snapshots"},"content":{"rendered":"
This page describes how to use new experimental JVM Snapshot to execute analytical (OLAP) queries in H-Store.<\/p>\n
<\/a><\/p>\n JVM Snapshots allows for H-Store to execute analytical queries without interfering with the normal OLTP workload. When executing analytical queries directly on the H-Store, it will block all the other transactions and execute the distributed transaction, which usually takes longer time than transactional queries. This has a significant impact on the performance of the whole system. With JVM Snapshots, the H-Store site will send the analytical transaction to the JVM Snapshot. In this way, it avoid the expensive concurrency control because the snapshot database has only one distributed transaction.<\/p>\n The trickier part is how to create the snapshot. Since H-Store is a main memory database, we can use fork()<\/b> to create a consistent virtual memory snapshot, which contains all the information the system need to execute OLAP query.<\/p>\n Fork has a nice property that is called Copy-on-Write. When the process is forked, the operating system does not need to make actual physical memory copies. Instead, virtual memory pages in both processes may refer to the same pages of physical memory until one of them writes to such a page: then it is copied. And this process is controlled implicitly by the operating system.<\/p>\n Thus the creation of snapshots is lightweight and fast. But the downside is that it may hurt the performance of write operations, because after creating the snapshot, each writing operation will trigger a page allocation which is an overhead that cannot be neglected in main memory DBMS.<\/p>\n <\/a><\/p>\n <\/a><\/p>\n Most of the work is done by the JVM Snapshot Manager that runs in a separate thread in the H-Store site. The Manager is in charge of creating, refreshing snapshots, communicating with snapshots and queuing OLAP queries.<\/p>\n When H-Store receives an OLAP query, it will parse and compile the SQL statement and generate a transaction object. Then it will put the transaction object in the queue of JVM Snapshot Manager. In the Manager main loop, it will check the queue and retrieve the OLAP transaction. Then it will check whether the current snapshot is available. If not, it will fork a new snapshot and wait for the initialization of the snapshot. Then it will send the OLAP transaction to the snapshot and wait for response.<\/p>\n In the snapshot process, it will first respawn partition execution engine threads and some other essential utility threads. After that it will set up socket connection with the parent process and wait for OLAP transaction or shutdown command. In this way only one query is executed in the snapshot at one time because the queuing happens in the JVM Snapshot Manager, which makes the logic of the snapshot much easier.<\/p>\n The communication between snapshot and the manager is socket connection. There are three types of messages: OLAP transaction request, OLAP transaction response, and shutdown.<\/p>\n Here the snapshot should be recreated after a certain amount of time to make sure the data in the snapshot will not be too stale. However, creating a new snapshot may introduce overhead and affect the OLTP performance, so here is a tradeoff between data staleness and performance. This is controlled by site.jvmsnapshot_interval<\/b>.<\/p>\n <\/a><\/p>\n While H-Store is executing TPC-C or read-only TPC-C workload, we manually invoke OLAP queries towards the system to measure the influence caused by OLAP queries.<\/p>\n The OLAP query is drawn from another OLAP benchmark TPC-H. The query requires a full table scan on ORDER_LINE, which is the busiest table in the TPC-C benchmark:<\/p>\n\nOverview<\/h2>\n
Implementation Details<\/h2>\n
Evaluation<\/h2>\n