Student Research Projects

The H-Store project is looking for capable undergraduate and graduate students to assist with the development of the database and its supporting environment. The following are list of open research projects that are available. Interested students need not be at Brown, MIT, or Yale, but we ask if are not at one of those schools then to please coordinate with us first before proceeding.

TPC-E Benchmark Port

Estimated Time: 20-30 hours.
We currently have a partial implementation of the TPC-E Benchmark written for H-Store. We are looking for a student to complete the port and get the benchmark fully running on the H-Store system (or as close to the official specification as possible). There are two tasks in this project: (1) integrate the official TPC-E C++ workload generator into H-Store’s Java-based benchmark framework and (2) refactor any complex queries that contain SQL features that are currently not supported by H-Store into equivalent multiple queries and Java code. For the former task, we have already integrated the TPC-E EGenLoader into an H-Store benchmark data loader, but we still need to add JNI hooks into the EGenClient driver in order to generate the transactional workload (e.g., store procedure invocations with input parameters). Once this is done, the next step is go through the TPC-E stored procedures written for H-Store and refactor them as needed in order to produce the correct results. For example, the H-Store distributed planner has trouble with the seven table join query in the BrokerVolume transaction; the student will need to break the single large query up into multiple queries and combine the results in the Java-portion of the stored procedure. Students will need some previous C++ and Java development experience.

Airline Booking OLTP Benchmark

Estimated Time: 30-40 hours.
Much like TPC-E, we also have a half-completed implementation of a new OLTP benchmark that is based on airline ticket booking systems. This work is an enhancement of a ticketing system benchmark designed by Mike Stonebraker in 2008. We are looking to add non-deterministic transactions that are more complex than the original benchmark specification. Specifically we are interested in adding affinity-based skew in the data based real world information (e.g., flights out of Madison, WI are more likely to fly to Chicago, IL than any other destination). This project consists of (1) writing a synthetic data loader that is based on publicly available air travel information from the FAA, (2) creating stored procedures that model an airline ticketing system, and (3) writing documentation on the data loader and workload generator. All code will be written in Java using the H-Store benchmark framework.

System Development & Optimization

Estimated Time: 10-40 hours.
The original VoltDB transaction coordination system was completely removed and rewritten in H-Store in order to support more complex transaction workloads in our research projects. As such, it is obviously not as optimized as VoltDB. We are looking for a resourceful student to profile the Java-based frontend code in order to identify performance bottlenecks and then work with one of the H-Store developers to optimize the code. The complexity and scope of this project can vary based on the student’s skill set. Some issues may be as simple as correcting bad coding practices, while others could involve integrating a shared memory message passing subsystem. Familiarity with Java-based profiling tools (e.g., JProfiler) is preferred but not required.

Universal OLTP Benchmark Framework

Estimated Time: 30-50 hours.
We are working in conjunction with the RelationalCloud project to develop a general benchmark framework for comparing parallel OLTP systems. This project is similar to the Yahoo! Cloud Serving Benchmark platform. Possible responsibilities could include (1) integrating a harness for benchmarks written using the H-Store benchmark framework, (2) writing plug-ins for other DBMSs (e.g., Oracle, DB2), and (3) collecting workload samples and analyzing performance results. This project is more research-focused than the other ones listed above.