Automatic Database Designer

The following are instructions on how to use H-Store’s database design tool, called Horticulture. This work is was part of the automatic database partitioning project.

Source Code Overview

All of the source code for Horticulture is available in the main source code repository.

The design tool consists of two components: (1) the search algorithm and (2) the cost model. The search algorithm is the process that Horticulture uses to find the best design for an arbitrary application. The main large-neighborhood search algorithm described in the SIGMOD paper is here. This algorithm executes multiple rounds of a branch-and-bound search algorithm.

The search is guided by a cost model (i.e., objective function). The default cost model for Horticulture computes the ratio of the number of single-partition transactions to distributed transactions in the sample workload.

Workload Traces

The sample workloads that used in the experiments are available in H-Store’s supplemental file repository. See these instructions on how to create these workload files for other benchmark implementations in H-Store.

Setup

Once you have the workloads and your project jar, you can create the “mappings” and “stats” files automatically using this command:

ant designer-prepare -Dproject=$BENCHMARK

And then you can invoke the designer code:

ant designer-benchmark -Dproject=$BENCHMARK

Execution

ant designer-benchmark -Dproject=$BENCHMARK \
   -Dworkload=files/workloads/${BENCHMARK}.trace.gz \
   -Dstats=files/stats/${BENCHMARK}.stats.gz \
   -Dlimit=5000 \
   -Dhints=files/designhints/${BENCHMARK}.hints \
   -Dpartitioner=edu.brown.designer.partitioners.BranchAndBoundPartitioner \
   -Dcostmodel=edu.brown.costmodel.SingleSitedCostModel

Options:

The limit option limits the number of txn traces to load from the sample workload.
The hints file is a JSON file that contains additional config options. I made it a separate file instead of using commandline arguments to make it easier for me to prototype things.
The partitioner option sets the class file to use for the partitioning algorithm. As you can see in src/frontend/edu/brown/designer/partitioners, there are other algorithms:
- MostPopularPartitioner – Choose the partitioning column for each table as the one that is most accessed by queries in the sample workload.
- PrimaryKeyPartitioner – Choose each table’s primary key as the partitioning column.
- RandomPartitioner – Choose a random column in each table as its partitioning column.
- BranchAndBoundPartitioner – Run a branch-and-bound search to find the optimal partitioning scheme for the database. This is an exhaustive search.
- LNSPartitioner – Run multiple rounds of the BranchAndBoundPartitioner to find the optimal partitioning scheme for a database.
The costmodel option sets the class file for objective function to use in the designer. SingleSitedCostModel is the basic one that will count the ratio of the number of single-partition txns with dtxns. This is probably the one you’ll want to use. There is a more complex one called TimeIntervalCostModel that can handle discrete intervals when computing skew.

« Deferred Query Execution

OLAP JVM Snapshots »

Last Updated on October 13th, 2013 by Andy Pavlo

Documentation > Deployment > Automatic Database Designer

Source Code Overview

Workload Traces

Setup

Execution

Site Information

Latest Development Updates