0%

TreeLine的编译及运行

随着现代存储技术的发展,对于NVMe SSDs来说,随机写的开销与顺序写的开销已经不像传统存储器那样差距明显了。基于LSM结构的KV数据库,如rocksdb、leanstore都采用顺序写,会牺牲部分读性能;而2021年被提出来的TreeLine则采用随机写,是一个原地更新的KV存储系统,并提出三个关键思想提高读写性能。

准备环境

安装Cmake

要求:version >= 3.17

下载源码并解压

1
2
wget https://cmake.org/files/v3.24/cmake-3.24.0.tar.gz
tar -zxvf cmake-3.24.0.tar.gz

编译并安装,- -prefix=/usr/local/cmake-3.24.0表示将cmake安装到目录/uar/local/cmake-3.24.0,可自定义目录,但要与下文中的环境变量目录相对应

1
2
3
4
cd cmake-3.24.0
./bootstrap --prefix=/usr/local/cmake-3.24.0
make
sudo make install

打开~/.bashrc文件添加环境变量

1
vim ~/.bashrc

将目录/usr/local/cmake-3.24.0/bin添加进环境变量,在文件末尾追加

1
export PATH=/usr/local/cmake-3.24.0/bin:$PATH

使~/.bashrc文件立即生效

1
source ~/.bashrc

安装gcc/g++

要求:必须支持C++17

下载源码并解压

1
2
wget https://mirror.tuna.tsinghua.edu.cn/gnu/gcc/gcc-9.3.0/gcc-9.3.0.tar.gz
tar -zxvf gcc-9.3.0.tar.gz

下载所需依赖

1
2
cd gcc-9.3.0
./contrib/download_prerequisites

编译并安装,将gcc/g++安装到/usr/local/gcc-9.3.0下

1
2
3
./configure --prefix=/usr/local/gcc-9.3.0  --enable-bootstrap --enable-languages=c,c++ --enable-checking=release --disable-multilib
make -j4
sudo make install

打开~/.bashrc文件添加环境变量

1
vim ~/.bashrc

添加进环境变量,在文件末尾追加

1
2
3
export PATH=/usr/local/gcc-9.3.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/gcc-9.3.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/gcc-9.3.0/lib64:$LD_LIBRARY_PATH

使~/.bashrc文件立即生效

1
source ~/.bashrc

安装Python

要求:version >= 3.8

下载所需工具和库

1
sudo apt install build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev libffi-dev

下载源码并解压

1
2
wget https://www.python.org/ftp/python/3.8.16/Python-3.8.16.tgz
tar -xf Python-3.8.16.tgz

编译并安装,安装到/usr/local/python-3.8.16目录下

1
2
3
4
cd Python-3.8.16
./configure --prefix=/usr/local/python-3.8.16 --enable-optimizations
make
sudo make install

打开~/.bashrc文件添加环境变量

1
vim ~/.bashrc

将目录/usr/local/python-3.8.16/bin添加进环境变量,在文件末尾追加

1
export PATH=/usr/local/python-3.8.16/bin:$PATH

使~/.bashrc文件立即生效

1
source ~/.bashrc

下载其他所需依赖

1
sudo apt install libtbb-dev autoconf libjemalloc-dev

若无法使用此方法安装libtbb-dev和libjemalloc-dev,也可以选择下载源码并编译安装,参考下文中的“安装tbb库”和“安装jemalloc库”

安装tbb库

下载源码

1
git clone https://github.com/oneapi-src/oneTBB.git

编译并安装,安装目录为/tmp/my_installed_onetbb

1
2
3
4
5
cd oneTBB
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/tmp/my_installed_onetbb -DTBB_TEST=OFF ..
cmake --build .
cmake --install .

将所需头文件及依赖复制到/usr/local/gcc-9.3.0/include/c++/9.3.0/和/usr/local/gcc-9.3.0/lib64/下

1
2
3
4
cd /tem/my_installed_onetbb
sudo cp -r include/tbb /usr/local/gcc-9.3.0/include/c++/9.3.0/ //复制头文件
cd lib64
sudo cp *.so *.2 *.11 *.12 /usr/local/gcc-9.3.0/lib64/ //复制依赖

安装jemalloc库

下载源码并解压

1
2
wget https://github.com/jemalloc/jemalloc/archive/5.2.1.tar.gz
tar -zxvf 5.2.1.tar.gz

编译并安装,安装目录默认为/usr/local

1
2
3
4
5
6
cd jemalloc-5.2.1
./autogen.sh
./configure --with-version="5.2.1-0-g0"
make dist
make
sudo make install

打开~/.bashrc文件添加环境变量

1
vim ~/.bashrc

将目录/usr/local/bin和/usr/local/lib添加进环境变量,在文件末尾追加

1
2
export PATH=/usr/local/bin:$PATH
export LD_LIBRAR_PATH=/usr/local/lib:$LD_LIBRAR_PATH

使~/.bashrc文件立即生效

1
source ~/.bashrc

下载TreeLine源码

1
git clone https://github.com/mitdbg/treeline.git

编译

创建build文件夹存放编译后的文件

1
mkdir build && cd build

执行编译(不包括treeline/tests/和treeline/benchmarks/目录下的文件)

1
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j

若也要编译treeline/tests/下的文件,则编译命令可换为

1
cmake -DCMAKE_BUILD_TYPE=Release -DTL_BUILD_TESTS=ON .. && make -j

同样地,若也要编译treeline/benchmarks/下的文件,则编译命令可换为

1
cmake -DCMAKE_BUILD_TYPE=Release -DTL_BUILD_BENCHMARKS=ON .. && make -j

也可将treeline/下的文件全部编译:

1
cmake -DCMAKE_BUILD_TYPE=Release -DTL_BUILD_TESTS=ON -DTL_BUILD_BENCHMARKS=ON .. && make -j

编译后的运行benchmark的可执行文件为treeline/build/bench/run_custom

运行

下载cond

使用cond命令能够批量执行任务,treeline/scripts/ycsb_v2/COND中已经写好了批量运行benchmark的实验命令

1
2
apt install python3-pip  //先下载pip
pip install conductor-cli

命令行输入cond,若显示Command cond not found,打开~/.bashrc文件添加环境变量

1
vim ~/.bashrc

将cond所在目录/home/xxx/.local/bin添加进环境变量,在文件末尾追加(xxx为用户名)

1
export PATH=/home/xxx/.local/bin:$PATH

使~/.bashrc文件立即生效

1
source ~/.bashrc

设置checkpoint存储路径

每个数据库会从空开始加载相同的数据集,做相同次数的更新操作,并截取此时的数据库作为checkpoint;在跑benchmark之前,会提取每个数据库的cheakpoint作为实验的起点;

在treeline/scripts/下创建experiment_config.sh文件

1
2
cd treeline/scripts
touch experiment_config.sh

将treeline/scripts/experiment_config_example.sh的内容复制给experiment_config.sh

修改experiment_config.sh文件

1
2
3
4
5
6
#存储checkpoint的路径
DB_CHECKPOINT_PATH = xxxx/llsm-checkpoint
#加载checkpoint的路径
DB_PATH = xxxx/llsm
#自定义数据集存储路径
TP_DATASET_PATH = xxxx/datasets

修改实验配置

实验参数和cond运行命令写在treeline/scripts/ycsb_v2/COND中

可以通过修改此COND文件中的WORKLOADS,DBS,DISTRIBUTIONS,THREADS,CONFIGS等来修改运行的工作负载类型、数据库、数据分布类型、线程数、记录大小等

例:

1
2
3
4
5
6
7
8
9
WORKLOADS = ["a","b",]  
DBS = [
"leanstore",
"pg_llsm",
"rocksdb"
]
DISTRIBUTIONS = ["zipfian"]
THREADS = [1, 2, 4, 8, 16]
CONFIGS = [CONFIG_64B]

表示只运行工作负载A和B类型,在所示三个数据库上进行测试,数据分布类型为zipfian,并分别使用1,2,4,8,16线程跑五次,一个记录大小为64B。于是一共会执行2×3×1×5×1=30次实验。

COND文件中也已经写好了相应的cond运行命令(下面是文件中其中一个cond运行命令的描述)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
run_experiment_group(
name="synth",
run="./run.sh",
experiments=[
# e.g. synth-pg_llsm-64B-a-zipfian-1
ExperimentInstance(
name="synth-{}-{}-{}-{}-{}".format(db, config["name"], workload, dist, threads),
options={
**COMMON_OPTIONS,
**process_config(db, config, SYNTH_DATASET, workload=workload),
"db": db,
"checkpoint_name": "ycsb-synth-{}-{}".format(db, config["name"]),
"threads": threads,
"gen_template": "workloads/{}.yml".format(workload),
"gen_distribution": dist,
},
)
for db, config, workload, dist, threads in product(
DBS,
CONFIGS,
WORKLOADS,
DISTRIBUTIONS,
THREADS,
)
# The uniform and zipfian "d" workloads are the same, so just run one.
if not (workload == "d" and dist == "uniform")
],
deps=[
# e.g. :preload-synth-pg_llsm-64B
":preload-synth-{}-{}".format(db, config["name"])
for db, config in product(DBS, CONFIGS)
],
)

其中name表示的是command的名称,options是传递给run.sh的参数(包含实验参数)

执行cond run //scripts/ycsb_v2: synth后,会将options中的值作为参数运行run.sh,但在运行run.sh之前,会先执行deps中的任务,也就是装载checkpoint

实验参数修改完成后,通过替换下文“运行benchmark”中的command,可实现特定的一个或数个任务

1
2
3
command = synth   //在合成数据集上测试
command = amzn //在amzn数据集上测试
command = osm //在osm数据集上测试

更多“command”可取值详见COND文件

运行benchmark

使用cond命令运行

1
2
cd treeline
cond run //scripts/ycsb_v2: [command]

结果保存在treeline/cond-out/下