What is LevelDB

LevelDB 是 KV 型数据库，为 BSD 协议，为 Leveling+分区 实现的 LSM 代表，适合写多读少。

特性

Key Value 支持任意 Byte 类型数组，不单单支持字符串
为持久化存储的 KV 系统，大部分数据存在磁盘上
按照 key 值顺序存储数据，按照用户定义的比较函数进行排序
操作接口简单，包括写/读记录以及删除记录，也支持针对多条操作的原子批量操作
支持 snapshot，读写操作互不影响
支持 snappy 压缩，I/O 效率高

源码编译

直接 clone

1	git clone https://github.com/google/leveldb.git

initialize submodule issues#1004

# 一步到位
git clone --recurse-submodules https://github.com/google/leveldb.git

# without re-cloning
git submodule update --init --recursive

编译

cd leveldb
mkdir -p build && cd build 
# Release
cmake -DCMAKE_BUILD_TYPE=Release .. && cmake --build .

# Debug 方便跟踪还是 Debug 伐
cmake -DCMAKE_BUILD_TYPE=Debug .. && cmake --build .

为方便将 levelDB 加入到系统库

# Linux
cp -r ./include/leveldb /usr/include/
cp build/libleveldb.a /usr/local/lib/

# MacOS waiting

基本组件

字节序 Endian

一般都是小端序，将低序字节存在起始位置的为小端序，反之为大端序。以 short 2B 数据作测试:

/* RETURN "Little Endian" */  
union {
    short s;
    char c[sizeof(short)]; 
    // 共享一块内存时 char 是 1B 不需要做额外除法
    // 如果是 int 为 4B，需要 sizeof(int) / 4，内存大小要对齐
  } tmp;

  tmp.s = 0x0102;
  if (tmp.c[0] == 1 && tmp.c[1] == 2) {
    std::cout << "Big Endian" << std::endl;
  } else if (tmp.c[0] == 2 && tmp.c[1] == 1) {
    std::cout << "Little Endian" << std::endl;
  }

slice

slice 为 levelDB 自定义的数据结构，在 include\leveldb\slice.h 中，不用 string 的原因是 CPP 中 string 为拷贝变量，不支持 remove_prefix, starts_with 的操作，不灵活。

// Snippets of `slice.h`

// Private variables:
private:
 const char* data_;
 size_t size_;

// Updated Function

// Return true iff "x" is a prefix of "*this"
 bool starts_with(const Slice& x) const {
   return ((size_ >= x.size_) && (memcmp(data_, x.data_, x.size_) == 0));
 }

 // Drop the first "n" bytes from this slice.
 void remove_prefix(size_t n) {
   assert(n <= size());
   data_ += n;
   size_ -= n;
 }

status

status.h
用于记录 levelDB 中状态信息，保存错误码和对应的字符串错误信息(不支持自定义)。基本组成为:

Status 具有特色的私有变量为:

private:
 // 不支持扩展
 enum Code {
   kOk = 0,
   kNotFound = 1,
   kCorruption = 2,
   kNotSupported = 3,
   kInvalidArgument = 4,
   kIOError = 5
 };
 Status(Code code, const Slice& msg, const Slice& msg2);
 // OK status has a null state_.  Otherwise, state_ is a new[] array
 // of the following form:
 //    state_[0..3] == length of message
 //    state_[4]    == code
 //    state_[5..]  == message
 const char* state_;

具体实现为:

/*status.cc*/
Status::Status(Code code, const Slice& msg, const Slice& msg2) {
  assert(code != kOk);
  const uint32_t len1 = static_cast<uint32_t>(msg.size());
  const uint32_t len2 = static_cast<uint32_t>(msg2.size());
  const uint32_t size = len1 + (len2 ? (2 + len2) : 0);
  char* result = new char[size + 5];
  std::memcpy(result, &size, sizeof(size));
  result[4] = static_cast<char>(code);
  std::memcpy(result + 5, msg.data(), len1);
  if (len2) {
    result[5 + len1] = ':';
    result[6 + len1] = ' ';
    std::memcpy(result + 7 + len1, msg2.data(), len2);
  }
  state_ = result;
}

编码

levelDB 中分为定长和变长编码，变长编码目的是为了减少空间占用，基本思想为：每一个 Byte 最高位 bit 用 0/1 表示该整数是否结束，用剩余 7-bit 表示实际的数值，在 protobuf 中被广泛使用。

if-else 根据数值 v 来判断是否需要继续变长。B 作为最高位来表示整数是否结束。

/* util\coding.cc */
char* EncodeVarint32(char* dst, uint32_t v) {
  // Operate on characters as unsigneds
  uint8_t* ptr = reinterpret_cast<uint8_t*>(dst);
  static const int B = 128;
  if (v < (1 << 7)) {
    *(ptr++) = v;
  } else if (v < (1 << 14)) {
    *(ptr++) = v | B;
    *(ptr++) = v >> 7;
  } else if (v < (1 << 21)) {
    *(ptr++) = v | B;
    *(ptr++) = (v >> 7) | B;
    *(ptr++) = v >> 14;
  } else if (v < (1 << 28)) {
    *(ptr++) = v | B;
    *(ptr++) = (v >> 7) | B;
    *(ptr++) = (v >> 14) | B;
    *(ptr++) = v >> 21;
  } else {
    *(ptr++) = v | B;
    *(ptr++) = (v >> 7) | B;
    *(ptr++) = (v >> 14) | B;
    *(ptr++) = (v >> 21) | B;
    *(ptr++) = v >> 28;
  }
  return reinterpret_cast<char*>(ptr);
}

Option

Option 记录了 levelDB 中的参数信息，用来控制 levelDB 的控制信息
通用(如 block 数据大小和读写设置)：options.h GitHub
LSM存储设置: dbformat.h GitHub

skiplist

常规线段跳表

定义

skiplist 用于代替平衡树，可以看作是并联的有序链表。skiplist 通过概率保证平衡，平衡树采用严格的旋转来保证平衡，因此 skiplist 比较容易实现，并且相比平衡树有着较高的运行效率。

Redis 中也是 skiplist，其默认最大的 level 为64

db/skiplist.h

LevelDB 源码分析 (一) ｜前沿和基本组件