database

btcd数据库设计

块的存储

由于区块链块信息的不变性,btcd将整个比特币的区块想象成一个超级大的平坦文件,块与块相邻存放.
由于文件系统的限制,将这个超级大文件拆分成n多文件块,每块都是512M.
假定我知道我要读的第3000block的起始是1000000000000,大小是3M,那么很简单的算法.
一个文件块是512M字节=51210241024=536870912字节,
因此,我位于第1862(1000000000000/536870912)文件块,文件块内偏移是346361856(1000000000000%536870912)字节,
那么我就读取第1862文件块中的346361856到349507584即可.

当然具体实现还要考虑到缓存以及block跨越文件块的问题

接口

btcd关于数据的设计完全是面向接口的,最终暴露出来的是ffldb/interface.go中的几个接口,
主要是DB,Cursor,Bucket,Tx 全部都是接口. 真正的实现位于ffldb/db.go中

  • Bucket是真正存数据的地方,并且提供了直接读写KV接口
  • 一组读写Bucket使用Tx进行管理,
  • Cursor是用来遍历Bucket

// db represents a collection of namespaces which are persisted and implements
// the database.DB interface.  All database access is performed through
// transactions which are obtained through the specific Namespace.
type db struct {
	writeLock sync.Mutex   // Limit to one write transaction at a time.
	closeLock sync.RWMutex // Make database close block while txns active.
	closed    bool         // Is the database closed?
	store     *blockStore  // Handles read/writing blocks to flat files.
	cache     *dbCache     // Cache layer which wraps underlying leveldb DB.
}
// transaction represents a database transaction.  It can either be read-only or
// read-write and implements the database.Bucket interface.  The transaction
// provides a root bucket against which all read and writes occur.
type transaction struct {
	managed        bool             // Is the transaction managed?
	closed         bool             // Is the transaction closed?
	writable       bool             // Is the transaction writable?
	db             *db              // DB instance the tx was created from.
	snapshot       *dbCacheSnapshot // Underlying snapshot for txns.
	metaBucket     *bucket          // The root metadata bucket.
	blockIdxBucket *bucket          // The block index bucket.

	// Blocks that need to be stored on commit.  The pendingBlocks map is
	// kept to allow quick lookups of pending data by block hash.
	pendingBlocks    map[chainhash.Hash]int
	pendingBlockData []pendingBlock

	// Keys that need to be stored or deleted on commit.
	pendingKeys   *treap.Mutable
	pendingRemove *treap.Mutable

	// Active iterators that need to be notified when the pending keys have
	// been updated so the cursors can properly handle updates to the
	// transaction state.
	activeIterLock sync.RWMutex
	activeIters    []*treap.Iterator
}
// bucket is an internal type used to represent a collection of key/value pairs
// and implements the database.Bucket interface.
type bucket struct {
	tx *transaction
	id [4]byte
}
// cursor is an internal type used to represent a cursor over key/value pairs
// and nested buckets of a bucket and implements the database.Cursor interface.
type cursor struct {
	bucket      *bucket
	dbIter      iterator.Iterator
	pendingIter iterator.Iterator
	currentIter iterator.Iterator
}

这里面实现机制比较复杂,基本思路是用leveldb来存block的元数据信息,真正的block是按照文件进行存储的. 这样的设计好处是规避了leveldb写大量数据时的低性能问题.

同时还有缓存管理,尤其是transaction中使用的treap模块 很有意思,需要专门花时间研究, treap这个数据结构可以做到检索,插入,删除都是O(logn), 是一个比较巧妙的设计.