Reaves.dev

v0.1.0

built using

Phoenix v1.7.12

Db Index

Stephen M. Reaves

::

2023-07-18

Notes about Lesson 11 of CS-6400

Summary

Computer Architecture

Main memory is fast, volatile, small, and (relatively) expensive

Disks are permanent, slow, big, and cheap

Why should you care

Time:

That would be like reading a Phonebook:

Cost cmputation:

Disk

Typical high-end pack capacity:

Records

RegularUser(
  Email varchar(50),
  Sex char(1),
  Birthdate datetime,
  CurrentCity varchar(50),
  Hometown varchar(50),
);

Storage used for one record:

Assumptions

Page fault:

Bulk/Extent transfers (e.g. 250 blocks) save seek time and rotational delay, but require more buffer space.

buffer needs to be managed

File Organization

Heap

Unsorted File

Lookup time:

Block pointer: 4 bytes

num of records: 4 million

num of data blocks: ~200,000

file size: ~800MB

Sorting

Sorting helps

Allows us to use binary search

Primary Index

Skip list holding pointers to the beginning of blocks

This reduces the size of data we search in.

GindexABCDdata_aAaAbAcAdindex:A->data_a:ndata_bBaBbBcBdindex:B->data_b:ndata_cCaCbCcCdindex:C->data_c:ndata_dDaDbDcDdindex:D->data_d:ndata_a:D->data_b:Adata_b:D->data_c:Adata_c:D->data_d:A

a dense key means all the key values in the data are in the index.

a sparse index means only a subset are.

Lookup time:

Secondary Index

Same thing as a primary index, but values in the data are not necessarily sorted

GindexABCDdata_dDaDbDcDdindex:D->data_d:ndata_aAaAbAcAdindex:A->data_a:ndata_cCaCbCcCdindex:C->data_c:ndata_bBaBbBcBdindex:B->data_b:ndata_d:e->data_a:Adata_a:e->data_c:Adata_c:e->data_b:A

This is tricky for non-unique fields

Multi-leveled Index

GindexABCDindex_aABindex:A->index_a:Aindex:B->index_a:Bindex_cCDindex:C->index_c:Cindex:D->index_c:Dindex_a:B->index_c:Cdata_aAaAbAcAdindex_a:A->data_a:ndata_bBaBbBcBdindex_a:B->data_b:ndata_cCaCbCcCdindex_c:C->data_c:ndata_dDaDbDcDdindex_c:D->data_d:ndata_a:D->data_b:Adata_b:D->data_c:Adata_c:D->data_d:A

Lookup Time:

Multi-level Index B-Tree

Multi-level indexes can be arranged in a B-Tree.

Insertion, deletion, and udpate operations are implemented to keep tree balanced.

Only rarely does an overflow at a lower level of the tree propagate more than 1-2 levels up.

Static Hashing

Instead of indexing, we can have a hash table

Lookup Time: