Fundamentals Of Databases

Stephen M. Reaves

2023-05-15

Notes about Lesson 1 of CS-6400

Summary

Why use Models
Message to Model Makers
To Use or Not To Use
Data Modeling
Process Modeling
Data Models
Example of Data Models
Relational Model Data Structures
Relational Model Constraints
Data Model Operations
Keys and Identifiers
Integrity and Consistency
Null Values
Surrogates - Things and Names
ANSI/SPARC
Physical Data Independence
Logical Data Independence
ANSI/SPARC DBMS Framework
Metadata

Why use Models

Models can be useful when we want to examine or manage part of the real world.

The costs of using a model are often considerably lower than the cost of using or experimenting with the real world itself.

Models also use conventions to increase speed in making a point.

Message to Model Makers

Users of a model must have a certain amount of knowledge in common (conventions).

A model is:

a means of communication
only emphasizes selected aspects
is described in some language
can be erroneous
may have features that do not exist in reality
- contour lines on a mountain

To Use or Not To Use

Database Management Systems (DBMS) are good at:

maintaining data intensive applications (as opposed to process intensive)
persistent storage of data
centralized control of data
control redundancy
control consistency and integrity
- can you contrive contradictions from within the database itself
multi-user support
sharing of data
data documentation (schema)
data independence
control of access and security
backup and recovery utility

Why not use a DBMS:

initial investment in hardware, software, and training can be high
generality is not always needed
- overhead for security, concurrency control, and recovery can be too high
data and application are simple and stable
real-time requirements cannot be met by it
maybe multi-user access is not needed

Data Modeling

The model represents a perception of structures of reality.

The data modeling process is to fix a perception of structures of reality and represent this perception.

In the data modeling process, we select aspects and we abstract.

Process Modeling

The use of the model reflects processes of reality.

Processes may be represented:

embedded in program code
executed ad hoc

Data Models

Data models contain:

data structures
constraints
operations
keys and identifiers
integrity and consistency
null values
surrogates

DB architecture and DBMS architecture are concerned with the layers of the system.

Example of Data Models

A data model is not the same as a model of data

A database is a model of the structures of reality.

A data model is the tool (or formalism) used to create the model.

A data model includes:

data structures
integrity constraints
operations

Relational Model Data Structures

Data is represented in tables. Tables consist of a name, and a matrix where columns a fields (consisting of name and type) and rows are entries.

Example:

Email (varchar(50))	BirthDate (datetime)	Hometown (varchar(50))	Salary (integer)
[email protected]	1985-11-07	Atlanta	10,000
[email protected]	1969-11-28	Austion	10,000
[email protected]	1967-11-03	Portland	12,000
[email protected]	1988-11-15	Atlanta	13,000
[email protected]	1973-03-12	Portland	14,000
[email protected]	1988-11-09	Atlanta	15,000

Degree := Number of columns

Schema := The aspect of the table that is considered stable, or not expected to change over time

State := The aspect of the table that is expected to change over time

Relational Model Constraints

Constraints express rules that cannot be expressed by the data structures alone

Examples:

Emails must be unique
Must not be null
BirthDate must be after 1900-01-01
Hometown must by city in US

Data Model Operations

Operations support change and retrieval of data.

Keys and Identifiers

Keys are uniqueness constraints

Integrity and Consistency

Integrity answers the question Does the db reflect reality well?

Consistency answers the question Is the db without internal conflicts/contradictions?

Null Values

Email (varchar(50))	BirthDate (datetime)	Hometown (varchar(50))	SSRegist	Sex	Salary (integer)
[email protected]	1985-11-07	Atlanta	Yes	M	10,000
[email protected]	1969-11-28	Austion	INAPPLICABLE	F	10,000
[email protected]	1967-11-03	Portland	INAPPLICABLE	F	12,000
[email protected]	1988-11-15	Atlanta	INAPPLICABLE	F	13,000
[email protected]	1973-03-12	Portland	INAPPLICABLE	F	14,000
[email protected]	1988-11-09	Atlanta	No	M	15,000

Null values can be used for unknown, but not for Inapplicable

Surrogates - Things and Names

In name-based representation you are what is known about you, no more, no less.

Surrogate keys solve this problem by adding an artificial ID. This allows you to change everything else about a record and still determine if it is the same record.

ANSI/SPARC

We already separate data and schema, why not separate schema into conceptual and internal schemas?

Conceptual Schemas

Conceptual Schema describes all conceptually relevant, general, time-invariant, structural aspects of reality, and excludes aspects of data representation and physical organization, and access.

This is typically how we normally think of schemas.

External Schemas

Describes parts of the information in the conceptual schema in a form convenient to a particular user group’s view. It is derived from the conceptual schema.

This is typically how we think of views.

Internal Schemas

Describes how the information described in the conceptual schema is physically represented to provide the overall best performance.

This is typically how we think of indexes.

Physical Data Independence

Physical data independence is a measure of how much the internal schema can change without affecting the application programs.

This is similar to encapsulation in OOP.

Logical Data Independence

Logical data independence is a measure of how much the conceptual schema can change without affecting the application programs.

ANSI/SPARC DBMS Framework

Framework came out as early as 1975.

Metadata

Systems Metadata:
- Where did data come from
- How data was changed
- How data are stored
- How data are mapped
- Who owns data
- Who can access data
- Data usage history
- Data usage statistics
Business Metadata:
- What data are available
- Where are data located
- What the data means
- How to access the data
- Predefined reports
- Predefined queries
- How current the data are

System metadata are critical in a DBMS.

Business metadata are critical in a data warehouse.