Fundamentals Of Databases
Summary
- Why use Models
- Message to Model Makers
- To Use or Not To Use
- Data Modeling
- Process Modeling
- Data Models
- Example of Data Models
- Relational Model Data Structures
- Relational Model Constraints
- Data Model Operations
- Keys and Identifiers
- Integrity and Consistency
- Null Values
- Surrogates - Things and Names
- ANSI/SPARC
- Physical Data Independence
- Logical Data Independence
- ANSI/SPARC DBMS Framework
- Metadata
Why use Models
Models can be useful when we want to examine or manage part of the real world.
The costs of using a model are often considerably lower than the cost of using or experimenting with the real world itself.
Models also use conventions to increase speed in making a point.
Message to Model Makers
Users of a model must have a certain amount of knowledge in common (conventions).
A model is:
- a means of communication
- only emphasizes selected aspects
- is described in some language
- can be erroneous
- may have features that do not exist in reality
- contour lines on a mountain
To Use or Not To Use
Database Management Systems (DBMS) are good at:
- maintaining data intensive applications (as opposed to process intensive)
- persistent storage of data
- centralized control of data
- control redundancy
- control consistency and integrity
- can you contrive contradictions from within the database itself
- multi-user support
- sharing of data
- data documentation (schema)
- data independence
- control of access and security
- backup and recovery utility
Why not use a DBMS:
- initial investment in hardware, software, and training can be high
- generality is not always needed
- overhead for security, concurrency control, and recovery can be too high
- data and application are simple and stable
- real-time requirements cannot be met by it
- maybe multi-user access is not needed
Data Modeling
The model
represents a perception of structures of reality.
The data modeling process
is to fix a perception of structures of reality and
represent this perception.
In the data modeling process, we select aspects and we abstract.
Process Modeling
The use of the model reflects processes of reality.
Processes may be represented:
- embedded in program code
- executed ad hoc
Data Models
Data models contain:
- data structures
- constraints
- operations
- keys and identifiers
- integrity and consistency
- null values
- surrogates
DB architecture and DBMS architecture are concerned with the layers of the system.
Example of Data Models
A data model is not the same as a model of data
A database is a model of the structures of reality.
A data model is the tool (or formalism) used to create the model.
A data model includes:
- data structures
- integrity constraints
- operations
Relational Model Data Structures
Data is represented in tables. Tables consist of a name, and a matrix where columns a fields (consisting of name and type) and rows are entries.
Example:
Email (varchar(50)) | BirthDate (datetime) | Hometown (varchar(50)) | Salary (integer) |
---|---|---|---|
[email protected] | 1985-11-07 | Atlanta | 10,000 |
[email protected] | 1969-11-28 | Austion | 10,000 |
[email protected] | 1967-11-03 | Portland | 12,000 |
[email protected] | 1988-11-15 | Atlanta | 13,000 |
[email protected] | 1973-03-12 | Portland | 14,000 |
[email protected] | 1988-11-09 | Atlanta | 15,000 |
Degree := Number of columns
Schema := The aspect of the table that is considered stable, or not expected to change over time
State := The aspect of the table that is expected to change over time
Relational Model Constraints
Constraints express rules that cannot be expressed by the data structures alone
Examples:
- Emails must be unique
- Must not be null
- BirthDate must be after 1900-01-01
- Hometown must by city in US
Data Model Operations
Operations support change and retrieval of data.
Keys and Identifiers
Keys are uniqueness constraints
Integrity and Consistency
Integrity answers the question Does the db reflect reality well?
Consistency answers the question Is the db without internal conflicts/contradictions?
Null Values
Email (varchar(50)) | BirthDate (datetime) | Hometown (varchar(50)) | SSRegist | Sex | Salary (integer) |
---|---|---|---|---|---|
[email protected] | 1985-11-07 | Atlanta | Yes | M | 10,000 |
[email protected] | 1969-11-28 | Austion | INAPPLICABLE | F | 10,000 |
[email protected] | 1967-11-03 | Portland | INAPPLICABLE | F | 12,000 |
[email protected] | 1988-11-15 | Atlanta | INAPPLICABLE | F | 13,000 |
[email protected] | 1973-03-12 | Portland | INAPPLICABLE | F | 14,000 |
[email protected] | 1988-11-09 | Atlanta | No | M | 15,000 |
Null values can be used for unknown, but not for Inapplicable
Surrogates - Things and Names
In name-based representation you are what is known about you, no more, no less.
Surrogate keys solve this problem by adding an artificial ID. This allows you to change everything else about a record and still determine if it is the same record.
ANSI/SPARC
We already separate data and schema, why not separate schema into conceptual and internal schemas?
Conceptual Schemas
Conceptual Schema describes all conceptually relevant, general, time-invariant, structural aspects of reality, and excludes aspects of data representation and physical organization, and access.
This is typically how we normally think of schemas.
External Schemas
Describes parts of the information in the conceptual schema in a form convenient to a particular user group’s view. It is derived from the conceptual schema.
This is typically how we think of views.
Internal Schemas
Describes how the information described in the conceptual schema is physically represented to provide the overall best performance.
This is typically how we think of indexes.
Physical Data Independence
Physical data independence is a measure of how much the internal schema can change without affecting the application programs.
This is similar to encapsulation in OOP.
Logical Data Independence
Logical data independence is a measure of how much the conceptual schema can change without affecting the application programs.
ANSI/SPARC DBMS Framework
Framework came out as early as 1975.
Metadata
- Systems Metadata:
- Where did data come from
- How data was changed
- How data are stored
- How data are mapped
- Who owns data
- Who can access data
- Data usage history
- Data usage statistics
- Business Metadata:
- What data are available
- Where are data located
- What the data means
- How to access the data
- Predefined reports
- Predefined queries
- How current the data are
System metadata are critical in a DBMS.
Business metadata are critical in a data warehouse.