Relational Database
Introduction:
A database is a shared collection of logically related data, designed to meet the information needs of multiple users in an organization
Features of Database
1.Efficiency
2.Accuracy
3.Ease of Updating
4.Security
5.Controlled Redundancy
Database terms
1.Data
2.Information
3.Relational database terminology
4.Flat database vs. Relational database
5.Database
6.Database Management Systems (DBMS)
7.Structured Query Language (SQL)
Data: Collection of raw facts collected for a specific purpose
Information: Data converted into a meaningful form is called information
Database Terminology:
Attribute: Column or field: Name and data type
Attribute value: Valid value for an attribute
Tuple: Set of attribute values: Row or record
Relation: Table: set of tuples sharing the same attributes
Flat vs Relational Database
Flat database:
All the data is stored into a single table.
Data duplication occurs.
For example: A phone directory
Relational database:
Splits the data into several tables.
Reduces data duplication.
For example: A phone directory (broken down into two tables)
Database:
are the data and the associated rules
DBMS:
Software that manages data and performs certain operations on data
SQL:
DBMS use SQL to Interact with the database, Create new tables, Insert data into tables, Update data, Delete data from tables, Retrieve information across tables
Level of data:
1.Real World Data
2.Metadata
3.Data Occurrences
Real World Data: Objects that exist in reality
For example: Any student, Any employee.
Metadata: Data about data
It tells you about type, size, and number of attributes for a real world object.
For example: Any student:
First Name, Text, 25 character size field,
Last Name, Text, 25 character size field,
Age, Date type, 8 byte size,
Data Occurrences: Data Existence
Actual data about real world objects.
Meeting rules defined at the meta data level.
For example: Any student:
Andy, Murphy, 27/09/1983, 39 lower clanbrassil street - Dublin 8
Adam, Brennan, 27/12/1981, some address here - Dublin 7
Database Architecture:
1.Internal
2.Conceptual
3.External
Internal:
Describes the physical storage structure of the database.
How the data is going to be stored on the disk.
Conceptual:
A complete description of the information content of the database.
Represents all entities, attributes, their relationships, and constraints.
Conceptual schema defines present and future needs of an organization.
External:
Each end user has a data view of the database subject to limited permissions.
View of the data
Can be different, for example DOB
Calculated data, for example age
Barrier/limitations
Restricts user when adding the data or making changes in the data.
Normalization:
Transforming data from a problem into relations while ensuring data integrity and eliminating data redundancy
1NF - Table contains no repeating attributes or groups of attributes and has a primary key.
2NF - Table is in 1NF and has no partial key dependencies.
3NF - Table is in 2NF and there are no non-key transitive functional dependencies.
BCNF - Table is in 3NF and every determinant is a candidate key.
Functions of DBMS:
1.Data Processing
2.A User Accessible Catalog
3.Transaction Support
4.Concurrency Control Services
5.Recovery Services
6.Authorization Services
7.Support for Data Communication
8.Integrity Services
1.Data Processing:
Operations on the data:
Creation of data
Storing of the data,
Providing access to the data,
Placing data in storage devices, etc.
2.A User Accessible Catalog:
DBMS provides access to administrative users about database catalog
3.Transaction Support:
DBMS is responsible for supporting all required operations/actions on the database.
4.Concurrency Control Services:
DBMS provides concurrency control services to make sure all operations are correct
5.Recovery Services:
A database can enter an inconsistent state due to some invalid action.
DBMS should be able to recover -> minimal data loss.
6.Authorization Services:
Who is authorized to access the database?
If a user is authorized then what kind of user actions are authorized?
7.Support for Data Communication:
DBMS allows the communication of the data
8.Integrity Services:
Integrity services make sure that the information is real and correct as defined by the rules
MySQL:
Datatype:
Numeric
String
Date and time
Large storage
Numeric:
Integer – to store whole numbers
Floating point – real numbers
Fixed point – decimal numbers with fixed precision
String:
CHAR(M) – The range is 0 to 255 characters.
VARCHAR(M) – The range of M is 0 to 65,535.
TEXT – If we are not sure about the length!
Date and Time:
DATE – To store dates. For example: 1982-10-23
TIME – To store time values. For example: 10:55:10
DATETIME – To store date/time combinations. For example: 1970-06-18 11:49:59
TIMESTAMP – To store event logging.
Larger storage:
TINYBLOB(M) – Maximum length of 255 bytes.
BLOB(M) – Maximum length of 64K bytes.
MEDIUMBLOB(M) – Maximum length of 16M bytes.
LONGBLOB(M) – Maximum length of 4G bytes.
Constructing an ERD:
Identify Entities:
Remove duplicate entities:
list the attributes of each entity:
Mark the primary keys:
Define relationships:
Describe the cardinality of the relationships:
Remove redundant relationships:
Combine into single diagram.
If there are many to many relationships hiding another entity, introduce a new entity, and turn many to many relationship into 1-M and M-1.
Codds Rule
Rule 1: The Information Rule
All Information in the relational database is represented in exactly one and only one way – by values in tables.
Rule 2: Guaranteed Access Rule
Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name
Rule 3: Systematic Treatment of NULL Values
NULL values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational RDBMS for representing missing information in a systematic way, independent of data type.
Rule 4: Dynamic Online Catalog Based on the Relational Model
The database description is represented at the logical level in the same way as ordinary data, so authorized users can apply the same relational language to its interrogation as they apply to regular data.
Rule 5: Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal use.
Rule 6: View Updating Rule
All views that are theoretically updatable are also updatable by the system.
Rule 7: High-Level Insert, Update, and Delete
The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data.
Rule 8: Physical Data Independence
Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods.
Rule 9: Logical Data Independence
Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Rule 10: Integrity Independence
Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.
Rule 11: Distribution Independence
The data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities to remain logically unimpaired whether and whenever data are physically centralized or distributed.
Rule 12: Non-Subversion Rule
If a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-time) relational language
Relational Algebra:
Relational Algebra is collection of algebraic operators
Selection:
Horizontal decomposition.
Notation: σcondition relation
Example:
Students with GPA > 3.6:
σGPA > 3.6 Student
Projection:
Vertical decomposition.
Notation: Π attribute names relation
Example: tudents with Fname and Lname:
ΠFname, Lname Student
Cartesian Product:
Cross-product to combine two or more relations.
Notation: Relation1 X Relation2
Example: ID, Fname, Lname of students with GPA > 3.4 who signed up for CS courses.
σStudent.ID==Course.Sid ∧ GPA > 3.4 ∧ Dept. == “CS” (Student X Course)
Union:
Combine information vertically
Duplicate tuples are eliminated.
Notation: Relation1 U Relation2
Example:List of Fname and Lname from Student and Employee
ΠFname, Lname (Student) U ΠFname, Lname (Employee)
Intersection:
Similar information between two relations
Example:Same rules as in Union operation apply here.
Notation: Relation1 ∩ Relation2
Example: Intersection between Fname and Lname of Student and Employee.
ΠFname, Lname (Student) ∩ ΠFname, Lname (Employee)
Difference:
Difference of two relations
Same rules as in Union operation apply here.
Notation: Relation1 - Relation2
Example:Difference between the Fname and Lname of Student and Employee.
ΠFname, Lname (Student) - ΠFname, Lname (Employee)
Database User:
1.Application Programmers
2.End Users
3.Database Administrators (DBA)
Application Programmers:
Write applications for database users.
Know structure of the database.
Know needs of an organization.
End Users
Naïve Users:
Use database applications.
No interaction with other parts of the database.
Not interested in the inner workings of the database.
Sophisticated Users:
Have some additional rights over the Naïve users.
Can access data in more ways.
They need to be aware of the database structure.
Database Administrators (DBA):
Most technical class of database users.
Know how to design and manage the database.
Responsible for making database backups and recovering data in case of problems.
Components of DBMS:
1.User Interface
2.Data Definition Language (DDL) and Data Manipulation Language (DML) Compilers
3.System Catalog/Data Dictionary
4.Runtime Database Processor
5.Concurrency Control, Backup, & Recovery Manager
1.User Interface:
Web Forms
Application Front Ends
SQL Command Line Interface
2. DDL and DML Compilers:
DDL compiler processes schema definitions and stores descriptions of the schema in the system catalog.
DML compiler is responsible for compiling DML commands into object code to access the database.
3.System Catalog/Data Dictionary:
Stores information about the database itself.
Including schema, procedures, functions, triggers, and much more.
Every time a DDL statement is run, the data dictionary becomes modified
4. Runtime Database Processor:
Executes privileged commands while communicating with system catalog.
Executes the compiled queries.
5.Concurrency Control, Backup, & Recovery Manager:
Supports multiple user interaction.
Backup the database (scheduled or nonscheduled).
Recovers the database in case of system crash.