Introduction to database concepts and terminologies in RDBMS
Greetings guys! The basic aim of this article is to bring you in the world of RDBMS which is an acronym for Relational Database Management System. We will see what Databases are, why they are used and how to use them in the R language. The article is divided in two blog posts. This part describes various terms used in RDBMS. The next part will be all about R and how to work with them in R Studio. Please note that this tutorial is for learning purposes only, not for preparing a data analyst or a data scientist position.
What is Database?
The database is a collection of information organized especially by the type of data involved, such as sales transactions, product catalog, personnel records, etc…The formal definition from Wikipedia is “In computer science, a database management system (DBMS) is software that interacts with the user, other programs, and the operating system to store and retrieve data”. We can summarize it in yet another way which I find interesting: A Database stores structured data in tables according to between them. These tables are divided in two types:
Relational Database- Tabular Data
Why are databases used?
As the answer might be obvious for some of you, I will not leave it without a reply. To store data in an organized way so that one can find them whenever they are required, to maintain data integrity between tables, to have any information related with other at just one place are just some of the many reasons why RDBMS’s are used these days across every field of knowledge- from simple personal uses like keeping track of movies or songs you possess up to complex research works done by scientists and engineers on various subjects including biology, chemistry, marketing, etc… A database basically gives organization while working with a huge amount of data. Databases can scale up as per requirement and can be easily divided into different sections or containers called as tables.
Terminologies used in RDBMS every field has its own terminologies, some necessary to know which one comes across while going through a particular subject. The same is the case with databases only that you need to know a little more about them to understand what they are and how they work. Following are a few of those terms: A table is nothing but a container for data generally having rows and columns similar to excel sheets. Each row might have different data about the same subject whereas each column will store specific information about that row’s data. For example, say we have collected information of people who were born on common months of the year from Jan – Dec. We might have collected names, age, gender, the month of birth and other information which is common for all people who have been born in a particular month. Columns might be named, date of birth, gender while rows will contain the data associated with different people. Types of the table:
There are two types of tables basically, namely flat-file table and Relational Database Table (RDB). A flat file table has only one dimension i.e. rows and columns but no structure to them whereas RDB tables can have multiple dimensions depending on the number of tables we use to construct it i.e. we can attach any number of tables to each other at any degree as per requirements. For example: If we want to store info about movies along with the actors and their characters, we might need three tables. One for movies, another one for actors and one more for actors’ characters. This will make a 3-dimensional table where the actors’ table can be attached to the movie table (2nd degree) and this, in turn, can be attached to the character table (1st degree). 1st Degree: Attached at the outer edge of a structure. 2nd Degree: Attached at the inner edge of a structure.
Relational Database Management System (RDBMS)
I am sure by now you must have got an idea as to what an RDBMS is and how it works but I would still like to explain it in simple words so that you understand even better! A database management system (DBMS) is a software application that interacts with the user, other programs, and the operating system to store and retrieve data.
An RDBMS basically consists of two components:
Database
Contains all tables, views available in it. All information about how these tables are created, what they contain is present here. Database Schema. A collection of schemas i.e. related databases like schema-movies, schema-characters, etc… Which we use to form the single RDBMS called My Movie Db.
Data Manipulation Language (DML)
Data manipulation language (DML) refers to a language used by an application program or end-user to retrieve and modify data stored in a database or files on a disk.
Conclusion:
I hope this blog helps you understand what databases are, how they work and what their role in today’s world is. There are other components that complete the RDBMS architecture but at the moment I would like to leave that for you to discover if you want or need to know more about it.