Databases. Data independence. Basic concepts

Data independence. Basic concepts. Structures for storing mutable data


Contents


Search other resources:

1. The concept of data independence. Need for data independence

Data independence refers to the influence of the requirements of the application that manages the database on how this data is organized in secondary storage and how it is accessed. When changing the presentation of data, it was necessary to reprogram the application itself.

The dependence of data on the capabilities of the application took place in older systems that could represent data in their format, not even in the form of a relational model. Such applications contained the appropriate program code containing information about the organization of the data or how to access the data, which caused dependence. Thus, during the development and maintenance of the application, additional complications appeared that were not directly related to the problem for which this application was developed.

For any database system, it is important to ensure that it is independent of the data.

The reasons for this need are as follows:

  1. Representation of the same data in different applications. For example, the presentation of data in different formats, each of which is appropriately converted in the database system. The database system receives a request from an application (client) to represent data. This approach has the advantage of avoiding the need to store data in different formats, which in turn reduces data redundancy.
  1. It must be possible for an administrator to change the physical representation or method of accessing data without having to modify the client applications that use the data. The database may require new data to be added, which in turn may change the performance requirements of applications. If applications depend on data, the above changes to the data require changes to the applications. Accordingly, there is an additional “extra” work for programmers to reprogram applications.

Ideally, any database system should be designed to provide complete data independence. This independence effectively separates the data model from the implementation. The more the data model is separated from the implementation, the more data independence will be ensured. A good approach to ensuring data independence is considered to be the creation of systems based on the use of SQL language (System Query Language).

 

2. Basic components that ensure data independence

To ensure data independence, the data model should be separated from the implementation as much as possible.

The direct implementation of the data is performed by the database management system (DBMS). In this system, to ensure data independence, you need to implement the following elements:

  • a stored field is an instance containing information about the type of one significant element, on the basis of which data of this type is formed. For example, in a relational data model presented as a table, a field defines a column with data of the same type;
  • a stored record is directly a portion from a data set that displays information about a single case (fact) that must be stored and recorded in the database. A stored record is a collection of related, stored fields. The number of records in the database can be tens, hundreds, thousands and even millions. With an increase in the number of records in the database, the requirements for maximum data access speed increase. Each record in the database is a separate instance (object) containing a list of data of the corresponding fields. Each field is defined by the corresponding data type;
  • a stored file is a set of multiple instances of stored records that are of the same type.

Figure 1 shows an example of a persistent database containing persistent fields, records, and files. This database uses one table (as an example) that lists subscribers with their addresses and phone numbers.

As can be seen from Figure 1, in addition to the basic information, the database stores additional information in the form of files (and where without them) – everything in the information world is based on files.

Stored database

Figure 1. Stored database

 

3. Basic aspects of storage structures for mutable data

Data independence should protect the application as much as possible from changes that may occur in the data. With active use, the database can grow and evolve. If the data is independent, increasing the amount of data should not adversely affect the performance of the application.

To protect applications that interact with database management systems, the following aspects of the storage structure for mutable data must be considered.

  1. A way to represent numeric data. This refers to the format of representation of numerical fields in tables. For example, string representation, numeric representation, etc. For each format, the value of the corresponding parameters is formed (selection of the calculation system, number of decimal places, etc.). This is all done by the database administrator to ensure maximum performance. 
  1. The way in which character data is represented. If character data is selected as the field format, then it is important to use known existing character encodings (ASCII, Unicode, etc.).
  1. Unit correction for numeric data. This implies setting the presentation of numerical data in the fields of the table (meters, kilometers, inches, etc.).
  1. Ways to encode data. What is important here is how certain categories of data are encoded. For example, colors can be encoded with numbers, bit sets, strings, and the like.
  1. Representation of logical fields as virtual. In other words, data materialization. The essence of this is that the representation of logical fields is different from the representation of stored fields. For example, some logical fields are calculated, that is, they are formed as a result of calculating the values of other logical (stored) fields. This means that a logical field does not have its own equivalent stored field. It doesn’t make sense to store the sums of the stored fields if they can be calculated on the fly and displayed in logical fields. Calculated fields are also called virtual fields (virtual), and calculated values are called indirect. 
  1. Ensuring the correct structure of stored records. The structure of stored records provides for one of the possible operations:
    • combining informative fields of several tables into one table. This is the case when previously created tables are being integrated into the current database system;
    • splitting table records into several sub-tables. This is the reverse operation to the previous one. This operation is used when it is necessary to optimize performance by transferring rarely used parts of the table to slower readers/writers.

For example, the following two tables can be combined into one (Figure 2).

Databases. Merging tables in case of application integration

Figure 2. Merging tables in case of application integration

Accordingly, on the contrary, the fields of the following table can be divided into two parts (Figure 3).

Databases. Splitting table records into sub-tables in order to optimize performance

Figure 3. Splitting table records into sub-tables in order to optimize performance

 

  1. Ensuring that the structure of stored files is represented in different ways. As you know, the database is stored in files. These files can be stored on storage media in a variety of ways. These methods describe different file storage structures (for example, storing data as files on the same media or on different media). There are a variety of ways to save files: using tables with indexes, using a sequence (chain) of pointers, or using associative tables (hash tables). All methods should provide the maximum efficiency of the application execution and the main requirement for data storage, namely: none of the data storage methods should affect the application that operates with this data.

 


Related topics