Python. Files. General concepts. Opening/closing a file. Functions open(), close()




Files. General concepts. Opening/closing a file. Functions open(), close()


Contents


Search other websites:

1. General concepts about files

In Python programs, it is possible to work with files that belong to the basic built-in types of objects. A file is a named area of read-only memory. The file system is managed by the operating system. An object is created for each file that provides a link to the file. The file can be placed anywhere in your computer (hard drive, flash drives, etc.).

Any file has a name, which can be full or abbreviated. In the case of the full name (full path to the file), the file name includes:

  • the media (resource) on which the file is located (for example, the C :, D: )drive;
  • a list of directories (folders) from top to bottom separated by a ‘\’ (backslash). An example of a full name: “C:\ABC\myfile.txt”.

If the abbreviated name is used, the relative path is included, namely:

  • a list of directories (folders) relative to the current directory in which the source file of the Python module with the extension *.py is located. An example of the abbreviated name is “TEXT\myfile.txt”.

A file can contain any information that is interpreted by different programs in different ways depending on the format. A file format is a way of encoding information with a view to its further efficient use and storage. Different programs use different file formats. Many widespread file formats are standardized and published.

 

2. Features of working with files in Python

When working with files in Python, the following features can be distinguished:

  • files in Python belong neither to numbers, nor to sequences, nor to mappings;
  • the data that is formed from the file is displayed as a string. Therefore, to represent data in other types of objects, you need to implement their conversion. This applies to both reading from a file and writing data to a file. If writing to a file is performed, then the processing methods need to transfer the already formed lines;
  • optionally invoke a method close(). This method will be called automatically after the end of the program. However, calling the close() method to close the file is recommended (see section 6);
  • files provide I/O buffering. By default, output to files is carried out using intermediate buffers (specially allocated memory sections of a predetermined size). If a write operation is performed, the data is first buffered, and then later it can be written to a file (functions flush(), close()). If desired, you can disable the buffering mode. More details about the buffering mechanism are described in clause 4;
  • files allow positioning at which the read/write position in the file changes.

 

3. Types of files in Python. Text files. Binary files

In the Python programming language, as in other programming languages, there are following types of files:

  • text type. These are files that are represented as strings of type str. In Python, for such strings, Unicode characters are automatically encoded/decoded and the end-of-line character is handled accordingly;
  • binary type. These are files represented as strings of type bytes. Such lines are transferred without additional processing when writing to a file.

 



4. Opening a file. Function open(). General form

In order to access the file, you must first open this file. After opening the file, you can implement reading from the file or writing some information to the file. The file is opened using the open() function. This function returns the corresponding file object. If the file cannot be opened, an OSError exception is thrown. This exception is thrown if errors occur in I/O operations such as “file not found” or others.

According to the Python documentation, the open() function has the following general form

open(file, mode = 'r', buffering = -1, encoding = None, errors = None, closefd = True, opener = None)

here

  • file is a string or an integer value. If a string is specified, then file determines the path to the file (full or abbreviated file name). If a string is specified, then file determines the path to the file (full or abbreviated file name). For example: ‘C:\1.txt’, ‘file.txt’. If an integer value is specified, it means that file is a file descriptor. The file parameter is required in the open() function. All other parameters are optional;
  • mode – an optional line that sets the mode of opening the file. By default, mode = ‘r’, this means that the file is opened in text mode for reading. The mode value is formed in accordance with the values described in clause 4. For text mode, it is not necessary to set the value to ‘t’. If the file is opened in binary mode, then ‘b’ is necessarily added to the mode value;
  • buffering is an optional integer that is used to set the buffering policy. The term “buffering” means that when reading (writing) information from a file (to a file), this information is previously written to a special section of memory – the buffer. Thus, information is duplicated in a buffer of a specified size. Correctly selected value of the buffer size allows you to speed up the execution of a program that intensively uses work with files. A buffering value can be one of four possible values:
    • buffering = 0. In this case, buffering is disabled. The value 0 can only be set for binary files (mode symbol ‘b’).
    • buffering = 1 – line buffering, which is available only for text files. The buffer is a string of variable size.
    • buffering> 1 – buffering enabled for both text and binary files. Here, the value of buffering specifies a fixed buffer size in bytes.
    • buffering is not set – this is the default case when buffering = -1. If you do not specify buffering values, the buffering policy works as follows:
      • 1. For binary files, a fixed buffer size is formed, the value of which is obtained from the value io.DEFAULT_BUFFER_SIZE.
      • 2. For so-called “interactive” text files, line buffering is used (buffering = 1). For other text files, the same buffering is used as for binary files described above;
  • encoding is the encoding name used to encode or decode a file. This option is used only for text files. If encoding is not specified, then the default encoding is used, which is platform dependent;
  • errors is an optional string parameter that defines how encoding and decoding errors should be handled. This option is used only for text files. The errors parameter can take one of the values: ‘strict’, ‘ignore’, ‘replace’, ‘surrogateescape’, ‘xmlcharrefreplace’, ‘backslashreplace’, ‘namereplace’. For more information on coding error handling, see the Python documentation;
  • newline – an optional parameter that controls the operation of the universal newline mode. This parameter is used only in text mode. The newline parameter can take one of the following values: None, ‘\n’, ‘\r’ or ‘\r\n’. For more information on using newlines in different modes, see the Python documentation;
  • closefd – optional parameter, which makes sense if a descriptor is specified instead of a file name. If a file name is specified, then the closefd value must be True (the default value), otherwise an error will occur. If a file descriptor is specified, two possible situations are considered:
    • closefd = False. In this case, the base descriptor will remain open after closing the file.
    • closefd = True. After closing the file, the base file descriptor closes.
  • opener – this optional parameter defines a user function that can be used to open a file. In this case, the user must set his own function to open the file. By default opener = None.

 

5. Parameter mode. File open modes

Listed below are all possible characters that define different file open modes that can be specified in the mode parameter of the open() function. Symbols can form sequences, for example, ‘rb’, ‘rt’, ‘wb’, ‘wt’, ‘r+b’, ‘w+b’.

File open modes depending on mode value:

  • ‘r’ – the file is opened for reading (default mode);
  • ‘w’ – the file is opened for writing. If a file with the same name already exists, then the previous content of the file is destroyed;
  • ‘х’ – the file is opened for exclusive creation. If such a file already exists, a failure occurs;
  • ‘а’ – opens the file for writing. If such a file already exists, it opens the file for addition (information is recorded from the end of the file);
  • ‘b’ – binary file type. This value can be combined with the values ‘r’, ‘w’, ‘x’, ‘a’;
  • ‘t’ – text file type. This value can be combined with ‘r’, ‘w’, ‘x’, ‘a’. The text file type is set by default;
  • ‘+’ – open the file on the disk for updating (reading/writing);
  • ‘U’ – the mode that defines the character representation of a newline: in Unix – ‘\n’, in Windows – ‘\r\n’, in Macintosh – ‘\r’.

 

6. Close the file. Function close()

 After the file is opened and processed (read, written new information), it must be closed. The file is closed using the close() method. Calling the close() method breaks the connection of an object with an external file.

In Python, it is not necessary invoke the close() method to close the file. However, it is recommended invoke close() in the case when, after using the file, program execution continues (which is typical for large programs).

Calling the close() method frees up resources that have been allocated by the system for the file object. If the file was opened with buffering support, then data from these buffers are pushed out of the buffer and freed. In large software systems, it is recommended to call close(), since after that the system has more free resources for its work.

We recommend the following file usage algorithm:

  • open file;
  • implement file operations (read/write);
  • if the file is no longer needed, then close this file with the close() function.

Example.

If the file was opened as follows:

f = open('myfile.txt', 'r')

then after using this file, close it with the close() function

f.close()

which breaks the link to the external file.

If, after calling the close() function, you try to call the methods of working with the file (read or write), a ValueError error will be generated.

 

7. Functions open(), close(). Opening/closing a file for different types of files

The following are examples that include processing text and binary files using the following operations:

  • opening a file with the open() function;
  • processing of files – reading/writing files;
  • closing the file with the close() function.

 

7.1. Opening in text format
7.1.1. Reading from a file, modes ‘r’, ‘rt’. Example

 

# Open file for reading in text mode
# Method 1. Open the file 'myfile1.txt' - an abbreviated path to the file
f1 = open('myfile1.txt', 'r')

# Method 2. Open the file that is located on the path c:\2\myfile2.txt
#           'rt' - reading a file in text mode
f2 = open('c:\\2\\myfile2.txt', 'rt')

# After opening, you can work with the contents of the file
# ...

# For example, read some line from the file myfile1.txt
s1 = f1.readline()
print(s1)

s2 = f2.readline() # Read line from myfile2.txt file
print(s2)

# Close files (optional in Python)
f1.close()
f2.close()

 

7.1.2. Writing to a file, modes ‘w’, ‘wt’

To open a file for writing in text format, you need to use the mode designation ‘w’ or ‘wt’.

# Open a text file for writing,
# you can also set 'wt'
f3 = open('myfile3.txt', 'wt')

# Write to file
# ...

# For example, write multiple lines to a file
f3.write("using static System.Console;\n")
f3.write("class Program\n")
f3.write("{ }\n")

# Close the file
f3.close()

 

7.2. Opening in binary mode
7.2.1. Modes ‘wb’, ‘w+b’ – write information to files

 

# Binary mode
# Writing a list to a file

# The specified list
A = [1, True, 2.88]

# Open file for writing
f1 = open('myfile1.bin', 'wb')
f2 = open('myfile2.bin', 'w+b')

# For convenient work in binary format
# it is advisable to use the capabilities of the pickle module
import pickle

# method dump() - writes an object to a file
pickle.dump(A,f1)
pickle.dump(A,f2)

# Close files associated with objects f1, f2
f1.close()
f2.close()

  

7.2.2. Modes ‘rb’, ‘r+b’ – read information from files

 

# Binary mode
# Reading a list from file

# For convenient work in binary format
# it is advisable to use the capabilities of the pickle module
import pickle

# Read list from file
f1 = open('myfile1.bin', 'rb')
f2 = open('myfile2.bin', 'r+b')

# load() method - loads data from a file
B1 = pickle.load(f1)
B2 = pickle.load(f2)

print("B1 = ", B1)
print("B2 = ", B2)

f1.close()
f2.close()

The result of the program

B1 = [1, True, 2.88]
B2 = [1, True, 2.88]

 

8. Features of opening a file for updating in binary format. Modes ‘w+b’, ‘r+b’

If the mode parameter contains the ‘+’ character in the open() function, the file is opened for updating. Two options for the mode parameter are possible here:

  • mode = ‘w+b’. In this case, the file will be truncated to 0 bytes;
  • mode = ‘r+b’. In this case, the file is opened without cropping with preservation of the previous information. The read/write pointer is positioned at the end of the file.

 

9. An example that demonstrates different cases of buffering policy (value buffering)

 

# Reading the file. Demonstration of buffering.
# Different buffering parameter values

# --------------------------------------
# 1. buffering = 0
# 1.1. For text files - error, exception with message
# "can't have unbuffered text I / O"
# f1 = open("myfile1.txt", buffering = 0) - only in binary mode

# 1.2. For binary files
f1 = open("myfile1.bin", mode='rb', buffering=0)
buffer = f1.read() # read the information
print("Read from myfile1.bin. buffer = ", buffer)   # display it
f1.close()

# -------------------------------------
# 2. buffering = 1
# 2.1. For text files
f1 = open("myfile1.txt", buffering=1)
s = f1.readline() # read a line
print("Read from myfile1.txt. s = ", s)
f1.close()

# 2.2. For binary files
f1 = open("myfile1.bin", mode='rb', buffering=1)
buffer = f1.read()
print("Read from myfile1.bin. buffer = ", buffer)
f1.close()

# --------------------------------------
# 3. buffering = 128 - the buffer size 128 байт
f1 = open("myfile1.txt", buffering = 128)
f2 = open("myfile1.bin", 'rb', buffering = 128)
s = f1.readline() # read the string
print("Read from myfile1.txt. s = ", s)
buffer = f2.readline()
print("Read from myfile1.bin. buffer = ", buffer)

f1.close()
f2.close()

The result of the program

Read from myfile1.bin. buffer =   b'\x80\x03]q\x00(K\x01\x88G@\x07\n=p\xa3\xd7\ne.'
Read from myfile1.txt. s = #include <iostream.h>

Read from myfile1.bin. buffer = b'\x80\x03]q\x00(K\x01\x88G@\x07\n=p\xa3\xd7\ne.'
Read from myfile1.txt. s = #include <iostream.h>

Read from myfile1.bin. buffer = b'\x80\x03]q\x00(K\x01\x88G@\x07\n'

 

10. How to display the default buffer size? Example

The default buffer size in bytes is set to io.DEFAULT_BUFFER_SIZE.

import io
print(io.DEFAULT_BUFFER_SIZE)

The result of the program

Buffer size = 8192

 


Related topics