How to solve UnicodeDecodeError: ‘UTF-8’ codec can’t decode bytes
So, I just had this error on reading a csv file using python
decode UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 7250–7251: invalid continuation byte
I was opening file a .csv with:
with open('filapath/filename.ext') as file:
print(file.read())
Important to note that, not specifying encoding and access mode, means encoding=‘utf-8’, , mode=‘r’ and that we are opening to read a plain text file.
Meaning, that the above code to open a text file is de facto the same as including encoding=‘utf-8’ in the open() statement to read special characters:
with open('filapath/filename.ext','r', encoding="utf-8") as file:
print(file.read())
So, what’s utf-8?
coz obviously we can’t read some file’s characters?
This reminded me of html metadata with utf-8 within <head> tags
UTf-8 (Unicode Transformation Format, 8-bit) is an encoding standard. It translates numbers and letters into binary.
ASCII was the first character encoding standard for the web. It defined 128 different characters that could be used on the internet: then came ANSI with special characters addition, later…