I noticed that the CSV files (when exporting reports) do not display special characters properly, such as “é” in “Numéro”, instead displaying their UTF8 codes:
Could be possible to support decoding UTF8 codes on exported CSV files so that special characters would display correctly? For now I can just fix the spelling in Excel after importing the CSVs, but it would be nice if this would be done automatically.
According to my research, a BOM (Byte order mark) has to be set in the CSV to tell other software what its encoding is.
The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:[1]
The byte order, or endianness, of the text stream in the cases of 16-bit and 32-bit encodings;
The fact that the text stream’s encoding is Unicode, to a high level of confidence;
Which Unicode character encoding is used.
BOM use is optional. Its presence interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.
Indeed. If adding a BOM is implemented, maybe it could be as a toggle option? This way people using software not compatible with a BOM could export a plain CSV, and for the rest of us, we wouldn’t have to guess and manually pick the encoding when importing the values into a compatible software.
The “ON” option could be default too, IMO, since Excel, the go-to spreadsheet program, does look for an encoding mark when importing CSVs, according to this:
Excel looks for a special signature string at the beginning of a CSV file to determine its encoding. For UTF-8 we can add 3 special bytes to hint the UTF-8 signature (the signature is a type of “BOM” for Byte Order Mark), the actual bytes are: \xEF\xBB\xBF . So the solution to this problem is very simple: add those 3 bytes at the beginning of the CSV file. (Also, remember to remove the bytes before parsing the file into a script.)