CSV reports: support for special characters | v5.2.16-C56

Hi,

I noticed that the CSV files (when exporting reports) do not display special characters properly, such as “é” in “Numéro”, instead displaying their UTF8 codes:

The admin panel displays those characters fine before the export, though:

Could be possible to support decoding UTF8 codes on exported CSV files so that special characters would display correctly? For now I can just fix the spelling in Excel after importing the CSVs, but it would be nice if this would be done automatically.

Thanks!

Hi,

We’ll try to correct this, is it wrong if you manually inspect the file using notepad?

1 Like

Nope, it’s fine in Notepad!

"Numéro","Montant","Nom de la taxe","Montant de taxe","Date"

So could it be an Excel error then?

It could be related to the selected encoding when opening the file

Aah, great, that’s exactly that, thanks! Excel detects automatically “1252: West Europe”, which of course is wrong.
image :

Could it be possible to tag the CSV files so that other software would know how they are encoded?

I’m not sure that’s possible, are you aware of a solution for this?

According to my research, a BOM (Byte order mark) has to be set in the CSV to tell other software what its encoding is.

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:[1]

  • The byte order, or endianness, of the text stream in the cases of 16-bit and 32-bit encodings;
  • The fact that the text stream’s encoding is Unicode, to a high level of confidence;
  • Which Unicode character encoding is used.

BOM use is optional. Its presence interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.

More info here:
Byte order mark - Wikipedia

I don’t code PHP but according to the following post this is how to add a BOM using the PHP fputcsv() Function:

php - Adding BOM to CSV file using fputcsv - Stack Overflow

if(file_exists($file)) {
    $fp = fopen($file, 'a');
    if($fp && 
        fputcsv($fp, $submittedForm) && 
        fclose($fp)) {
        return true;
    } 
} else {
    $fp = fopen($file, 'w');
    fwrite($fp, $BOM); // NEW LINE
    if($fp &&    
        fputcsv($fp, $fields) &&
        fputcsv($fp, $submittedForm) &&
        fclose($fp)) {     
        return true;
    } 
}

Thanks, that’s interesting!

The ‘presence interferes…’ part is a little concerning though.

Indeed. If adding a BOM is implemented, maybe it could be as a toggle option? This way people using software not compatible with a BOM could export a plain CSV, and for the rest of us, we wouldn’t have to guess and manually pick the encoding when importing the values into a compatible software.

The “ON” option could be default too, IMO, since Excel, the go-to spreadsheet program, does look for an encoding mark when importing CSVs, according to this:

Excel looks for a special signature string at the beginning of a CSV file to determine its encoding. For UTF-8 we can add 3 special bytes to hint the UTF-8 signature (the signature is a type of “BOM” for Byte Order Mark), the actual bytes are: \xEF\xBB\xBF . So the solution to this problem is very simple: add those 3 bytes at the beginning of the CSV file. (Also, remember to remove the bytes before parsing the file into a script.)

Quick Fix for UTF-8 CSV files in Microsoft Excel — Edmundo Fuentes’ Blog

Thanks, definitely worth considering…

1 Like