Bug 157929

Summary: Lost PUA chars with xlsx/ods formats
Product: LibreOffice Reporter: yukiguo <yukisama>
Component: CalcAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED INSUFFICIENTDATA    
Severity: normal CC: ilmari.lauhakangas
Priority: medium    
Version: 7.6.2.1 release   
Hardware: x86-64 (AMD64)   
OS: Windows (All)   
Whiteboard:
Crash report or crash signature: Regression By:
Attachments: PUAtest.xls
PUAtest2.png

Description yukiguo 2023-10-26 08:57:04 UTC
Description:
Calc opening/saving as xlsx/ods format will lose PUA characters.

Unicode PUA zone has 3 area:

    U+00E000-U+00F8FF Private Use Area
    U+0F0000-U+0FFFFF Supplementary Private Use Area-A
    U+100000-U+10FFFF Supplementary Private Use Area-B

but from original file saved as ODS, the PUA chars will be removed. (after opening file)

I lost a lot of data because of this issue

The same problem occurs when accessing XLSX files

evo: Windows10/LibreOffice 7.6.2.1 x64

Steps to Reproduce:
1. Open the xls/xlsx file containing PUA characters.
2. Save as ods/xlsx format file.
3. Close all files with Calc windows.
4. Open the newly saved ods/xlsx file.
5. All PUA characters are deleted and cannot be recovered.

Actual Results:
PUA characters are deleted.

Expected Results:
should be keep the original characters.


Reproducible: Always


User Profile Reset: Yes

Additional Info:
test file:
https://ask.libreoffice.org/t/lost-pua-chars-with-xlsx-ods-formats/97477
Comment 1 yukiguo 2023-10-26 09:01:36 UTC
Created attachment 190434 [details]
PUAtest.xls

PUAtest.xls is a file contain PUA chars.
Comment 2 yukiguo 2023-10-26 09:05:46 UTC
Created attachment 190435 [details]
PUAtest2.png

PUAtest2.png is the xls format convert to ods format result.
Comment 3 ajlittoz 2023-10-26 11:51:42 UTC
Before concluding that something is wrong in Calc, it is necessary to check what is in the test file.

From experiment (using Alt-X), problematic cells contain a pair of Unicode codepoints taken from the Surrogate block, but order in these pairs is incorrect. First codepoint is low surrogate instead of high.

When members of the pair are switched, the surrogate pair is recognised as such and Calc displays an X-crossed rectangle (missing glyph in font) in my 7.5.7.1 under Fedora 38, KDE Plasma desktop.

OP claims the characters are taken from a PUA block but decoding the surrogate pairs (at least in A9 and A10) shows they are somewhere in Plane 2.

I'd first suspect an incorrect designation for the intended characters.

More information in needed about the intended characters.
Comment 4 Buovjaga 2023-11-01 18:04:03 UTC
(In reply to ajlittoz from comment #3)
> Before concluding that something is wrong in Calc, it is necessary to check
> what is in the test file.
> 
> From experiment (using Alt-X), problematic cells contain a pair of Unicode
> codepoints taken from the Surrogate block, but order in these pairs is
> incorrect. First codepoint is low surrogate instead of high.
> 
> When members of the pair are switched, the surrogate pair is recognised as
> such and Calc displays an X-crossed rectangle (missing glyph in font) in my
> 7.5.7.1 under Fedora 38, KDE Plasma desktop.
> 
> OP claims the characters are taken from a PUA block but decoding the
> surrogate pairs (at least in A9 and A10) shows they are somewhere in Plane 2.
> 
> I'd first suspect an incorrect designation for the intended characters.
> 
> More information in needed about the intended characters.

NEEDINFO while we wait for the reporter to respond.
Comment 5 QA Administrators 2024-04-30 03:14:21 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2024-05-31 03:15:27 UTC
Dear yukiguo,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-FollowUp