As part of setting up this system for a friend, I wanted to test how Unicode data is handled.
It turns out there are 2 bugs when receiving messages, and 1 when sending. Both will cause data to be truncated at the first encountered invalid character.
For receiving, any code point at U+8000 or above will cause truncation, if the char data type is signed and sizeof(unsigned int) > 2, which I believe should hold on most systems. The reason is that sign extension will take place here:
Because of the sign extension, we will have c>0xffff if sizeof(c) > 2, and this function will return 0, causing the string to be truncated.
Another problem for both receiving and sending is that UTF-16 data is not supported. UTF-16 is basically an extension of UCS2, allowing the code points U+10000 to U+FFFFF to be represented with two UCS2 code points, called surrogate pairs. The first pair is in the range 0xd800..0xdbff, and the second one is 0xdc00..0xdffff.
I fixed these problems. I registered on this forum only to report this bug and to submit my patch. I would have preferred to use email for this, but I did not find any address. So, I am posting my patch below (see more discussion after it):
The above patch includes some cleanup as well, mainly the addition of static linkage specifiers and const qualifiers.
For testing the patch, I used the following patch:
and the following test program that I wrote first so that I could reproduce the problem without having any hardware attached:
The test string that I used was from the "PDU" header of a received message. With that message, the output of the program would be as follows:
Without the patch applied, the string would be truncated at the first emoji (U+1F600), leaving the string "Koe ja toinen " (with a terminating space).
I would appreciate it if you could include this patch in the next smstools release.