Working with Unicode data in NiceWatch triggers
Article ID: 187 - Updated: Feb 25, 2010 - Products: - Version: V5 - Category: How-To
Nowadays supporting multilingual data on the labels is essential. To successfully print labels with data in different languages, the support must be available on the system level (which is not a problem with Windows operating systems) and in the labeling software. Basically, whenever talking about multi-lingual data, we refer to the support for Unicode standard.
In generation 5 NiceLabel software is Unicode-aware application, not only on the label-design level (NiceLabel Pro), but also in the integration level (NiceWatch). You can accept Unicode data to any NiceWatch-supported trigger. In most cases multilingual data is provided with the UTF-8 data encoding.
UTF-8 encodes each character in 1 to 4 bytes, just the lower 128 US-ASCII characters are encoded with 1 byte. UTF-8 can encode any Unicode character, avoiding the need to figure out and set a code page or otherwise indicate what character set is in use. Without Unicode the characters above ASCII code 128 will look differently dependent on the code page used in Windows system.
NiceWatch accepts the data from any trigger type as-is. NiceWatch will not modify the incoming encoding in any way. It is good practice to indicate what kind of data encoding you will use for the data sent to triggers. In this case NiceWatch will automatically decode the data.
If you send text files to NiceWatch, include the UTF-8 byte order mark (BOM) information in the beginning of the file. The BOM header identifies the data structure. For UTF-8 encoding the BOM defines three bytes at the start of document: 0xEF, 0xBB, 0xBF.
If you send data to HTTP trigger, make sure you client adds the information about UTF-8 encoding in the header. Usually like this: content-type: text/html; charset=utf-8.
However, from various reasons it is not always possible to add the encoding information in the incoming data. For example, NiceWatch will receive the data stream encoded in UTF-8, but without the BOM or charset setting. The data will not be automatically recognized as UTF-8, NiceWatch will use it as plain ASCII characters. That is not a problem if your Windows system locale (code page) matches the data encoding that you print. The problem will occur if you print characters from other alphabetic scripts (other languages). Latin letters with diacritics will not print correctly using your system locale.
Apart from the obvious workaround (changing the system locale setting in Control Panel -> Regional and Language options) there is a better solution. You can manually define the data encoding in the trigger. If you know data is received encoded as UTF-8 (without explicitly definition), do the following:
- Open properties of your trigger
- Go to the Filter tab
- Click the Advanced button
- Select UTF-8 for the file encoding
NiceWatch will not automatically try to figure out the decoding, and will default to UTF-8.
Note: if you have NiceWatch Enterprise or NiceWatch Enterprise Business Connector 184.108.40.2063 or previous, the Unicode is not supported correctly in the HTTP trigger. Upgrade to the latest version, or contact Technical Support for the hotfix.