Interested in improving this site? Please check the To Do page.

Back to String Verbs

string.convertCharset

Rewrites a string from one encoding (character set) to another.

Syntax

string.convertCharset( charsetIn, charsetOut, s )

Params

  • charsetIn is a string, the “internet name” of the character set used by the input string s
  • charsetOut is a string, the “internet name” of the character set to which the input string s is to be re-encoded
  • s is the string whose encoding is to be converted from charsetIn to charsetOut

Returns

The string in the new character set.

Examples

string.convertCharset( "macintosh", "iso-8859-1", "ñ é î - those are 'n e i', with accents" )

“Ò È Ó - those are ‘n e i’, but with accents”

Errors

  • An error will be generated if one of the specified character sets (charsetIn or charsetOut) are not available on the current system. (See “string.isCharsetAvailable”)
  • An error will be generated if the input string contains characters which do not actually belong to the specified encoding (charsetIn).

Notes

  • This verb is especially useful for standardizing an internet application’s input. Email and NNTP messages, for example, can be in virtually any encoding. Use this verb to convert from any character set (a.k.a. “encoding”) to whatever set you need, so long as such a conversion is possible.
  • If a character in the input can not be mapped to a character in the output encoding, the character will generally be replaced with a ?. In some cases, the character will be replaced with multiple characters that represent the original. For example, the macintosh set has a single character for “not equals” at byte 173. Look at what happens when that string is converted to the windows set, which doesn’t have that character:
    • string.convertCharset( “macintosh”, “windows-1252”, char( 173 ) )
      • “!=”
  • Some character sets contain characters which are not represented in other sets at all, even with multiple characters. For example, the common japanese encoding shift_jis contains many japanese characters which do not exist in iso-8859-1 or macintosh, so attempting such a conversion will generally result in text with many question marks where the Japanese characters were.

See Also


Personal Tools