Re: [Audacity-help] Audacity CVS compile on Ubuntu
On Sun, 2005-09-25 at 15:27 -0400, A. D. Olson wrote:
> > I've just checked out a copy and built it using GCC 3.3.6 against wxGTK
> > 2.6.1 Unicode GTK2 without any problems. From the fact that you are
> > getting character set conversion errors it sounds like you are trying to
> > use an ansi version and unicode won't back-convert to ansi (not
> > suprising).
> I've tried with GCC 4.0 and 3.4, same result on both.
Ah. Just tried 3.4.3 and got the exact same error - so it is an audacity
problem, but only shows up with GCC 3.4 and higher.
Not yet sure what is wrong, except the file does have a load of extended
character set characters in it.
g++ -c -I../lib-src/portaudio/pa_common -I../lib-src/portmixer/px_common
-g -O2 -I../lib-src/libresample/include
-I../lib-src/expat -I../lib-src/allegro -Wall -I./include -I.
-DGTK_NO_CHECK_CASTS -D__WXGTK__ -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES
-D_LARGEFILE_SOURCE=1 Languages.cpp -o Languages.o
Languages.cpp:96:1: converting to execution character set: Invalid
Languages.cpp:98:1: converting to execution character set: Invalid or
incomplete multibyte or wide character
Languages.cpp:106:1: converting to execution character set: Invalid
make: *** [Languages.o] Error 1
make: Leaving directory
Re: Audacity CVS compile with GCC >=3.4 (was Audacity CVS compile on Ubuntu)
On Sun, 2005-09-25 at 22:16 +0100, Richard Ash wrote:
> > I've tried with GCC 4.0 and 3.4, same result on both.
> Ah. Just tried 3.4.3 and got the exact same error - so it is an audacity
> problem, but only shows up with GCC 3.4 and higher.
> Not yet sure what is wrong, except the file does have a load of extended
> character set characters in it.
The same file in 1.2.4 doesn't produce an error, so the line
localLanguageName["es"] = "Espanol";
(only with a ~ on the n that my mail client can't cope with) is valid in
localLanguageName[wxT("es")] = wxT("Espanol");
from 1.3 isn't if you use gcc 3.4 and above.
I presume the "n with ~" character is the source of the problem, as
Evolution can't cope with it being pasted, gnome-terminal displays a
square box character, but vim inside Eterm knows what it is and prints
What's not clear is what to do with the source file in order that gcc
3.4 can compile the 1.3 version (we will also need this to work to build
on OS X as gcc 4.0 is standard, and a number of distros are using it
(Ubuntu and Fedora to my knowledge)).
A quick look with a hex editor reveals the file has one bytes per
character for all characters, with the english characters all leaving
the most significant bit unset (i.e. having hex values less than 80h),
where as the problem characters have the high bit set, giving a value
greater than 80h. This is normal for an ascii text file. The problem is
that it's getting converted to Unicode for wxGTK, and that's where the
errors come from.
Reading the GCC 3.4 release notes:
"The C, C++, and Objective-C compilers can now handle source files
written in any character encoding supported by the host C library. The
default input character set is taken from the current locale, and may be
overridden with the -finput-charset command line option. In the future
we will add support for inline encoding markers."
this implies that -finput-charset=ascii added the the compiler flags for
that file. Unfortunately all that gives us is the error
"cc1plus: failure to convert ascii to UTF-8"
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf8 takes the cause of this a little further, as it points out that
characters with the high bit set in ASCII do not map directly to UTF-8
characters - they map to two UTF-8 bytes (a variable-width character set
here), both of which have the high bit set. The aim is that ASCII files
using the 7-bit character set (e.g. nearly all source code, config files
etc) are automatically valid UTF-8.
The problem in Languages.cpp is we have an ASCII file using the high
bit, so a character set conversion is needed to get a valid UTF-8 that
GCC 3.4 and higher can use. inconv would seem to be the relevant tool,
except that you need to work out which charset the source is really in:
$ iconv -f ascii -t UTF-8 Languages.cpp
gives an error on first accented character
iconv -f ISO-8859-15 -t UTF-8 Languages.cpp
works, but doesn't give the correct output characters, nor does
iconv -f ISO-8859-1 -t UTF-8 Languages.cpp
iconv -f Latin1 -t UTF-8 Languages.cpp
All get two characters from the one in the source file, neither of which
Opening the file in Vim, doing :set fileencoding=utf-8 and then saving
works, and the result compiles in GCC 3.4
Now waiting for wxGTK to rebuild so I can test the resulting binary to
find out what has happened, and then see if GCC 3.3 can still cope with
the input file.
Converted code is attached in tarball to protect it from email!
> g++ -c -I../lib-src/portaudio/pa_common -I../lib-src/portmixer/px_common
> -g -O2 -I../lib-src/libresample/include
> -I../lib-src/soundtouch/include -I../lib-src/libnyquist/nyx
> -I../lib-src/expat -I../lib-src/allegro -Wall -I./include -I.
> -I/usr/lib/wx/include/gtk2-unicode-release-2.6 -I/usr/include/wx-2.6
> -DGTK_NO_CHECK_CASTS -D__WXGTK__ -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES
> -D_LARGEFILE_SOURCE=1 Languages.cpp -o Languages.o
> Languages.cpp:96:1: converting to execution character set: Invalid
> Languages.cpp:98:1: converting to execution character set: Invalid or
> incomplete multibyte or wide character
> Languages.cpp:106:1: converting to execution character set: Invalid
> make: *** [Languages.o] Error 1
> make: Leaving directory