標題可能下的含糊不清,意義不明,且讓我娓娓道來
昨天,我用youtube-dl從酷我音樂上載了林俊傑的江南下來。載下來的檔案是ape格式,我還是第一次看到,用VLC開一堆error,所以我就用mpv聽:
$ mpv 江南-93157.ape
Playing: 江南-93157.ape
(+) Audio --aid=1 (ape)
File tags:
Artist: ÁÖ¿¡½Ü
Album: ½ÄÏ
Genre: R&B
Title: ½ÄÏ
Track: 2
AO: [pulse] 44100Hz stereo 2ch s16
A: 00:00:01 / 00:04:27 (0%)
一切都很好,除了metadata那邊一堆亂碼。(後來發現他們是GB2312)ffprobe出來的結果也差不多:
$ ffprobe ~/Music/songs/江南-93157.ape
ffprobe version 2.7.1 Copyright (c) 2007-2015 the FFmpeg developers
built with gcc 5.1.0 (GCC)
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab
libavutil 54. 27.100 / 54. 27.100
libavcodec 56. 41.100 / 56. 41.100
libavformat 56. 36.100 / 56. 36.100
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 16.101 / 5. 16.101
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.100 / 1. 2.100
libpostproc 53. 3.100 / 53. 3.100
Input #0, ape, from '江南-93157.ape':
Metadata:
title : ½ÄÏ
artist : ÁÖ¿¡½Ü
album : ½ÄÏ
encoded_by : Exact Audio Copy (Secure mode)
track : 2
genre : R&B
date : 2004
Duration: 00:04:27.95, start: 0.000000, bitrate: 907 kb/s
Stream #0:0: Audio: ape (APE / 0x20455041), 44100 Hz, stereo, s16p
我就猜是ffmpeg的問題。首先先確定這些metadata是哪種格式:
$ file 江南-93157.ape
江南-93157.ape: Audio file with ID3 version 2.3.0, contains: Monkey's Audio compressed format version 3970 with fast compression, stereo, sample rate 44100
嗯是ID3v2 2.3。ID3v2的header開頭是"ID3"。先在ffmpeg的code裡搜尋這個字串:
$ grep -r '"ID3"'
libavformat/asfdec_o.c: else if (!strcmp(name, "ID3")) // handle ID3 tag
libavformat/asfdec_f.c: } else if (!strcmp(key, "ID3")) { // handle ID3 tag
libavformat/id3v2.h: * Default magic bytes for ID3v2 header: "ID3"
libavformat/id3v2.h:#define ID3v2_DEFAULT_MAGIC "ID3"
雖然asfdec_f.c跟asfdec_o.c聽起來不太像是decode ape的檔,不過應該都差不多。打開asfdec_o.c來看:
else if (!strcmp(name, "ID3")) // handle ID3 tag
get_id3_tag(s, val_len);
一步一步追查下去,可以知道函數呼叫的順序是:
- get_id3_tag() in libavformat/asfdec_o.c
- ff_id3v2_read() in libavformat/id3v2.c, 3~5也是在這個檔案裡
- id3v2_read_internal()
- id3v2_parse()
- read_ttag()
read_ttag()裡面,他先做decode_str():
if (decode_str(s, pb, encoding, &dst, &taglen) < 0) {
av_log(s, AV_LOG_ERROR, "Error reading frame %s, skipped\n", key);
return;
}
decode_str()用到的encoding參數是從前一個byte讀出來的。我下載下來的檔案這個地方是0,也就是ISO-8859-1。不過那些字串顯然不是ISO-8859-1,顯示出來就變成亂碼。
case ID3v2_ENCODING_ISO8859:
while (left && ch) {
ch = avio_r8(pb);
PUT_UTF8(ch, tmp, avio_w8(dynbuf, tmp);)
left--;
}
break;
這裡的關鍵是PUT_UTF8,他做的事是把UTF-32轉換成UTF-8(詳見libavutil/common.h)。理論上只要把UTF-8轉回UTF-32,再用GB2312顯示出來就可以了。
不過這裡有個問題。ffprobe的結果裡面"江南-93157.ape"那一行是正確的UTF-8,其他有亂碼的地方是GB2312,不能全部用同一種encoding顯示出來。我的作法是每行分開處理。
剛好最近聽到chardet,一個可以自動幫你偵測encoding的python module,我就拿來用用看。最後的code放在這兒。
結果如下,大功告成!
$ ffprobe ~/Music/songs/江南-93157.ape |& convert_encoding.py
Encoding: GB2312 with confidence 0.99
ffprobe version 2.7.1 Copyright (c) 2007-2015 the FFmpeg developers
built with gcc 5.1.0 (GCC)
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab
libavutil 54. 27.100 / 54. 27.100
libavcodec 56. 41.100 / 56. 41.100
libavformat 56. 36.100 / 56. 36.100
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 16.101 / 5. 16.101
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.100 / 1. 2.100
libpostproc 53. 3.100 / 53. 3.100
Input #0, ape, from '江南-93157.ape':
Metadata:
title : 江南
artist : 林俊杰
album : 江南
encoded_by : Exact Audio Copy (Secure mode)
track : 2
genre : R&B
date : 2004
Duration: 00:04:27.95, start: 0.000000, bitrate: 907 kb/s
Stream #0:0: Audio: ape (APE / 0x20455041), 44100 Hz, stereo, s16p