Linux迷+Python粉 - LANGhttps://blog.pythonwood.com/2018-11-07T15:30:00+08:00神奇的环境bug导致python3中出现udc开头字符串2018-11-07T15:30:00+08:002018-11-07T15:30:00+08:00pythonwoodtag:blog.pythonwood.com,2018-11-07:/2018/11/神奇的环境bug导致python3中出现udc开头字符串/<h2 id="langzh_cnutf-8langen_usutf-8">注意:<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8不可混淆!<a class="headerlink" href="#langzh_cnutf-8langen_usutf-8" title="Permanent link">¶</a></h2>
<p><strong><span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8有区别</strong> , 所以不可混淆!想之前在python2时代吃过坑,没想到到了统一unicode的python3 …</p><h2 id="langzh_cnutf-8langen_usutf-8">注意:<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8不可混淆!<a class="headerlink" href="#langzh_cnutf-8langen_usutf-8" title="Permanent link">¶</a></h2>
<p><strong><span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8有区别</strong> , 所以不可混淆!想之前在python2时代吃过坑,没想到到了统一unicode的python3,因环境不一致也能导致编码问题!</p>
<h2 id="_1">当时环境与功能:<a class="headerlink" href="#_1" title="Permanent link">¶</a></h2>
<p>vps系统是ubutnu 14.04, 相关软件python3.4, selenium3+, chrome66, chromedriver。使用crontab启动shell, shell中启动python脚本, 脚本中selenium启动chrome,……</p>
<h2 id="bug">出bug的运行流程:<a class="headerlink" href="#bug" title="Permanent link">¶</a></h2>
<ol>
<li>crontab中的a.sh启动 <strong><span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8 bash a.sh</strong></li>
<li>a.sh末尾调用”b中文名.py”, 带中文参数”《xxx》”</li>
<li>b中文.py 中print(参数1) 会异常显示字符串编码问题’ascii’ codec can’t encode characters</li>
</ol>
<h2 id="_2">调试发现:<a class="headerlink" href="#_2" title="Permanent link">¶</a></h2>
<ol>
<li>print repr(中文参数1), 会打印\udc 开头的而非\x开头的utf8型编码。</li>
<li>比如”《” 正常编码是 <strong>‘\xe3\x80\x8a’, 此处确是打印了’\udce3\udc80\udc8a’</strong> 。</li>
<li>改变逻辑,直接ssh到vps并执行 <strong>b中文.py 《xxx》</strong> 没有问题!</li>
</ol>
<h2 id="_3">问题定位:<a class="headerlink" href="#_3" title="Permanent link">¶</a></h2>
<ol>
<li>个人本机ubuntu系统测试不会出现bug,vps才出现,所以应该是shell环境或者是python环境问题。</li>
<li>打印执行a.sh的shell环境,对比发现本机有<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8和<span class="caps">LANGUAGE</span>=zh_CN:zh,vps仅有<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8。</li>
<li>把crontab中强加的环境变量<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8去掉,此时a.sh的环境变量为<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8, vps恢复正常。(2小时排查出结果了!)</li>
<li>总结: 之前觉得<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8没什么不同,从此改观。</li>
</ol>
<h2 id="_4">问题解决:<a class="headerlink" href="#_4" title="Permanent link">¶</a></h2>
<p>去掉<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8,之后执行过程中会自动变成默认<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8</p>
<h2 id="_5">原因探究:<a class="headerlink" href="#_5" title="Permanent link">¶</a></h2>
<p>待定</p>
<h2 id="python-reprudc-print">python repr输出udc开头字符串, print(参数)导致异常<a class="headerlink" href="#python-reprudc-print" title="Permanent link">¶</a></h2>
<div class="highlight"><pre><span></span>'/home/maskuser/path/to/ts/\udce3\udc80\udc8a\udce9\udc80\udc97.....mp4'
Traceback (most recent call last):
File "/home/maskuser/pathtodir/script/20181105\udce8\udca7\udc86\.....py", line 73, in <module>
video_upload_testsite(*sys.argv[1:])
File "/home/maskuser/pathtodir/script/20181105\udce8\udca7\udc86\.....py", line 29, in video_upload_testsite
print (videopath)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 27-50: ordinal not in range(128)
</pre></div>