Linux迷+Python粉 - LANGhttps://blog.pythonwood.com/2018-11-07T15:30:00+08:00神奇的环境bug导致python3中出现udc开头字符串2018-11-07T15:30:00+08:002018-11-07T15:30:00+08:00pythonwoodtag:blog.pythonwood.com,2018-11-07:/2018/11/神奇的环境bug导致python3中出现udc开头字符串/<h2 id="langzh_cnutf-8langen_usutf-8">注意:<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8不可混淆!<a class="headerlink" href="#langzh_cnutf-8langen_usutf-8" title="Permanent link">&para;</a></h2> <p><strong><span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8有区别</strong> ,&nbsp;所以不可混淆!想之前在python2时代吃过坑,没想到到了统一unicode的python3 …</p><h2 id="langzh_cnutf-8langen_usutf-8">注意:<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8不可混淆!<a class="headerlink" href="#langzh_cnutf-8langen_usutf-8" title="Permanent link">&para;</a></h2> <p><strong><span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8有区别</strong> ,&nbsp;所以不可混淆!想之前在python2时代吃过坑,没想到到了统一unicode的python3,因环境不一致也能导致编码问题!</p> <h2 id="_1">当时环境与功能:<a class="headerlink" href="#_1" title="Permanent link">&para;</a></h2> <p>vps系统是ubutnu 14.04, 相关软件python3.4, selenium3+, chrome66, chromedriver。使用crontab启动shell, shell中启动python脚本,&nbsp;脚本中selenium启动chrome,……</p> <h2 id="bug">出bug的运行流程:<a class="headerlink" href="#bug" title="Permanent link">&para;</a></h2> <ol> <li>crontab中的a.sh启动 <strong><span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8 bash&nbsp;a.sh</strong></li> <li>a.sh末尾调用&rdquo;b中文名.py&rdquo;,&nbsp;带中文参数&rdquo;《xxx》&rdquo;</li> <li>b中文.py 中print(参数1) 会异常显示字符串编码问题&rsquo;ascii&rsquo; codec can&rsquo;t encode&nbsp;characters</li> </ol> <h2 id="_2">调试发现:<a class="headerlink" href="#_2" title="Permanent link">&para;</a></h2> <ol> <li>print repr(中文参数1), 会打印\udc&nbsp;开头的而非\x开头的utf8型编码。</li> <li>比如&rdquo;《&rdquo; 正常编码是 <strong>&lsquo;\xe3\x80\x8a&rsquo;, 此处确是打印了&rsquo;\udce3\udc80\udc8a&rsquo;</strong>&nbsp;。</li> <li>改变逻辑,直接ssh到vps并执行 <strong>b中文.py 《xxx》</strong>&nbsp;没有问题!</li> </ol> <h2 id="_3">问题定位:<a class="headerlink" href="#_3" title="Permanent link">&para;</a></h2> <ol> <li>个人本机ubuntu系统测试不会出现bug,vps才出现,所以应该是shell环境或者是python环境问题。</li> <li>打印执行a.sh的shell环境,对比发现本机有<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8和<span class="caps">LANGUAGE</span>=zh_CN:zh,vps仅有<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8。</li> <li>把crontab中强加的环境变量<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8去掉,此时a.sh的环境变量为<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8,&nbsp;vps恢复正常。(2小时排查出结果了!)</li> <li>总结: 之前觉得<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8与<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8没什么不同,从此改观。</li> </ol> <h2 id="_4">问题解决:<a class="headerlink" href="#_4" title="Permanent link">&para;</a></h2> <p>去掉<span class="caps">LANG</span>=zh_CN.<span class="caps">UTF</span>-8,之后执行过程中会自动变成默认<span class="caps">LANG</span>=en_US.<span class="caps">UTF</span>-8</p> <h2 id="_5">原因探究:<a class="headerlink" href="#_5" title="Permanent link">&para;</a></h2> <p>待定</p> <h2 id="python-reprudc-print">python repr输出udc开头字符串, print(参数)导致异常<a class="headerlink" href="#python-reprudc-print" title="Permanent link">&para;</a></h2> <div class="highlight"><pre><span></span>&#39;/home/maskuser/path/to/ts/\udce3\udc80\udc8a\udce9\udc80\udc97.....mp4&#39; Traceback (most recent call last): File &quot;/home/maskuser/pathtodir/script/20181105\udce8\udca7\udc86\.....py&quot;, line 73, in &lt;module&gt; video_upload_testsite(*sys.argv[1:]) File &quot;/home/maskuser/pathtodir/script/20181105\udce8\udca7\udc86\.....py&quot;, line 29, in video_upload_testsite print (videopath) UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode characters in position 27-50: ordinal not in range(128) </pre></div>