`
davidxiaozhi
  • 浏览: 236685 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

python编码设置

阅读更多
       当我们的python源文件中出现中文的问题是,我们便需要指定源文件中的编码,这样解释器就会使用指定编码去解释源文件,python默认编码如果我们木有制定的话默认是ASCII
通过python官方文档,如果我们对python源文件指定编码的话我们必须如下图一样声明注释在第一行或者第二行
例如
# coding=<encoding name>  例如 # -*- coding: UTF-8 -*-

或者通过流行的编辑器 使用 公认的格式
 
#!/usr/bin/python
 # -*- coding: <encoding name> -*-

或者

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

很重要的一段我们看一下为什么这么设置

More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation. There must not be any Python statement on the line that contains the encoding declaration

好吧,我承认有的同学懒得看英文文档,大概意思是这样:更准确的来说,第一行或者第二行必须匹配正则表达式
coding[:=]\s*([-\w.]+) 第一组表达式([-\w.]+) 被解释成一个字符编码,如果解释的编码python不知道,在编译的时候一个错误便会给出,这里必须有一个python语句在前两行并且包含编码的声明

To aid with platforms such as Windows, which add Unicode BOM marks to the beginning of Unicode files, the UTF-8 signature '\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well (even if no magic encoding comment is given). If a source file uses both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'.  Any other encoding will cause an error.
 
对于windows平台 如果一个Unicode BOM标记开始于Unicode 文件, UTF-8签名'\xef\xbb\xbf' 将会被解释为utf-8编码(即使我们在注释中声明了其他编码),但是如果源文件中包含UTF-8 BOM 标记 我们在注释中提供了编码 也仅仅允许是UTF-8 否则一个错误将会给出。

 1. With interpreter binary and using Emacs style file encoding
       comment:

          #!/usr/bin/python
          # -*- coding: latin-1 -*-
          import os, sys
          ...

          #!/usr/bin/python
          # -*- coding: iso-8859-15 -*-
          import os, sys
          ...

          #!/usr/bin/python
          # -*- coding: ascii -*-
          import os, sys
          ...

    2. Without interpreter line, using plain text:

          # This Python file uses the following encoding: utf-8
          import os, sys
          ...

    3. Text editors might have different ways of defining the file's
       encoding, e.g.

          #!/usr/local/bin/python
          # coding: latin-1
          import os, sys
          ...

    4. Without encoding comment, Python's parser will assume ASCII
       text:

          #!/usr/local/bin/python
          import os, sys
          ...

    5. Encoding comments which don't work:

       Missing "coding:" prefix:

          #!/usr/local/bin/python
          # latin-1
          import os, sys
          ...

       Encoding comment not on line 1 or 2:

          #!/usr/local/bin/python
          #
          # -*- coding: latin-1 -*-
          import os, sys
          ...

       Unsupported encoding:

          #!/usr/local/bin/python
          # -*- coding: utf-42 -*-
          import os, sys
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics