模块概述

模块是 Python 代码的一种组织单位,也是一种对象。各模块具有独立的命名空间,可包含任意 Python 对象。

一个 .py 文件是一个模块;一个文件夹是一个模块(包);文件夹(包)中还可以再有 .py 文件(子模块)和文件夹(子包)。例如 内置模块 random,和第三方包 pandas。

文件夹中包含一个 __init__.py 文件的包是常规包;无 __init__.py 文件的是命名空间包,仅被用作子包的容器。

模块无论是用 Python、C 还是别的语言实现均可。

import random, pandas
type(random), type(pandas)
(module, module)

所有包都是模块,但并非所有模块都是包。或者换句话说,包只是一种特殊的模块。

可以使用属性 __packge__ 查看包名,如果只是模块不是包,该属性为空字符串。

__name__ 属性是模块的名字。

特别地,主模块(你正在运行代码的当前模块)的 __packge__ 属性总是 None;__name__ 属性总是 '__main__'__main__ 是一个在解释器启动时直接初始化的特殊模块),这可以控制当前模块能够执行,而导入到其他模块不能被执行的代码,然后用来测试当前模块。

random.__package__, pandas.__package__
('', 'pandas')
print(__package__)
None
random.__name__, pandas.__name__
('random', 'pandas')
__name__
'__main__'
a = 3 + 2 - 5
def f():
    print(a+1)
print(a)

if __name__ == '__main__':
    # 以下代码导入其他模块不会执行
    print(a == 0)
0
True

使用 import 语句将其他模块导入当前模块;使用属性表示法调用模块中的属性。

import pandas as pd
pd.core
<module 'pandas.core' from 'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\pandas\\core\\__init__.py'>
pd.core.series.Series
pandas.core.series.Series

还可以以脚本的方式执行不属于包的模块(此时 __name__ 属性为 "__main__")。

import this
this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


<module 'this' from 'C:\\ProgramData\\Anaconda3\\lib\\this.py'>
# %run 是 jupyter 的魔法命令,在终端使用 python 命令
# F:\anaconda\lib\this.py 根据自己电脑的路径调整
%run F:\anaconda\lib\this.py
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
jupyter附件

创建模块

创建一个模块非常简单,有两种方式:

  • 创建一个 .py 文件,即可创建一个模块;

  • 创建一个文件夹,则该文件夹也是一个模块(包)。

模块中可以无任何内容。但模块是用来组织代码,实现处理各类问题或完成各种功能的,这更便于应用或开发。例如正则模块 re,科学计算库 pandas 等。

创建一个文件夹,如果文件夹中包含一个 __init__.py 模块,则该文件夹是一个常规包;否则是一个命名空间包。包中还可以再创建子包或子模块。

__init__.py 文件中可以无任何内容,但因为当一个常规包被导入时,这个 __init__.py 文件会隐式地被执行,所以通常用来写入一些导入包即可执行的代码,或导入子包,或导入子包中模块的属性等,从而可以直接调用某些属性,例如 pandas 包的文档描述属性 __doc__ 和 DataFrame 数据结构。

import folder # 创建的空文件夹
folder
<module 'folder' (namespace)>
import pandas as pd

print(pd.__doc__)
pandas - a powerful data analysis and manipulation library for Python
=====================================================================

**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

Main Features
-------------
Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as non-floating
    point data.
  - Size mutability: columns can be inserted and deleted from DataFrame and
    higher dimensional objects
  - Automatic and explicit data alignment: objects can be explicitly aligned
    to a set of labels, or the user can simply ignore the labels and let
    `Series`, `DataFrame`, etc. automatically align the data for you in
    computations.
  - Powerful, flexible group by functionality to perform split-apply-combine
    operations on data sets, for both aggregating and transforming data.
  - Make it easy to convert ragged, differently-indexed data in other Python
    and NumPy data structures into DataFrame objects.
  - Intelligent label-based slicing, fancy indexing, and subsetting of large
    data sets.
  - Intuitive merging and joining data sets.
  - Flexible reshaping and pivoting of data sets.
  - Hierarchical labeling of axes (possible to have multiple labels per tick).
  - Robust IO tools for loading data from flat files (CSV and delimited),
    Excel files, databases, and saving/loading data from the ultrafast HDF5
    format.
  - Time series-specific functionality: date range generation and frequency
    conversion, moving window statistics, date shifting and lagging.
# 直接调用子包 core 中模块 frame 的属性 DataFrame
pd.DataFrame
pandas.core.frame.DataFrame

可执行文件

每一个 .py 文件模块,都是 Python 的可执行文件。文件内容可以为空,但执行什么也不发生。

你可以在某些编辑工具里打开文件执行,也可以在命令行使用 python 文件路径python -m 模块 的方式执行。这些方式都是主模块中直接执行文件。

主模块(你正在运行代码的当前模块)的 __name__ 属性总是 '__main__'__main__ 是一个在解释器启动时直接初始化的特殊模块),因此直接执行文件,if __name__ == '__main__': 语句下的代码一定会被执行,而如果导入到其他模块则不会。

__name__
'__main__'

下面举例说明:

当前文件路径下,有一个 myfile 的包,包里有 space.pymycode.py 模块,内容如下:

# space.py 为空
# mycode.py 的内容, 在此源码直接执行
_a = '自学'

def __f():
    print(_a)

msg1 = '我是mycode模块中的代码'
print(msg1)

if __name__ == '__main__':
    msg2 = '我是导入其他模块不会执行的代码'
    print(msg2)
我是mycode模块中的代码
我是导入其他模块不会执行的代码
# 导入不会执行 __name__ == '__main__' 下面的代码
from myfile import mycode
print(mycode.msg1)
mycode.msg2
我是mycode模块中的代码
我是mycode模块中的代码


---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_21344/86150704.py in <module>
      2 from myfile import mycode
      3 print(mycode.msg1)
----> 4 mycode.msg2


AttributeError: module 'myfile.mycode' has no attribute 'msg2'

命令行直接执行模块效果同上。下面使用命令执行(%run 是 jupyter 的魔法命令,终端请改为 python):

# 文件路径必须带 .py,可以是相对或绝对路径
%run myfile/space.py 
%run myfile/mycode.py
我是mycode模块中的代码
我是导入其他模块不会执行的代码
import warnings # 忽略警告
warnings.filterwarnings("ignore") 

# 模块可以使用属性表示法,但不能有 .py
%run -m myfile.mycode 
我是mycode模块中的代码
我是导入其他模块不会执行的代码

导入操作

导入操作使用 import 语句,详细的语法规则查看 import 导入语句

  • import ... 只能导入模块:
import random as r, pandas.core as pc
r, pc
(<module 'random' from 'F:\\anaconda\\lib\\random.py'>,
 <module 'pandas.core' from 'F:\\anaconda\\lib\\site-packages\\pandas\\core\\__init__.py'>)
# 导入方法报错
import random.randint
---------------------------------------------------------------------------

ModuleNotFoundError                       Traceback (most recent call last)

<ipython-input-2-8d4ecd1fe339> in <module>
      1 # 导入方法报错
----> 2 import random.randint


ModuleNotFoundError: No module named 'random.randint'; 'random' is not a package
  • from ... import ... 从模块中导入子模块,类,函数等:
from pandas import core
core
<module 'pandas.core' from 'F:\\anaconda\\lib\\site-packages\\pandas\\core\\__init__.py'>
from pandas import DataFrame as df
df
pandas.core.frame.DataFrame
from random import randint
randint
<bound method Random.randint of <random.Random object at 0x000001E44ED52020>>
from math import pi
pi
3.141592653589793
  • from ... import * 将导入模块中所有的公有属性:

当前文件路径下,有一个 myfile 的包,包里有 space.pymycode.py__init__.py 等模块,__init__.py 内容为空,则 myfile 包里的模块都不是它的属性,无法导入

import myfile
dir(myfile)
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']
from myfile import *
mycode
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_21108/1508979804.py in <module>
      1 from myfile import *
----> 2 mycode


NameError: name 'mycode' is not defined

如果在文件 __init__.py 中我们定义了属性 __all__ ,从包 myfile 导入则只能导入该属性中的列出名称。

# 修改 myfile/__init__.py
__all__ = ['mycode','xue']
from myfile import *
mycode
我是mycode模块中的代码


<module 'myfile.mycode' from 'D:\\Jupyter\\jupyter\\jupyter-python\\15_module\\myfile\\mycode.py'>

在文件 myfile/mycode.py 中,以下划线打头的属性名,从模块 mycode.mycode 导入时都不可导入。

# myfile/mycode.py
_a = '自学'

def __f():
    print(_a)

msg1 = '我是mycode模块中的代码'
print(msg1)

if __name__ == '__main__':
    msg2 = '我是导入其他模块不会执行的代码'
    print(msg2)
from myfile.mycode import *
_a
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_9512/1217097939.py in <module>
      1 from myfile.mycode import *
----> 2 _a


NameError: name '_a' is not defined
__f
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_9512/3733874203.py in <module>
----> 1 __f


NameError: name '__f' is not defined

通常情况下不应使用这个功能,因为它在解释器中引入了一组未知的名称,而它们很可能会覆盖一些你已经定义过的名称。而且会导致代码的可读性很差。

# 其他方式则可以导入
from myfile.mycode import __f
print(__f)
del __f
<function __f at 0x00000157E4BA53A0>
  • 相对导入:

存在相对导入代码的模块,通常是不能直接执行的。因为直接执行,解释器认为该模块即为顶级模块,属性 __package__ 的值为 None。

print(__package__)
None

但可以使用 python -m 模块 命令直接执行。

例如,当前文件路径下,文件 myfile/test.py 中相对导入的代码可以使用 python -m 模块 命令直接执行,因为该命令将属性 __package__ 重新设置为顶级模块的名称:

# xue.py 模块在 myfile 包中,内容是 “msg = '自学是门手艺'”
# myfile/test.py 的内容
from . import xue
print(xue.msg)
print(__package__)
%run -m myfile.test
自学是门手艺
myfile

模块导入之后,即可使用属性表示法调用模块中属性:

import random, math
random.randint, math.pi
(<bound method Random.randint of <random.Random object at 0x000002811CF98C80>>,
 3.141592653589793)