Pandas DataFrame filter()

Pandas 中的 filter() 方法用于根据指定的条件从 DataFrame 中过滤行和列。

示例

import pandas as pd

# create a sample DataFrame
data = {'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}

df = pd.DataFrame(data)

# use filter() to select specific columns by name
selected_columns = df.filter(items=['A', 'C'])

# print the resulting DataFrame
print(selected_columns)

'''
Output

    A  C
0   1  7
1   2  8
2   3  9

'''

filter() 语法

Pandas 中 filter() 方法的语法是：

df.filter(items=None, like=None, regex=None)

filter() 参数

filter() 方法接受以下参数：

items（可选）- 一个包含我们想要保留的列标签的列表
like（可选）- 一个字符串，表示要在列名中匹配的子字符串
regex（可选）- 一个正则表达式模式

filter() 返回值

filter() 方法根据指定的条件（例如列名、子字符串或正则表达式模式）从 DataFrame 中返回选定的列。

示例 1：选择包含特定子字符串的列

import pandas as pd

# create a dictionary 
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

# create a DataFrame df from data
df = pd.DataFrame(data)

# use filter() to select specific columns ('Name' and 'Age') from df
selected_columns = df.filter(items=['Name', 'Age'])

# display the selected columns
print(selected_columns)

输出

     Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22

在上面的示例中，我们首先创建了具有三列的 df DataFrame：Name、Age 和 City。

然后，我们使用带 items 参数的 filter() 方法来仅选择 Name 和 Age 列。

示例 2：使用 like 参数选择包含特定子字符串的列

import pandas as pd

# sample DataFrame
data = {'apple_count': [3, 2, 5],
        'banana_count': [1, 4, 6],
        'orange_count': [4, 3, 2]}

df = pd.DataFrame(data)

# select columns containing the substring "apple"
filtered_columns = df.filter(like='apple')

print(filtered_columns)

输出

    apple_count
0            3
1            2
2            5

在此示例中，我们使用带 like 参数的 filter() 方法来选择 DataFrame 中列名包含子字符串 apple 的列。

结果存储在 filtered_columns DataFrame 中，由于 apple_count 列匹配子字符串 apple，因此它仅包含 apple_count 列。

示例 3：使用正则表达式模式选择列

import pandas as pd

# create a sample DataFrame
data = {'A_column': [1, 2, 3],
        'B_column': [4, 5, 6],
        'C_Column': [7, 8, 9]}
df = pd.DataFrame(data)

# use filter() with a regular expression pattern to select columns
filtered_df = df.filter(regex='^A|C_')

print(filtered_df)

输出

  A_column  C_Column
0         1         7
1         2         8
2         3         9

在这里，我们创建了具有 A_column、B_column 和 C_column 列的 df DataFrame。

我们使用了 filter() 函数，并将 regex 参数设置为 '^A|C_'，这意味着我们想要选择以 'A' 开头的列或名称以 'C_' 开头的列。

结果是，filtered_df 仅包含 'A_column' 和 'C_column' 列。

注意：要了解有关正则表达式的更多信息，请访问 Python RegEx。

我们的高级学习平台，凭借十多年的经验和数千条反馈创建。

以前所未有的方式学习和提高您的编程技能。

试用 Programiz PRO

交互式课程
证书
AI 帮助
2000+ 挑战

热门教程

热门实例

参考资料

认证课程

成为一名认证的 Python
程序员。

热门教程

参考资料

热门实例

Pandas DataFrame filter()

示例

filter() 语法

filter() 参数

filter() 返回值

示例 1：选择包含特定子字符串的列

示例 2：使用 like 参数选择包含特定子字符串的列

示例 3：使用正则表达式模式选择列

热门教程

热门实例

参考资料

认证课程

成为一名认证的 Python程序员。

热门教程

参考资料

热门实例

Pandas DataFrame filter()

示例

filter() 语法

filter() 参数

filter() 返回值

示例 1：选择包含特定子字符串的列

示例 2：使用 like 参数选择包含特定子字符串的列

示例 3：使用正则表达式模式选择列

成为一名认证的 Python
程序员。