对sklearn库中的鸢尾花数据集内容和结构的详解认识和load
对sklearn库中的鸢尾花数据集内容和结构的详解认识和load_iris()函数查找学习举例
对sklearn库中的鸢尾花数据集内容和结构的详解认识和load_iris函数查找学习举例 对sklearn库中的鸢尾花数据集内容和结构的详解认识和load_iris()函数查找学习举例一、鸢尾花数据位置二、鸢尾花数据调用2.1 load_iris()函数的使用方法查看(1)步骤1——使用浏览器打开Python Module Docs文档(2)步骤2——使用浏览器打开sklearn(package)库文档(3)步骤3——使用浏览器打开sklearn(package)库文档中的datasets(package)库(4)步骤4——使用浏览器打开sklearn(package)库文档中的datasets(package)库后,往下文搜索查找load_iris函数 2.2 load_iris函数的使用方法说明2.2.1 Parameters2.2.2 Returns2.2.3 Notes2.2.3 Examples 三、鸢尾花数据结果分析3.1 查看数据结构3.2 iris数据各个部分3.2.1 DESCR部分查看3.2.2 data部分查看3.2.3 target部分查看3.2.4 target_names部分查看3.2.5 feature_names部分查看3.2.6 filename部分查看3.2.7 frame部分查看3.2.8 data_module部分查看 四.鸢尾花数据绘图4.1 花萼数据的散点图绘制4.2 花瓣数据的散点图绘制 五、总结鸢尾花数据集在sklearn的机器学习中有重要应用,下载sklearn库后,鸢尾花数据就在如图1所示的datasets中。该数据集由 3 种不同类型的鸢尾花组成 (Setosa, Versicolour, 和 Virginica)花瓣和萼片尺寸,存储在 150x4 的 numpy.ndarray 中。行是样本,列是: 萼片长度、萼片宽度、花瓣长度和花瓣宽度。同时,本文深入分析iris数据内容和机构,以及一种的load_iris函数的学习举例,其他可以举一反三。
一、鸢尾花数据位置
下载sklearn库后,鸢尾花数据就在如图1所示的datasets中。对于鸢尾花的详细结构认识见本人博文链接: 鸢尾花植物的结构认识和Python中scikit-learn工具包的安装的内容。
图1 数据集datasets位置
二、鸢尾花数据调用
调用鸢尾花数据,使用如下python代码:
## 1. 从sklearn中加载数据集datasets from sklearn import datasets ## 2.取出datasets数据集中的鸢尾花数据赋值给iris iris = datasets.load_iris() #iris为为类似字典类型的数据,其中.load_iris()方法是机器学习库sklearn中的datasets数据集中的函数。查询使用方法如图2-图5所示。 12345 2.1 load_iris()函数的使用方法查看 (1)步骤1——使用浏览器打开Python Module Docs文档
图2 查看load_iris函数步骤1——使用浏览器打开Python Module Docs文档
图3 查看load_iris函数步骤2——使用浏览器打开sklearn(package)库文档
图4 查看load_iris函数步骤3——使用浏览器打开sklearn(package)库文档中的datasets(package)库
图5 查看load_iris函数步骤4——使用浏览器打开sklearn(package)库文档中的datasets(package)库后,往下文搜索查找load_iris函数
load_iris(*, return_X_y=False, as_frame=False)
Load and return the iris dataset (classification).
The iris dataset is a classic and very easy multi-class classification
dataset.
================= ==============
Classes 3
Samples per class 50
Samples total 150
Dimensionality 4
Features real, positive
================= ==============
Read more in the :ref:User Guide <iris_dataset>.
2.2.1 Parametersreturn_X_y : bool, default=False
If True, returns (data, target) instead of a Bunch object. See
below for more information about the data and target object.
.. versionadded:: 0.18 1
as_frame : bool, default=False
If True, the data is a pandas DataFrame including columns with
appropriate dtypes (numeric). The target is
a pandas DataFrame or Series depending on the number of target columns.
If return_X_y is True, then (data, target) will be pandas
DataFrames or Series as described below.
.. versionadded:: 0.23 1 2.2.2 Returns
data : :class:~sklearn.utils.Bunch
Dictionary-like object, with the following attributes.
data : {ndarray, dataframe} of shape (150, 4) The data matrix. If `as_frame=True`, `data` will be a pandas DataFrame. target: {ndarray, Series} of shape (150,) The classification target. If `as_frame=True`, `target` will be a pandas Series. feature_names: list The names of the dataset columns. target_names: list The names of target classes. frame: DataFrame of shape (150, 5) Only present when `as_frame=True`. DataFrame with `data` and `target`. .. versionadded:: 0.23 DESCR: str The full description of the dataset. filename: str The path to the location of the data. .. versionadded:: 0.20 123456789101112131415161718192021
(data, target) : tuple if return_X_y is True
A tuple of two ndarray. The first containing a 2D array of shape
(n_samples, n_features) with each row representing one sample and
each column representing the features. The second ndarray of shape
(n_samples,) containing the target samples.
.. versionadded:: 0.18 1 2.2.3 Notes
.. versionchanged:: 0.20 Fixed two wrong data points according to Fisher's paper. The new version is the same as in R, but not as in the UCI Machine Learning Repository. 1234 2.2.3 Examples
Let’s say you are interested in the samples 10, 25, and 50, and want to
know their class name.
from sklearn.datasets import load_iris data = load_iris() data.target[[10, 25, 50]] # 运行得到array([0, 0, 1]) list(data.target_names) #运行得到[np.str_('setosa'), np.str_('versicolor'), np.str_('virginica')] 12345
See :ref:sphx_glr_auto_examples_datasets_plot_iris_dataset.py for a more
detailed example of how to work with the iris dataset.
三、鸢尾花数据结果分析
3.1 查看数据结构## 1. 从sklearn中加载数据集datasets from sklearn import datasets ## 2.取出datasets数据集中的鸢尾花数据赋值给iris iris = datasets.load_iris() #iris为字典类型数据 # print("Shape of iris:n{}".format(iris.shape())) ## 3.打印字典iris所有键名 print("Keys of iris:n{}".format(iris.keys())) 1234567
运行结果:
图6 iris数据结构查看
根据图6可知iris数据是类似于字典结构的数据类型,它有8个键。进一步可以在PyCharm的python控制台如图7中圈1位置,再观察图7左侧,可以看到iris的数据结构以及里面所包含的其他具体数据等。
图7 在PyCharm软件的Python控制台中查看iris数据结构查看
调用方式
print("Values of key 'DESCR' of iris:n{}".format(iris.get('DESCR'))) 1
运行得到:
Values of key ‘DESCR’ of iris:_iris_dataset:
Iris plants dataset
Data Set Characteristics:
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
:Summary Statistics:
============== ==== ==== ======= ===== ====================
Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================
:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988
The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher’s paper. Note that it’s the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.
This is perhaps the best known database to be found in the
pattern recognition literature. Fisher’s paper is a classic in the field and
is referenced frequently to this day. (See Duda & Hart, for example.) The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant. One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.
… dropdown:: References
Annual Eugenics, 7, Part II, 179-188 (1936); also in “Contributions to
Mathematical Statistics” (John Wiley, NY, 1950).Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.Dasarathy, B.V. (1980) “Nosing Around the Neighborhood: A New System
Structure and Classification Rule for Recognition in Partially Exposed
Environments”. IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-2, No. 1, 67-71.Gates, G.W. (1972) “The Reduced Nearest Neighbor Rule”. IEEE Transactions
on Information Theory, May 1972, 431-433.See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
conceptual clustering system finds 3 classes in the data.Many, many more … 3.2.2 data部分查看
调用格式
print("Values of key 'data' of iris:n{}".format(iris.get('data'))) 1
运行结果:
Values of key ‘data’ of iris:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3. 1.4 0.1]
[4.3 3. 1.1 0.1]
[5.8 4. 1.2 0.2]
[5.7 4.4 1.5 0.4]
[5.4 3.9 1.3 0.4]
[5.1 3.5 1.4 0.3]
[5.7 3.8 1.7 0.3]
[5.1 3.8 1.5 0.3]
[5.4 3.4 1.7 0.2]
[5.1 3.7 1.5 0.4]
[4.6 3.6 1. 0.2]
[5.1 3.3 1.7 0.5]
[4.8 3.4 1.9 0.2]
[5. 3. 1.6 0.2]
[5. 3.4 1.6 0.4]
[5.2 3.5 1.5 0.2]
[5.2 3.4 1.4 0.2]
[4.7 3.2 1.6 0.2]
[4.8 3.1 1.6 0.2]
[5.4 3.4 1.5 0.4]
[5.2 4.1 1.5 0.1]
[5.5 4.2 1.4 0.2]
[4.9 3.1 1.5 0.2]
[5. 3.2 1.2 0.2]
[5.5 3.5 1.3 0.2]
[4.9 3.6 1.4 0.1]
[4.4 3. 1.3 0.2]
[5.1 3.4 1.5 0.2]
[5. 3.5 1.3 0.3]
[4.5 2.3 1.3 0.3]
[4.4 3.2 1.3 0.2]
[5. 3.5 1.6 0.6]
[5.1 3.8 1.9 0.4]
[4.8 3. 1.4 0.3]
[5.1 3.8 1.6 0.2]
[4.6 3.2 1.4 0.2]
[5.3 3.7 1.5 0.2]
[5. 3.3 1.4 0.2]
[7. 3.2 4.7 1.4]
[6.4 3.2 4.5 1.5]
[6.9 3.1 4.9 1.5]
[5.5 2.3 4. 1.3]
[6.5 2.8 4.6 1.5]
[5.7 2.8 4.5 1.3]
[6.3 3.3 4.7 1.6]
[4.9 2.4 3.3 1. ]
[6.6 2.9 4.6 1.3]
[5.2 2.7 3.9 1.4]
[5. 2. 3.5 1. ]
[5.9 3. 4.2 1.5]
[6. 2.2 4. 1. ]
[6.1 2.9 4.7 1.4]
[5.6 2.9 3.6 1.3]
[6.7 3.1 4.4 1.4]
[5.6 3. 4.5 1.5]
[5.8 2.7 4.1 1. ]
[6.2 2.2 4.5 1.5]
[5.6 2.5 3.9 1.1]
[5.9 3.2 4.8 1.8]
[6.1 2.8 4. 1.3]
[6.3 2.5 4.9 1.5]
[6.1 2.8 4.7 1.2]
[6.4 2.9 4.3 1.3]
[6.6 3. 4.4 1.4]
[6.8 2.8 4.8 1.4]
[6.7 3. 5. 1.7]
[6. 2.9 4.5 1.5]
[5.7 2.6 3.5 1. ]
[5.5 2.4 3.8 1.1]
[5.5 2.4 3.7 1. ]
[5.8 2.7 3.9 1.2]
[6. 2.7 5.1 1.6]
[5.4 3. 4.5 1.5]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[6.3 2.3 4.4 1.3]
[5.6 3. 4.1 1.3]
[5.5 2.5 4. 1.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
[5. 2.3 3.3 1. ]
[5.6 2.7 4.2 1.3]
[5.7 3. 4.2 1.2]
[5.7 2.9 4.2 1.3]
[6.2 2.9 4.3 1.3]
[5.1 2.5 3. 1.1]
[5.7 2.8 4.1 1.3]
[6.3 3.3 6. 2.5]
[5.8 2.7 5.1 1.9]
[7.1 3. 5.9 2.1]
[6.3 2.9 5.6 1.8]
[6.5 3. 5.8 2.2]
[7.6 3. 6.6 2.1]
[4.9 2.5 4.5 1.7]
[7.3 2.9 6.3 1.8]
[6.7 2.5 5.8 1.8]
[7.2 3.6 6.1 2.5]
[6.5 3.2 5.1 2. ]
[6.4 2.7 5.3 1.9]
[6.8 3. 5.5 2.1]
[5.7 2.5 5. 2. ]
[5.8 2.8 5.1 2.4]
[6.4 3.2 5.3 2.3]
[6.5 3. 5.5 1.8]
[7.7 3.8 6.7 2.2]
[7.7 2.6 6.9 2.3]
[6. 2.2 5. 1.5]
[6.9 3.2 5.7 2.3]
[5.6 2.8 4.9 2. ]
[7.7 2.8 6.7 2. ]
[6.3 2.7 4.9 1.8]
[6.7 3.3 5.7 2.1]
[7.2 3.2 6. 1.8]
[6.2 2.8 4.8 1.8]
[6.1 3. 4.9 1.8]
[6.4 2.8 5.6 2.1]
[7.2 3. 5.8 1.6]
[7.4 2.8 6.1 1.9]
[7.9 3.8 6.4 2. ]
[6.4 2.8 5.6 2.2]
[6.3 2.8 5.1 1.5]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[6.3 3.4 5.6 2.4]
[6.4 3.1 5.5 1.8]
[6. 3. 4.8 1.8]
[6.9 3.1 5.4 2.1]
[6.7 3.1 5.6 2.4]
[6.9 3.1 5.1 2.3]
[5.8 2.7 5.1 1.9]
[6.8 3.2 5.9 2.3]
[6.7 3.3 5.7 2.5]
[6.7 3. 5.2 2.3]
[6.3 2.5 5. 1.9]
[6.5 3. 5.2 2. ]
[6.2 3.4 5.4 2.3]
[5.9 3. 5.1 1.8]]
print("Values of key 'target' of iris:n{}".format(iris.get('target'))) 1
运行结果:
调用程序
print("Values of key 'target_names' of iris:n{}".format(iris.get('target_names'))) 1
调运程序
print("Values of key 'feature_names' of iris:n{}".format(iris.get('feature_names'))) 1
运行结果:
调用程序
print("Values of key 'filename' of iris:n{}".format(iris.get('filename'))) 1
运行结果:
调用程序
print("Values of key 'frame' of iris:n{}".format(iris.get('frame'))) 1
运行结果
运行程序:
print("Values of key 'data_module' of iris:n{}".format(iris.get('data_module'))) 1
运行结果:
四.鸢尾花数据绘图
4.1 花萼数据的散点图绘制运行代码:
## 1. 从sklearn中加载数据集datasets from sklearn import datasets ## 2.取出datasets数据集中的鸢尾花数据赋值给iris iris = datasets.load_iris() #iris为类似字典类型数据 ## 3.打印iris所有键名 print("Keys of iris:n{}".format(iris.keys())) # 输出为: #Keys of iris: #dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module']) # 4.打印输出键名为data所对应的键值 print("Values of key 'DESCR' of iris:n{}".format(iris.get('DESCR'))) print("Values of key 'data' of iris:n{}".format(iris.get('data'))) print("Values of key 'target' of iris:n{}".format(iris.get('target'))) print("Values of key 'target_names' of iris:n{}".format(iris.get('target_names'))) print("Values of key 'feature_names' of iris:n{}".format(iris.get('feature_names'))) print("Values of key 'filename' of iris:n{}".format(iris.get('filename'))) print("Values of key 'data_module' of iris:n{}".format(iris.get('data_module'))) print("Values of key 'frame' of iris:n{}".format(iris.get('frame'))) ## 5.绘花萼图 import matplotlib.pyplot as plt #使用缩减的plt代替matplotlib fig1, ax1 = plt.subplots() #将 plt.subplots()赋值于fig1和ax scatter1 = ax1.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target) ax1.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1]) ## 6.加图例 _ = ax1.legend(scatter1.legend_elements()[0], iris.target_names, loc="lower right", title="Classes") plt.show() #图显示 12345678910111213141516171819202122232425262728293031
运行结果:
图8 花萼的长宽尺寸散点图
图9 运行过程数据输出
运行代码:
## 1. 从sklearn中加载数据集datasets from sklearn import datasets ## 2.取出datasets数据集中的鸢尾花数据赋值给iris iris = datasets.load_iris() #iris为类似字典类型数据 ## 3.打印iris所有键名 print("Keys of iris:n{}".format(iris.keys())) # 输出为: #Keys of iris: #dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module']) # 4.打印输出键名为data所对应的键值 print("Values of key 'DESCR' of iris:n{}".format(iris.get('DESCR'))) print("Values of key 'data' of iris:n{}".format(iris.get('data'))) print("Values of key 'target' of iris:n{}".format(iris.get('target'))) print("Values of key 'target_names' of iris:n{}".format(iris.get('target_names'))) print("Values of key 'feature_names' of iris:n{}".format(iris.get('feature_names'))) print("Values of key 'filename' of iris:n{}".format(iris.get('filename'))) print("Values of key 'data_module' of iris:n{}".format(iris.get('data_module'))) print("Values of key 'frame' of iris:n{}".format(iris.get('frame'))) ## 5.绘花瓣图 import matplotlib.pyplot as plt #使用缩减的plt代替matplotlib fig2, ax2 = plt.subplots() #将 plt.subplots()赋值于fig1和ax scatter2 = ax2.scatter(iris.data[:, 2], iris.data[:, 3], c=iris.target) ax2.set(xlabel=iris.feature_names[2], ylabel=iris.feature_names[3]) ## 6.加图附 _ = ax2.legend(scatter2.legend_elements()[0], iris.target_names, loc="lower right", title="Classes") plt.show() #图显示 12345678910111213141516171819202122232425262728293031
运行结果:
图10 花瓣的长宽尺寸散点图
图11 运行过程数据输出
五、总结
鸢尾花数据集在sklearn的机器学习中有重要应用,深入分析鸢尾花iris数据内容和机构,以及一种的load_iris函数的学习举例,其他函数等查询和学习可以举一反三,为掌握和直观理解分类问题走好第二步。
相关知识
sklearn学习之用matplotlib绘制鸢尾花(Iris)数据集的两个特征:花萼的长度和宽度
基于Logistic回归模型对鸢尾花数据集的线性多分类
python 鸢尾花数据集下载
python 怎么加载鸢尾花数据
分析鸢尾花数据集
决策树可视化:鸢尾花数据集分类(附代码数据集)
使用鸢尾花数据集构建神经网络模型
对鸢尾花数据集和月亮数据集,分别采用LDA、k
机器学习——鸢尾花数据集的线性多分类
【机器学习】基于KNN算法实现鸢尾花数据集的分类
网址: 对sklearn库中的鸢尾花数据集内容和结构的详解认识和load https://www.huajiangbk.com/newsview1548663.html
上一篇: 【数据分享】2024年道路数据( |
下一篇: 用Python实现简单机器学习模 |
推荐分享

- 1君子兰什么品种最名贵 十大名 4012
- 2世界上最名贵的10种兰花图片 3364
- 3花圈挽联怎么写? 3286
- 4迷信说家里不能放假花 家里摆 1878
- 5香山红叶什么时候红 1493
- 6花的意思,花的解释,花的拼音 1210
- 7教师节送什么花最合适 1167
- 8勿忘我花图片 1103
- 9橄榄枝的象征意义 1093
- 10洛阳的市花 1039