首页 分享 AlexNet

AlexNet

来源:花匠小妙招 时间:2025-08-14 11:39

一、AlexNet网络结构

二、model.py

搭建网络详解

1.nn.Sequential()

与demo1不同,这里使用的nn.Sequential()可以将一系列结构打包成新的结构,用这种结构的好处主要是可以简化代码

2.pytorch卷积函数Conv2d()里面的padding参数

padding = 1表示上下左右各自补一行一列0

padding = (2, 1)表示上下两行各自补两行零,左右两列分别补1列0

要想实现更高阶的padding,可以使用nn.Zero.Pad2d()方法,例如nn.Zero.Pad2d((1,2,1,2))表示上面补1行,下面补两行,左边补一列,右边补两列

事实上,当padding完进行卷积操作时,如果不能整除,pytorch会舍弃掉右边和下边的列行

3.nn.ReLU(inplace = True)

可以理解为增加计算量,减少内存的一种方法

4.nn.Dropout()

减小参数数目,防止过拟合,里面参数代表随机失活的比列

5.isinstance()

判断对象是否为一个已知的类型

model.py代码

import torch.nn as nn

import torch

class AlexNet(nn.Module):

def __init__(self, num_classes=1000, init_weights=False):

super(AlexNet, self).__init__()

self.features = nn.Sequential(

nn.Conv2d(3, 48, kernel_size=11, padding=2),

nn.ReLU(inplace=True),

nn.MaxPool2d(kernel_size=3, stride=2),

nn.Conv2d(48, 128, kernel_size=5, padding=2),

nn.ReLU(inplace=True),

nn.MaxPool2d(kernel_size=3, padding=2),

nn.Conv2d(128, 192, kernel_size=3, padding=1),

nn.ReLU(inplace=True),

nn.Conv2d(192, 192, kernel_size=3, padding=1),

nn.ReLU(inplace=True),

nn.Conv2d(192, 128, kernel_size=3, padding=1),

nn.ReLU(inplace=True),

nn.MaxPool2d(kernel_size=3, stride=2),

)

self.classifier = nn.Sequential(

nn.Dropout(0.5),

nn.Linear(128 * 6 * 6, 2048),

nn.ReLU(inplace=True),

nn.Dropout(0.5),

nn.Linear(2048, 2048),

nn.ReLU(inplace=True),

nn.Linear(2048, num_classes),

)

if init_weights:

self._initialize_weights()

def forward(self, x):

x = self.features(x)

x = torch.flatten(x, start_dim=1)

self.classifier(x)

return x

def _initialize_weights(self):

for m in self.modules():

if isinstance(m, nn.Conv2d):

nn.init.kaiming_normal_(m.weight, mode='fan_out')

if m.bias is not None:

nn.init.constant_(m.bias, 0)

elif isinstance(m, nn.Linear):

nn.init.normal_(m.weight, 0, 0.01)

nn.init.constant_(m.bias, 0)

python

运行

三、train.py

1.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

作用:如果有GPU,调用GPU训练,没有则继续用CPU训练

2.dataset = data.ImageFolder(root ,transform)

root为图像根目录,transform预处理函数,返回的dataset有三种属性

1)dataset.classes:用一个 list 保存类别名称

2)dataset.class_to_idx:类别对应的索引,与类别对应,是一个字典

3)dataset:保存(img-path, class) tuple的 list

print(dataset.classes)

print(dataset.class_to_idx)

print(dataset.imgs)

输出:

['cat', 'dog']

{'cat': 0, 'dog': 1}

[('./data/traincat1.jpg', 0),

('./data/traincat2.jpg', 0),

('./data/traindog1.jpg', 1),

('./data/traindog2.jpg', 1)]

python

运行

3

json_str = json.dumps(cla_list, indent=4)

with open('class_indices.json', 'w') as json_file:

json_file.write(json_str)

python

运行

作用:把字典编码成json格式,并保存,indent表示每一个类别前空四个字符

4.torchvision.utils.make_grid(images, padding=0)

作用:将多张图片拼接为1张图片

padding = 0

padding = 5

5.net.train()和net.eval()方法来管理dropout的使用

pytorch存在两种模式,train()模式与eval()模式,分别用来训练和验证,一般情况下这两种模式是一样的,只有存在dropout和batchnorm才有区别。

在训练时,如果模型中存在BN和dropout,需要在训练时添加net.train(),用来启用 batch normalization 和 dropout 。

model.train()是保证 BN 层能够用到 每一批数据 的均值和方差。对于 Dropout,model.train() 是 随机取一部分 网络连接来训练更新参数。

6.

rate = (step + 1) / len(train_loader)

a = "*" * int(rate * 50)

b = "." * int((1-rate) * 50)

print("rtrainloss:{:^3.0f}%[{}->{}]{:.3f}".format(int(rate * 100),a,b,loss),end="")

python

运行

作用:训练进度条显示

r可以使r之后的语句直接打印出来,从而可以不用去管转义字符的问题。

train.py

import torch

import torch.nn as nn

from torchvision import transforms, datasets, utils

import matplotlib.pyplot as plt

import numpy as np

import torch.optim as optim

from model import AlexNet

import os

import time

import json

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(device)

data_transform = {

"train": transforms.Compose([transforms.RandomResizedCrop(224),

transforms.RandomHorizontalFlip(),

transforms.ToTensor(),

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),

"val": transforms.Compose([transforms.Resize((224, 224)),

transforms.ToTensor(),

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))

image_root = data_root + "/data_set/flower_data"

train_dataset = datasets.ImageFolder(root=image_root + "/train",

transform=data_transform["train"])

train_num = len(train_dataset)

flower_list = train_dataset.class_to_idx

cla_list = dict((val, key) for key, val in flower_list.items())

json_str = json.dumps(cla_list, indent=4)

with open('class_indices.json', 'w') as json_file:

json_file.write(json_str)

batch = 32

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch, shuffle=True, num_workers=0)

validate_dataset = datasets.ImageFolder(root=image_root + "/val",

transform=data_transform["val"])

val_num = len(validate_dataset)

val_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch, shuffle=False, num_workers=0)

test_data_iter = iter(val_loader)

test_image, test_label = test_data_iter.next()

net = AlexNet(num_classes=5, init_weights=True)

net.to(device)

loss_function = nn.CrossEntropyLoss()

optimizer = optim.Adam(net.parameters(), lr=0.0002)

save_path = './AlexNet_gpu.pth'

best_acc = 0

for epoch in range(10):

net.train()

running_loss = 0

t1 = time.perf_counter()

for step, data in enumerate(train_loader, start=0):

images, labels = data

optimizer.zero_grad()

output = net(images.to(device))

loss = loss_function(output, labels.to(device))

loss.backward()

optimizer.step()

running_loss += loss.item()

rate = (step + 1) / len(train_loader)

a = "*" * int(rate * 50)

b = "." * int((1-rate) * 50)

print("rtrain loss: {:^3.0f}%[{}->{}] {:.3f}".format(int(rate * 100), a, b, loss), end="")

print()

print(time.perf_counter() - t1)

net.eval()

acc = 0.0

with torch.no_grad():

for data_set in val_loader:

test_images, test_labels = data_set

outputs = net(test_images.to(device))

predict_y = torch.max(outputs, dim=1)[1]

acc += (predict_y == test_labels.to(device)).sum().item()

accurate = acc / val_num

if accurate > best_acc:

best_acc = accurate

torch.save(net.state_dict(), save_path)

print('[epoch %d] train_loss:%.3f test_accuracy: %.3f' %

(epoch + 1, running_loss / step, acc / val_num))

print("Finished Training")

python

运行

四、predict.py

1.torch.unsqueeze()

作用:增加维度,因为tensor都是四维张量

2.

with torch.no_grad():

output = torch.squeeze(net(im))

predict = torch.softmax(output, dim=0)

predict_cla = torch.argmax(predict).numpy()

print(class_indict[str(predict_cla)], predict[predict_cla].item())

python

运行

这段代码主要是:

1)先用torch.no_grad()来禁用梯度计算

2)torch.squeeze舍去一个维度

3)将输出用softmax函数计算

4)判断softmax函数得到的值的最大值,返回转化为numpy类型的索引

5)打印字典中对应的类别,同时打印预测的概率,.item()表示取标量值

predict.py

import torch

import torchvision.transforms as transforms

from PIL import Image

from model import AlexNet

import matplotlib.pyplot as plt

import json

import os

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

transform = transforms.Compose([transforms.Resize((224, 224)),

transforms.ToTensor(),

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

im = Image.open('test.jpg')

plt.imshow(im)

im = transform(im)

im = torch.unsqueeze(im, dim=0)

try:

json_file = open('./class_indices.json', 'r')

class_indict = json.load(json_file)

except Exception as e:

print(e)

exit(-1)

net = AlexNet(num_classes=5)

net.load_state_dict(torch.load('AlexNet_gpu.pth'))

net.eval()

with torch.no_grad():

output = torch.squeeze(net(im))

predict = torch.softmax(output, dim=0)

predict_cla = torch.argmax(predict).numpy()

print(class_indict[str(predict_cla)], predict[predict_cla].item())

plt.show()

python

运行

宝藏博主视频连接:

3.2 使用pytorch搭建AlexNet并训练花分类数据集_哔哩哔哩_bilibili

相关知识

AlexNet花分类实践
Pytorch之AlexNet花朵分类
AlexNet实现花卉识别
AlexNet pytorch代码注释
基于Alexnet的植物病斑识别
基于AlexNet的农作物病虫害识别研究
AlexNet网络的搭建以及训练花分类
深度学习实战:AlexNet实现花图像分类
AlexNet网络详解(实现花的种类识别)
深度学习实战(二):AlexNet实现花图像分类

网址: AlexNet https://www.huajiangbk.com/newsview2259259.html

所属分类:花卉
上一篇: 四棱草属比较形态及其分类系统位置
下一篇: 萱草属植物分类学研究进展

推荐分享