超参数调优
预计时间:25分钟
本教程介绍如何系统地调优模型超参数以获得最佳性能。
关键超参数
1. 学习率(Learning Rate)
最重要的超参数,影响训练速度和最终性能。
推荐范围:
- Adam: 1e-4 到 1e-3
- SGD: 1e-2 到 1e-1
- AdamW: 1e-5 到 1e-3
调优策略:
python
# 方法 1:学习率范围测试
learning_rates = [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3]
for lr in learning_rates:
config.training.optimizer.lr = lr
# 训练几个 epoch 观察损失yaml
# 方法 2:使用学习率调度器
training:
optimizer:
name: "adam"
lr: 0.001
scheduler:
name: "cosine"
warmup_epochs: 5
min_lr: 1e-62. 批次大小(Batch Size)
影响训练稳定性和内存使用。
推荐范围:
- 小数据集(<5k): 16-32
- 中等数据集(5k-50k): 32-64
- 大数据集(>50k): 64-128
权衡:
- 更大批次:更稳定,但可能泛化性差
- 更小批次:更好泛化,但训练不稳定
yaml
data:
batch_size: 32
# 如果内存不足,使用梯度累积
training:
gradient_accumulation_steps: 2 # 等效批次大小 = 32 * 2 = 643. 优化器选择
Adam(推荐):
yaml
training:
optimizer:
name: "adam"
lr: 0.001
betas: [0.9, 0.999]
weight_decay: 0.0001AdamW(更好的正则化):
yaml
training:
optimizer:
name: "adamw"
lr: 0.0001
weight_decay: 0.01 # 更强的正则化SGD(需要更多调优):
yaml
training:
optimizer:
name: "sgd"
lr: 0.01
momentum: 0.9
weight_decay: 0.00014. 数据增强
基础增强:
yaml
data:
augmentation:
random_flip: true
random_rotation: 15
random_scale: [0.9, 1.1]
color_jitter: 0.2强增强(大数据集):
yaml
data:
augmentation:
random_flip: true
random_rotation: 30
random_scale: [0.8, 1.2]
color_jitter: 0.4
random_erasing: 0.2
mixup_alpha: 0.2弱增强(小数据集):
yaml
data:
augmentation:
random_flip: true
random_rotation: 10
normalize: true5. 正则化
Dropout:
yaml
model:
fusion:
dropout: 0.3 # 0.1-0.5
head:
dropout: 0.5 # 分类头通常更高Weight Decay:
yaml
training:
optimizer:
weight_decay: 0.0001 # 1e-5 到 1e-3Label Smoothing:
yaml
training:
label_smoothing: 0.1 # 0.0-0.2系统调优流程
阶段 1:粗调(Coarse Tuning)
目标:快速找到合理范围
python
# 1. 固定其他参数,只调学习率
learning_rates = [1e-5, 1e-4, 1e-3, 1e-2]
epochs = 10 # 快速测试
# 2. 选择最佳学习率后,调批次大小
batch_sizes = [16, 32, 64]
# 3. 选择优化器
optimizers = ["adam", "adamw", "sgd"]阶段 2:细调(Fine Tuning)
目标:在最佳范围内精确搜索
python
# 在最佳学习率附近搜索
best_lr = 1e-3
learning_rates = [best_lr * 0.5, best_lr, best_lr * 2]
# 调整正则化
weight_decays = [1e-5, 1e-4, 1e-3]
dropouts = [0.1, 0.3, 0.5]阶段 3:微调(Micro Tuning)
目标:榨取最后的性能
python
# 学习率调度器
schedulers = ["cosine", "step", "exponential"]
# 数据增强强度
augmentation_strengths = ["weak", "medium", "strong"]使用 Optuna 自动调优
安装 Optuna
bash
uv pip install optuna基本示例
python
import optuna
from med_core.models import MultiModalModelBuilder
from med_core.trainers import MultimodalTrainer
def objective(trial):
# 定义搜索空间
lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
dropout = trial.suggest_float("dropout", 0.1, 0.5)
weight_decay = trial.suggest_float("weight_decay", 1e-5, 1e-3, log=True)
# 创建配置
config = load_config("configs/starter/quickstart.yaml")
config.training.optimizer.lr = lr
config.data.batch_size = batch_size
config.model.fusion.dropout = dropout
config.training.optimizer.weight_decay = weight_decay
# 训练模型
model = build_model_from_config(config)
trainer = MultimodalTrainer(model, train_loader, val_loader, config)
history = trainer.train()
# 返回验证指标
return history['val_auc'][-1]
# 创建研究
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)
# 查看最佳参数
print("最佳参数:", study.best_params)
print("最佳 AUC:", study.best_value)高级搜索空间
python
def objective(trial):
# 条件搜索
optimizer_name = trial.suggest_categorical("optimizer", ["adam", "adamw", "sgd"])
if optimizer_name == "sgd":
lr = trial.suggest_float("lr", 1e-3, 1e-1, log=True)
momentum = trial.suggest_float("momentum", 0.8, 0.99)
else:
lr = trial.suggest_float("lr", 1e-5, 1e-3, log=True)
momentum = None
# 骨干网络选择
backbone = trial.suggest_categorical("backbone",
["resnet18", "resnet50", "efficientnet_b0"])
# 融合策略
fusion_type = trial.suggest_categorical("fusion_type",
["concat", "gated", "attention"])
# 数据增强
aug_strength = trial.suggest_float("aug_strength", 0.0, 1.0)
# ... 构建和训练模型 ...
return val_auc可视化结果
python
# 优化历史
optuna.visualization.plot_optimization_history(study)
# 参数重要性
optuna.visualization.plot_param_importances(study)
# 并行坐标图
optuna.visualization.plot_parallel_coordinate(study)
# 切片图
optuna.visualization.plot_slice(study)使用 Ray Tune
安装 Ray Tune
bash
uv pip install ray[tune]基本示例
python
from ray import tune
from ray.tune.schedulers import ASHAScheduler
def train_model(config):
# 使用 config 训练模型
model = build_model_from_config(config)
trainer = MultimodalTrainer(model, train_loader, val_loader, config)
for epoch in range(config["epochs"]):
metrics = trainer.train_epoch()
# 报告指标给 Ray Tune
tune.report(
loss=metrics["val_loss"],
accuracy=metrics["val_acc"],
auc=metrics["val_auc"]
)
# 定义搜索空间
config = {
"lr": tune.loguniform(1e-5, 1e-2),
"batch_size": tune.choice([16, 32, 64]),
"dropout": tune.uniform(0.1, 0.5),
"weight_decay": tune.loguniform(1e-5, 1e-3),
}
# 使用 ASHA 调度器(早停表现差的试验)
scheduler = ASHAScheduler(
metric="auc",
mode="max",
max_t=50,
grace_period=10,
reduction_factor=2
)
# 运行调优
analysis = tune.run(
train_model,
config=config,
num_samples=50,
scheduler=scheduler,
resources_per_trial={"cpu": 4, "gpu": 1}
)
# 获取最佳配置
best_config = analysis.get_best_config(metric="auc", mode="max")
print("最佳配置:", best_config)网格搜索 vs 随机搜索
网格搜索
适合:参数空间小,需要完整探索
python
from itertools import product
# 定义参数网格
param_grid = {
'lr': [1e-4, 1e-3, 1e-2],
'batch_size': [16, 32, 64],
'dropout': [0.1, 0.3, 0.5]
}
# 生成所有组合
results = []
for lr, bs, dropout in product(*param_grid.values()):
config.training.optimizer.lr = lr
config.data.batch_size = bs
config.model.fusion.dropout = dropout
# 训练并记录结果
auc = train_and_evaluate(config)
results.append({
'lr': lr,
'batch_size': bs,
'dropout': dropout,
'auc': auc
})
# 找到最佳组合
best = max(results, key=lambda x: x['auc'])随机搜索
适合:参数空间大,资源有限
python
import numpy as np
# 随机采样
n_trials = 50
results = []
for _ in range(n_trials):
# 随机采样参数
lr = 10 ** np.random.uniform(-5, -2)
batch_size = np.random.choice([16, 32, 64])
dropout = np.random.uniform(0.1, 0.5)
weight_decay = 10 ** np.random.uniform(-5, -3)
# 训练并记录
config.training.optimizer.lr = lr
config.data.batch_size = batch_size
config.model.fusion.dropout = dropout
config.training.optimizer.weight_decay = weight_decay
auc = train_and_evaluate(config)
results.append({
'lr': lr,
'batch_size': batch_size,
'dropout': dropout,
'weight_decay': weight_decay,
'auc': auc
})学习率查找器
实现学习率范围测试
python
import torch
import matplotlib.pyplot as plt
def find_lr(model, train_loader, optimizer, criterion,
start_lr=1e-7, end_lr=10, num_iter=100):
"""
学习率范围测试
"""
model.train()
lr_mult = (end_lr / start_lr) ** (1 / num_iter)
lr = start_lr
optimizer.param_groups[0]['lr'] = lr
lrs = []
losses = []
best_loss = float('inf')
for i, (inputs, targets) in enumerate(train_loader):
if i >= num_iter:
break
# 前向传播
outputs = model(inputs)
loss = criterion(outputs, targets)
# 记录
lrs.append(lr)
losses.append(loss.item())
# 如果损失爆炸,停止
if loss.item() > 4 * best_loss:
break
if loss.item() < best_loss:
best_loss = loss.item()
# 反向传播
optimizer.zero_grad()
loss.backward()
optimizer.step()
# 增加学习率
lr *= lr_mult
optimizer.param_groups[0]['lr'] = lr
# 绘制曲线
plt.figure(figsize=(10, 6))
plt.plot(lrs, losses)
plt.xscale('log')
plt.xlabel('Learning Rate')
plt.ylabel('Loss')
plt.title('Learning Rate Finder')
plt.savefig('lr_finder.png')
# 推荐学习率(损失下降最快的点)
gradients = np.gradient(losses)
recommended_lr = lrs[np.argmin(gradients)]
print(f"推荐学习率: {recommended_lr:.2e}")
return lrs, losses, recommended_lr
# 使用
lrs, losses, best_lr = find_lr(model, train_loader, optimizer, criterion)实用技巧
1. 从已知良好配置开始
yaml
# 医学影像分类的良好起点
training:
optimizer:
name: "adam"
lr: 0.0001
weight_decay: 0.0001
scheduler:
name: "cosine"
warmup_epochs: 5
data:
batch_size: 32
augmentation:
random_flip: true
random_rotation: 15
normalize: true
model:
vision:
backbone: "resnet50"
pretrained: true
fusion:
type: "gated"
dropout: 0.32. 使用早停节省时间
yaml
training:
early_stopping:
enabled: true
patience: 5 # 调优时使用更小的 patience
min_delta: 0.0013. 记录所有实验
python
import json
# 保存实验结果
experiment_log = {
'config': config.to_dict(),
'results': {
'val_auc': val_auc,
'val_acc': val_acc,
'train_time': train_time
}
}
with open(f'experiments/exp_{timestamp}.json', 'w') as f:
json.dump(experiment_log, f, indent=2)4. 并行运行多个实验
bash
# 使用不同 GPU 并行运行
CUDA_VISIBLE_DEVICES=0 uv run medfusion train --config config1.yaml &
CUDA_VISIBLE_DEVICES=1 uv run medfusion train --config config2.yaml &
CUDA_VISIBLE_DEVICES=2 uv run medfusion train --config config3.yaml &
wait常见问题
Q1: 调优顺序是什么?
A: 推荐顺序:
- 学习率(最重要)
- 批次大小
- 优化器
- 正则化(dropout, weight decay)
- 数据增强
- 学习率调度器
Q2: 需要调多少次?
A: 取决于资源:
- 最少:10-20 次试验
- 推荐:50-100 次试验
- 充足资源:200+ 次试验
Q3: 如何判断是否过拟合?
A: 观察训练和验证曲线:
python
if train_acc > 0.95 and val_acc < 0.80:
print("过拟合!增加正则化")
# 增加 dropout, weight_decay
# 或减少模型复杂度Q4: 调优后性能反而下降?
A: 可能原因:
- 随机性:多次运行取平均
- 验证集太小:使用交叉验证
- 过度调优:在验证集上过拟合