PyBrain-テストネットワーク

この章では、データをトレーニングし、トレーニングしたデータのエラーをテストする例をいくつか見ていきます。

私たちはトレーナーを利用するつもりです-

BackpropTrainer

BackpropTrainerは、エラーを（時間を通して）バックプロパゲーションすることにより、教師付きまたはClassificationDataSetデータセット（潜在的にシーケンシャル）に従ってモジュールのパラメーターをトレーニングするトレーナーです。

TrainUntilConvergence

収束するまで、データセットでモジュールをトレーニングするために使用されます。

ニューラルネットワークを作成すると、与えられたトレーニングデータに基づいてトレーニングされますが、ネットワークが適切にトレーニングされているかどうかは、そのネットワークでテストされたテストデータの予測に依存します。

ニューラルネットワークを構築し、トレーニングエラー、テストエラー、検証エラーを予測する実際の例を順を追って見てみましょう。

ネットワークをテストする

以下は、ネットワークをテストするために従う手順です-

必要なPyBrainおよびその他のパッケージのインポート
ClassificationDataSetを作成する
データセットをテストデータとして25％、トレーニング済みデータとして75％に分割
TestDataとトレーニングデータをClassificationDataSetに変換して戻す
ニューラルネットワークの作成
ネットワークのトレーニング
エラーと検証データの視覚化
テストデータエラーの割合

ステップ1

必要なPyBrainおよびその他のパッケージをインポートします。

必要なパッケージは、以下に示すようにインポートされます-

from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel

ステップ2

次のステップは、ClassificationDataSetを作成することです。

データセットの場合、以下に示すように、sklearnデータセットのデータセットを使用します-

以下のリンクでsklearnのload_digitsデータセットを参照してください-

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digitsl#sklearn.datasets.load_digits

digits = datasets.load_digits()
X, y = digits.data, digits.target

ds = ClassificationDataSet(64, 1, nb_classes=10)
# we are having inputs are 64 dim array and since the digits are from 0-9 the
classes considered is 10.

for i in range(len(X)):
   ds.addSample(ravel(X[i]), y[i]) # adding sample to datasets

ステップ3 *

データセットをテストデータとして25％、トレーニング済みデータとして75％に分割-

test_data_temp, training_data_temp = ds.splitWithProportion(0.25)

そのため、ここでは、値が0.25のsplitWithProportion（）というデータセットのメソッドを使用しました。これは、データセットをテストデータとして25％、トレーニングデータとして75％に分割します。

ステップ4 *

TestdataとトレーニングデータをClassificationDataSetに変換します。

test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
   test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)

for n in range(0, training_data_temp.getLength()):

training_data.addSample(
   training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
)
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()

データセットでsplitWithProportion（）メソッドを使用すると、データセットがsuperviseddatasetに変換されるため、上記の手順に示すように、データセットを再度classificationdatasetに変換します。

ステップ5 *

次のステップは、ニューラルネットワークの作成です。

net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)

入力と出力がトレーニングデータから使用されるネットワークを作成しています。

ステップ6 *

ネットワークのトレーニング

今、重要な部分は、以下に示すように、データセット上のネットワークを訓練することです-

trainer = BackpropTrainer(net, dataset=training_data,
momentum=0.1,learningrate=0.01,verbose=True,weightdecay=0.01)

BackpropTrainer（）メソッドを使用し、作成されたネットワーク上のデータセットを使用しています。

ステップ7 *

次のステップは、エラーの可視化とデータの検証です。

trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()

エポック10で収束するトレーニングデータでtrainUntilConvergenceというメソッドを使用します。以下に示すようにプロットしたトレーニングエラーと検証エラーを返します。青い線はトレーニングエラーを示し、赤い線は検証エラーを示します。

トレーニングデータ

上記のコードの実行中に受信した合計エラーは以下に示されています-

Total error: 0.0432857814358
Total error: 0.0222276374185
Total error: 0.0149012052174
Total error: 0.011876985318
Total error: 0.00939854792853
Total error: 0.00782202445183
Total error: 0.00714707652044
Total error: 0.00606068893793
Total error: 0.00544257958975
Total error: 0.00463929281336
Total error: 0.00441275665294
('train-errors:', '[0.043286 , 0.022228 , 0.014901 , 0.011877 , 0.009399 , 0.007
822 , 0.007147 , 0.006061 , 0.005443 , 0.004639 , 0.004413 ]')
('valid-errors:', '[0.074296 , 0.027332 , 0.016461 , 0.014298 , 0.012129 , 0.009
248 , 0.008922 , 0.007917 , 0.006547 , 0.005883 , 0.006572 , 0.005811 ]')

エラーは0.04から始まり、エポックごとに低下します。つまり、ネットワークはトレーニングされ、エポックごとに改善されます。

ステップ8 *

テストデータエラーの割合

以下に示すpercentErrorメソッドを使用して、パーセントエラーを確認できます-

print('Percent Error on
   testData:',percentError(trainer.testOnClassData(dataset=test_data),
   test_data['class']))

testDataのエラー率*-3.34075723830735

エラーの割合、つまり3.34％を取得しています。これは、ニューラルネットワークの精度が97％であることを意味します。

以下は完全なコードです-

from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
digits = datasets.load_digits()
X, y = digits.data, digits.target

ds = ClassificationDataSet(64, 1, nb_classes=10)

for i in range(len(X)):
   ds.addSample(ravel(X[i]), y[i])

test_data_temp, training_data_temp = ds.splitWithProportion(0.25)
test_data = ClassificationDataSet(64, 1, nb_classes=10)
for n in range(0, test_data_temp.getLength()):
   test_data.addSample( test_data_temp.getSample(n)[0], test_data_temp.getSample(n)[1] )
training_data = ClassificationDataSet(64, 1, nb_classes=10)

for n in range(0, training_data_temp.getLength()):
   training_data.addSample(
      training_data_temp.getSample(n)[0], training_data_temp.getSample(n)[1]
   )
test_data._convertToOneOfMany()
training_data._convertToOneOfMany()

net = buildNetwork(training_data.indim, 64, training_data.outdim, outclass=SoftmaxLayer)
trainer = BackpropTrainer(
   net, dataset=training_data, momentum=0.1,
   learningrate=0.01,verbose=True,weightdecay=0.01
)
trnerr,valerr = trainer.trainUntilConvergence(dataset=training_data,maxEpochs=10)
plt.plot(trnerr,'b',valerr,'r')
plt.show()

trainer.trainEpochs(10)
print('Percent Error on testData:',percentError(
   trainer.testOnClassData(dataset=test_data), test_data['class']
))

Pybrain-testing-network