I am predicting the curve fitting. I tried many machine learning algorithms such as random forest regression, MLP regression, SVR but I am getting the best result with Decision tree regressor. my plots for training and testing are:
My question is, can the testing accuracy be improved more. or is my result is good. I tried to change hyperparameters in each ML algorithms mentioned above. But could not get better result than this. data is fixed I can not have more data. please help.
Here is my code.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
np.random.seed(1337)
data = pd.read_csv('C:/Users/....../..../..../plot.csv',header=None)
x = data.iloc[:,0:4].values
y = data.iloc[:,4:5].values
from sklearn.model_selection import train_test_split
x11, xtt1, y11, yt1 = train_test_split(x, y, test_size=0.2,
random_state=None, shuffle=False)
y11=y11.reshape(-1,1)
scaler2 = RobustScaler()
y1 = scaler2.fit_transform(y11)
k=x11[:,-1]
kk=xtt1[:,-1]
clf1 = DecisionTreeRegressor(random_state=None, criterion='mse',
splitter='best', max_depth=None, min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None, presort='deprecated', ccp_alpha=0.0)
clf1.fit(x11, y1)
test_y = clf1.predict(x11)
test_y=test_y.reshape(-1,1)
Y = scaler2.inverse_transform(test_y)
y11=np.squeeze(y11)
rm=mean_squared_error(y1, test_y, squared=False)
print(rm)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(k, y11, s=25, c='b', marker="s", label='real')
ax1.scatter(k,Y, s=25, c='r', marker="o", label='NN Prediction')
ax1.set_yscale('log')
plt.legend();
plt.show()
ypredict = clf1.predict(xtt1)
ypredict=ypredict.reshape(-1,1)
Yt = scaler2.inverse_transform(ypredict)
rm=mean_squared_error(yt1, Yt, squared=False)
print(rm)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(kk, yt1, s=15, c='b', marker="s", label='real')
ax1.scatter(kk,Yt, s=15, c='r', marker="o", label='NN Prediction')
ax1.set_yscale('log')
plt.legend();
plt.show()
I have four inputs here: I1 input has 50 different values. I2 has 3 different values. I3 has one single value and I4 has two different values. total data points are 250. I used 200 for training and 50 for testing.
plot is between output vs I1 for different values of I2, i3 and I4.
Aucun commentaire:
Enregistrer un commentaire