testing: How to split into train and test data in regression?

mercredi 18 mars 2020

How to split into train and test data in regression?

I have an application for regression spline forecasting. I have split data into train and test but i don't know how to train the model to only use 30% of data as test and the rest as train. So far, the algorithm of regression is using all data for Co2 values from the dataframe group_by_df. I have attached the code: https://pastebin.com/18ZzMxwK

The part that does the fit to the model:

# calculate 25%,50% and 75% percentiles
percentile_25 = np.percentile(group_by_df['day'], 25)
percentile_50 = np.percentile(group_by_df['day'], 50)
percentile_75 = np.percentile(group_by_df['day'], 75)

for count, degree in enumerate([i + 1 for i in range(0, max_grade)]):
    # Specifying 3 knots for regression spline
    transformed_x1 = dmatrix(
        "bs(group_by_df.day, knots=(percentile_25,percentile_50,percentile_75), degree=degree, include_intercept=False)",
        {"group_by_df.day": group_by_df.day}, return_type='dataframe')
    # build a regular linear model from the splines
    fit_spline = sm.GLM(group_by_df[sensor_name], transformed_x1).fit()
    # make predictions
    pred_spline = fit_spline.predict(transformed_x1)

I have followed this tutorial:http://www.science.smith.edu/~jcrouser/SDS293/labs/2016/lab13/Lab%2013%20-%20Splines%20in%20Python.pdf and applied it to my application, but the tutorial doesn't split data into train and test.

testing

mercredi 18 mars 2020

How to split into train and test data in regression?

Aucun commentaire:

Enregistrer un commentaire