dimanche 16 décembre 2018

How to create a performatic logical vector for a large data frame R Language?

I ask for help for the following: csv file: 7.5 GB with 185 million lines.

So far, I've done the following:

library(caTools)
library(data.table)
library(dplyr)

dados_treino <- fread('train.csv')

vetor_TF  <- sample.split(dados_treino, SplitRatio = 0.70)

At this point, R Studio returns error:

Can not allocate vector size 7.5 GB

The intent is to split the object into training and test data.

I ask for help for: 1) able to use the command sample (it may be of a different package from CATOOLS); 2) apply the vector constructed in the two sets of data

Follow the link to the data: download data

I am using a computer with 16 GB RAM and Intel i7 processor

Aucun commentaire:

Enregistrer un commentaire