I am running some tests in spark. For that I'm loading a csv file to compare my results against.
My etalon
;;NULL;2017-03-21
;;NULL;2017-03-21
;;NULL;2017-03-21
This is how I'm loading the file
spark.read.schema(Table.schema)
.format("com.databricks.spark.csv")
.option("delimiter", ";")
.option("nullValue", "NULL")
.load(pathTable)
.createTempView(param.TABLE)
This is my schema
val fields = Seq(
StructField("balance", StringType, nullable = true),
StructField("status", StringType, nullable = true),
StructField("status_date", DateType, nullable = true),
StructField("time_key", StringType, nullable = true)
)
val schema = StructType(fields)
For some reason balance
and status
are loaded as NULL when they should be empty strings.
+-------+------+-----------+----------+
|balance|status|status_date| time_key|
+-------+------+-----------+----------+
| null| null| null|2017-03-21|
| null| null| null|2017-03-21|
| null| null| null|2017-03-21|
| null| null| null|2017-03-21|
| null| null| null|2017-03-21|
| null| null| null|2017-03-21|
| null| null| null|2017-03-21|
+-------+------+-----------+----------+
Why is that and how can I have it shown as empty string?
Aucun commentaire:
Enregistrer un commentaire