vendredi 1 février 2019

when loading tempView empty string is shown as null

I am running some tests in spark. For that I'm loading a csv file to compare my results against.

My etalon

;;NULL;2017-03-21
;;NULL;2017-03-21
;;NULL;2017-03-21

This is how I'm loading the file

spark.read.schema(Table.schema)
      .format("com.databricks.spark.csv")
      .option("delimiter", ";")
      .option("nullValue", "NULL")
      .load(pathTable)
      .createTempView(param.TABLE)

This is my schema

  val fields = Seq(
    StructField("balance", StringType, nullable = true),
    StructField("status", StringType, nullable = true),
    StructField("status_date", DateType, nullable = true),
    StructField("time_key", StringType, nullable = true)
  )
  val schema = StructType(fields)

For some reason balance and status are loaded as NULL when they should be empty strings.

+-------+------+-----------+----------+
|balance|status|status_date|  time_key|
+-------+------+-----------+----------+
|   null|  null|       null|2017-03-21|
|   null|  null|       null|2017-03-21|
|   null|  null|       null|2017-03-21|
|   null|  null|       null|2017-03-21|
|   null|  null|       null|2017-03-21|
|   null|  null|       null|2017-03-21|
|   null|  null|       null|2017-03-21|
+-------+------+-----------+----------+

Why is that and how can I have it shown as empty string?

Aucun commentaire:

Enregistrer un commentaire