Examining Data Types JSON Columns

The example data contains a couple of rows with unusual data:

{
  "id":6,
  "person" : {
      "mynumber":5,
      "myarray":[1,2,3,4]
  }
}

{
  "id":7,
  "person" : {
      "mynumber":"5",
      "myarray":["1","2","3","4"]
  }
} 

You can locate them using the query:

sql-> SELECT * FROM JSONPersons j WHERE EXISTS j.person.mynumber;
 +----+-------------------+
 | id |      person       |
 +----+-------------------+
 |  6 | myarray           |
 |    |            1      |
 |    |            2      |
 |    |            3      |
 |    |            4      |
 |    | mynumber | 5      |
 +----+-------------------+
 |  7 | myarray           |
 |    |            1      |
 |    |            2      |
 |    |            3      |
 |    |            4      |
 |    | mynumber | 5      |
 +----+-------------------+

2 rows returned 

However, notice that these two rows actually contain numbers stored as different types. ID 6 stores integers while ID 7 stores strings. You can select a row based on its type:

sql-> SELECT * FROM JSONPersons j 
WHERE j.person.mynumber IS OF TYPE (integer);
 +----+-------------------+
 | id |      person       |
 +----+-------------------+
 |  6 | myarray           |
 |    |            1      |
 |    |            2      |
 |    |            3      |
 |    |            4      |
 |    | mynumber | 5      |
 +----+-------------------+ 

Notice that if you use IS NOT OF TYPE then every row in the table is returned except id 6. This is because for all the other rows, j.person.mynumber evaluates to jnull, which is not an integer.


sql-> SELECT id FROM JSONPersons j 
WHERE j.person.mynumber IS NOT OF TYPE (integer);
 +----+
 | id |
 +----+
 |  3 |
 |  2 |
 |  5 |
 |  4 |
 |  1 |
 |  7 |
 +----+

6 rows returned 

To solve this problem, also check for the existence of j.person.mynumber:

sql-> SELECT id from JSONPersons j WHERE EXISTS j.person.mynumber 
and j.person.mynumber IS NOT OF TYPE (integer); 
 +----+
 | id |
 +----+
 |  7 |
 +----+

1 row returned 

You can also perform type checking based on the type of data contained in the array. Recall that our rows contain arrays with integers and arrays with strings. You can return the row with just the array of strings using:


sql-> SELECT id, j.person.myarray FROM JSONPersons j 
WHERE j.person.myarray[] IS OF TYPE (string+);
 +----+-------------------+
 | id |      myarray      |
 +----+-------------------+
 |  7 | 1                 |
 |    | 2                 |
 |    | 3                 |
 |    | 4                 |
 +----+-------------------+

1 row returned

Here, we use the array filter step ([]) in the WHERE clause to unpack the array into a sequence. This allows is-of-type to iterate over the sequence, checking the type of each element. If every element in the sequence matches the identified type (string, in this case), then the is-of-type returns true.

Also notice that the query uses the + cardinality modifier. This means that is-of-type will return true only if the input sequence (myarray[], in this case) contains ONE OR MORE elements that match the identified type (string). If we used *, then 0 or more elements would have to match the identified type in order for true to return. Because our table contains a mix of rows with different schema, the result is that every row except id 6 is returned:

sql-> SELECT id, j.person.myarray FROM JSONPersons j 
WHERE j.person.myarray[] IS OF TYPE (string*);
 +----+-------------------+
 | id |      myarray      |
 +----+-------------------+
 |  3 | NULL              |
 +----+-------------------+
 |  5 | NULL              |
 +----+-------------------+
 |  1 | NULL              |
 +----+-------------------+
 |  7 | 1                 |
 |    | 2                 |
 |    | 3                 |
 |    | 4                 |
 +----+-------------------+
 |  4 | NULL              |
 +----+-------------------+
 |  2 | NULL              |
 +----+-------------------+

6 rows returned 

Finally, if we do not provide a cardinality modifier at all, then is-of-type returns true if ONE AND ONLY one member of the input sequence matches the identified type. In this example, the result is that no rows are returned.

sql-> SELECT id, j.person.myarray FROM JSONPersons j 
WHERE j.person.myarray[] IS OF TYPE (string);

0 row returned