Flatten a nested JSON using the built-in SQL Pool

This article will show you how to access differents nested nodes in a JSON file via the built-in SQL Pool !

This example will use the randomuser API : https://randomuser.me/

This API can return an amount of user depending of the parameter that you pass to the API.

This example will return 10 users :

https://randomuser.me/api/?results=10

Save the result as a JSON file and upload it to your data storage.

Let’s attack the Azure Synapse part !

Open Synapse studio and go to “Data” then create a “linked service” to your folder/file.

Once created you can go through it and right click and New SQL Script > Select TOP 100 rows.

This will generate a basic script

SELECT TOP 100
    jsonContent
FROM
    OPENROWSET(
        BULK 'https://LinkToYourStorage/YourJsonFolder/10person.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b',
        ROWTERMINATOR = '0x0b' --<== Remove that line
    )
    WITH (
        jsonContent varchar(MAX)
    ) AS [result]

Then remove the line that define the ROWTERMINATOR.

Executing this you will have that kind of result :

Which is great, you can now see your JSON.

Here is a look to how a “user” is defined:

{
  "results": [
    {
      "gender": "female",
      "name": {
        "title": "Miss",
        "first": "Jennie",
        "last": "Nichols"
      },
      "location": {
        "street": {
          "number": 8929,
          "name": "Valwood Pkwy",
        },
        "city": "Billings",
        "state": "Michigan",
        "country": "United States",
        "postcode": "63104",
        "coordinates": {
          "latitude": "-69.8246",
          "longitude": "134.8719"
        },
        "timezone": {
          "offset": "+9:30",
          "description": "Adelaide, Darwin"
        }
      },
      "email": "jennie.nichols@example.com",
      "login": {
        "uuid": "7a0eed16-9430-4d68-901f-c0d4c1c3bf00",
        "username": "yellowpeacock117",
        "password": "addison",
        "salt": "sld1yGtd",
        "md5": "ab54ac4c0be9480ae8fa5e9e2a5196a3",
        "sha1": "edcf2ce613cbdea349133c52dc2f3b83168dc51b",
        "sha256": "48df5229235ada28389b91e60a935e4f9b73eb4bdb855ef9258a1751f10bdc5d"
      },
      "dob": {
        "date": "1992-03-08T15:13:16.688Z",
        "age": 30
      },
      "registered": {
        "date": "2007-07-09T05:51:59.390Z",
        "age": 14
      },
      "phone": "(272) 790-0888",
      "cell": "(489) 330-2385",
      "id": {
        "name": "SSN",
        "value": "405-88-3636"
      },
      "picture": {
        "large": "https://randomuser.me/api/portraits/men/75.jpg",
        "medium": "https://randomuser.me/api/portraits/med/men/75.jpg",
        "thumbnail": "https://randomuser.me/api/portraits/thumb/men/75.jpg"
      },
      "nat": "US"
    }
  ],
  "info": {
    "seed": "56d27f4a53bd5441",
    "results": 1,
    "page": 1,
    "version": "1.4"
  }
}

What we want to accomplish here is to navigate to “results” node and access underlying nodes.

To do that, simply add a CROSS APPLY and an OPENJSON statement to your query. That way you will have access to the underlying nodes.

SELECT *
FROM
    OPENROWSET(
        BULK 'https://LinkToYourStorage/YourJsonFolder/10person.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b'
    )
    WITH (
        jsonContent nvarchar(MAX)
    ) AS [result]
    CROSS APPLY OPENJSON(jsonContent, '$.results') --Go to the "results" node
    WITH (
        gender varchar(100) '$.gender'
        ,title varchar(100) '$.name.title'
        ,first varchar(100) '$.name.first'
        ,last  varchar(100) '$.name.last'
        ,streetnumber varchar(100) '$.location.street.number'
        ,streetname varchar(100) '$.location.street.name'
    )

And here you are !

Share