Examples#

Nested JSON to tables#

The original use-case for dict2rel was taking a list of JSON objects with lots of nesting and flattening them into tables, one table for each field-path of a nested list, while maintaining a way to reconstruct the original list again after transformations had taken place. dict2rel() is provided to accomplish this goal and takes one or more JSON objects and will produce one or more tables. The data-type of the tables is dependent on the provider argument, which must be supplied and can be values like pd.DataFrame, pl.DataFrame, or simply lambda rows: rows. For example, if you have the following JSON:

{
    "catalog_id": "CAT001",
    "name": "Tech Skill Bootcamp",
    "courses": [
        {
            "course_id": "CS101",
            "title": "Introduction to Programming",
            "instructor": "Dr. Alex Lee",
            "modules": [
                {
                    "module_id": 1,
                    "name": "Fundamentals",
                    "lessons": [
                        "Setting up the environment",
                        "Basic data types",
                        "Control flow (if/else)"
                    ]
                },
                {
                    "module_id": 2,
                    "name": "Data Structures",
                    "lessons": [
                        "Lists and Tuples",
                        "Dictionaries and Sets"
                    ]
                }
            ]
        },
        {
            "course_id": "WD201",
            "title": "Web Development Basics",
            "instructor": "Ms. Jamie Chen",
            "modules": [
                {
                    "module_id": 1,
                    "name": "HTML & CSS",
                    "lessons": [
                        "Structuring content with HTML5",
                        "Styling with CSS Grid"
                    ]
                }
            ]
        }
    ]
}

>>> from dict2rel import dict2rel
>>> import pandas as pd
>>> tables = dict2rel(json_data, pd.DataFrame)
>>> tables
{
    '*.courses.*.modules.*.lessons': pd.DataFrame([...])
    '*.courses.*.modules': pd.DataFrame([...])
    '*.courses': pd.DataFrame([...])
    '*': pd.DataFrame([...])
}

where the tables look like:

*:

catalog_id	name	_id
CAT001	Tech Skill Bootcamp	0

*.courses:

course_id	title	instructor	_id
CS101	Introduction to Programming	Dr. Alex Lee	0.courses.0
WD201	Web Development Basics	Ms. Jamie Chen	0.courses.1

*.courses.*.modules:

module_id	name	_id
1	Fundamentals	0.courses.0.modules.0
2	Data Structures	0.courses.0.modules.1
1	HTML & CSS	0.courses.1.modules.0

*.courses.*.modules.*.lessons:

_value	_id
Setting up the environment	0.courses.0.modules.0.lessons.0
Basic data types	0.courses.0.modules.0.lessons.1
Control flow (if/else)	0.courses.0.modules.0.lessons.2
Lists and Tuples	0.courses.0.modules.1.lessons.0
Dictionaries and Sets	0.courses.0.modules.1.lessons.1
Structuring content with HTML5	0.courses.1.modules.0.lessons.0
Styling with CSS Grid	0.courses.1.modules.0.lessons.1

This last table varies a little in format from the others due the _value key which indicates that the list was of singletons and not nested objects.

Now, we can modify these tables if we’d like. For example, lets add the number of credits each course is:

>>> tables["*.courses"]["credits"] = [2, 3]

If we tried to do that in the original JSON, we’d have to do something like the following, which involves a lot more indexing and for which there isn’t an easy way to add a new column to all of the objects at a certain level.

>>> json_data["courses"][0]["credits"] = 2
>>> json_data["courses"][1]["credits"] = 3

Tables back to JSON#

dict2rel also provides a way to take those flattened tables and convert them back to the nested JSON with dict2rel.rel2dict(). Applying that to the example we modified above gives the following, which has been shortened just to highlight the new fields we added:

[
    {
        "catalog_id": "CAT001",
        "name": "Tech Skill Bootcamp",
        "courses": [
            {
                "course_id": "CS101",
                "title": "Introduction to Programming",
                "instructor": "Dr. Alex Lee",
                "credits": 2,
                "modules": [

                ]
            },
            {
                "course_id": "WD201",
                "title": "Web Development Basics",
                "instructor": "Ms. Jamie Chen",
                "credits": 3,
                "modules": [

                ]
            }
        ]
    }
]

JSON to a single table#

Now, sometimes you have a highly nested JSON object, but just want a single table. dict2rel also support this scenario with dict2rel.flatten(). Similar to dict2rel.dict2rel(), it takes one or more JSON objects and a provider argument. Unlike dict2rel(), it will only return a single table objects.

>>> from dict2rel import flatten
>>> table = flatten(json_data, pd.DataFrame)  # data from earlier example
>>> table
pd.DataFrame([...])

_id	catalog_id	name	courses.0.course_id	courses.0.title	courses.0.instructor	courses.0.modules.0.module_id	courses.0.modules.0.name	courses.0.modules.0.lessons.0	courses.0.modules.0.lessons.1	courses.0.modules.0.lessons.2	courses.0.modules.1.module_id	courses.0.modules.1.name	courses.0.modules.1.lessons.0	courses.0.modules.1.lessons.1	courses.1.course_id	courses.1.title	courses.1.instructor	courses.1.modules.0.module_id	courses.1.modules.0.name	courses.1.modules.0.lessons.0	courses.1.modules.0.lessons.1
0	CAT001	Tech Skill Bootcamp	CS101	Introduction to Programming	Dr. Alex Lee	1	Fundamentals	Setting up the environment	Basic data types	Control flow (if/else)	2	Data Structures	Lists and Tuples	Dictionaries and Sets	WD201	Web Development Basics	Ms. Jamie Chen	1	HTML & CSS	Structuring content with HTML5	Styling with CSS Grid

dict2rel.inflate() provides the complement to flatten() as rel2dict() does for dict2rel.dict2rel().

Heterogeneous data#

Heterogeneous means diverse or varied. In the context of dict2rel, this refers to fields where the value has different types across objects. In particular, the situation where the value is sometimes an object and other times a list of objects stands out as the object will be flattened inline and the list of objects will be placed in a separate table. This can result in data for the same field being placed in two different places.

To address this, dict2rel.UnravelOptions.support_heterogeneous_data can be set. When it is, a pass will be done to analyze the data and detect these mismatched type scenarios. Then, the fields will be handled consistently and the data will always be placed in a separate table. See the examples below to contrast the flag being set versus not set.

Data for examples below#

[
    {
        "addresses": {
            "address1": "101 North Street",
            "city": "Waco",
            "state": "Texas",
            "zip": "76711",
        }
    },
    {
        "addresses": [
            {
                "address1": "500 W 6th St",
                "city": "Waterloo",
                "state": "Iowa",
                "zip": "50701",
            }
        ]
    }
]

With support_heterogeneous_data = False:

*

addresses.address1

addresses.city

addresses.state

addresses.zip

_id

101 North Street

Waco

Texas

76711

0
*.addresses

address1

city

state

zip

_id

500 W 6th St

Waterloo

Iowa

50701

1.addresses.0

addresses.address1	addresses.city	addresses.state	addresses.zip	_id
101 North Street	Waco	Texas	76711	0

address1	city	state	zip	_id
500 W 6th St	Waterloo	Iowa	50701	1.addresses.0

With support_heterogeneous_data = True

addresses

address1

city

state

zip

_id

101 North Street

Waco

Texas

76711

0.addresses.0

500 W 6th St

Waterloo

Iowa

50701

1.addresses.0

Examples#

Nested JSON to tables#

Tables back to JSON#

JSON to a single table#

Heterogeneous data#

This Page