Package Reference#

exception dict2rel.ToRowsRequiredError(message: str = "A function to turn tables into rows must be provided when they aren't in row form already or from a supported provider")#
class dict2rel.UnravelOptions(*, fields_to_expand: Iterable[str] | None = None, marker: str | None = None, support_heterogeneous_data: bool = False)#

Options for configuring how an object is unraveled and expanded into one or more tables. Used by dict2rel.dict2rel().

fields_to_expand: Iterable[str] | None = None#

Field paths which point to nested objects which should be expanded to their own tables instead of being flattened inline. Essentially, this will treat the nested objects as if they were nested lists.

The field paths should ignore any nested arrays and mirror field paths seen in query languages like ElasticSearch’s DSL.

>>> data = [
...     {
...         "addresses": {
...
...         }
...     }
... ]
>>> UnravelOptions(
...     fields_to_expand=["addresses"]  # not *.addresses
... )

Added in version 0.0.2.

Changed in version 0.0.3: rel2dict() will now correctly reconstruct the original object even when specific fields were set to be expanded. Fields which were objects are no longer reconstructed as lists of a single object.

marker: str | None = None#

The value, if any, which will be placed in a column when the value was a list and therefore got expanded to its own table. By default, the column is not included.

String interpolation is supported and the provided values are:

  • field: str - the name of the field being expanded

  • id: str - the _id of the current row

  • len: int - the length of the nested list

  • sheet: str - the name of the sheet where the nested values were placed

An example marker value would be: "{len} values placed in {sheet}".

support_heterogeneous_data: bool = False#

By default, the expectation is that the value for a given field path has the same type across all objects. However, that is not always the case and that is particularly impactful if the value is sometimes an object, which will be flattened inline, and other times a list of objects, which will be put in their own sheet.

By setting this flag, fields which have object values sometimes and list values others will be handled consistently and always placed in a separate table.

Note

The produced table names may be different for the same data depending on whether this flag is set or not. The data will be reconstructed to the same original objects, but the intermediate tables may be named differently.

Added in version 0.0.3.

dict2rel.dict2rel(obj: list[JsonObject] | JsonObject, provider: Callable[[list[Row]], P], options: UnravelOptions | None = None) dict[str, P]#

Take a list of (or single) JSON object(s) and convert them to tables using the provider of your choice to construct the tables (like Polars, Pandas, etc.). Nested arrays of JSON objects will be broken out into their own tables while nested objects will be flattened inline. options can be provided to do things like place a marker whenever a list is expanded to a new table instead of dropping the column.

rel2dict() can be used to convert the results of this function back to obj, such that rel2dict(dict2rel(obj, ...)) == obj.

>>> dict2rel(
...     [
...         {
...             "name": {"first": "John", "last": "Smith"},
...             "phones": [
...                 {"country": "USA", "number": "1234567890"},
...                 {"country": "ESP", "number": "987654321"},
...             ],
...         }
...     ],
...     pd.DataFrame,
... )
{
    "*": pd.DataFrame([
        {
            "_id": "0",
            "name.first": "John",
            "name.last": "Smith"
        }
    ]),
    "*.phones": pd.DataFrame([
        {
            "_id": "0.phones.0",
            "country": "USA",
            "number": "1234567890"
        },
        {
            "_id": "0.phones.1",
            "country": "ESP",
            "number": "987654321"
        }
    ])
}

Changed in version 0.0.3: Rows are no longer produced if they would otherwise be empty (or just _id). Tables are no longer produced when there are no rows. Empty rows were generated if an object only had one key and it was nested such that the values got placed in a separate table.

Parameters:
  • obj – A JSONObject or list of them

  • provider – A function which converts a list of rows into a table. Typically, this will be a value like pandas.DataFrame or polars.DataFrame, but can be an identity lambda which will return the results as lists of dictionaries.

  • options – Options to configure how obj is unraveled, like whether to place markers whenever a column is a list which gets expanded to its own table.

dict2rel.flatten(obj: list[JsonObject] | JsonObject, provider: Callable[[list[Row]], P]) P#

Take a list of objects, or a single dict, and flatten it into a single sheet. Unlike dict2rel(), nested lists are kept on the primary sheet and provided unique column names.

inflate() can be used to reverse this process such that inflate(flatten(obj, ...)) == obj.

>>> from dict2rel import flatten
>>> flatten(
...     [
...         {
...             "name": {"first": "John", "last": "Smith"},
...             "phones": [
...                 {"country": "USA", "number": "1234567890"},
...                 {"country": "ESP", "number": "987654321"},
...             ],
...         }
...     ],
...     pl.DataFrame,
... )
pl.DataFrame([
    {
        "_id": "0",
        "name.first": "John",
        "name.last": "Smith",
        "phones.0.country": "USA",
        "phones.0.number": "1234567890",
        "phones.1.country": "ESP",
        "phones.1.number": "987654321"
    }
])
Parameters:
  • obj – A JSONObject or list of them

  • provider – A function which converts a list of rows into a table. Typically, this will be a value like pandas.DataFrame or polars.DataFrame, but can be an identity lambda which will return the results as list of dictionaries.

dict2rel.inflate(table: P, to_rows: Callable[[P], Iterable[Row]] | None = None) list[JsonObject]#

Undo flatten() and take a sheet with nesting represented by column names and inflate it back to a list of dictionaries with actual nesting.

>>> from dict2rel import inflate
>>> inflate(
...     pl.DataFrame(
...         [
...             {
...                 "name": "Bravo",
...                 "version.major": "1",
...                 "version.minor": "0",
...                 "version.patch": "12",
...                 "releases.0.date": "2025-02-12",
...                 "releases.0.version": "0.0.1",
...                 "releases.1.date": "2025-02-18",
...                 "releases.1.version": "0.1.0",
...             }
...         ]
...     )
... )
[{
    'name': 'Bravo',
    'version': {
        'major': '1',
        'minor': '0',
        'patch': '12'
    },
    'releases': [
        {'date': '2025-02-12', 'version': '0.0.1'},
        {'date': '2025-02-18', 'version': '0.1.0'}
    ]
}]
Parameters:
  • table – The table to inflate. This can either be a list of dictionaries, or a table such as pandas.DataFrame or polars.DataFrame.

  • to_rows – A function to convert the table data to dictionaries. This is only needed if the data isn’t already in that format or the tables are a datatype other than pandas.DataFrame or polars.DataFrame.

Raises:

ToRowsRequiredError – If any of the tables aren’t lists or a known table-type like pandas.DataFrame or polars.DataFrame.

dict2rel.rel2dict(tables: dict[str, P], to_rows: Callable[[P], Iterable[Row]] | None = None) list[JsonObject]#

Take a mapping of tables, likely produced by dict2rel(), and reconstruct the nested JSON from them. The tables themselves can be objects like pandas.DataFrame or polars.DataFrame, or other table-types if to_rows is provided.

>>> from dict2rel import rel2dict
>>> rel2dict(
...     {
...         "*": pl.DataFrame(
...             [
...                 {
...                     "_id": "0",
...                     "name": "Acme Corp.",
...                     "state": "AZ",
...                     "board": "4 board members in *.board",
...                 },
...                 {
...                     "_id": "1",
...                     "name": "ZZZ Consulting",
...                     "state": "NY",
...                     "board": "2 board members in *.board",
...                 },
...             ]
...         ),
...         "*.board": pl.DataFrame(
...             [
...                 {"_id": "0.board.0", "name": "Wile E. Coyote"},
...                 {"_id": "0.board.1", "name": "Someone Else"},
...                 {"_id": "1.board.0", "name": "Leonhard Euler"},
...                 {"_id": "1.board.1", "name": "Carl Gauss"},
...             ]
...         ),
...     }
... )
[
    {
        'name': 'Acme Corp.',
        'state': 'AZ',
        'board': [
            {'name': 'Wile E. Coyote'},
            {'name': 'Someone Else'}
        ]
    },
    {
        'name': 'ZZZ Consulting',
        'state': 'NY',
        'board': [
            {'name': 'Leonhard Euler'},
            {'name': 'Carl Gauss'}
        ]
    }
]
Parameters:
  • tables – A mapping of table names to table data

  • to_rows – A function to convert the table data to dictionaries. This is only needed if the data isn’t already in that format or the tables are a datatype other than pandas.DataFrame or polars.DataFrame.

Raises:

ToRowsRequiredError – If any of the tables aren’t lists or a known table-type like pandas.DataFrame or polars.DataFrame.