Skip to content

xsd_to_dataclasses

XSD to Python Dataclasses Generator.

This script converts XSD (XML Schema Definition) files to Python dataclasses. It's specifically designed to work with FFprobe's XSD schema but can be adapted for other XSD files.

Requirements
  • Python 3.7+
Usage

python xsd_to_dataclasses.py ffprobe.xsd

This will generate a new file called 'ffprobe_dataclasses.py' containing all the dataclass definitions.

Features
  • Converts XSD complex types to Python dataclasses
  • Handles optional fields and unbounded sequences
  • Preserves type information from XSD
  • Generates frozen dataclasses with keyword-only arguments
  • Generates a registered_types dictionary for type resolution

Classes:

Name Description
XSDAttribute

Represents an XSD attribute with its properties.

XSDElement

Represents an XSD element with its properties.

Functions:

Name Description
find_elements_with_namespace

Find elements using XPath-like syntax with namespace support.

generate_dataclass

Generate a dataclass definition from an XSD complex type.

generate_dataclass_fields

Generate field definitions for a dataclass from an XSD complex type.

generate_dataclasses

Generate dataclass definitions from an XSD root element.

generate_registered_types

Generate the registered_types dictionary.

get_choice_types

Get the types from a choice element.

get_field_type

Generate the Python type annotation for an XSD element.

get_python_type

Convert XSD type to Python type.

main

Convert XSD to Python dataclasses.

parse_xsd_attribute

Parse an XSD attribute into a structured object.

parse_xsd_element

Parse an XSD element into a structured object.

parse_xsd_file

Parse an XSD file and return its root element and namespace mapping.

XSDAttribute dataclass

XSDAttribute(name: str, type_name: str)

Represents an XSD attribute with its properties.

XSDElement dataclass

XSDElement(
    name: str, type_name: str, max_occurs: str = "1"
)

Represents an XSD element with its properties.

find_elements_with_namespace

find_elements_with_namespace(
    element: Element, xpath: str, ns: dict[str, str]
) -> list[Element]

Find elements using XPath-like syntax with namespace support.

Parameters:

Name Type Description Default
element Element

The root element to search from

required
xpath str

The XPath-like expression to search for

required
ns dict[str, str]

The namespace mapping

required

Returns:

Type Description
list[Element]

A list of matching elements

generate_dataclass

generate_dataclass(
    class_name: str,
    complex_type: Element,
    ns: dict[str, str],
) -> str

Generate a dataclass definition from an XSD complex type.

Parameters:

Name Type Description Default
class_name str

The name of the class to generate

required
complex_type Element

The XSD complex type element

required
ns dict[str, str]

The XML namespace mapping

required

Returns:

Type Description
str

A string containing the dataclass definition

generate_dataclass_fields

generate_dataclass_fields(
    complex_type: Element, ns: dict[str, str]
) -> list[str]

Generate field definitions for a dataclass from an XSD complex type.

Parameters:

Name Type Description Default
complex_type Element

The XSD complex type element

required
ns dict[str, str]

The XML namespace mapping

required

Returns:

Type Description
list[str]

A list of field definition strings

generate_dataclasses

generate_dataclasses(
    root: Element, ns: dict[str, str]
) -> tuple[str, list[str]]

Generate dataclass definitions from an XSD root element.

Parameters:

Name Type Description Default
root Element

The XSD root element

required
ns dict[str, str]

The XML namespace mapping

required

Returns:

Type Description
tuple[str, list[str]]

A tuple containing the generated code and list of class names

generate_registered_types

generate_registered_types(class_names: list[str]) -> str

Generate the registered_types dictionary.

Parameters:

Name Type Description Default
class_names list[str]

List of all generated class names

required

Returns:

Type Description
str

A string containing the registered_types dictionary definition

get_choice_types

get_choice_types(
    choice: Element, ns: dict[str, str]
) -> list[str]

Get the types from a choice element.

Parameters:

Name Type Description Default
choice Element

The choice element

required
ns dict[str, str]

The namespace mapping

required

Returns:

Type Description
list[str]

A list of type names

get_field_type

get_field_type(element: XSDElement) -> str

Generate the Python type annotation for an XSD element.

Parameters:

Name Type Description Default
element XSDElement

The XSD element to generate type for

required

Returns:

Type Description
str

A string representing the Python type annotation

Examples:

>>> get_field_type(XSDElement("test", "string", "1"))
'Optional["string"]'
>>> get_field_type(XSDElement("test", "string", "unbounded"))
'Optional[tuple["string", ...]]'

get_python_type

get_python_type(xsd_type: str) -> str

Convert XSD type to Python type.

Parameters:

Name Type Description Default
xsd_type str

The XSD type to convert

required

Returns:

Type Description
str

The corresponding Python type

Examples:

>>> get_python_type("string")
'str'
>>> get_python_type("int")
'int'
>>> get_python_type("unknown")
'Any'

main

main() -> None

Convert XSD to Python dataclasses.

Usage

python xsd_to_dataclasses.py

parse_xsd_attribute

parse_xsd_attribute(attr: Element) -> XSDAttribute

Parse an XSD attribute into a structured object.

Parameters:

Name Type Description Default
attr Element

The XSD attribute to parse

required

Returns:

Type Description
XSDAttribute

An XSDAttribute object containing the parsed information

parse_xsd_element

parse_xsd_element(element: Element) -> XSDElement

Parse an XSD element into a structured object.

Parameters:

Name Type Description Default
element Element

The XSD element to parse

required

Returns:

Type Description
XSDElement

An XSDElement object containing the parsed information

parse_xsd_file

parse_xsd_file(
    xsd_file: str,
) -> tuple[Element, dict[str, str]]

Parse an XSD file and return its root element and namespace mapping.

Parameters:

Name Type Description Default
xsd_file str

Path to the XSD file

required

Returns:

Type Description
tuple[Element, dict[str, str]]

A tuple containing the root element and namespace mapping