Skip to content

Working with JSON Schema

JSON Schema is a tool used to validate data. In Synapse, JSON Schemas can be used to validate the metadata applied to an entity such as project, file, folder, table, or view, including the annotations applied to it. To learn more about JSON Schemas, check out JSON-Schema.org. You can view a full list of entities supported by Synapse here

Synapse supports a subset of features from json-schema-draft-07. To see the list of features currently supported, see the JSON Schema object definition from Synapse's REST API Documentation

In this tutorial, you will learn how to bind a JSON Schema to a folder, which allows you to enforce data standards on the folder and all its children. You can also bind a JSON Schema to a project, file, table, or view. This tutorial will walk you through each step, from binding a schema to validation.

Tutorial Purpose

By the end of this tutorial, you will:

  1. Log in and create a project and folder
  2. Learn how to retrieve an organization and schema. If they do not exist, understand how to create them.
  3. Bind the JSON Schema to a folder
  4. Add annotations to the folder and validate them against the schema.
  5. Attach a file to the folder, add annotations, and validate the file against the schema.
  6. View schema validation statistics and results

Prerequisites

  • You have a working installation of the Synapse Python Client.
  • Make sure that you have completed the Project tutorial, which covers creating and managing projects in Synapse. This is a prerequisite because you need a project to organize and store the folder used in this tutorial.
  • You are familiar with Synapse concepts: Project, Folder, File.
  • You are familiar with adding annotations to synapse entity.

1. Set Up Synapse Python Client and Retrieve Project

import time
from pprint import pprint

import synapseclient
from synapseclient.core.utils import make_bogus_data_file
from synapseclient.models import File, Folder

# 1. Set up Synapse Python client and retrieve project
syn = synapseclient.Synapse()
syn.login()

# Retrieve test project
PROJECT_ID = syn.findEntityId(
    name="My uniquely named project about Alzheimer's Disease"
)

# Create a test folder for JSON schema experiments
test_folder = Folder(name="clinical_data_folder", parent_id=PROJECT_ID).store()

2. Take a Look at the Constants and Structure of the JSON Schema

ORG_NAME = "myUniqueAlzheimersResearchOrgTutorial"
VERSION = "0.0.1"
NEW_VERSION = "0.0.2"

SCHEMA_NAME = "clinicalObservations"

title = "Alzheimer's Clinical Observation Schema"
schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "https://example.com/schema/alzheimers_observation.json",
    "title": title,
    "type": "object",
    "properties": {
        "patient_id": {
            "type": "string",
            "description": "A unique identifier for the patient",
        },
        "cognitive_score": {
            "type": "integer",
            "description": "Quantitative cognitive function score",
        },
        "diagnosis_stage": {
            "type": "string",
            "description": "Stage of Alzheimer's diagnosis (e.g., Mild, Moderate, Severe)",
            "const": "Mild",  # Example constant for derived annotation
        },
    },
}

Derived annotations allow you to define default values for annotations based on schema rules, ensuring consistency and reducing manual input errors. As you can see here, you could use derived annotations to prescribe default annotation values. Please read more about derived annotations here.

3. Try Create Test Organization and JSON Schema if They Do Not Exist

Next, try creating a test organization and register a schema if they do not already exist:

js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        syn.logger.info(f"Organization {ORG_NAME} already exists.")
        break
else:
    syn.logger.info(f"Creating organization {ORG_NAME}.")
    js.create_organization(ORG_NAME)

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(SCHEMA_NAME)
if not test_schema:
    test_schema = my_test_org.create_json_schema(schema, SCHEMA_NAME, VERSION)

Note: If you update your schema, you can re-register it with the organization by assigning a new version number to reflect the changes. Synapse does not allow re-creating a schema with the same version number, so please ensure that each schema version within an organization is unique:

updated_schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "https://example.com/schema/alzheimers_observation.json",
    "title": "my new title",
    "type": "object",
    "properties": {
        "patient_id": {
            "type": "string",
            "description": "A unique identifier for the patient",
        },
        "cognitive_score": {
            "type": "integer",
            "description": "Quantitative cognitive function score",
        },
        "updated_field": {
            "type": "string",
            "description": "Updated description for the field",
        },
    },
}

try:
    new_test_schema = my_test_org.create_json_schema(
        updated_schema, SCHEMA_NAME, VERSION
    )
except synapseclient.core.exceptions.SynapseHTTPError as e:
    if e.response.status_code == 400 and "already exists" in e.response.text:
        syn.logger.warning(
            f"Schema {SCHEMA_NAME} already exists. Please switch to use a new version number."
        )
    else:
        raise e

4. Bind the JSON Schema to the Folder

After creating the organization, you can now bind your json schema to a test folder. When you bind a JSON Schema to a container entity such as a project or folder, then all items inside of the project or folder will inherit the schema binding, unless the item has a schema bound to itself.

When you bind the schema, you may also include the boolean property enable_derived_annotations to have Synapse automatically calculate derived annotations based on the schema:

schema_uri = ORG_NAME + "-" + SCHEMA_NAME + "-" + VERSION
bound_schema = test_folder.bind_schema(
    json_schema_uri=schema_uri, enable_derived_annotations=True
)
json_schema_version_info = bound_schema.json_schema_version_info
syn.logger.info("JSON schema was bound successfully. Please see details below:")
pprint(vars(json_schema_version_info))
You should be able to see:
JSON schema was bound successfully. Please see details below:
{'created_by': '<your synapse user id>',
 'created_on': '2025-06-13T21:46:37.457Z',
 'id': 'myUniqueAlzheimersResearchOrgTurtorial-clinicalObservations-0.0.1',
 'json_sha256_hex': 'f01270d61cf9a317b9f33a8acc1d86d330effc3548ad350c60d2a072de33f3fd',
 'organization_id': '571',
 'organization_name': 'myUniqueAlzheimersResearchOrgTurtorial',
 'schema_id': '5650',
 'schema_name': 'clinicalObservations',
 'semantic_version': '0.0.1',
 'version_id': '41294'}

5. Retrieve the Bound Schema

Next, we can retrieve the bound schema:

schema = test_folder.get_schema()
syn.logger.info("JSON Schema was retrieved successfully. Please see details below:")
pprint(vars(schema))

You should be able to see:
JSON Schema was retrieved successfully. Please see details below:
{'created_by': '<your synapse user id>',
'created_on': '2025-06-17T15:26:13.718Z',
'enable_derived_annotations': True,
'json_schema_version_info': JSONSchemaVersionInfo(
  organization_id='571',
  organization_name='myUniqueAlzheimersResearchOrgTurtorial',
  schema_id='5650',
  id='myUniqueAlzheimersResearchOrgTurtorial-clinicalObservations-0.0.1',
  schema_name='clinicalObservations',
  version_id='41294',
  semantic_version='0.0.1',
  json_sha256_hex='f01270d61cf9a317b9f33a8acc1d86d330effc3548ad350c60d2a072de33f3fd',
  created_on='2025-06-13T21:46:37.457Z',
  created_by='<your synapse user id>'),
'object_id': 68294149,
'object_type': 'entity'}

6. Add Invalid Annotations to the Folder and Store, and Validate the Folder against the Schema

Try adding invalid annotations to your folder: This step and the step below demonstrate how the system handles invalid annotations and how the schema validation process works.

test_folder.annotations = {
    "patient_id": "1234",
    "cognitive_score": "invalid str",
}
test_folder.store()

Try validating the folder. You should be able to see messages related to invalid annotations.

validation_results = test_folder.validate_schema()
syn.logger.info("Validation was completed. Please see details below:")
pprint(vars(validation_results))

You should be able to see:
Validation was completed. Please see details below:
{'all_validation_messages': ['#/cognitive_score: expected type: Integer, '
                             'found: String'],
 'validation_error_message': 'expected type: Integer, found: String',
 'validation_exception': ValidationException(pointer_to_violation='#/cognitive_score',
 message='expected type: Integer, '
          'found: String',
 schema_location='#/properties/cognitive_score',
 causing_exceptions=[]),
 'validation_response': JSONSchemaValidation(object_id='syn68294149',
 object_type='entity',
 object_etag='251af76f-56d5-49a2-aada-c268f24d699d',
 id='https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/myUniqueAlzheimersResearchOrgTurtorial-clinicalObservations-0.0.1',
 is_valid=False,
 validated_on='2025-06-17T15:26:14.878Z')}

In the synapse web UI, you could also see your invalid annotations being marked by a yellow label similar to this:

json_schema

7. Create a File with Invalid Annotations, Upload It, and View Validation Details

This step is only relevant for container entities, such as a folder or a project.

Try creating a test file locally and store the file in the folder that we created earlier. Then, try adding invalid annotations to that file. This step demonstrates how the files inside a folder also inherit the schema from the parent entity.

path_to_file = make_bogus_data_file(n=5)

annotations = {"patient_id": "123456", "cognitive_score": "invalid child str"}

child_file = File(path=path_to_file, parent_id=test_folder.id, annotations=annotations)
child_file = child_file.store()

You could then use get_schema_validation_statistics to get information such as the number of children with invalid annotations inside a container.

validation_statistics = test_folder.get_schema_validation_statistics()
syn.logger.info(
    "Validation statistics were retrieved successfully. Please see details below:"
)
pprint(vars(validation_statistics))

You should be able to see:
Validation statistics were retrieved successfully. Please see details below:
{'container_id': 'syn68294149',
'number_of_invalid_children': 1,
'number_of_unknown_children': 0,
'number_of_valid_children': 0,
'total_number_of_children': 1}

You could also use get_invalid_validation to see more detailed results of all the children inside a container, which includes all validation messages and validation exception details.

invalid_validation = invalid_results = test_folder.get_invalid_validation()
for child in invalid_validation:
    syn.logger.info("See details of validation results: ")
    pprint(vars(child))

You should be able to see:
See details of validation results:
{'all_validation_messages': ['#/cognitive_score: expected type: Integer, '
                             'found: String'],
 'validation_error_message': 'expected type: Integer, found: String',
 'validation_exception': ValidationException(pointer_to_violation='#/   cognitive_score',
 message='expected type: Integer, '
          'found: String',
 schema_location='#/properties/cognitive_score',
  causing_exceptions=[]),
'validation_response': JSONSchemaValidation(object_id='syn68294165',
  object_type='entity',
  object_etag='4ab1d39d-d1bf-45ac-92be-96ceee576d72',
  id='https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/myUniqueAlzheimersResearchOrgTurtorial-clinicalObservations-0.0.1',
  is_valid=False,
  validated_on='2025-06-17T15:26:18.650Z')}

In the synapse web UI, you could also see your invalid annotations being marked by a yellow label similar to this:

jsonschema

Source Code for this Tutorial

Click to show me
import time
from pprint import pprint

import synapseclient
from synapseclient.core.utils import make_bogus_data_file
from synapseclient.models import File, Folder

# 1. Set up Synapse Python client and retrieve project
syn = synapseclient.Synapse()
syn.login()

# Retrieve test project
PROJECT_ID = syn.findEntityId(
    name="My uniquely named project about Alzheimer's Disease"
)

# Create a test folder for JSON schema experiments
test_folder = Folder(name="clinical_data_folder", parent_id=PROJECT_ID).store()

# 2. Take a look at the constants and structure of the JSON schema
ORG_NAME = "myUniqueAlzheimersResearchOrgTutorial"
VERSION = "0.0.1"
NEW_VERSION = "0.0.2"

SCHEMA_NAME = "clinicalObservations"

title = "Alzheimer's Clinical Observation Schema"
schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "https://example.com/schema/alzheimers_observation.json",
    "title": title,
    "type": "object",
    "properties": {
        "patient_id": {
            "type": "string",
            "description": "A unique identifier for the patient",
        },
        "cognitive_score": {
            "type": "integer",
            "description": "Quantitative cognitive function score",
        },
        "diagnosis_stage": {
            "type": "string",
            "description": "Stage of Alzheimer's diagnosis (e.g., Mild, Moderate, Severe)",
            "const": "Mild",  # Example constant for derived annotation
        },
    },
}

# 3. Try create test organization and json schema if they do not exist
js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        syn.logger.info(f"Organization {ORG_NAME} already exists.")
        break
else:
    syn.logger.info(f"Creating organization {ORG_NAME}.")
    js.create_organization(ORG_NAME)

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(SCHEMA_NAME)
if not test_schema:
    test_schema = my_test_org.create_json_schema(schema, SCHEMA_NAME, VERSION)

# If you want to make an update, you can re-register your schema with the organization:
updated_schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "https://example.com/schema/alzheimers_observation.json",
    "title": "my new title",
    "type": "object",
    "properties": {
        "patient_id": {
            "type": "string",
            "description": "A unique identifier for the patient",
        },
        "cognitive_score": {
            "type": "integer",
            "description": "Quantitative cognitive function score",
        },
        "updated_field": {
            "type": "string",
            "description": "Updated description for the field",
        },
    },
}

try:
    new_test_schema = my_test_org.create_json_schema(
        updated_schema, SCHEMA_NAME, VERSION
    )
except synapseclient.core.exceptions.SynapseHTTPError as e:
    if e.response.status_code == 400 and "already exists" in e.response.text:
        syn.logger.warning(
            f"Schema {SCHEMA_NAME} already exists. Please switch to use a new version number."
        )
    else:
        raise e

# 4. Bind the JSON schema to the folder
schema_uri = ORG_NAME + "-" + SCHEMA_NAME + "-" + VERSION
bound_schema = test_folder.bind_schema(
    json_schema_uri=schema_uri, enable_derived_annotations=True
)
json_schema_version_info = bound_schema.json_schema_version_info
syn.logger.info("JSON schema was bound successfully. Please see details below:")
pprint(vars(json_schema_version_info))

# 5. Retrieve the Bound Schema
schema = test_folder.get_schema()
syn.logger.info("JSON Schema was retrieved successfully. Please see details below:")
pprint(vars(schema))

# 6. Add Invalid Annotations to the Folder and Store
test_folder.annotations = {
    "patient_id": "1234",
    "cognitive_score": "invalid str",
}
test_folder.store()

time.sleep(2)

validation_results = test_folder.validate_schema()
syn.logger.info("Validation was completed. Please see details below:")
pprint(vars(validation_results))

# 7. Create a File with Invalid Annotations and Upload It
# Then, view validation statistics and invalid validation results
path_to_file = make_bogus_data_file(n=5)

annotations = {"patient_id": "123456", "cognitive_score": "invalid child str"}

child_file = File(path=path_to_file, parent_id=test_folder.id, annotations=annotations)
child_file = child_file.store()
time.sleep(2)

validation_statistics = test_folder.get_schema_validation_statistics()
syn.logger.info(
    "Validation statistics were retrieved successfully. Please see details below:"
)
pprint(vars(validation_statistics))

invalid_validation = invalid_results = test_folder.get_invalid_validation()
for child in invalid_validation:
    syn.logger.info("See details of validation results: ")
    pprint(vars(child))

Reference