Skip to content

Table

Contained within this file are experimental interfaces for working with the Synapse Python Client. Unless otherwise noted these interfaces are subject to change at any time. Use at your own risk.

API Reference

synapseclient.models.Table dataclass

Bases: AccessControllable, TableBase, TableStoreRowMixin, TableDeleteRowMixin, DeleteMixin, ColumnMixin, GetMixin, QueryMixin, TableUpsertMixin, TableStoreMixin, TableSynchronousProtocol, BaseJSONSchema

A Table represents the metadata of a table.

ATTRIBUTE DESCRIPTION
id

The unique immutable ID for this table. A new ID will be generated for new Tables. Once issued, this ID is guaranteed to never change or be re-issued

TYPE: Optional[str]

name

The name of this table. Must be 256 characters or less. Names may only contain: letters, numbers, spaces, underscores, hyphens, periods, plus signs, apostrophes, and parentheses

TYPE: Optional[str]

description

The description of this entity. Must be 1000 characters or less.

TYPE: Optional[str]

parent_id

The ID of the Entity that is the parent of this table.

TYPE: Optional[str]

columns

The columns of this table. This is an ordered dictionary where the key is the name of the column and the value is the Column object. When creating a new instance of a Table object you may pass any of the following types as the columns argument:

  • A list of Column objects
  • A dictionary where the key is the name of the column and the value is the Column object
  • An OrderedDict where the key is the name of the column and the value is the Column object

The order of the columns will be the order they are stored in Synapse. If you need to reorder the columns the recommended approach is to use the .reorder_column() method. Additionally, you may add, and delete columns using the .add_column(), and .delete_column() methods on your table class instance.

You may modify the attributes of the Column object to change the column type, name, or other attributes. For example suppose I'd like to change a column from a INTEGER to a DOUBLE. I can do so by changing the column type attribute of the Column object. The next time you store the table the column will be updated in Synapse with the new type.

from synapseclient import Synapse
from synapseclient.models import Table, Column, ColumnType

syn = Synapse()
syn.login()

table = Table(id="syn1234").get()
table.columns["my_column"].column_type = ColumnType.DOUBLE
table.store()

Note that the keys in this dictionary should match the column names as they are in Synapse. However, know that the name attribute of the Column object is used for all interactions with the Synapse API. The OrderedDict key is purely for the usage of this interface. For example, if you wish to rename a column you may do so by changing the name attribute of the Column object. The key in the OrderedDict does not need to be changed. The next time you store the table the column will be updated in Synapse with the new name and the key in the OrderedDict will be updated.

TYPE: Optional[Union[List[Column], OrderedDict[str, Column], Dict[str, Column]]]

etag

Synapse employs an Optimistic Concurrency Control (OCC) scheme to handle concurrent updates. Since the E-Tag changes every time an entity is updated it is used to detect when a client's current representation of an entity is out-of-date.

TYPE: Optional[str]

created_on

The date this table was created.

TYPE: Optional[str]

created_by

The ID of the user that created this table.

TYPE: Optional[str]

modified_on

The date this table was last modified. In YYYY-MM-DD-Thh:mm:ss.sssZ format

TYPE: Optional[str]

modified_by

The ID of the user that last modified this table.

TYPE: Optional[str]

version_number

(Read Only) The version number issued to this version on the object. Use this .snapshot() method to create a new version of the table.

TYPE: Optional[int]

version_label

(Read Only) The version label for this table. Use the .snapshot() method to create a new version of the table.

TYPE: Optional[str]

version_comment

(Read Only) The version comment for this table. Use the .snapshot() method to create a new version of the table.

TYPE: Optional[str]

is_latest_version

(Read Only) If this is the latest version of the object.

TYPE: Optional[bool]

is_search_enabled

When creating or updating a table or view specifies if full text search should be enabled. Note that enabling full text search might slow down the indexing of the table or view.

TYPE: Optional[bool]

activity

The Activity model represents the main record of Provenance in Synapse. It is analygous to the Activity defined in the W3C Specification on Provenance. Activity cannot be removed during a store operation by setting it to None. You must use: synapseclient.models.Activity.delete_async or synapseclient.models.Activity.disassociate_from_entity_async.

TYPE: Optional[Activity]

annotations

Additional metadata associated with the table. The key is the name of your desired annotations. The value is an object containing a list of values (use empty list to represent no values for key) and the value type associated with all values in the list. To remove all annotations set this to an empty dict {} or None and store the entity.

TYPE: Optional[Dict[str, Union[List[str], List[bool], List[float], List[int], List[date], List[datetime]]]]

Create a table with data without specifying columns

This API is setup to allow the data to define which columns are created on the Synapse table automatically. The limitation with this behavior is that the columns created will only be of the following types:

  • STRING
  • LARGETEXT
  • INTEGER
  • DOUBLE
  • BOOLEAN
  • DATE

The determination of the column type is based on the data that is passed in using the pandas function infer_dtype. If you need a more specific column type, or need to add options to the colums follow the examples below.

import pandas as pd

from synapseclient import Synapse
from synapseclient.models import Table, SchemaStorageStrategy

syn = Synapse()
syn.login()

my_data = pd.DataFrame(
    {
        "my_string_column": ["a", "b", "c", "d"],
        "my_integer_column": [1, 2, 3, 4],
        "my_double_column": [1.0, 2.0, 3.0, 4.0],
        "my_boolean_column": [True, False, True, False],
    }
)

table = Table(
    name="my_table",
    parent_id="syn1234",
).store()

table.store_rows(values=my_data, schema_storage_strategy=SchemaStorageStrategy.INFER_FROM_DATA)

# Prints out the stored data about this specific column
print(table.columns["my_string_column"])
Rename an existing column

This examples shows how you may retrieve a table from synapse, rename a column, and then store the table back in synapse.

from synapseclient import Synapse
from synapseclient.models import Table

syn = Synapse()
syn.login()

table = Table(
    name="my_table",
    parent_id="syn1234",
).get()

# You may also get the table by id:
table = Table(
    id="syn4567"
).get()

table.columns["my_old_column"].name = "my_new_column"

# Before the data is stored in synapse you'll still be able to use the old key to access the column entry
print(table.columns["my_old_column"])

table.store()

# After the data is stored in synapse you'll be able to use the new key to access the column entry
print(table.columns["my_new_column"])
Create a table with a list of columns

A list of columns may be passed in when creating a new table. The order of the columns in the list will be the order they are stored in Synapse. If the table already exists and you create the Table instance in this way the columns will be appended to the end of the existing columns.

from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

columns = [
    Column(name="my_string_column", column_type=ColumnType.STRING),
    Column(name="my_integer_column", column_type=ColumnType.INTEGER),
    Column(name="my_double_column", column_type=ColumnType.DOUBLE),
    Column(name="my_boolean_column", column_type=ColumnType.BOOLEAN),
]

table = Table(
    name="my_table",
    parent_id="syn1234",
    columns=columns
)

table.store()
Creating a table with a dictionary of columns

When specifying a number of columns via a dict setting the name attribute on the Column object is optional. When it is not specified it will be pulled from the key of the dict.

from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

columns = {
    "my_string_column": Column(column_type=ColumnType.STRING),
    "my_integer_column": Column(column_type=ColumnType.INTEGER),
    "my_double_column": Column(column_type=ColumnType.DOUBLE),
    "my_boolean_column": Column(column_type=ColumnType.BOOLEAN),
}

table = Table(
    name="my_table",
    parent_id="syn1234",
    columns=columns
)

table.store()
Creating a table with an OrderedDict of columns

When specifying a number of columns via a dict setting the name attribute on the Column object is optional. When it is not specified it will be pulled from the key of the dict.

from collections import OrderedDict
from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

columns = OrderedDict({
    "my_string_column": Column(column_type=ColumnType.STRING),
    "my_integer_column": Column(column_type=ColumnType.INTEGER),
    "my_double_column": Column(column_type=ColumnType.DOUBLE),
    "my_boolean_column": Column(column_type=ColumnType.BOOLEAN),
})

table = Table(
    name="my_table",
    parent_id="syn1234",
    columns=columns
)

table.store()
Source code in synapseclient/models/table.py
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
@dataclass()
@async_to_sync
class Table(
    AccessControllable,
    TableBase,
    TableStoreRowMixin,
    TableDeleteRowMixin,
    DeleteMixin,
    ColumnMixin,
    GetMixin,
    QueryMixin,
    TableUpsertMixin,
    TableStoreMixin,
    TableSynchronousProtocol,
    BaseJSONSchema,
):
    """A Table represents the metadata of a table.

    Attributes:
        id: The unique immutable ID for this table. A new ID will be generated for new
            Tables. Once issued, this ID is guaranteed to never change or be re-issued
        name: The name of this table. Must be 256 characters or less. Names may only
            contain: letters, numbers, spaces, underscores, hyphens, periods, plus
            signs, apostrophes, and parentheses
        description: The description of this entity. Must be 1000 characters or less.
        parent_id: The ID of the Entity that is the parent of this table.
        columns: The columns of this table. This is an ordered dictionary where the key is the
            name of the column and the value is the Column object. When creating a new instance
            of a Table object you may pass any of the following types as the `columns` argument:

            - A list of Column objects
            - A dictionary where the key is the name of the column and the value is the Column object
            - An OrderedDict where the key is the name of the column and the value is the Column object

            The order of the columns will be the order they are stored in Synapse. If you need
            to reorder the columns the recommended approach is to use the `.reorder_column()`
            method. Additionally, you may add, and delete columns using the `.add_column()`,
            and `.delete_column()` methods on your table class instance.

            You may modify the attributes of the Column object to change the column
            type, name, or other attributes. For example suppose I'd like to change a
            column from a INTEGER to a DOUBLE. I can do so by changing the column type
            attribute of the Column object. The next time you store the table the column
            will be updated in Synapse with the new type.

            ```python
            from synapseclient import Synapse
            from synapseclient.models import Table, Column, ColumnType

            syn = Synapse()
            syn.login()

            table = Table(id="syn1234").get()
            table.columns["my_column"].column_type = ColumnType.DOUBLE
            table.store()
            ```

            Note that the keys in this dictionary should match the column names as they are in
            Synapse. However, know that the name attribute of the Column object is used for
            all interactions with the Synapse API. The OrderedDict key is purely for the usage
            of this interface. For example, if you wish to rename a column you may do so by
            changing the name attribute of the Column object. The key in the OrderedDict does
            not need to be changed. The next time you store the table the column will be updated
            in Synapse with the new name and the key in the OrderedDict will be updated.
        etag: Synapse employs an Optimistic Concurrency Control (OCC) scheme to handle
            concurrent updates. Since the E-Tag changes every time an entity is updated
            it is used to detect when a client's current representation of an entity is
            out-of-date.
        created_on: The date this table was created.
        created_by: The ID of the user that created this table.
        modified_on: The date this table was last modified.
            In YYYY-MM-DD-Thh:mm:ss.sssZ format
        modified_by: The ID of the user that last modified this table.
        version_number: (Read Only) The version number issued to this version on the
            object. Use this `.snapshot()` method to create a new version of the
            table.
        version_label: (Read Only) The version label for this table. Use the
            `.snapshot()` method to create a new version of the table.
        version_comment: (Read Only) The version comment for this table. Use the
            `.snapshot()` method to create a new version of the table.
        is_latest_version: (Read Only) If this is the latest version of the object.
        is_search_enabled: When creating or updating a table or view specifies if full
            text search should be enabled. Note that enabling full text search might
            slow down the indexing of the table or view.
        activity: The Activity model represents the main record of Provenance in
            Synapse. It is analygous to the Activity defined in the
            [W3C Specification](https://www.w3.org/TR/prov-n/) on Provenance. Activity
            cannot be removed during a store operation by setting it to None. You must
            use: [synapseclient.models.Activity.delete_async][] or
            [synapseclient.models.Activity.disassociate_from_entity_async][].
        annotations: Additional metadata associated with the table. The key is the name
            of your desired annotations. The value is an object containing a list of
            values (use empty list to represent no values for key) and the value type
            associated with all values in the list. To remove all annotations set this
            to an empty dict `{}` or None and store the entity.

    Example: Create a table with data without specifying columns
        This API is setup to allow the data to define which columns are created on the
        Synapse table automatically. The limitation with this behavior is that the
        columns created will only be of the following types:

        - STRING
        - LARGETEXT
        - INTEGER
        - DOUBLE
        - BOOLEAN
        - DATE

        The determination of the column type is based on the data that is passed in
        using the pandas function
        [infer_dtype](https://pandas.pydata.org/docs/reference/api/pandas.api.types.infer_dtype.html).
        If you need a more specific column type, or need to add options to the colums
        follow the examples below.

        ```python
        import pandas as pd

        from synapseclient import Synapse
        from synapseclient.models import Table, SchemaStorageStrategy

        syn = Synapse()
        syn.login()

        my_data = pd.DataFrame(
            {
                "my_string_column": ["a", "b", "c", "d"],
                "my_integer_column": [1, 2, 3, 4],
                "my_double_column": [1.0, 2.0, 3.0, 4.0],
                "my_boolean_column": [True, False, True, False],
            }
        )

        table = Table(
            name="my_table",
            parent_id="syn1234",
        ).store()

        table.store_rows(values=my_data, schema_storage_strategy=SchemaStorageStrategy.INFER_FROM_DATA)

        # Prints out the stored data about this specific column
        print(table.columns["my_string_column"])
        ```

    Example: Rename an existing column
        This examples shows how you may retrieve a table from synapse, rename a column,
        and then store the table back in synapse.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Table

        syn = Synapse()
        syn.login()

        table = Table(
            name="my_table",
            parent_id="syn1234",
        ).get()

        # You may also get the table by id:
        table = Table(
            id="syn4567"
        ).get()

        table.columns["my_old_column"].name = "my_new_column"

        # Before the data is stored in synapse you'll still be able to use the old key to access the column entry
        print(table.columns["my_old_column"])

        table.store()

        # After the data is stored in synapse you'll be able to use the new key to access the column entry
        print(table.columns["my_new_column"])
        ```

    Example: Create a table with a list of columns
        A list of columns may be passed in when creating a new table. The order of the
        columns in the list will be the order they are stored in Synapse. If the table
        already exists and you create the Table instance in this way the columns will
        be appended to the end of the existing columns.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        columns = [
            Column(name="my_string_column", column_type=ColumnType.STRING),
            Column(name="my_integer_column", column_type=ColumnType.INTEGER),
            Column(name="my_double_column", column_type=ColumnType.DOUBLE),
            Column(name="my_boolean_column", column_type=ColumnType.BOOLEAN),
        ]

        table = Table(
            name="my_table",
            parent_id="syn1234",
            columns=columns
        )

        table.store()
        ```


    Example: Creating a table with a dictionary of columns
        When specifying a number of columns via a dict setting the `name` attribute
        on the `Column` object is optional. When it is not specified it will be
        pulled from the key of the dict.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        columns = {
            "my_string_column": Column(column_type=ColumnType.STRING),
            "my_integer_column": Column(column_type=ColumnType.INTEGER),
            "my_double_column": Column(column_type=ColumnType.DOUBLE),
            "my_boolean_column": Column(column_type=ColumnType.BOOLEAN),
        }

        table = Table(
            name="my_table",
            parent_id="syn1234",
            columns=columns
        )

        table.store()
        ```

    Example: Creating a table with an OrderedDict of columns
        When specifying a number of columns via a dict setting the `name` attribute
        on the `Column` object is optional. When it is not specified it will be
        pulled from the key of the dict.

        ```python
        from collections import OrderedDict
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        columns = OrderedDict({
            "my_string_column": Column(column_type=ColumnType.STRING),
            "my_integer_column": Column(column_type=ColumnType.INTEGER),
            "my_double_column": Column(column_type=ColumnType.DOUBLE),
            "my_boolean_column": Column(column_type=ColumnType.BOOLEAN),
        })

        table = Table(
            name="my_table",
            parent_id="syn1234",
            columns=columns
        )

        table.store()
        ```
    """

    id: Optional[str] = None
    """The unique immutable ID for this table. A new ID will be generated for new
    Tables. Once issued, this ID is guaranteed to never change or be re-issued"""

    name: Optional[str] = None
    """The name of this table. Must be 256 characters or less. Names may only
    contain: letters, numbers, spaces, underscores, hyphens, periods, plus signs,
    apostrophes, and parentheses"""

    description: Optional[str] = None
    """The description of this entity. Must be 1000 characters or less."""

    parent_id: Optional[str] = None
    """The ID of the Entity that is the parent of this table."""

    columns: Optional[
        Union[List[Column], OrderedDict[str, Column], Dict[str, Column]]
    ] = field(default_factory=OrderedDict, compare=False)
    """
    The columns of this table. This is an ordered dictionary where the key is the
    name of the column and the value is the Column object. When creating a new instance
    of a Table object you may pass any of the following types as the `columns` argument:

    - A list of Column objects
    - A dictionary where the key is the name of the column and the value is the Column object
    - An OrderedDict where the key is the name of the column and the value is the Column object

    The order of the columns will be the order they are stored in Synapse. If you need
    to reorder the columns the recommended approach is to use the `.reorder_column()`
    method. Additionally, you may add, and delete columns using the `.add_column()`,
    and `.delete_column()` methods on your table class instance.

    You may modify the attributes of the Column object to change the column
    type, name, or other attributes. For example suppose I'd like to change a
    column from a INTEGER to a DOUBLE. I can do so by changing the column type
    attribute of the Column object. The next time you store the table the column
    will be updated in Synapse with the new type.

    ```python
    from synapseclient import Synapse
    from synapseclient.models import Table, Column, ColumnType

    syn = Synapse()
    syn.login()

    table = Table(id="syn1234").get()
    table.columns["my_column"].column_type = ColumnType.DOUBLE
    table.store()
    ```

    Note that the keys in this dictionary should match the column names as they are in
    Synapse. However, know that the name attribute of the Column object is used for
    all interactions with the Synapse API. The OrderedDict key is purely for the usage
    of this interface. For example, if you wish to rename a column you may do so by
    changing the name attribute of the Column object. The key in the OrderedDict does
    not need to be changed. The next time you store the table the column will be updated
    in Synapse with the new name and the key in the OrderedDict will be updated.
    """

    _columns_to_delete: Optional[Dict[str, Column]] = field(default_factory=dict)
    """
    Columns to delete when the table is stored. The key in this dict is the ID of the
    column to delete. The value is the Column object that represents the column to
    delete.
    """

    etag: Optional[str] = field(default=None, compare=False)
    """
    Synapse employs an Optimistic Concurrency Control (OCC) scheme to handle
    concurrent updates. Since the E-Tag changes every time an entity is updated it is
    used to detect when a client's current representation of an entity is out-of-date.
    """

    created_on: Optional[str] = field(default=None, compare=False)
    """The date this table was created."""

    created_by: Optional[str] = field(default=None, compare=False)
    """The ID of the user that created this table."""

    modified_on: Optional[str] = field(default=None, compare=False)
    """The date this table was last modified. In YYYY-MM-DD-Thh:mm:ss.sssZ format"""

    modified_by: Optional[str] = field(default=None, compare=False)
    """The ID of the user that last modified this table."""

    version_number: Optional[int] = field(default=None, compare=False)
    """(Read Only) The version number issued to this version on the object. Use this
    `.snapshot()` method to create a new version of the table."""

    version_label: Optional[str] = None
    """(Read Only) The version label for this table. Use this `.snapshot()` method
    to create a new version of the table."""

    version_comment: Optional[str] = None
    """(Read Only) The version comment for this table. Use this `.snapshot()` method
    to create a new version of the table."""

    is_latest_version: Optional[bool] = field(default=None, compare=False)
    """(Read Only) If this is the latest version of the object."""

    is_search_enabled: Optional[bool] = None
    """When creating or updating a table or view specifies if full text search
    should be enabled. Note that enabling full text search might slow down the
    indexing of the table or view."""

    activity: Optional[Activity] = field(default=None, compare=False)
    """The Activity model represents the main record of Provenance in Synapse.  It is
    analygous to the Activity defined in the
    [W3C Specification](https://www.w3.org/TR/prov-n/) on Provenance. Activity cannot
    be removed during a store operation by setting it to None. You must use:
    [synapseclient.models.Activity.delete_async][] or
    [synapseclient.models.Activity.disassociate_from_entity_async][].
    """

    annotations: Optional[
        Dict[
            str,
            Union[
                List[str],
                List[bool],
                List[float],
                List[int],
                List[date],
                List[datetime],
            ],
        ]
    ] = field(default_factory=dict, compare=False)
    """Additional metadata associated with the table. The key is the name of your
    desired annotations. The value is an object containing a list of values
    (use empty list to represent no values for key) and the value type associated with
    all values in the list. To remove all annotations set this to an empty dict `{}`
    or None and store the entity."""

    _last_persistent_instance: Optional["Table"] = field(
        default=None, repr=False, compare=False
    )
    """The last persistent instance of this object. This is used to determine if the
    object has been changed and needs to be updated in Synapse."""

    def __post_init__(self):
        """Post initialization of the Table object. This is used to set the columns
        attribute to an OrderedDict if it is a list or dict."""
        self.columns = self._convert_columns_to_ordered_dict(columns=self.columns)

    @property
    def has_changed(self) -> bool:
        """Determines if the object has been changed and needs to be updated in Synapse."""
        return (
            not self._last_persistent_instance or self._last_persistent_instance != self
        )

    def _set_last_persistent_instance(self) -> None:
        """Stash the last time this object interacted with Synapse. This is used to
        determine if the object has been changed and needs to be updated in Synapse."""
        del self._last_persistent_instance
        self._last_persistent_instance = dataclasses.replace(self)
        self._last_persistent_instance.activity = (
            dataclasses.replace(self.activity) if self.activity else None
        )
        self._last_persistent_instance.columns = (
            OrderedDict(
                (key, dataclasses.replace(column))
                for key, column in self.columns.items()
            )
            if self.columns
            else OrderedDict()
        )
        self._last_persistent_instance.annotations = (
            deepcopy(self.annotations) if self.annotations else {}
        )

    def fill_from_dict(
        self, entity: Synapse_Table, set_annotations: bool = True
    ) -> "Table":
        """
        Converts the data coming from the Synapse API into this datamodel.

        Arguments:
            entity: The data coming from the Synapse API

        Returns:
            The Table object instance.
        """
        self.id = entity.get("id", None)
        self.name = entity.get("name", None)
        self.description = entity.get("description", None)
        self.parent_id = entity.get("parentId", None)
        self.etag = entity.get("etag", None)
        self.created_on = entity.get("createdOn", None)
        self.created_by = entity.get("createdBy", None)
        self.modified_on = entity.get("modifiedOn", None)
        self.modified_by = entity.get("modifiedBy", None)
        self.version_number = entity.get("versionNumber", None)
        self.version_label = entity.get("versionLabel", None)
        self.version_comment = entity.get("versionComment", None)
        self.is_latest_version = entity.get("isLatestVersion", None)
        self.is_search_enabled = entity.get("isSearchEnabled", False)

        if set_annotations:
            self.annotations = Annotations.from_dict(entity.get("annotations", {}))
        return self

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        entity = {
            "name": self.name,
            "description": self.description,
            "id": self.id,
            "etag": self.etag,
            "parentId": self.parent_id,
            "concreteType": concrete_types.TABLE_ENTITY,
            "versionNumber": self.version_number,
            "versionLabel": self.version_label,
            "versionComment": self.version_comment,
            "isSearchEnabled": self.is_search_enabled,
            # When saving other (non-column) fields to Synapse we still need to pass
            # in the list of columns, otherwise Synapse will wipe out the columns. We
            # are using the last known columns to ensure that we are not losing any
            "columnIds": (
                [
                    column.id
                    for column in self._last_persistent_instance.columns.values()
                ]
                if self._last_persistent_instance
                and self._last_persistent_instance.columns
                else []
            ),
        }
        delete_none_keys(entity)
        result = {
            "entity": entity,
        }
        delete_none_keys(result)
        return result

    async def snapshot_async(
        self,
        comment: str = None,
        label: str = None,
        include_activity: bool = True,
        associate_activity_to_new_version: bool = True,
        *,
        synapse_client: Optional[Synapse] = None,
    ) -> Dict[str, Any]:
        """
        Request to create a new snapshot of a table. The provided comment, label, and
        activity will be applied to the current version thereby creating a snapshot
        and locking the current version. After the snapshot is created a new version
        will be started with an 'in-progress' label.

        Arguments:
            comment: Comment to add to this snapshot to the table.
            label: Label to add to this snapshot to the table. The label must be unique,
                if a label is not provided a unique label will be generated.
            include_activity: If True the activity will be included in snapshot if it
                exists. In order to include the activity, the activity must have already
                been stored in Synapse by using the `activity` attribute on the Table
                and calling the `store()` method on the Table instance. Adding an
                activity to a snapshot of a table is meant to capture the provenance of
                the data at the time of the snapshot.
            associate_activity_to_new_version: If True the activity will be associated
                with the new version of the table. If False the activity will not be
                associated with the new version of the table.
            synapse_client: If not passed in and caching was not disabled by
                `Synapse.allow_client_caching(False)` this will use the last created
                instance from the Synapse class constructor.

        Example: Creating a snapshot of a table
            Comment and label are optional, but filled in for this example.

            ```python
            import asyncio
            from synapseclient.models import Table
            from synapseclient import Synapse

            syn = Synapse()
            syn.login()


            async def main():
                my_table = Table(id="syn1234")
                await my_table.snapshot_async(
                    comment="This is a new snapshot comment",
                    label="3This is a unique label"
                )

            asyncio.run(main())
            ```

        Example: Including the activity (Provenance) in the snapshot and not pulling it forward to the new `in-progress` version of the table.
            By default this method is set up to include the activity in the snapshot and
            then pull the activity forward to the new version. If you do not want to
            include the activity in the snapshot you can set `include_activity` to
            False. If you do not want to pull the activity forward to the new version
            you can set `associate_activity_to_new_version` to False.

            See the [activity][synapseclient.models.Activity] attribute on the Table
            class for more information on how to interact with the activity.

            ```python
            import asyncio
            from synapseclient.models import Table
            from synapseclient import Synapse

            syn = Synapse()
            syn.login()


            async def main():
                my_table = Table(id="syn1234")
                await my_table.snapshot_async(
                    comment="This is a new snapshot comment",
                    label="This is a unique label",
                    include_activity=True,
                    associate_activity_to_new_version=False
                )

            asyncio.run(main())
            ```

        Returns:
            A dictionary that matches: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/SnapshotResponse.html>
        """
        client = Synapse.get_client(synapse_client=synapse_client)
        # Ensure that we have seeded the table with the latest data
        await self.get_async(include_activity=True, synapse_client=client)
        client.logger.info(
            f"[{self.id}:{self.name}]: Creating a snapshot of the table."
        )

        snapshot_response = await create_table_snapshot(
            table_id=self.id,
            comment=comment,
            label=label,
            activity_id=(
                self.activity.id if self.activity and include_activity else None
            ),
            synapse_client=synapse_client,
        )

        if associate_activity_to_new_version and self.activity:
            self._last_persistent_instance.activity = None
            await self.store_async(synapse_client=synapse_client)
        else:
            await self.get_async(include_activity=True, synapse_client=synapse_client)

        return snapshot_response

Functions

get_async async

get_async(include_columns: bool = True, include_activity: bool = False, *, synapse_client: Optional[Synapse] = None) -> Self

Get the metadata about the table from synapse.

PARAMETER DESCRIPTION
include_columns

If True, will include fully filled column objects in the .columns attribute. Defaults to True.

TYPE: bool DEFAULT: True

include_activity

If True the activity will be included in the file if it exists. Defaults to False.

TYPE: bool DEFAULT: False

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Self

The Table instance stored in synapse.

Getting metadata about a table using id

Get a table by ID and print out the columns and activity. include_columns defaults to True and include_activity defaults to False. When you need to update existing columns or activity these need to be set to True during the get_async call, then you'll make the changes, and finally call the .store_async() method.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(id="syn4567").get_async(include_activity=True)
    print(table)

    # Columns are retrieved by default
    print(table.columns)
    print(table.activity)

asyncio.run(main())
Getting metadata about a table using name and parent_id

Get a table by name/parent_id and print out the columns and activity. include_columns defaults to True and include_activity defaults to False. When you need to update existing columns or activity these need to be set to True during the get_async call, then you'll make the changes, and finally call the .store_async() method.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(name="my_table", parent_id="syn1234").get_async(include_columns=True, include_activity=True)
    print(table)
    print(table.columns)
    print(table.activity)

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
@otel_trace_method(
    method_to_trace_name=lambda self, **kwargs: f"{self.__class__}_Get: {self.name}"
)
async def get_async(
    self,
    include_columns: bool = True,
    include_activity: bool = False,
    *,
    synapse_client: Optional[Synapse] = None,
) -> "Self":
    """Get the metadata about the table from synapse.

    Arguments:
        include_columns: If True, will include fully filled column objects in the
            `.columns` attribute. Defaults to True.
        include_activity: If True the activity will be included in the file
            if it exists. Defaults to False.

        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        The Table instance stored in synapse.

    Example: Getting metadata about a table using id
        Get a table by ID and print out the columns and activity. `include_columns`
        defaults to True and `include_activity` defaults to False. When you need to
        update existing columns or activity these need to be set to True during the
        `get_async` call, then you'll make the changes, and finally call the
        `.store_async()` method.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(id="syn4567").get_async(include_activity=True)
            print(table)

            # Columns are retrieved by default
            print(table.columns)
            print(table.activity)

        asyncio.run(main())
        ```

    Example: Getting metadata about a table using name and parent_id
        Get a table by name/parent_id and print out the columns and activity.
        `include_columns` defaults to True and `include_activity` defaults to
        False. When you need to update existing columns or activity these need to
        be set to True during the `get_async` call, then you'll make the changes,
        and finally call the `.store_async()` method.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(name="my_table", parent_id="syn1234").get_async(include_columns=True, include_activity=True)
            print(table)
            print(table.columns)
            print(table.activity)

        asyncio.run(main())
        ```
    """
    if not (self.id or (self.name and self.parent_id)):
        raise ValueError(
            "The table must have an id or a (name and `parent_id`) set."
        )

    entity_id = await get_id(entity=self, synapse_client=synapse_client)

    await get_from_entity_factory(
        entity_to_update=self,
        version=self.version_number,
        synapse_id_or_path=entity_id,
        synapse_client=synapse_client,
    )

    if include_columns:
        column_instances = await get_columns(
            table_id=self.id, synapse_client=synapse_client
        )
        for column in column_instances:
            if column.name not in self.columns:
                self.columns[column.name] = column

    if include_activity:
        self.activity = await Activity.from_parent_async(
            parent=self, synapse_client=synapse_client
        )

    self._set_last_persistent_instance()
    return self

store_async async

store_async(dry_run: bool = False, *, job_timeout: int = 600, synapse_client: Optional[Synapse] = None) -> Self

Store non-row information about a table including the columns and annotations.

Note the following behavior for the order of columns:

  • If a column is added via the add_column method it will be added at the index you specify, or at the end of the columns list.
  • If column(s) are added during the contruction of your Table instance, ie. Table(columns=[Column(name="foo")]), they will be added at the begining of the columns list.
  • If you use the store_rows method and the schema_storage_strategy is set to INFER_FROM_DATA the columns will be added at the end of the columns list.
PARAMETER DESCRIPTION
dry_run

If True, will not actually store the table but will log to the console what would have been stored.

TYPE: bool DEFAULT: False

job_timeout

The maximum amount of time to wait for a job to complete. This is used when updating the table schema. If the timeout is reached a SynapseTimeoutError will be raised. The default is 600 seconds

TYPE: int DEFAULT: 600

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Self

The Table instance stored in synapse.

Source code in synapseclient/models/mixins/table_components.py
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
@otel_trace_method(
    method_to_trace_name=lambda self, **kwargs: f"{self.__class__}_Store: {self.name}"
)
async def store_async(
    self,
    dry_run: bool = False,
    *,
    job_timeout: int = 600,
    synapse_client: Optional[Synapse] = None,
) -> "Self":
    """Store non-row information about a table including the columns and annotations.

    Note the following behavior for the order of columns:

    - If a column is added via the `add_column` method it will be added at the
        index you specify, or at the end of the columns list.
    - If column(s) are added during the contruction of your `Table` instance, ie.
        `Table(columns=[Column(name="foo")])`, they will be added at the begining
        of the columns list.
    - If you use the `store_rows` method and the `schema_storage_strategy` is set to
        `INFER_FROM_DATA` the columns will be added at the end of the columns list.

    Arguments:
        dry_run: If True, will not actually store the table but will log to
            the console what would have been stored.

        job_timeout: The maximum amount of time to wait for a job to complete.
            This is used when updating the table schema. If the timeout
            is reached a `SynapseTimeoutError` will be raised.
            The default is 600 seconds

        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        The Table instance stored in synapse.
    """
    client = Synapse.get_client(synapse_client=synapse_client)

    if (
        (not self._last_persistent_instance)
        and (
            existing_id := await get_id(
                entity=self, synapse_client=synapse_client, failure_strategy=None
            )
        )
        and (
            existing_entity := await self.__class__(id=existing_id).get_async(
                include_columns=True, synapse_client=synapse_client
            )
        )
    ):
        merge_dataclass_entities(
            source=existing_entity,
            destination=self,
        )

    if (not self._last_persistent_instance) and (
        hasattr(self, "_append_default_columns")
        and hasattr(self, "include_default_columns")
        and self.include_default_columns
    ):
        await self._append_default_columns(synapse_client=synapse_client)

    if (
        self.__class__.__name__ not in CLASSES_WITH_READ_ONLY_SCHEMA
        and self.columns
    ):
        # check that column names match this regex "^[a-zA-Z0-9 _\-\.\+\(\)']+$"
        for _, column in self.columns.items():
            if not re.match(r"^[a-zA-Z0-9 _\-\.\+\(\)']+$", column.name):
                raise ValueError(
                    f"Column name '{column.name}' contains invalid characters. "
                    "Names may only contain: letters, numbers, spaces, underscores, "
                    "hyphens, periods, plus signs, apostrophes, and parentheses."
                )

    if dry_run:
        client.logger.info(
            f"[{self.id}:{self.name}]: Dry run enabled. No changes will be made."
        )

    if self.has_changed:
        if self.id:
            if dry_run:
                client.logger.info(
                    f"[{self.id}:{self.name}]: Dry run {self.__class__} update, expected changes:"
                )
                log_dataclass_diff(
                    logger=client.logger,
                    prefix=f"[{self.id}:{self.name}]: ",
                    obj1=self._last_persistent_instance,
                    obj2=self,
                    fields_to_ignore=["columns", "_last_persistent_instance"],
                )
            else:
                entity = await put_entity_id_bundle2(
                    entity_id=self.id,
                    request=self.to_synapse_request(),
                    synapse_client=synapse_client,
                )
                self.fill_from_dict(entity=entity["entity"], set_annotations=False)
        else:
            if dry_run:
                client.logger.info(
                    f"[{self.id}:{self.name}]: Dry run {self.__class__} update, expected changes:"
                )
                log_dataclass_diff(
                    logger=client.logger,
                    prefix=f"[{self.name}]: ",
                    obj1=self.__class__(),
                    obj2=self,
                    fields_to_ignore=["columns", "_last_persistent_instance"],
                )
            else:
                entity = await post_entity_bundle2_create(
                    request=self.to_synapse_request(), synapse_client=synapse_client
                )
                self.fill_from_dict(entity=entity["entity"], set_annotations=False)

    schema_change_request = await self._generate_schema_change_request(
        dry_run=dry_run, synapse_client=synapse_client
    )

    if dry_run:
        return self

    if schema_change_request:
        await TableUpdateTransaction(
            entity_id=self.id, changes=[schema_change_request]
        ).send_job_and_wait_async(synapse_client=client, timeout=job_timeout)

        # Replace the columns after a schema change in case any column names were updated
        updated_columns = OrderedDict()
        for column in self.columns.values():
            updated_columns[column.name] = column
        self.columns = updated_columns
        await self.get_async(
            include_columns=False,
            synapse_client=synapse_client,
        )

    re_read_required = await store_entity_components(
        root_resource=self,
        synapse_client=synapse_client,
        failure_strategy=FailureStrategy.RAISE_EXCEPTION,
    )
    if re_read_required:
        await self.get_async(
            include_columns=False,
            synapse_client=synapse_client,
        )
    self._set_last_persistent_instance()

    return self

delete_async async

delete_async(*, synapse_client: Optional[Synapse] = None) -> None

Delete the entity from synapse. This is not version specific. If you'd like to delete a specific version of the entity you must use the synapseclient.api.delete_entity function directly.

PARAMETER DESCRIPTION
synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
None

None

Deleting a table

Deleting a table is only supported by the ID of the table.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table

syn = Synapse()
syn.login()

async def main():
    await Table(id="syn4567").delete_async()

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
@otel_trace_method(
    method_to_trace_name=lambda self, **kwargs: f"{self.__class__}_Delete: {self.name}"
)
async def delete_async(self, *, synapse_client: Optional[Synapse] = None) -> None:
    """Delete the entity from synapse. This is not version specific. If you'd like
    to delete a specific version of the entity you must use the
    [synapseclient.api.delete_entity][] function directly.

    Arguments:
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        None

    Example: Deleting a table
        Deleting a table is only supported by the ID of the table.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table

        syn = Synapse()
        syn.login()

        async def main():
            await Table(id="syn4567").delete_async()

        asyncio.run(main())
        ```
    """
    if not (self.id or (self.name and self.parent_id)):
        raise ValueError(
            "The table must have an id or a (name and `parent_id`) set."
        )

    entity_id = await get_id(entity=self, synapse_client=synapse_client)

    await delete_entity(
        entity_id=entity_id,
        synapse_client=synapse_client,
    )

query_async async staticmethod

query_async(query: str, include_row_id_and_row_version: bool = True, convert_to_datetime: bool = False, download_location=None, quote_character='"', escape_character='\\', line_end=str(linesep), separator=',', header=True, *, synapse_client: Optional[Synapse] = None, **kwargs) -> Union[DATA_FRAME_TYPE, str]

Query for data on a table stored in Synapse. The results will always be returned as a Pandas DataFrame unless you specify a download_location in which case the results will be downloaded to that location. There are a number of arguments that you may pass to this function depending on if you are getting the results back as a DataFrame or downloading the results to a file.

PARAMETER DESCRIPTION
query

The query to run. The query must be valid syntax that Synapse can understand. See this document that describes the expected syntax of the query: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/web/controller/TableExamples.html

TYPE: str

include_row_id_and_row_version

If True the ROW_ID and ROW_VERSION columns will be returned in the DataFrame. These columns are required if using the query results to update rows in the table. These columns are the primary keys used by Synapse to uniquely identify rows in the table.

TYPE: bool DEFAULT: True

convert_to_datetime

(DataFrame only) If set to True, will convert all Synapse DATE columns from UNIX timestamp integers into UTC datetime objects

TYPE: bool DEFAULT: False

download_location

(CSV Only) If set to a path the results will be downloaded to that directory. The results will be downloaded as a CSV file. A path to the downloaded file will be returned instead of a DataFrame.

DEFAULT: None

quote_character

(CSV Only) The character to use to quote fields. The default is a double quote.

DEFAULT: '"'

escape_character

(CSV Only) The character to use to escape special characters. The default is a backslash.

DEFAULT: '\\'

line_end

(CSV Only) The character to use to end a line. The default is the system's line separator.

DEFAULT: str(linesep)

separator

(CSV Only) The character to use to separate fields. The default is a comma.

DEFAULT: ','

header

(CSV Only) If set to True the first row will be used as the header row. The default is True.

DEFAULT: True

**kwargs

(DataFrame only) Additional keyword arguments to pass to pandas.read_csv. See https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html for complete list of supported arguments. This is exposed as internally the query downloads a CSV from Synapse and then loads it into a dataframe.

DEFAULT: {}

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Union[DATA_FRAME_TYPE, str]

The results of the query as a Pandas DataFrame or a path to the downloaded

Union[DATA_FRAME_TYPE, str]

query results if download_location is set.

Querying for data

This example shows how you may query for data in a table and print out the results.

import asyncio
from synapseclient import Synapse
from synapseclient.models import query_async

syn = Synapse()
syn.login()

async def main():
    results = await query_async(query="SELECT * FROM syn1234")
    print(results)

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
@staticmethod
async def query_async(
    query: str,
    include_row_id_and_row_version: bool = True,
    convert_to_datetime: bool = False,
    download_location=None,
    quote_character='"',
    escape_character="\\",
    line_end=str(os.linesep),
    separator=",",
    header=True,
    *,
    synapse_client: Optional[Synapse] = None,
    **kwargs,
) -> Union["DATA_FRAME_TYPE", str]:
    """Query for data on a table stored in Synapse. The results will always be
    returned as a Pandas DataFrame unless you specify a `download_location` in which
    case the results will be downloaded to that location. There are a number of
    arguments that you may pass to this function depending on if you are getting
    the results back as a DataFrame or downloading the results to a file.

    Arguments:
        query: The query to run. The query must be valid syntax that Synapse can
            understand. See this document that describes the expected syntax of the
            query:
            <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/web/controller/TableExamples.html>
        include_row_id_and_row_version: If True the `ROW_ID` and `ROW_VERSION`
            columns will be returned in the DataFrame. These columns are required
            if using the query results to update rows in the table. These columns
            are the primary keys used by Synapse to uniquely identify rows in the
            table.
        convert_to_datetime: (DataFrame only) If set to True, will convert all
            Synapse DATE columns from UNIX timestamp integers into UTC datetime
            objects

        download_location: (CSV Only) If set to a path the results will be
            downloaded to that directory. The results will be downloaded as a CSV
            file. A path to the downloaded file will be returned instead of a
            DataFrame.

        quote_character: (CSV Only) The character to use to quote fields. The
            default is a double quote.

        escape_character: (CSV Only) The character to use to escape special
            characters. The default is a backslash.

        line_end: (CSV Only) The character to use to end a line. The default is
            the system's line separator.

        separator: (CSV Only) The character to use to separate fields. The default
            is a comma.

        header: (CSV Only) If set to True the first row will be used as the header
            row. The default is True.

        **kwargs: (DataFrame only) Additional keyword arguments to pass to
            pandas.read_csv. See
            <https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html>
            for complete list of supported arguments. This is exposed as
            internally the query downloads a CSV from Synapse and then loads
            it into a dataframe.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        The results of the query as a Pandas DataFrame or a path to the downloaded
        query results if `download_location` is set.

    Example: Querying for data
        This example shows how you may query for data in a table and print out the
        results.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import query_async

        syn = Synapse()
        syn.login()

        async def main():
            results = await query_async(query="SELECT * FROM syn1234")
            print(results)

        asyncio.run(main())
        ```
    """

    client = Synapse.get_client(synapse_client=synapse_client)

    if client.logger.isEnabledFor(logging.DEBUG):
        client.logger.debug(f"Running query: {query}")

    # TODO: Implementation should not download CSV to disk, instead the ideal
    # solution will load the result into BytesIO and then pass that to
    # pandas.read_csv. During implmentation a determination on how large of a CSV
    # that can be loaded from Memory will be needed. When that limit is reached we
    # should continue to force the download of those results to disk.
    result, csv_path = await _table_query(
        query=query,
        include_row_id_and_row_version=include_row_id_and_row_version,
        quote_char=quote_character,
        escape_char=escape_character,
        line_end=line_end,
        separator=separator,
        header=header,
        download_location=download_location,
    )

    if download_location:
        return csv_path

    date_columns = []
    list_columns = []
    dtype = {}

    if result.headers is not None:
        for column in result.headers:
            if column.column_type == "STRING":
                # we want to identify string columns so that pandas doesn't try to
                # automatically parse strings in a string column to other data types
                dtype[column.name] = str
            elif column.column_type in LIST_COLUMN_TYPES:
                list_columns.append(column.name)
            elif column.column_type == "DATE" and convert_to_datetime:
                date_columns.append(column.name)

    return csv_to_pandas_df(
        filepath=csv_path,
        separator=separator or DEFAULT_SEPARATOR,
        quote_char=quote_character or DEFAULT_QUOTE_CHARACTER,
        escape_char=escape_character or DEFAULT_ESCAPSE_CHAR,
        row_id_and_version_in_index=False,
        date_columns=date_columns if date_columns else None,
        list_columns=list_columns if list_columns else None,
        **kwargs,
    )

query_part_mask_async async staticmethod

query_part_mask_async(query: str, part_mask: int, *, synapse_client: Optional[Synapse] = None, **kwargs) -> QueryResultOutput

Query for data on a table stored in Synapse. This is a more advanced use case of the query function that allows you to determine what addiitional metadata about the table or query should also be returned. If you do not need this additional information then you are better off using the query function.

The query for this method uses this Rest API: https://rest-docs.synapse.org/rest/POST/entity/id/table/query/async/start.html

PARAMETER DESCRIPTION
query

The query to run. The query must be valid syntax that Synapse can understand. See this document that describes the expected syntax of the query: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/web/controller/TableExamples.html

TYPE: str

part_mask

The bitwise OR of the part mask values you want to return in the results. The following list of part masks are implemented to be returned in the results:

  • Query Results (queryResults) = 0x1
  • Query Count (queryCount) = 0x2
  • The sum of the file sizes (sumFileSizesBytes) = 0x40
  • The last updated on date of the table (lastUpdatedOn) = 0x80

TYPE: int

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
QueryResultOutput

The results of the query as a QueryResultOutput object.

Querying for data with a part mask

This example shows how to use the bitwise OR of Python to combine the part mask values and then use that to query for data in a table and print out the results.

In this case we are getting the results of the query, the count of rows, and the last updated on date of the table.

import asyncio
from synapseclient import Synapse
from synapseclient.models import query_part_mask_async

syn = Synapse()
syn.login()

QUERY_RESULTS = 0x1
QUERY_COUNT = 0x2
LAST_UPDATED_ON = 0x80

# Combine the part mask values using bitwise OR
part_mask = QUERY_RESULTS | QUERY_COUNT | LAST_UPDATED_ON


async def main():
    result = await query_part_mask_async(query="SELECT * FROM syn1234", part_mask=part_mask)
    print(result)

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
@staticmethod
async def query_part_mask_async(
    query: str,
    part_mask: int,
    *,
    synapse_client: Optional[Synapse] = None,
    **kwargs,
) -> "QueryResultOutput":
    """Query for data on a table stored in Synapse. This is a more advanced use case
    of the `query` function that allows you to determine what addiitional metadata
    about the table or query should also be returned. If you do not need this
    additional information then you are better off using the `query` function.

    The query for this method uses this Rest API:
    <https://rest-docs.synapse.org/rest/POST/entity/id/table/query/async/start.html>

    Arguments:
        query: The query to run. The query must be valid syntax that Synapse can
            understand. See this document that describes the expected syntax of the
            query:
            <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/web/controller/TableExamples.html>
        part_mask: The bitwise OR of the part mask values you want to return in the
            results. The following list of part masks are implemented to be returned
            in the results:

            - Query Results (queryResults) = 0x1
            - Query Count (queryCount) = 0x2
            - The sum of the file sizes (sumFileSizesBytes) = 0x40
            - The last updated on date of the table (lastUpdatedOn) = 0x80

        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        The results of the query as a QueryResultOutput object.

    Example: Querying for data with a part mask
        This example shows how to use the bitwise `OR` of Python to combine the
        part mask values and then use that to query for data in a table and print
        out the results.

        In this case we are getting the results of the query, the count of rows, and
        the last updated on date of the table.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import query_part_mask_async

        syn = Synapse()
        syn.login()

        QUERY_RESULTS = 0x1
        QUERY_COUNT = 0x2
        LAST_UPDATED_ON = 0x80

        # Combine the part mask values using bitwise OR
        part_mask = QUERY_RESULTS | QUERY_COUNT | LAST_UPDATED_ON


        async def main():
            result = await query_part_mask_async(query="SELECT * FROM syn1234", part_mask=part_mask)
            print(result)

        asyncio.run(main())
        ```
    """
    loop = asyncio.get_event_loop()

    client = Synapse.get_client(synapse_client=synapse_client)
    client.logger.info(f"Running query: {query}")
    limit = kwargs.get("limit", None)
    offset = kwargs.get("offset", None)

    results = await _table_query(
        query=query,
        results_as="rowset",
        part_mask=part_mask,
        limit=limit,
        offset=offset,
    )

    as_df = await loop.run_in_executor(
        None,
        lambda: _rowset_to_pandas_df(
            query_result_bundle=results,
            synapse=client,
            row_id_and_version_in_index=False,
        ),
    )
    return QueryResultOutput.fill_from_dict(
        result=as_df,
        data={
            "count": results.query_count,
            "last_updated_on": results.last_updated_on,
            "sum_file_sizes": results.sum_file_sizes,
        },
    )

store_rows_async async

store_rows_async(values: Union[str, Dict[str, Any], DATA_FRAME_TYPE], schema_storage_strategy: SchemaStorageStrategy = None, column_expansion_strategy: ColumnExpansionStrategy = None, dry_run: bool = False, additional_changes: List[Union[TableSchemaChangeRequest, UploadToTableRequest, AppendableRowSetRequest]] = None, *, insert_size_bytes: int = 900 * MB, csv_table_descriptor: Optional[CsvTableDescriptor] = None, read_csv_kwargs: Optional[Dict[str, Any]] = None, to_csv_kwargs: Optional[Dict[str, Any]] = None, job_timeout: int = 600, synapse_client: Optional[Synapse] = None) -> None

Add or update rows in Synapse from the sources defined below. In most cases the result of this function call will append rows to the table. In the case of an update this method works on a full row replacement. What this means is that you may not do a partial update of a row. If you want to update a row you must pass in all the data for that row, or the data for the columns not provided will be set to null.

If you'd like to update a row see the example Updating rows in a table below.

If you'd like to perform an upsert or partial update of a row you may use the .upsert_rows() method. See that method for more information.

Note the following behavior for the order of columns:

  • If a column is added via the add_column method it will be added at the index you specify, or at the end of the columns list.
  • If column(s) are added during the contruction of your Table instance, ie. Table(columns=[Column(name="foo")]), they will be added at the begining of the columns list.
  • If you use the store_rows method and the schema_storage_strategy is set to INFER_FROM_DATA the columns will be added at the end of the columns list.

Limitations:

  • Synapse limits the number of rows that may be stored in a single request to a CSV file that is 1GB. If you are storing a CSV file that is larger than this limit the data will be chunked into smaller requests. This process is done by reading the file once to determine what the row and byte boundries are and calculating the MD5 hash of that portion, then reading the file again to send the data to Synapse. This process is done to ensure that the data is not corrupted during the upload process, in addition Synapse requires the MD5 hash of the data to be sent in the request along with the number of bytes that are being sent.
  • The limit of 1GB is also enforced when storing a dictionary or a DataFrame. The data will be converted to a CSV format using the .to_csv() pandas function. If you are storing more than a 1GB file it is recommended that you store the data as a CSV and use the file path to upload the data. This is due to the fact that the DataFrame chunking process is slower than reading portions of a file on disk and calculating the MD5 hash of that portion.

The following is a Sequence Daigram that describes the process noted in the limitation above. It shows how the data is chunked into smaller requests when the data exceeds the limit of 1GB, and how portions of the data are read from the CSV file on disk while being uploaded to Synapse.

sequenceDiagram
    participant User
    participant Table
    participant FileSystem
    participant Synapse

    User->>Table: store_rows(values)

    alt CSV size > 1GB
        Table->>Synapse: Apply schema changes before uploading
        note over Table, FileSystem: Read CSV twice
        Table->>FileSystem: Read entire CSV (First Pass)
        FileSystem-->>Table: Compute chunk sizes & MD5 hashes

        loop Read and Upload CSV chunks (Second Pass)
            Table->>FileSystem: Read next chunk from CSV
            FileSystem-->>Table: Return bytes
            Table->>Synapse: Upload CSV chunk
            Synapse-->>Table: Return `file_handle_id`
            Table->>Synapse: Send 'TableUpdateTransaction' to append/update rows
            Synapse-->>Table: Transaction result
        end
    else
        Table->>Synapse: Upload CSV without splitting & Any additional schema changes
        Synapse-->>Table: Return `file_handle_id`
        Table->>Synapse: Send `TableUpdateTransaction' to append/update rows
        Synapse-->>Table: Transaction result
    end

    Table-->>User: Upload complete

The following is a Sequence Daigram that describes the process noted in the limitation above for DataFrames. It shows how the data is chunked into smaller requests when the data exceeds the limit of 1GB, and how portions of the data are read from the DataFrame while being uploaded to Synapse.

sequenceDiagram
    participant User
    participant Table
    participant MemoryBuffer
    participant Synapse

    User->>Table: store_rows(DataFrame)

    loop For all rows in DataFrame in 100 row increments
        Table->>MemoryBuffer: Convert DataFrame rows to CSV in-memory
        MemoryBuffer-->>Table: Compute chunk sizes & MD5 hashes
    end


    alt Multiple chunks detected
        Table->>Synapse: Apply schema changes before uploading
    end

    loop For all chunks found in first loop
        loop for all parts in chunk byte boundry
            Table->>MemoryBuffer: Read small (< 8MB) part of the chunk
            MemoryBuffer-->>Table: Return bytes (with correct offset)
            Table->>Synapse: Upload part
            Synapse-->>Table: Upload response
        end
        Table->>Synapse: Complete upload
        Synapse-->>Table: Return `file_handle_id`
        Table->>Synapse: Send 'TableUpdateTransaction' to append/update rows
        Synapse-->>Table: Transaction result
    end

    Table-->>User: Upload complete
PARAMETER DESCRIPTION
values

Supports storing data from the following sources:

  • A string holding the path to a CSV file. If the schema_storage_strategy is set to None the data will be uploaded as is. If schema_storage_strategy is set to INFER_FROM_DATA the data will be read into a Pandas DataFrame. The code makes assumptions about the format of the columns in the CSV as detailed in the csv_to_pandas_df function. You may pass in additional arguments to the csv_to_pandas_df function by passing them in as keyword arguments to this function.
  • A dictionary where the key is the column name and the value is one or more values. The values will be wrapped into a Pandas DataFrame. You may pass in additional arguments to the pd.DataFrame function by passing them in as keyword arguments to this function. Read about the available arguments in the Pandas DataFrame documentation.
  • A Pandas DataFrame

TYPE: Union[str, Dict[str, Any], DATA_FRAME_TYPE]

schema_storage_strategy

Determines how to automate the creation of columns based on the data that is being stored. If you want to have full control over the schema you may set this to None and create the columns manually.

The limitation with this behavior is that the columns created may only be of the following types:

  • STRING
  • LARGETEXT
  • INTEGER
  • DOUBLE
  • BOOLEAN
  • DATE

The determination is based on how this pandas function infers the data type: infer_dtype

This may also only set the name, column_type, and maximum_size of the column when the column is created. If this is used to update the column the maxium_size will only be updated depending on the value of column_expansion_strategy. The other attributes of the column will be set to the default values on create, or remain the same if the column already exists.

The usage of this feature will never delete a column, shrink a column, or change the type of a column that already exists. If you need to change any of these attributes you must do so after getting the table via a .get() call, updating the columns as needed, then calling .store() on the table.

TYPE: SchemaStorageStrategy DEFAULT: None

column_expansion_strategy

Determines how to automate the expansion of columns based on the data that is being stored. The options given allow cells with a limit on the length of content (Such as strings) to be expanded to a larger size if the data being stored exceeds the limit. If you want to have full control over the schema you may set this to None and create the columns manually. String type columns are the only ones that support this feature.

TYPE: ColumnExpansionStrategy DEFAULT: None

dry_run

Log the actions that would be taken, but do not actually perform the actions. This will not print out the data that would be stored or modified as a result of this action. It will print out the actions that would be taken, such as creating a new column, updating a column, or updating table metadata. This is useful for debugging and understanding what actions would be taken without actually performing them.

TYPE: bool DEFAULT: False

additional_changes

Additional changes to the table that should execute within the same transaction as appending or updating rows. This is used as a part of the upsert_rows method call to allow for the updating of rows and the updating of the table schema in the same transaction. In most cases you will not need to use this argument.

TYPE: List[Union[TableSchemaChangeRequest, UploadToTableRequest, AppendableRowSetRequest]] DEFAULT: None

insert_size_bytes

The maximum size of data that will be stored to Synapse within a single transaction. The API have a limit of 1GB, but the default is set to 900 MB to allow for some overhead in the request. The implication of this limit is that when you are storing a CSV that is larger than this limit the data will be chunked into smaller requests by reading the file once to determine what the row and byte boundries are and calculating the MD5 hash of that portion, then reading the file again to send the data to Synapse. This process is done to ensure that the data is not corrupted during the upload process, in addition Synapse requires the MD5 hash of the data to be sent in the request along with the number of bytes that are being sent. This argument is also used when storing a dictionary or a DataFrame. The data will be converted to a CSV format using the .to_csv() pandas function. When storing data as a DataFrame the minimum that it will be chunked to is 100 rows of data, regardless of if the data is larger than the limit.

TYPE: int DEFAULT: 900 * MB

csv_table_descriptor

When passing in a CSV file this will allow you to specify the format of the CSV file. This is only used when the values argument is a string holding the path to a CSV file. See CsvTableDescriptor for more information.

TYPE: Optional[CsvTableDescriptor] DEFAULT: None

read_csv_kwargs

Additional arguments to pass to the pd.read_csv function when reading in a CSV file. This is only used when the values argument is a string holding the path to a CSV file and you have set the schema_storage_strategy to INFER_FROM_DATA. See https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html for complete list of supported arguments.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

to_csv_kwargs

Additional arguments to pass to the pd.DataFrame.to_csv function when writing the data to a CSV file. This is only used when the values argument is a Pandas DataFrame. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html for complete list of supported arguments.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

job_timeout

The maximum amount of time to wait for a job to complete. This is used when inserting, and updating rows of data. Each individual request to Synapse will be sent as an independent job. If the timeout is reached a SynapseTimeoutError will be raised. The default is 600 seconds

TYPE: int DEFAULT: 600

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
None

None

Inserting rows into a table that already has columns

This example shows how you may insert rows into a table.

Suppose we have a table with the following columns:

col1 col2 col3

The following code will insert rows into the table:

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table # Also works with `Dataset`

syn = Synapse()
syn.login()

async def main():
    data_to_insert = {
        'col1': ['A', 'B', 'C'],
        'col2': [1, 2, 3],
        'col3': [1, 2, 3],
    }

    await Table(id="syn1234").store_rows_async(values=data_to_insert)

asyncio.run(main())

The resulting table will look like this:

col1 col2 col3
A 1 1
B 2 2
C 3 3
Inserting rows into a table that does not have columns

This example shows how you may insert rows into a table that does not have columns. The columns will be inferred from the data that is being stored.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table, SchemaStorageStrategy # Also works with `Dataset`

syn = Synapse()
syn.login()

async def main():
    data_to_insert = {
        'col1': ['A', 'B', 'C'],
        'col2': [1, 2, 3],
        'col3': [1, 2, 3],
    }

    await Table(id="syn1234").store_rows_async(
        values=data_to_insert,
        schema_storage_strategy=SchemaStorageStrategy.INFER_FROM_DATA
    )

asyncio.run(main())

The resulting table will look like this:

col1 col2 col3
A 1 1
B 2 2
C 3 3
Using the dry_run option with a SchemaStorageStrategy of INFER_FROM_DATA

This example shows how you may use the dry_run option with the SchemaStorageStrategy set to INFER_FROM_DATA. This will show you the actions that would be taken, but not actually perform the actions.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table, SchemaStorageStrategy # Also works with `Dataset`

syn = Synapse()
syn.login()

async def main():
    data_to_insert = {
        'col1': ['A', 'B', 'C'],
        'col2': [1, 2, 3],
        'col3': [1, 2, 3],
    }

    await Table(id="syn1234").store_rows_async(
        values=data_to_insert,
        dry_run=True,
        schema_storage_strategy=SchemaStorageStrategy.INFER_FROM_DATA
    )

asyncio.run(main())

The result of running this action will print to the console the actions that would be taken, but not actually perform the actions.

Updating rows in a table

This example shows how you may query for data in a table, update the data, and then store the updated rows back in Synapse.

Suppose we have a table that has the following data:

col1 col2 col3
A 1 1
B 2 2
C 3 3

Behind the scenese the tables also has ROW_ID and ROW_VERSION columns which are used to identify the row that is being updated. These columns are not shown in the table above, but is included in the data that is returned when querying the table. If you add data that does not have these columns the data will be treated as new rows to be inserted.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table, query_async # Also works with `Dataset`

syn = Synapse()
syn.login()

async def main():
    query_results = await query_async(query="select * from syn1234 where col1 in ('A', 'B')")

    # Update `col2` of the row where `col1` is `A` to `22`
    query_results.loc[query_results['col1'] == 'A', 'col2'] = 22

    # Update `col3` of the row where `col1` is `B` to `33`
    query_results.loc[query_results['col1'] == 'B', 'col3'] = 33

    await Table(id="syn1234").store_rows_async(values=query_results)

asyncio.run(main())

The resulting table will look like this:

col1 col2 col3
A 22 1
B 2 33
C 3 3
Source code in synapseclient/models/mixins/table_components.py
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
async def store_rows_async(
    self,
    values: Union[str, Dict[str, Any], DATA_FRAME_TYPE],
    schema_storage_strategy: SchemaStorageStrategy = None,
    column_expansion_strategy: ColumnExpansionStrategy = None,
    dry_run: bool = False,
    additional_changes: List[
        Union[
            "TableSchemaChangeRequest",
            "UploadToTableRequest",
            "AppendableRowSetRequest",
        ]
    ] = None,
    *,
    insert_size_bytes: int = 900 * MB,
    csv_table_descriptor: Optional[CsvTableDescriptor] = None,
    read_csv_kwargs: Optional[Dict[str, Any]] = None,
    to_csv_kwargs: Optional[Dict[str, Any]] = None,
    job_timeout: int = 600,
    synapse_client: Optional[Synapse] = None,
) -> None:
    """
    Add or update rows in Synapse from the sources defined below. In most cases the
    result of this function call will append rows to the table. In the case of an
    update this method works on a full row replacement. What this means is
    that you may not do a partial update of a row. If you want to update a row
    you must pass in all the data for that row, or the data for the columns not
    provided will be set to null.

    If you'd like to update a row see the example `Updating rows in a table` below.

    If you'd like to perform an `upsert` or partial update of a row you may use
    the `.upsert_rows()` method. See that method for more information.


    Note the following behavior for the order of columns:

    - If a column is added via the `add_column` method it will be added at the
        index you specify, or at the end of the columns list.
    - If column(s) are added during the contruction of your `Table` instance, ie.
        `Table(columns=[Column(name="foo")])`, they will be added at the begining
        of the columns list.
    - If you use the `store_rows` method and the `schema_storage_strategy` is set to
        `INFER_FROM_DATA` the columns will be added at the end of the columns list.


    **Limitations:**

    - Synapse limits the number of rows that may be stored in a single request to
        a CSV file that is 1GB. If you are storing a CSV file that is larger than
        this limit the data will be chunked into smaller requests. This process is
        done by reading the file once to determine what the row and byte boundries
        are and calculating the MD5 hash of that portion, then reading the file
        again to send the data to Synapse. This process is done to ensure that the
        data is not corrupted during the upload process, in addition Synapse
        requires the MD5 hash of the data to be sent in the request along with the
        number of bytes that are being sent.
    - The limit of 1GB is also enforced when storing a dictionary or a DataFrame.
        The data will be converted to a CSV format using the `.to_csv()` pandas
        function. If you are storing more than a 1GB file it is recommended that
        you store the data as a CSV and use the file path to upload the data. This
        is due to the fact that the DataFrame chunking process is slower than
        reading portions of a file on disk and calculating the MD5 hash of that
        portion.

    The following is a Sequence Daigram that describes the process noted in the
    limitation above. It shows how the data is chunked into smaller requests when
    the data exceeds the limit of 1GB, and how portions of the data are read from
    the CSV file on disk while being uploaded to Synapse.

    ```mermaid
    sequenceDiagram
        participant User
        participant Table
        participant FileSystem
        participant Synapse

        User->>Table: store_rows(values)

        alt CSV size > 1GB
            Table->>Synapse: Apply schema changes before uploading
            note over Table, FileSystem: Read CSV twice
            Table->>FileSystem: Read entire CSV (First Pass)
            FileSystem-->>Table: Compute chunk sizes & MD5 hashes

            loop Read and Upload CSV chunks (Second Pass)
                Table->>FileSystem: Read next chunk from CSV
                FileSystem-->>Table: Return bytes
                Table->>Synapse: Upload CSV chunk
                Synapse-->>Table: Return `file_handle_id`
                Table->>Synapse: Send 'TableUpdateTransaction' to append/update rows
                Synapse-->>Table: Transaction result
            end
        else
            Table->>Synapse: Upload CSV without splitting & Any additional schema changes
            Synapse-->>Table: Return `file_handle_id`
            Table->>Synapse: Send `TableUpdateTransaction' to append/update rows
            Synapse-->>Table: Transaction result
        end

        Table-->>User: Upload complete
    ```

    The following is a Sequence Daigram that describes the process noted in the
    limitation above for DataFrames. It shows how the data is chunked into smaller
    requests when the data exceeds the limit of 1GB, and how portions of the data
    are read from the DataFrame while being uploaded to Synapse.

    ```mermaid
    sequenceDiagram
        participant User
        participant Table
        participant MemoryBuffer
        participant Synapse

        User->>Table: store_rows(DataFrame)

        loop For all rows in DataFrame in 100 row increments
            Table->>MemoryBuffer: Convert DataFrame rows to CSV in-memory
            MemoryBuffer-->>Table: Compute chunk sizes & MD5 hashes
        end


        alt Multiple chunks detected
            Table->>Synapse: Apply schema changes before uploading
        end

        loop For all chunks found in first loop
            loop for all parts in chunk byte boundry
                Table->>MemoryBuffer: Read small (< 8MB) part of the chunk
                MemoryBuffer-->>Table: Return bytes (with correct offset)
                Table->>Synapse: Upload part
                Synapse-->>Table: Upload response
            end
            Table->>Synapse: Complete upload
            Synapse-->>Table: Return `file_handle_id`
            Table->>Synapse: Send 'TableUpdateTransaction' to append/update rows
            Synapse-->>Table: Transaction result
        end

        Table-->>User: Upload complete
    ```

    Arguments:
        values: Supports storing data from the following sources:

            - A string holding the path to a CSV file. If the `schema_storage_strategy` is set to `None` the data will be uploaded as is. If `schema_storage_strategy` is set to `INFER_FROM_DATA` the data will be read into a [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe). The code makes assumptions about the format of the columns in the CSV as detailed in the [csv_to_pandas_df][synapseclient.models.mixins.table_components.csv_to_pandas_df] function. You may pass in additional arguments to the `csv_to_pandas_df` function by passing them in as keyword arguments to this function.
            - A dictionary where the key is the column name and the value is one or more values. The values will be wrapped into a [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe). You may pass in additional arguments to the `pd.DataFrame` function by passing them in as keyword arguments to this function. Read about the available arguments in the [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) documentation.
            - A [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe)

        schema_storage_strategy: Determines how to automate the creation of columns
            based on the data that is being stored. If you want to have full
            control over the schema you may set this to `None` and create
            the columns manually.

            The limitation with this behavior is that the columns created may only
            be of the following types:

            - STRING
            - LARGETEXT
            - INTEGER
            - DOUBLE
            - BOOLEAN
            - DATE

            The determination is based on how this pandas function infers the
            data type: [infer_dtype](https://pandas.pydata.org/docs/reference/api/pandas.api.types.infer_dtype.html)

            This may also only set the `name`, `column_type`, and `maximum_size` of
            the column when the column is created. If this is used to update the
            column the `maxium_size` will only be updated depending on the
            value of `column_expansion_strategy`. The other attributes of the
            column will be set to the default values on create, or remain the same
            if the column already exists.


            The usage of this feature will never delete a column, shrink a column,
            or change the type of a column that already exists. If you need to
            change any of these attributes you must do so after getting the table
            via a `.get()` call, updating the columns as needed, then calling
            `.store()` on the table.

        column_expansion_strategy: Determines how to automate the expansion of
            columns based on the data that is being stored. The options given allow
            cells with a limit on the length of content (Such as strings) to be
            expanded to a larger size if the data being stored exceeds the limit.
            If you want to have full control over the schema you may set this to
            `None` and create the columns manually. String type columns are the only
            ones that support this feature.

        dry_run: Log the actions that would be taken, but do not actually perform
            the actions. This will not print out the data that would be stored or
            modified as a result of this action. It will print out the actions that
            would be taken, such as creating a new column, updating a column, or
            updating table metadata. This is useful for debugging and understanding
            what actions would be taken without actually performing them.

        additional_changes: Additional changes to the table that should execute
            within the same transaction as appending or updating rows. This is used
            as a part of the `upsert_rows` method call to allow for the updating of
            rows and the updating of the table schema in the same transaction. In
            most cases you will not need to use this argument.

        insert_size_bytes: The maximum size of data that will be stored to Synapse
            within a single transaction. The API have a limit of 1GB, but the
            default is set to 900 MB to allow for some overhead in the request. The
            implication of this limit is that when you are storing a CSV that is
            larger than this limit the data will be chunked into smaller requests
            by reading the file once to determine what the row and byte boundries
            are and calculating the MD5 hash of that portion, then reading the file
            again to send the data to Synapse. This process is done to ensure that
            the data is not corrupted during the upload process, in addition Synapse
            requires the MD5 hash of the data to be sent in the request along with
            the number of bytes that are being sent. This argument is also used
            when storing a dictionary or a DataFrame. The data will be converted to
            a CSV format using the `.to_csv()` pandas function. When storing data
            as a DataFrame the minimum that it will be chunked to is 100 rows of
            data, regardless of if the data is larger than the limit.

        csv_table_descriptor: When passing in a CSV file this will allow you to
            specify the format of the CSV file. This is only used when the `values`
            argument is a string holding the path to a CSV file. See
            [CsvTableDescriptor][synapseclient.models.CsvTableDescriptor]
            for more information.

        read_csv_kwargs: Additional arguments to pass to the `pd.read_csv` function
            when reading in a CSV file. This is only used when the `values` argument
            is a string holding the path to a CSV file and you have set the
            `schema_storage_strategy` to `INFER_FROM_DATA`. See
            <https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html>
            for complete list of supported arguments.

        to_csv_kwargs: Additional arguments to pass to the `pd.DataFrame.to_csv`
            function when writing the data to a CSV file. This is only used when
            the `values` argument is a Pandas DataFrame. See
            <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html>
            for complete list of supported arguments.

        job_timeout: The maximum amount of time to wait for a job to complete.
            This is used when inserting, and updating rows of data. Each individual
            request to Synapse will be sent as an independent job. If the timeout
            is reached a `SynapseTimeoutError` will be raised.
            The default is 600 seconds

        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        None

    Example: Inserting rows into a table that already has columns
        This example shows how you may insert rows into a table.

        Suppose we have a table with the following columns:

        | col1 | col2 | col3 |
        |------|------| -----|

        The following code will insert rows into the table:

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table # Also works with `Dataset`

        syn = Synapse()
        syn.login()

        async def main():
            data_to_insert = {
                'col1': ['A', 'B', 'C'],
                'col2': [1, 2, 3],
                'col3': [1, 2, 3],
            }

            await Table(id="syn1234").store_rows_async(values=data_to_insert)

        asyncio.run(main())
        ```

        The resulting table will look like this:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 1    | 1    |
        | B    | 2    | 2    |
        | C    | 3    | 3    |

    Example: Inserting rows into a table that does not have columns
        This example shows how you may insert rows into a table that does not have
        columns. The columns will be inferred from the data that is being stored.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table, SchemaStorageStrategy # Also works with `Dataset`

        syn = Synapse()
        syn.login()

        async def main():
            data_to_insert = {
                'col1': ['A', 'B', 'C'],
                'col2': [1, 2, 3],
                'col3': [1, 2, 3],
            }

            await Table(id="syn1234").store_rows_async(
                values=data_to_insert,
                schema_storage_strategy=SchemaStorageStrategy.INFER_FROM_DATA
            )

        asyncio.run(main())
        ```

        The resulting table will look like this:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 1    | 1    |
        | B    | 2    | 2    |
        | C    | 3    | 3    |

    Example: Using the dry_run option with a SchemaStorageStrategy of INFER_FROM_DATA
        This example shows how you may use the `dry_run` option with the
        `SchemaStorageStrategy` set to `INFER_FROM_DATA`. This will show you the
        actions that would be taken, but not actually perform the actions.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table, SchemaStorageStrategy # Also works with `Dataset`

        syn = Synapse()
        syn.login()

        async def main():
            data_to_insert = {
                'col1': ['A', 'B', 'C'],
                'col2': [1, 2, 3],
                'col3': [1, 2, 3],
            }

            await Table(id="syn1234").store_rows_async(
                values=data_to_insert,
                dry_run=True,
                schema_storage_strategy=SchemaStorageStrategy.INFER_FROM_DATA
            )

        asyncio.run(main())
        ```

        The result of running this action will print to the console the actions that
        would be taken, but not actually perform the actions.

    Example: Updating rows in a table
        This example shows how you may query for data in a table, update the data,
        and then store the updated rows back in Synapse.

        Suppose we have a table that has the following data:


        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 1    | 1    |
        | B    | 2    | 2    |
        | C    | 3    | 3    |

        Behind the scenese the tables also has `ROW_ID` and `ROW_VERSION` columns
        which are used to identify the row that is being updated. These columns
        are not shown in the table above, but is included in the data that is
        returned when querying the table. If you add data that does not have these
        columns the data will be treated as new rows to be inserted.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table, query_async # Also works with `Dataset`

        syn = Synapse()
        syn.login()

        async def main():
            query_results = await query_async(query="select * from syn1234 where col1 in ('A', 'B')")

            # Update `col2` of the row where `col1` is `A` to `22`
            query_results.loc[query_results['col1'] == 'A', 'col2'] = 22

            # Update `col3` of the row where `col1` is `B` to `33`
            query_results.loc[query_results['col1'] == 'B', 'col3'] = 33

            await Table(id="syn1234").store_rows_async(values=query_results)

        asyncio.run(main())
        ```

        The resulting table will look like this:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 22   | 1    |
        | B    | 2    | 33   |
        | C    | 3    | 3    |

    """
    test_import_pandas()
    from pandas import DataFrame

    original_values = values
    if isinstance(values, dict):
        values = DataFrame(values)
    elif (
        isinstance(values, str)
        and schema_storage_strategy == SchemaStorageStrategy.INFER_FROM_DATA
    ):
        values = csv_to_pandas_df(filepath=values, **(read_csv_kwargs or {}))
    elif isinstance(values, DataFrame) or isinstance(values, str):
        # We don't need to convert a DF, and CSVs will be uploaded as is
        pass
    else:
        raise ValueError(
            "Don't know how to make tables from values of type %s." % type(values)
        )

    client = Synapse.get_client(synapse_client=synapse_client)

    if (
        (not self._last_persistent_instance)
        and (
            existing_id := await get_id(
                entity=self, synapse_client=synapse_client, failure_strategy=None
            )
        )
        and (
            existing_entity := await self.__class__(id=existing_id).get_async(
                include_columns=True, synapse_client=synapse_client
            )
        )
    ):
        merge_dataclass_entities(
            source=existing_entity,
            destination=self,
        )

    if dry_run:
        client.logger.info(
            f"[{self.id}:{self.name}]: Dry run enabled. No changes will be made."
        )

    schema_change_request = None

    if schema_storage_strategy == SchemaStorageStrategy.INFER_FROM_DATA:
        self._infer_columns_from_data(
            values=values, column_expansion_strategy=column_expansion_strategy
        )

        schema_change_request = await self._generate_schema_change_request(
            dry_run=dry_run, synapse_client=synapse_client
        )

    if dry_run:
        return

    if not self.id:
        raise ValueError(
            "The table must have an ID to store rows, or the table could not be found from the given name/parent_id."
        )

    if isinstance(original_values, str):
        with logging_redirect_tqdm(loggers=[client.logger]):
            await self._chunk_and_upload_csv(
                path_to_csv=original_values,
                insert_size_bytes=insert_size_bytes,
                csv_table_descriptor=csv_table_descriptor,
                schema_change_request=schema_change_request,
                client=client,
                additional_changes=additional_changes,
                job_timeout=job_timeout,
            )
    elif isinstance(values, DataFrame):
        with logging_redirect_tqdm(loggers=[client.logger]):
            await self._chunk_and_upload_df(
                df=values,
                insert_size_bytes=insert_size_bytes,
                csv_table_descriptor=csv_table_descriptor,
                schema_change_request=schema_change_request,
                client=client,
                additional_changes=additional_changes,
                job_timeout=job_timeout,
                to_csv_kwargs=to_csv_kwargs,
            )

    else:
        raise ValueError(
            "Don't know how to make tables from values of type %s." % type(values)
        )

upsert_rows_async async

upsert_rows_async(values: Union[str, Dict[str, Any], DATA_FRAME_TYPE], primary_keys: List[str], dry_run: bool = False, *, rows_per_query: int = 50000, update_size_bytes: int = 1.9 * MB, insert_size_bytes: int = 900 * MB, job_timeout: int = 600, synapse_client: Optional[Synapse] = None, **kwargs) -> None

This method allows you to perform an upsert (Update and Insert) for row(s). This means that you may update a row with only the data that you want to change. When supplied with a row that does not match the given primary_keys a new row will be inserted.

Using the primary_keys argument you may specify which columns to use to determine if a row already exists. If a row exists with the same values in the columns specified in this list the row will be updated. If a row does not exist it will be inserted.

Limitations:

  • The request to update, and the request to insert data does not occur in a single transaction. This means that the update of data may succeed, but the insert of data may fail. Additionally, as noted in the limitation below, if data is chunked up into multiple requests you may find that a portion of your data is updated, but another portion is not.
  • The number of rows that may be upserted in a single call should be kept to a minimum (< 50,000). There is significant overhead in the request to Synapse for each row that is upserted. If you are upserting a large number of rows a better approach may be to query for the data you want to update, update the data, then use the store_rows_async method to update the data in Synapse. Any rows you want to insert may be added to the DataFrame that is passed to the store_rows_async method.
  • When upserting mnay rows the requests to Synapse will be chunked into smaller requests. The limit is 2MB per request. This chunking will happen automatically and should not be a concern for most users. If you are having issues with the request being too large you may lower the number of rows you are trying to upsert, or note the above limitation.
  • The primary_keys argument must contain at least one column.
  • The primary_keys argument cannot contain columns that are a LIST type.
  • The primary_keys argument cannot contain columns that are a JSON type.
  • The values used as the primary_keys must be unique in the table. If there are multiple rows with the same values in the primary_keys the behavior is that an exception will be raised.
  • The columns used in primary_keys cannot contain updated values. Since the values in these columns are used to determine if a row exists, they cannot be updated in the same transaction.

The following is a Sequence Diagram that describces the upsert process at a high level:

sequenceDiagram
    participant User
    participant Table
    participant Synapse

    User->>Table: upsert_rows()

    loop Query and Process Updates in Chunks (rows_per_query)
        Table->>Synapse: Query existing rows using primary keys
        Synapse-->>Table: Return matching rows
        Note Over Table: Create partial row updates

        loop For results from query
            Note Over Table: Sum row/chunk size
            alt Chunk size exceeds update_size_bytes
                Table->>Synapse: Push update chunk
                Synapse-->>Table: Acknowledge update
            end
            Table->>Table: Add row to chunk
        end

        alt Remaining updates exist
            Table->>Synapse: Push final update chunk
            Synapse-->>Table: Acknowledge update
        end
    end

    alt New rows exist
        Table->>Table: Identify new rows for insertion
        Table->>Table: Call `store_rows()` function
    end

    Table-->>User: Upsert complete
PARAMETER DESCRIPTION
values

Supports storing data from the following sources:

  • A string holding the path to a CSV file. The data will be read into a Pandas DataFrame. The code makes assumptions about the format of the columns in the CSV as detailed in the csv_to_pandas_df function. You may pass in additional arguments to the csv_to_pandas_df function by passing them in as keyword arguments to this function.
  • A dictionary where the key is the column name and the value is one or more values. The values will be wrapped into a Pandas DataFrame. You may pass in additional arguments to the pd.DataFrame function by passing them in as keyword arguments to this function. Read about the available arguments in the Pandas DataFrame documentation.
  • A Pandas DataFrame

TYPE: Union[str, Dict[str, Any], DATA_FRAME_TYPE]

primary_keys

The columns to use to determine if a row already exists. If a row exists with the same values in the columns specified in this list the row will be updated. If a row does not exist it will be inserted.

TYPE: List[str]

dry_run

If set to True the data will not be updated in Synapse. A message will be printed to the console with the number of rows that would have been updated and inserted. If you would like to see the data that would be updated and inserted you may set the dry_run argument to True and set the log level to DEBUG by setting the debug flag when creating your Synapse class instance like: syn = Synapse(debug=True).

TYPE: bool DEFAULT: False

rows_per_query

The number of rows that will be queries from Synapse per request. Since we need to query for the data that is being updated this will determine the number of rows that are queried at a time. The default is 50,000 rows.

TYPE: int DEFAULT: 50000

update_size_bytes

The maximum size of the request that will be sent to Synapse when updating rows of data. The default is 1.9MB.

TYPE: int DEFAULT: 1.9 * MB

insert_size_bytes

The maximum size of the request that will be sent to Synapse when inserting rows of data. The default is 900MB.

TYPE: int DEFAULT: 900 * MB

job_timeout

The maximum amount of time to wait for a job to complete. This is used when inserting, and updating rows of data. Each individual request to Synapse will be sent as an independent job. If the timeout is reached a SynapseTimeoutError will be raised. The default is 600 seconds

TYPE: int DEFAULT: 600

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor

TYPE: Optional[Synapse] DEFAULT: None

**kwargs

Additional arguments that are passed to the pd.DataFrame function when the values argument is a path to a csv file.

DEFAULT: {}

Updating 2 rows and inserting 1 row

In this given example we have a table with the following data:

col1 col2 col3
A 1 1
B 2 2

The following code will update the first row's col2 to 22, update the second row's col3 to 33, and insert a new row:

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table # Also works with `Dataset`
import pandas as pd

syn = Synapse()
syn.login()


async def main():
    table = await Table(id="syn123").get_async(include_columns=True)

    df = pd.DataFrame({
        'col1': ['A', 'B', 'C'],
        'col2': [22, 2, 3],
        'col3': [1, 33, 3],
    })

    await table.upsert_rows_async(values=df, primary_keys=["col1"])

asyncio.run(main())

The resulting table will look like this:

col1 col2 col3
A 22 1
B 2 33
C 3 3
Deleting data from a specific cell

In this given example we have a table with the following data:

col1 col2 col3
A 1 1
B 2 2

The following code will update the first row's col2 to 22, update the second row's col3 to 33, and insert a new row:

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table # Also works with `Dataset`

syn = Synapse()
syn.login()


async def main():
    table = await Table(id="syn123").get_async(include_columns=True)

    dictionary_of_data = {
        'col1': ['A', 'B'],
        'col2': [None, 2],
        'col3': [1, None],
    }

    await table.upsert_rows_async(values=dictionary_of_data, primary_keys=["col1"])

asyncio.run(main())

The resulting table will look like this:

col1 col2 col3
A 1
B 2
Source code in synapseclient/models/mixins/table_components.py
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
async def upsert_rows_async(
    self,
    values: Union[str, Dict[str, Any], DATA_FRAME_TYPE],
    primary_keys: List[str],
    dry_run: bool = False,
    *,
    rows_per_query: int = 50000,
    update_size_bytes: int = 1.9 * MB,
    insert_size_bytes: int = 900 * MB,
    job_timeout: int = 600,
    synapse_client: Optional[Synapse] = None,
    **kwargs,
) -> None:
    """
    This method allows you to perform an `upsert` (Update and Insert) for row(s).
    This means that you may update a row with only the data that you want to change.
    When supplied with a row that does not match the given `primary_keys` a new
    row will be inserted.


    Using the `primary_keys` argument you may specify which columns to use to
    determine if a row already exists. If a row exists with the same values in the
    columns specified in this list the row will be updated. If a row does not exist
    it will be inserted.


    Limitations:

    - The request to update, and the request to insert data does not occur in a
        single transaction. This means that the update of data may succeed, but the
        insert of data may fail. Additionally, as noted in the limitation below, if
        data is chunked up into multiple requests you may find that a portion of
        your data is updated, but another portion is not.
    - The number of rows that may be upserted in a single call should be
        kept to a minimum (< 50,000). There is significant overhead in the request
        to Synapse for each row that is upserted. If you are upserting a large
        number of rows a better approach may be to query for the data you want
        to update, update the data, then use the [store_rows_async][synapseclient.models.mixins.table_components.TableStoreRowMixin.store_rows_async] method to
        update the data in Synapse. Any rows you want to insert may be added
        to the DataFrame that is passed to the [store_rows_async][synapseclient.models.mixins.table_components.TableStoreRowMixin.store_rows_async] method.
    - When upserting mnay rows the requests to Synapse will be chunked into smaller
        requests. The limit is 2MB per request. This chunking will happen
        automatically and should not be a concern for most users. If you are
        having issues with the request being too large you may lower the
        number of rows you are trying to upsert, or note the above limitation.
    - The `primary_keys` argument must contain at least one column.
    - The `primary_keys` argument cannot contain columns that are a LIST type.
    - The `primary_keys` argument cannot contain columns that are a JSON type.
    - The values used as the `primary_keys` must be unique in the table. If there
        are multiple rows with the same values in the `primary_keys` the behavior
        is that an exception will be raised.
    - The columns used in `primary_keys` cannot contain updated values. Since
        the values in these columns are used to determine if a row exists, they
        cannot be updated in the same transaction.

    The following is a Sequence Diagram that describces the upsert process at a
    high level:

    ```mermaid
    sequenceDiagram
        participant User
        participant Table
        participant Synapse

        User->>Table: upsert_rows()

        loop Query and Process Updates in Chunks (rows_per_query)
            Table->>Synapse: Query existing rows using primary keys
            Synapse-->>Table: Return matching rows
            Note Over Table: Create partial row updates

            loop For results from query
                Note Over Table: Sum row/chunk size
                alt Chunk size exceeds update_size_bytes
                    Table->>Synapse: Push update chunk
                    Synapse-->>Table: Acknowledge update
                end
                Table->>Table: Add row to chunk
            end

            alt Remaining updates exist
                Table->>Synapse: Push final update chunk
                Synapse-->>Table: Acknowledge update
            end
        end

        alt New rows exist
            Table->>Table: Identify new rows for insertion
            Table->>Table: Call `store_rows()` function
        end

        Table-->>User: Upsert complete
    ```

    Arguments:
        values: Supports storing data from the following sources:

            - A string holding the path to a CSV file. The data will be read into a
                [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe).
                The code makes assumptions about the format of the columns in the
                CSV as detailed in the [csv_to_pandas_df][synapseclient.models.mixins.table_components.csv_to_pandas_df]
                function. You may pass in additional arguments to the `csv_to_pandas_df`
                function by passing them in as keyword arguments to this function.
            - A dictionary where the key is the column name and the value is one or
                more values. The values will be wrapped into a [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe). You may pass in additional arguments to the `pd.DataFrame` function by passing them in as keyword arguments to this function. Read about the available arguments in the [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) documentation.
            - A [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe)

        primary_keys: The columns to use to determine if a row already exists. If
            a row exists with the same values in the columns specified in this list
            the row will be updated. If a row does not exist it will be inserted.

        dry_run: If set to True the data will not be updated in Synapse. A message
            will be printed to the console with the number of rows that would have
            been updated and inserted. If you would like to see the data that would
            be updated and inserted you may set the `dry_run` argument to True and
            set the log level to DEBUG by setting the debug flag when creating
            your Synapse class instance like: `syn = Synapse(debug=True)`.

        rows_per_query: The number of rows that will be queries from Synapse per
            request. Since we need to query for the data that is being updated
            this will determine the number of rows that are queried at a time.
            The default is 50,000 rows.

        update_size_bytes: The maximum size of the request that will be sent to Synapse
            when updating rows of data. The default is 1.9MB.

        insert_size_bytes: The maximum size of the request that will be sent to Synapse
            when inserting rows of data. The default is 900MB.

        job_timeout: The maximum amount of time to wait for a job to complete.
            This is used when inserting, and updating rows of data. Each individual
            request to Synapse will be sent as an independent job. If the timeout
            is reached a `SynapseTimeoutError` will be raised.
            The default is 600 seconds

        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor

        **kwargs: Additional arguments that are passed to the `pd.DataFrame`
            function when the `values` argument is a path to a csv file.


    Example: Updating 2 rows and inserting 1 row
        In this given example we have a table with the following data:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 1    | 1    |
        | B    | 2    | 2    |

        The following code will update the first row's `col2` to `22`, update the
        second row's `col3` to `33`, and insert a new row:

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table # Also works with `Dataset`
        import pandas as pd

        syn = Synapse()
        syn.login()


        async def main():
            table = await Table(id="syn123").get_async(include_columns=True)

            df = pd.DataFrame({
                'col1': ['A', 'B', 'C'],
                'col2': [22, 2, 3],
                'col3': [1, 33, 3],
            })

            await table.upsert_rows_async(values=df, primary_keys=["col1"])

        asyncio.run(main())
        ```

        The resulting table will look like this:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 22   | 1    |
        | B    | 2    | 33   |
        | C    | 3    | 3    |

    Example: Deleting data from a specific cell
        In this given example we have a table with the following data:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    | 1    | 1    |
        | B    | 2    | 2    |

        The following code will update the first row's `col2` to `22`, update the
        second row's `col3` to `33`, and insert a new row:

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table # Also works with `Dataset`

        syn = Synapse()
        syn.login()


        async def main():
            table = await Table(id="syn123").get_async(include_columns=True)

            dictionary_of_data = {
                'col1': ['A', 'B'],
                'col2': [None, 2],
                'col3': [1, None],
            }

            await table.upsert_rows_async(values=dictionary_of_data, primary_keys=["col1"])

        asyncio.run(main())
        ```


        The resulting table will look like this:

        | col1 | col2 | col3 |
        |------|------| -----|
        | A    |      | 1    |
        | B    | 2    |      |

    """
    return await _upsert_rows_async(
        entity=self,
        values=values,
        primary_keys=primary_keys,
        dry_run=dry_run,
        rows_per_query=rows_per_query,
        update_size_bytes=update_size_bytes,
        insert_size_bytes=insert_size_bytes,
        job_timeout=job_timeout,
        synapse_client=synapse_client,
        **kwargs,
    )

delete_rows_async async

delete_rows_async(query: str, *, job_timeout: int = 600, synapse_client: Optional[Synapse] = None) -> DATA_FRAME_TYPE

Delete rows from a table given a query to select rows. The query at a minimum must select the ROW_ID and ROW_VERSION columns. If you want to inspect the data that will be deleted ahead of time you may use the .query method to get the data.

PARAMETER DESCRIPTION
query

The query to select the rows to delete. The query at a minimum must select the ROW_ID and ROW_VERSION columns. See this document that describes the expected syntax of the query: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/web/controller/TableExamples.html

TYPE: str

job_timeout

The amount of time to wait for table updates to complete before a SynapseTimeoutError is thrown. The default is 600 seconds.

TYPE: int DEFAULT: 600

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
DATA_FRAME_TYPE

The results of your query for the rows that were deleted from the table.

Selecting a row to delete

This example shows how you may select a row to delete from a table.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table # Also works with `Dataset`

syn = Synapse()
syn.login()

async def main():
    await Table(id="syn1234").delete_rows_async(query="SELECT ROW_ID, ROW_VERSION FROM syn1234 WHERE foo = 'asdf'")

asyncio.run(main())
Selecting all rows that contain a null value

This example shows how you may select a row to delete from a table where a column has a null value.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table # Also works with `Dataset`

syn = Synapse()
syn.login()

async def main():
    await Table(id="syn1234").delete_rows_async(query="SELECT ROW_ID, ROW_VERSION FROM syn1234 WHERE foo is null")

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
async def delete_rows_async(
    self,
    query: str,
    *,
    job_timeout: int = 600,
    synapse_client: Optional[Synapse] = None,
) -> DATA_FRAME_TYPE:
    """
    Delete rows from a table given a query to select rows. The query at a
    minimum must select the `ROW_ID` and `ROW_VERSION` columns. If you want to
    inspect the data that will be deleted ahead of time you may use the
    `.query` method to get the data.


    Arguments:
        query: The query to select the rows to delete. The query at a minimum
            must select the `ROW_ID` and `ROW_VERSION` columns. See this document
            that describes the expected syntax of the query:
            <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/web/controller/TableExamples.html>
        job_timeout: The amount of time to wait for table updates to complete
            before a `SynapseTimeoutError` is thrown. The default is 600 seconds.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        The results of your query for the rows that were deleted from the table.

    Example: Selecting a row to delete
        This example shows how you may select a row to delete from a table.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table # Also works with `Dataset`

        syn = Synapse()
        syn.login()

        async def main():
            await Table(id="syn1234").delete_rows_async(query="SELECT ROW_ID, ROW_VERSION FROM syn1234 WHERE foo = 'asdf'")

        asyncio.run(main())
        ```

    Example: Selecting all rows that contain a null value
        This example shows how you may select a row to delete from a table where
        a column has a null value.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table # Also works with `Dataset`

        syn = Synapse()
        syn.login()

        async def main():
            await Table(id="syn1234").delete_rows_async(query="SELECT ROW_ID, ROW_VERSION FROM syn1234 WHERE foo is null")

        asyncio.run(main())
        ```
    """
    client = Synapse.get_client(synapse_client=synapse_client)
    results_from_query = await self.query_async(query=query, synapse_client=client)
    client.logger.info(
        f"Found {len(results_from_query)} rows to delete for given query: {query}"
    )

    if self.__class__.__name__ in CLASSES_THAT_CONTAIN_ROW_ETAG:
        filtered_columns = results_from_query[["ROW_ID", "ROW_VERSION", "ROW_ETAG"]]
    else:
        filtered_columns = results_from_query[["ROW_ID", "ROW_VERSION"]]

    filepath = f"{tempfile.mkdtemp()}/{self.id}_upload_{uuid.uuid4()}.csv"
    try:
        filtered_columns.to_csv(filepath, index=False)
        file_handle_id = await multipart_upload_file_async(
            syn=client, file_path=filepath, content_type="text/csv"
        )
    finally:
        os.remove(filepath)

    upload_request = UploadToTableRequest(
        table_id=self.id, upload_file_handle_id=file_handle_id, update_etag=None
    )

    await TableUpdateTransaction(
        entity_id=self.id, changes=[upload_request]
    ).send_job_and_wait_async(synapse_client=client, timeout=job_timeout)

    return results_from_query

snapshot_async async

snapshot_async(comment: str = None, label: str = None, include_activity: bool = True, associate_activity_to_new_version: bool = True, *, synapse_client: Optional[Synapse] = None) -> Dict[str, Any]

Request to create a new snapshot of a table. The provided comment, label, and activity will be applied to the current version thereby creating a snapshot and locking the current version. After the snapshot is created a new version will be started with an 'in-progress' label.

PARAMETER DESCRIPTION
comment

Comment to add to this snapshot to the table.

TYPE: str DEFAULT: None

label

Label to add to this snapshot to the table. The label must be unique, if a label is not provided a unique label will be generated.

TYPE: str DEFAULT: None

include_activity

If True the activity will be included in snapshot if it exists. In order to include the activity, the activity must have already been stored in Synapse by using the activity attribute on the Table and calling the store() method on the Table instance. Adding an activity to a snapshot of a table is meant to capture the provenance of the data at the time of the snapshot.

TYPE: bool DEFAULT: True

associate_activity_to_new_version

If True the activity will be associated with the new version of the table. If False the activity will not be associated with the new version of the table.

TYPE: bool DEFAULT: True

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

Creating a snapshot of a table

Comment and label are optional, but filled in for this example.

import asyncio
from synapseclient.models import Table
from synapseclient import Synapse

syn = Synapse()
syn.login()


async def main():
    my_table = Table(id="syn1234")
    await my_table.snapshot_async(
        comment="This is a new snapshot comment",
        label="3This is a unique label"
    )

asyncio.run(main())
Including the activity (Provenance) in the snapshot and not pulling it forward to the new in-progress version of the table.

By default this method is set up to include the activity in the snapshot and then pull the activity forward to the new version. If you do not want to include the activity in the snapshot you can set include_activity to False. If you do not want to pull the activity forward to the new version you can set associate_activity_to_new_version to False.

See the activity attribute on the Table class for more information on how to interact with the activity.

import asyncio
from synapseclient.models import Table
from synapseclient import Synapse

syn = Synapse()
syn.login()


async def main():
    my_table = Table(id="syn1234")
    await my_table.snapshot_async(
        comment="This is a new snapshot comment",
        label="This is a unique label",
        include_activity=True,
        associate_activity_to_new_version=False
    )

asyncio.run(main())
RETURNS DESCRIPTION
Dict[str, Any]
Source code in synapseclient/models/table.py
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
async def snapshot_async(
    self,
    comment: str = None,
    label: str = None,
    include_activity: bool = True,
    associate_activity_to_new_version: bool = True,
    *,
    synapse_client: Optional[Synapse] = None,
) -> Dict[str, Any]:
    """
    Request to create a new snapshot of a table. The provided comment, label, and
    activity will be applied to the current version thereby creating a snapshot
    and locking the current version. After the snapshot is created a new version
    will be started with an 'in-progress' label.

    Arguments:
        comment: Comment to add to this snapshot to the table.
        label: Label to add to this snapshot to the table. The label must be unique,
            if a label is not provided a unique label will be generated.
        include_activity: If True the activity will be included in snapshot if it
            exists. In order to include the activity, the activity must have already
            been stored in Synapse by using the `activity` attribute on the Table
            and calling the `store()` method on the Table instance. Adding an
            activity to a snapshot of a table is meant to capture the provenance of
            the data at the time of the snapshot.
        associate_activity_to_new_version: If True the activity will be associated
            with the new version of the table. If False the activity will not be
            associated with the new version of the table.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Example: Creating a snapshot of a table
        Comment and label are optional, but filled in for this example.

        ```python
        import asyncio
        from synapseclient.models import Table
        from synapseclient import Synapse

        syn = Synapse()
        syn.login()


        async def main():
            my_table = Table(id="syn1234")
            await my_table.snapshot_async(
                comment="This is a new snapshot comment",
                label="3This is a unique label"
            )

        asyncio.run(main())
        ```

    Example: Including the activity (Provenance) in the snapshot and not pulling it forward to the new `in-progress` version of the table.
        By default this method is set up to include the activity in the snapshot and
        then pull the activity forward to the new version. If you do not want to
        include the activity in the snapshot you can set `include_activity` to
        False. If you do not want to pull the activity forward to the new version
        you can set `associate_activity_to_new_version` to False.

        See the [activity][synapseclient.models.Activity] attribute on the Table
        class for more information on how to interact with the activity.

        ```python
        import asyncio
        from synapseclient.models import Table
        from synapseclient import Synapse

        syn = Synapse()
        syn.login()


        async def main():
            my_table = Table(id="syn1234")
            await my_table.snapshot_async(
                comment="This is a new snapshot comment",
                label="This is a unique label",
                include_activity=True,
                associate_activity_to_new_version=False
            )

        asyncio.run(main())
        ```

    Returns:
        A dictionary that matches: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/SnapshotResponse.html>
    """
    client = Synapse.get_client(synapse_client=synapse_client)
    # Ensure that we have seeded the table with the latest data
    await self.get_async(include_activity=True, synapse_client=client)
    client.logger.info(
        f"[{self.id}:{self.name}]: Creating a snapshot of the table."
    )

    snapshot_response = await create_table_snapshot(
        table_id=self.id,
        comment=comment,
        label=label,
        activity_id=(
            self.activity.id if self.activity and include_activity else None
        ),
        synapse_client=synapse_client,
    )

    if associate_activity_to_new_version and self.activity:
        self._last_persistent_instance.activity = None
        await self.store_async(synapse_client=synapse_client)
    else:
        await self.get_async(include_activity=True, synapse_client=synapse_client)

    return snapshot_response

delete_column

delete_column(name: str) -> None

Mark a column for deletion. Note that this does not delete the column from Synapse. You must call the .store() function on this table class instance to delete the column from Synapse. This is a convenience function to eliminate the need to manually delete the column from the dictionary and add it to the ._columns_to_delete attribute.

PARAMETER DESCRIPTION
name

The name of the column to delete.

TYPE: str

RETURNS DESCRIPTION
None

None

Deleting a column

This example shows how you may delete a column from a table and then store the change back in Synapse.

from synapseclient import Synapse
from synapseclient.models import Table

syn = Synapse()
syn.login()

table = Table(
    id="syn1234"
).get(include_columns=True)

table.delete_column(name="my_column")
table.store()
Deleting a column (async)

This example shows how you may delete a column from a table and then store the change back in Synapse.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(
        id="syn1234"
    ).get_async(include_columns=True)

    table.delete_column(name="my_column")
    table.store_async()

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
def delete_column(self, name: str) -> None:
    """
    Mark a column for deletion. Note that this does not delete the column from
    Synapse. You must call the `.store()` function on this table class instance to
    delete the column from Synapse. This is a convenience function to eliminate
    the need to manually delete the column from the dictionary and add it to the
    `._columns_to_delete` attribute.

    Arguments:
        name: The name of the column to delete.

    Returns:
        None

    Example: Deleting a column
        This example shows how you may delete a column from a table and then store
        the change back in Synapse.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Table

        syn = Synapse()
        syn.login()

        table = Table(
            id="syn1234"
        ).get(include_columns=True)

        table.delete_column(name="my_column")
        table.store()
        ```

    Example: Deleting a column (async)
        This example shows how you may delete a column from a table and then store
        the change back in Synapse.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(
                id="syn1234"
            ).get_async(include_columns=True)

            table.delete_column(name="my_column")
            table.store_async()

        asyncio.run(main())
        ```
    """
    if not self._last_persistent_instance:
        raise ValueError(
            "This method is only supported after interacting with Synapse via a `.get()` or `.store()` operation"
        )
    if not self.columns:
        raise ValueError(
            "There are no columns. Make sure you use the `include_columns` parameter in the `.get()` method."
        )

    column_to_delete = self.columns.get(name, None)
    if not column_to_delete:
        raise ValueError(f"Column with name {name} does not exist in the table.")

    self._columns_to_delete[column_to_delete.id] = column_to_delete
    self.columns.pop(column_to_delete.name, None)

add_column

add_column(column: Union[Column, List[Column]], index: int = None) -> None

Add column(s) to the table. Note that this does not store the column(s) in Synapse. You must call the .store() function on this table class instance to store the column(s) in Synapse. This is a convenience function to eliminate the need to manually add the column(s) to the dictionary.

This function will add an item to the .columns attribute of this class instance. .columns is a dictionary where the key is the name of the column and the value is the Column object.

PARAMETER DESCRIPTION
column

The column(s) to add, may be a single Column object or a list of Column objects.

TYPE: Union[Column, List[Column]]

index

The index to insert the column at. If not passed in the column will be added to the end of the list.

TYPE: int DEFAULT: None

RETURNS DESCRIPTION
None

None

Adding a single column

This example shows how you may add a single column to a table and then store the change back in Synapse.

from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

table = Table(
    id="syn1234"
).get(include_columns=True)

table.add_column(
    Column(name="my_column", column_type=ColumnType.STRING)
)
table.store()
Adding multiple columns

This example shows how you may add multiple columns to a table and then store the change back in Synapse.

from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

table = Table(
    id="syn1234"
).get(include_columns=True)

table.add_column([
    Column(name="my_column", column_type=ColumnType.STRING),
    Column(name="my_column2", column_type=ColumnType.INTEGER),
])
table.store()
Adding a column at a specific index

This example shows how you may add a column at a specific index to a table and then store the change back in Synapse. If the index is out of bounds the column will be added to the end of the list.

from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

table = Table(
    id="syn1234"
).get(include_columns=True)

table.add_column(
    Column(name="my_column", column_type=ColumnType.STRING),
    # Add the column at the beginning of the list
    index=0
)
table.store()
Adding a single column (async)

This example shows how you may add a single column to a table and then store the change back in Synapse.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(
        id="syn1234"
    ).get_async(include_columns=True)

    table.add_column(
        Column(name="my_column", column_type=ColumnType.STRING)
    )
    await table.store_async()

asyncio.run(main())
Adding multiple columns (async)

This example shows how you may add multiple columns to a table and then store the change back in Synapse.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(
        id="syn1234"
    ).get_async(include_columns=True)

    table.add_column([
        Column(name="my_column", column_type=ColumnType.STRING),
        Column(name="my_column2", column_type=ColumnType.INTEGER),
    ])
    await table.store_async()

asyncio.run(main())
Adding a column at a specific index (async)

This example shows how you may add a column at a specific index to a table and then store the change back in Synapse. If the index is out of bounds the column will be added to the end of the list.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(
        id="syn1234"
    ).get_async(include_columns=True)

    table.add_column(
        Column(name="my_column", column_type=ColumnType.STRING),
        # Add the column at the beginning of the list
        index=0
    )
    await table.store_async()

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
def add_column(
    self, column: Union["Column", List["Column"]], index: int = None
) -> None:
    """Add column(s) to the table. Note that this does not store the column(s) in
    Synapse. You must call the `.store()` function on this table class instance to
    store the column(s) in Synapse. This is a convenience function to eliminate
    the need to manually add the column(s) to the dictionary.


    This function will add an item to the `.columns` attribute of this class
    instance. `.columns` is a dictionary where the key is the name of the column
    and the value is the Column object.

    Arguments:
        column: The column(s) to add, may be a single Column object or a list of
            Column objects.
        index: The index to insert the column at. If not passed in the column will
            be added to the end of the list.

    Returns:
        None

    Example: Adding a single column
        This example shows how you may add a single column to a table and then store
        the change back in Synapse.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        table = Table(
            id="syn1234"
        ).get(include_columns=True)

        table.add_column(
            Column(name="my_column", column_type=ColumnType.STRING)
        )
        table.store()
        ```


    Example: Adding multiple columns
        This example shows how you may add multiple columns to a table and then store
        the change back in Synapse.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        table = Table(
            id="syn1234"
        ).get(include_columns=True)

        table.add_column([
            Column(name="my_column", column_type=ColumnType.STRING),
            Column(name="my_column2", column_type=ColumnType.INTEGER),
        ])
        table.store()
        ```

    Example: Adding a column at a specific index
        This example shows how you may add a column at a specific index to a table
        and then store the change back in Synapse. If the index is out of bounds the
        column will be added to the end of the list.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        table = Table(
            id="syn1234"
        ).get(include_columns=True)

        table.add_column(
            Column(name="my_column", column_type=ColumnType.STRING),
            # Add the column at the beginning of the list
            index=0
        )
        table.store()
        ```

    Example: Adding a single column (async)
        This example shows how you may add a single column to a table and then store
        the change back in Synapse.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(
                id="syn1234"
            ).get_async(include_columns=True)

            table.add_column(
                Column(name="my_column", column_type=ColumnType.STRING)
            )
            await table.store_async()

        asyncio.run(main())
        ```

    Example: Adding multiple columns (async)
        This example shows how you may add multiple columns to a table and then store
        the change back in Synapse.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(
                id="syn1234"
            ).get_async(include_columns=True)

            table.add_column([
                Column(name="my_column", column_type=ColumnType.STRING),
                Column(name="my_column2", column_type=ColumnType.INTEGER),
            ])
            await table.store_async()

        asyncio.run(main())
        ```

    Example: Adding a column at a specific index (async)
        This example shows how you may add a column at a specific index to a table
        and then store the change back in Synapse. If the index is out of bounds the
        column will be added to the end of the list.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(
                id="syn1234"
            ).get_async(include_columns=True)

            table.add_column(
                Column(name="my_column", column_type=ColumnType.STRING),
                # Add the column at the beginning of the list
                index=0
            )
            await table.store_async()

        asyncio.run(main())
        ```
    """
    if not self._last_persistent_instance:
        raise ValueError(
            "This method is only supported after interacting with Synapse via a `.get()` or `.store()` operation"
        )

    if index is not None:
        if isinstance(column, list):
            columns_to_insert = []
            for i, col in enumerate(column):
                if col.name in self.columns:
                    raise ValueError(f"Duplicate column name: {col.name}")
                columns_to_insert.append((col.name, col))
            insert_index = min(index, len(self.columns))
            self.columns = OrderedDict(
                list(self.columns.items())[:insert_index]
                + columns_to_insert
                + list(self.columns.items())[insert_index:]
            )
        else:
            if column.name in self.columns:
                raise ValueError(f"Duplicate column name: {column.name}")
            insert_index = min(index, len(self.columns))
            self.columns = OrderedDict(
                list(self.columns.items())[:insert_index]
                + [(column.name, column)]
                + list(self.columns.items())[insert_index:]
            )

    else:
        if isinstance(column, list):
            for col in column:
                if col.name in self.columns:
                    raise ValueError(f"Duplicate column name: {col.name}")
                self.columns[col.name] = col
        else:
            if column.name in self.columns:
                raise ValueError(f"Duplicate column name: {column.name}")
            self.columns[column.name] = column

reorder_column

reorder_column(name: str, index: int) -> None

Reorder a column in the table. Note that this does not store the column in Synapse. You must call the .store() function on this table class instance to store the column in Synapse. This is a convenience function to eliminate the need to manually reorder the .columns attribute dictionary.

You must ensure that the index is within the bounds of the number of columns in the table. If you pass in an index that is out of bounds the column will be added to the end of the list.

PARAMETER DESCRIPTION
name

The name of the column to reorder.

TYPE: str

index

The index to move the column to starting with 0.

TYPE: int

RETURNS DESCRIPTION
None

None

Reordering a column

This example shows how you may reorder a column in a table and then store the change back in Synapse.

from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

table = Table(
    id="syn1234"
).get(include_columns=True)

# Move the column to the beginning of the list
table.reorder_column(name="my_column", index=0)
table.store()
Reordering a column (async)

This example shows how you may reorder a column in a table and then store the change back in Synapse.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column, ColumnType, Table

syn = Synapse()
syn.login()

async def main():
    table = await Table(
        id="syn1234"
    ).get_async(include_columns=True)

    # Move the column to the beginning of the list
    table.reorder_column(name="my_column", index=0)
    table.store_async()

asyncio.run(main())
Source code in synapseclient/models/mixins/table_components.py
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
def reorder_column(self, name: str, index: int) -> None:
    """Reorder a column in the table. Note that this does not store the column in
    Synapse. You must call the `.store()` function on this table class instance to
    store the column in Synapse. This is a convenience function to eliminate
    the need to manually reorder the `.columns` attribute dictionary.

    You must ensure that the index is within the bounds of the number of columns in
    the table. If you pass in an index that is out of bounds the column will be
    added to the end of the list.

    Arguments:
        name: The name of the column to reorder.
        index: The index to move the column to starting with 0.

    Returns:
        None

    Example: Reordering a column
        This example shows how you may reorder a column in a table and then store
        the change back in Synapse.

        ```python
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        table = Table(
            id="syn1234"
        ).get(include_columns=True)

        # Move the column to the beginning of the list
        table.reorder_column(name="my_column", index=0)
        table.store()
        ```


    Example: Reordering a column (async)
        This example shows how you may reorder a column in a table and then store
        the change back in Synapse.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Column, ColumnType, Table

        syn = Synapse()
        syn.login()

        async def main():
            table = await Table(
                id="syn1234"
            ).get_async(include_columns=True)

            # Move the column to the beginning of the list
            table.reorder_column(name="my_column", index=0)
            table.store_async()

        asyncio.run(main())
        ```
    """
    if not self._last_persistent_instance:
        raise ValueError(
            "This method is only supported after interacting with Synapse via a `.get()` or `.store()` operation"
        )

    column_to_reorder = self.columns.pop(name, None)
    if index >= len(self.columns):
        self.columns[name] = column_to_reorder
        return self

    self.columns = OrderedDict(
        list(self.columns.items())[:index]
        + [(name, column_to_reorder)]
        + list(self.columns.items())[index:]
    )

get_permissions_async async

get_permissions_async(*, synapse_client: Optional[Synapse] = None) -> Permissions

Get the permissions that the caller has on an Entity.

PARAMETER DESCRIPTION
synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Permissions

A Permissions object

Using this function:

Getting permissions for a Synapse Entity

import asyncio
from synapseclient import Synapse
from synapseclient.models import File

syn = Synapse()
syn.login()

async def main():
    permissions = await File(id="syn123").get_permissions_async()

asyncio.run(main())

Getting access types list from the Permissions object

permissions.access_types
Source code in synapseclient/models/mixins/access_control.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
async def get_permissions_async(
    self,
    *,
    synapse_client: Optional[Synapse] = None,
) -> "Permissions":
    """
    Get the [permissions][synapseclient.core.models.permission.Permissions]
    that the caller has on an Entity.

    Arguments:
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        A Permissions object


    Example: Using this function:
        Getting permissions for a Synapse Entity

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File

        syn = Synapse()
        syn.login()

        async def main():
            permissions = await File(id="syn123").get_permissions_async()

        asyncio.run(main())
        ```

        Getting access types list from the Permissions object

        ```
        permissions.access_types
        ```
    """
    from synapseclient.core.models.permission import Permissions

    permissions_dict = await get_entity_permissions(
        entity_id=self.id,
        synapse_client=synapse_client,
    )
    return Permissions.from_dict(data=permissions_dict)

get_acl_async async

get_acl_async(principal_id: int = None, check_benefactor: bool = True, *, synapse_client: Optional[Synapse] = None) -> List[str]

Get the ACL that a user or group has on an Entity.

Note: If the entity does not have local sharing settings, or ACL set directly on it, this will look up the ACL on the benefactor of the entity. The benefactor is the entity that the current entity inherits its permissions from. The benefactor is usually the parent entity, but it can be any ancestor in the hierarchy. For example, a newly created Project will be its own benefactor, while a new FileEntity's benefactor will start off as its containing Project or Folder. If the entity already has local sharing settings, the benefactor would be itself.

PARAMETER DESCRIPTION
principal_id

Identifier of a user or group (defaults to PUBLIC users)

TYPE: int DEFAULT: None

check_benefactor

If True (default), check the benefactor for the entity to get the ACL. If False, only check the entity itself. This is useful for checking the ACL of an entity that has local sharing settings, but you want to check the ACL of the entity itself and not the benefactor it may inherit from.

TYPE: bool DEFAULT: True

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
List[str]

An array containing some combination of ['READ', 'UPDATE', 'CREATE', 'DELETE', 'DOWNLOAD', 'MODERATE', 'CHANGE_PERMISSIONS', 'CHANGE_SETTINGS'] or an empty array

Source code in synapseclient/models/mixins/access_control.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
async def get_acl_async(
    self,
    principal_id: int = None,
    check_benefactor: bool = True,
    *,
    synapse_client: Optional[Synapse] = None,
) -> List[str]:
    """
    Get the [ACL][synapseclient.core.models.permission.Permissions.access_types]
    that a user or group has on an Entity.

    Note: If the entity does not have local sharing settings, or ACL set directly
    on it, this will look up the ACL on the benefactor of the entity. The
    benefactor is the entity that the current entity inherits its permissions from.
    The benefactor is usually the parent entity, but it can be any ancestor in the
    hierarchy. For example, a newly created Project will be its own benefactor,
    while a new FileEntity's benefactor will start off as its containing Project or
    Folder. If the entity already has local sharing settings, the benefactor would
    be itself.

    Arguments:
        principal_id: Identifier of a user or group (defaults to PUBLIC users)
        check_benefactor: If True (default), check the benefactor for the entity
            to get the ACL. If False, only check the entity itself.
            This is useful for checking the ACL of an entity that has local sharing
            settings, but you want to check the ACL of the entity itself and not
            the benefactor it may inherit from.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        An array containing some combination of
            ['READ', 'UPDATE', 'CREATE', 'DELETE', 'DOWNLOAD', 'MODERATE',
            'CHANGE_PERMISSIONS', 'CHANGE_SETTINGS']
            or an empty array
    """
    return await get_entity_acl_list(
        entity_id=self.id,
        principal_id=str(principal_id) if principal_id is not None else None,
        check_benefactor=check_benefactor,
        synapse_client=synapse_client,
    )

set_permissions_async async

set_permissions_async(principal_id: int = None, access_type: List[str] = None, modify_benefactor: bool = False, warn_if_inherits: bool = True, overwrite: bool = True, *, synapse_client: Optional[Synapse] = None) -> Dict[str, Union[str, list]]

Sets permission that a user or group has on an Entity. An Entity may have its own ACL or inherit its ACL from a benefactor.

PARAMETER DESCRIPTION
principal_id

Identifier of a user or group. 273948 is for all registered Synapse users and 273949 is for public access. None implies public access.

TYPE: int DEFAULT: None

access_type

Type of permission to be granted. One or more of CREATE, READ, DOWNLOAD, UPDATE, DELETE, CHANGE_PERMISSIONS.

Defaults to ['READ', 'DOWNLOAD']

TYPE: List[str] DEFAULT: None

modify_benefactor

Set as True when modifying a benefactor's ACL. The term 'benefactor' is used to indicate which Entity an Entity inherits its ACL from. For example, a newly created Project will be its own benefactor, while a new FileEntity's benefactor will start off as its containing Project. If the entity already has local sharing settings the benefactor would be itself. It may also be the immediate parent, somewhere in the parent tree, or the project itself.

TYPE: bool DEFAULT: False

warn_if_inherits

When modify_benefactor is True, this does not have any effect. When modify_benefactor is False, and warn_if_inherits is True, a warning log message is produced if the benefactor for the entity you passed into the function is not itself, i.e., it's the parent folder, or another entity in the parent tree.

TYPE: bool DEFAULT: True

overwrite

By default this function overwrites existing permissions for the specified user. Set this flag to False to add new permissions non-destructively.

TYPE: bool DEFAULT: True

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Dict[str, Union[str, list]]
Setting permissions

Grant all registered users download access

import asyncio
from synapseclient import Synapse
from synapseclient.models import File

syn = Synapse()
syn.login()

async def main():
    await File(id="syn123").set_permissions_async(principal_id=273948, access_type=['READ','DOWNLOAD'])

asyncio.run(main())

Grant the public view access

import asyncio
from synapseclient import Synapse
from synapseclient.models import File

syn = Synapse()
syn.login()

async def main():
    await File(id="syn123").set_permissions_async(principal_id=273949, access_type=['READ'])

asyncio.run(main())
Source code in synapseclient/models/mixins/access_control.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
async def set_permissions_async(
    self,
    principal_id: int = None,
    access_type: List[str] = None,
    modify_benefactor: bool = False,
    warn_if_inherits: bool = True,
    overwrite: bool = True,
    *,
    synapse_client: Optional[Synapse] = None,
) -> Dict[str, Union[str, list]]:
    """
    Sets permission that a user or group has on an Entity.
    An Entity may have its own ACL or inherit its ACL from a benefactor.

    Arguments:
        principal_id: Identifier of a user or group. `273948` is for all
            registered Synapse users and `273949` is for public access.
            None implies public access.
        access_type: Type of permission to be granted. One or more of CREATE,
            READ, DOWNLOAD, UPDATE, DELETE, CHANGE_PERMISSIONS.

            **Defaults to ['READ', 'DOWNLOAD']**
        modify_benefactor: Set as True when modifying a benefactor's ACL. The term
            'benefactor' is used to indicate which Entity an Entity inherits its
            ACL from. For example, a newly created Project will be its own
            benefactor, while a new FileEntity's benefactor will start off as its
            containing Project. If the entity already has local sharing settings
            the benefactor would be itself. It may also be the immediate parent,
            somewhere in the parent tree, or the project itself.
        warn_if_inherits: When `modify_benefactor` is True, this does not have any
            effect. When `modify_benefactor` is False, and `warn_if_inherits` is
            True, a warning log message is produced if the benefactor for the
            entity you passed into the function is not itself, i.e., it's the
            parent folder, or another entity in the parent tree.
        overwrite: By default this function overwrites existing permissions for
            the specified user. Set this flag to False to add new permissions
            non-destructively.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        An Access Control List object matching <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/AccessControlList.html>.

    Example: Setting permissions
        Grant all registered users download access

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File

        syn = Synapse()
        syn.login()

        async def main():
            await File(id="syn123").set_permissions_async(principal_id=273948, access_type=['READ','DOWNLOAD'])

        asyncio.run(main())
        ```

        Grant the public view access

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File

        syn = Synapse()
        syn.login()

        async def main():
            await File(id="syn123").set_permissions_async(principal_id=273949, access_type=['READ'])

        asyncio.run(main())
        ```
    """
    if access_type is None:
        access_type = ["READ", "DOWNLOAD"]

    return await set_entity_permissions(
        entity_id=self.id,
        principal_id=str(principal_id) if principal_id is not None else None,
        access_type=access_type,
        modify_benefactor=modify_benefactor,
        warn_if_inherits=warn_if_inherits,
        overwrite=overwrite,
        synapse_client=synapse_client,
    )

delete_permissions_async async

delete_permissions_async(include_self: bool = True, include_container_content: bool = False, recursive: bool = False, target_entity_types: Optional[List[str]] = None, dry_run: bool = False, show_acl_details: bool = True, show_files_in_containers: bool = True, *, synapse_client: Optional[Synapse] = None, _benefactor_tracker: Optional[BenefactorTracker] = None) -> None

Delete the entire Access Control List (ACL) for a given Entity. This is not scoped to a specific user or group, but rather removes all permissions associated with the Entity. After this operation, the Entity will inherit permissions from its benefactor, which is typically its parent entity or the Project it belongs to.

In order to remove permissions for a specific user or group, you should use the set_permissions_async method with the access_type set to an empty list.

By default, Entities such as FileEntity and Folder inherit their permission from their containing Project. For such Entities the Project is the Entity's 'benefactor'. This permission inheritance can be overridden by creating an ACL for the Entity. When this occurs the Entity becomes its own benefactor and all permission are determined by its own ACL.

If the ACL of an Entity is deleted, then its benefactor will automatically be set to its parent's benefactor.

Special notice for Projects: The ACL for a Project cannot be deleted, you must individually update or revoke the permissions for each user or group.

PARAMETER DESCRIPTION
include_self

If True (default), delete the ACL of the current entity. If False, skip deleting the ACL of the current entity.

TYPE: bool DEFAULT: True

include_container_content

If True, delete ACLs from contents directly within containers (files and folders inside self). This must be set to True for recursive to have any effect. Defaults to False.

TYPE: bool DEFAULT: False

recursive

If True and the entity is a container (e.g., Project or Folder), recursively process child containers. Note that this must be used with include_container_content=True to have any effect. Setting recursive=True with include_container_content=False will raise a ValueError. Only works on classes that support the sync_from_synapse_async method.

TYPE: bool DEFAULT: False

target_entity_types

Specify which entity types to process when deleting ACLs. Allowed values are "folder", "file", "project", "table", "entityview", "materializedview", "virtualtable", "dataset", "datasetcollection", "submissionview" (case-insensitive). If None, defaults to ["folder", "file"]. This does not affect the entity type of the current entity, which is always processed if include_self=True.

TYPE: Optional[List[str]] DEFAULT: None

dry_run

If True, log the changes that would be made instead of actually performing the deletions. When enabled, all ACL deletion operations are simulated and logged at info level. Defaults to False.

TYPE: bool DEFAULT: False

show_acl_details

When dry_run=True, controls whether current ACL details are displayed for entities that will have their permissions changed. If True (default), shows detailed ACL information. If False, hides ACL details for cleaner output. Has no effect when dry_run=False.

TYPE: bool DEFAULT: True

show_files_in_containers

When dry_run=True, controls whether files within containers are displayed in the preview. If True (default), shows all files. If False, hides files when their only change is benefactor inheritance (but still shows files with local ACLs being deleted). Has no effect when dry_run=False.

TYPE: bool DEFAULT: True

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

_benefactor_tracker

Internal use tracker for managing benefactor relationships. Used for recursive functionality to track which entities will be affected

TYPE: Optional[BenefactorTracker] DEFAULT: None

RETURNS DESCRIPTION
None

None

RAISES DESCRIPTION
ValueError

If the entity does not have an ID or if an invalid entity type is provided.

SynapseHTTPError

If there are permission issues or if the entity already inherits permissions.

Exception

For any other errors that may occur during the process.

Note: The caller must be granted ACCESS_TYPE.CHANGE_PERMISSIONS on the Entity to call this method.

Delete permissions for a single entity
import asyncio
from synapseclient import Synapse
from synapseclient.models import File

syn = Synapse()
syn.login()

async def main():
    await File(id="syn123").delete_permissions_async()

asyncio.run(main())
Delete permissions recursively for a folder and all its children
import asyncio
from synapseclient import Synapse
from synapseclient.models import Folder

syn = Synapse()
syn.login()

async def main():
    # Delete permissions for this folder only (does not affect children)
    await Folder(id="syn123").delete_permissions_async()

    # Delete permissions for all files and folders directly within this folder,
    # but not the folder itself
    await Folder(id="syn123").delete_permissions_async(
        include_self=False,
        include_container_content=True
    )

    # Delete permissions for all items in the entire hierarchy (folders and their files)
    # Both recursive and include_container_content must be True
    await Folder(id="syn123").delete_permissions_async(
        recursive=True,
        include_container_content=True
    )

    # Delete permissions only for folder entities within this folder recursively
    # and their contents
    await Folder(id="syn123").delete_permissions_async(
        recursive=True,
        include_container_content=True,
        target_entity_types=["folder"]
    )

    # Delete permissions only for files within this folder and all subfolders
    await Folder(id="syn123").delete_permissions_async(
        include_self=False,
        recursive=True,
        include_container_content=True,
        target_entity_types=["file"]
    )

    # Delete permissions for specific entity types (e.g., tables and views)
    await Folder(id="syn123").delete_permissions_async(
        recursive=True,
        include_container_content=True,
        target_entity_types=["table", "entityview", "materializedview"]
    )

    # Dry run example: Log what would be deleted without making changes
    await Folder(id="syn123").delete_permissions_async(
        recursive=True,
        include_container_content=True,
        dry_run=True
    )
asyncio.run(main())
Source code in synapseclient/models/mixins/access_control.py
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
async def delete_permissions_async(
    self,
    include_self: bool = True,
    include_container_content: bool = False,
    recursive: bool = False,
    target_entity_types: Optional[List[str]] = None,
    dry_run: bool = False,
    show_acl_details: bool = True,
    show_files_in_containers: bool = True,
    *,
    synapse_client: Optional[Synapse] = None,
    _benefactor_tracker: Optional[BenefactorTracker] = None,
) -> None:
    """
    Delete the entire Access Control List (ACL) for a given Entity. This is not
    scoped to a specific user or group, but rather removes all permissions
    associated with the Entity. After this operation, the Entity will inherit
    permissions from its benefactor, which is typically its parent entity or
    the Project it belongs to.

    In order to remove permissions for a specific user or group, you
    should use the `set_permissions_async` method with the `access_type` set to
    an empty list.

    By default, Entities such as FileEntity and Folder inherit their permission from
    their containing Project. For such Entities the Project is the Entity's 'benefactor'.
    This permission inheritance can be overridden by creating an ACL for the Entity.
    When this occurs the Entity becomes its own benefactor and all permission are
    determined by its own ACL.

    If the ACL of an Entity is deleted, then its benefactor will automatically be set
    to its parent's benefactor.

    **Special notice for Projects:** The ACL for a Project cannot be deleted, you
    must individually update or revoke the permissions for each user or group.

    Arguments:
        include_self: If True (default), delete the ACL of the current entity.
            If False, skip deleting the ACL of the current entity.
        include_container_content: If True, delete ACLs from contents directly within
            containers (files and folders inside self). This must be set to
            True for recursive to have any effect. Defaults to False.
        recursive: If True and the entity is a container (e.g., Project or Folder),
            recursively process child containers. Note that this must be used with
            include_container_content=True to have any effect. Setting recursive=True
            with include_container_content=False will raise a ValueError.
            Only works on classes that support the `sync_from_synapse_async` method.
        target_entity_types: Specify which entity types to process when deleting ACLs.
            Allowed values are "folder", "file", "project", "table", "entityview",
            "materializedview", "virtualtable", "dataset", "datasetcollection",
            "submissionview" (case-insensitive). If None, defaults to ["folder", "file"].
            This does not affect the entity type of the current entity, which is always
            processed if `include_self=True`.
        dry_run: If True, log the changes that would be made instead of actually
            performing the deletions. When enabled, all ACL deletion operations are
            simulated and logged at info level. Defaults to False.
        show_acl_details: When dry_run=True, controls whether current ACL details are
            displayed for entities that will have their permissions changed. If True (default),
            shows detailed ACL information. If False, hides ACL details for cleaner output.
            Has no effect when dry_run=False.
        show_files_in_containers: When dry_run=True, controls whether files within containers
            are displayed in the preview. If True (default), shows all files. If False, hides
            files when their only change is benefactor inheritance (but still shows files with
            local ACLs being deleted). Has no effect when dry_run=False.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.
        _benefactor_tracker: Internal use tracker for managing benefactor relationships.
            Used for recursive functionality to track which entities will be affected

    Returns:
        None

    Raises:
        ValueError: If the entity does not have an ID or if an invalid entity type is provided.
        SynapseHTTPError: If there are permission issues or if the entity already inherits permissions.
        Exception: For any other errors that may occur during the process.

    Note: The caller must be granted ACCESS_TYPE.CHANGE_PERMISSIONS on the Entity to
    call this method.

    Example: Delete permissions for a single entity
        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File

        syn = Synapse()
        syn.login()

        async def main():
            await File(id="syn123").delete_permissions_async()

        asyncio.run(main())
        ```

    Example: Delete permissions recursively for a folder and all its children
        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Folder

        syn = Synapse()
        syn.login()

        async def main():
            # Delete permissions for this folder only (does not affect children)
            await Folder(id="syn123").delete_permissions_async()

            # Delete permissions for all files and folders directly within this folder,
            # but not the folder itself
            await Folder(id="syn123").delete_permissions_async(
                include_self=False,
                include_container_content=True
            )

            # Delete permissions for all items in the entire hierarchy (folders and their files)
            # Both recursive and include_container_content must be True
            await Folder(id="syn123").delete_permissions_async(
                recursive=True,
                include_container_content=True
            )

            # Delete permissions only for folder entities within this folder recursively
            # and their contents
            await Folder(id="syn123").delete_permissions_async(
                recursive=True,
                include_container_content=True,
                target_entity_types=["folder"]
            )

            # Delete permissions only for files within this folder and all subfolders
            await Folder(id="syn123").delete_permissions_async(
                include_self=False,
                recursive=True,
                include_container_content=True,
                target_entity_types=["file"]
            )

            # Delete permissions for specific entity types (e.g., tables and views)
            await Folder(id="syn123").delete_permissions_async(
                recursive=True,
                include_container_content=True,
                target_entity_types=["table", "entityview", "materializedview"]
            )

            # Dry run example: Log what would be deleted without making changes
            await Folder(id="syn123").delete_permissions_async(
                recursive=True,
                include_container_content=True,
                dry_run=True
            )
        asyncio.run(main())
        ```
    """
    if not self.id:
        raise ValueError("The entity must have an ID to delete permissions.")

    client = Synapse.get_client(synapse_client=synapse_client)

    if include_self and self.__class__.__name__.lower() == "project":
        client.logger.warning(
            "The ACL for a Project cannot be deleted, you must individually update or "
            "revoke the permissions for each user or group. Continuing without deleting "
            "the Project's ACL."
        )
        include_self = False

    normalized_types = self._normalize_target_entity_types(target_entity_types)

    is_top_level = not _benefactor_tracker
    benefactor_tracker = _benefactor_tracker or BenefactorTracker()

    should_process_children = (recursive or include_container_content) and hasattr(
        self, "sync_from_synapse_async"
    )
    all_entities = [self] if include_self else []

    custom_message = "Deleting ACLs [Dry Run]..." if dry_run else "Deleting ACLs..."
    with shared_download_progress_bar(
        file_size=1, synapse_client=client, custom_message=custom_message, unit=None
    ) as progress_bar:
        if progress_bar:
            progress_bar.update(1)  # Initial setup complete

        if should_process_children:
            if recursive and not include_container_content:
                raise ValueError(
                    "When recursive=True, include_container_content must also be True. "
                    "Setting recursive=True with include_container_content=False has no effect."
                )

            if progress_bar:
                progress_bar.total += 1
                progress_bar.refresh()

            all_entities = await self._collect_entities(
                client=client,
                target_entity_types=normalized_types,
                include_container_content=include_container_content,
                recursive=recursive,
                progress_bar=progress_bar,
            )
            if progress_bar:
                progress_bar.update(1)

            entity_ids = [entity.id for entity in all_entities if entity.id]
            if entity_ids:
                if progress_bar:
                    progress_bar.total += 1
                    progress_bar.refresh()
                await benefactor_tracker.track_entity_benefactor(
                    entity_ids=entity_ids,
                    synapse_client=client,
                    progress_bar=progress_bar,
                )
            else:
                if progress_bar:
                    progress_bar.total += 1
                    progress_bar.refresh()
                    progress_bar.update(1)

        if is_top_level:
            if progress_bar:
                progress_bar.total += 1
                progress_bar.refresh()
            await self._build_and_log_run_tree(
                client=client,
                benefactor_tracker=benefactor_tracker,
                collected_entities=all_entities,
                include_self=include_self,
                show_acl_details=show_acl_details,
                show_files_in_containers=show_files_in_containers,
                progress_bar=progress_bar,
                dry_run=dry_run,
            )

        if dry_run:
            return

        if include_self:
            if progress_bar:
                progress_bar.total += 1
                progress_bar.refresh()
            await self._delete_current_entity_acl(
                client=client,
                benefactor_tracker=benefactor_tracker,
                progress_bar=progress_bar,
            )

        if should_process_children:
            if include_container_content:
                if progress_bar:
                    progress_bar.total += 1
                    progress_bar.refresh()
                await self._process_container_contents(
                    client=client,
                    target_entity_types=normalized_types,
                    benefactor_tracker=benefactor_tracker,
                    progress_bar=progress_bar,
                    recursive=recursive,
                    include_container_content=include_container_content,
                )
                if progress_bar:
                    progress_bar.update(1)  # Process container contents complete

list_acl_async async

list_acl_async(recursive: bool = False, include_container_content: bool = False, target_entity_types: Optional[List[str]] = None, log_tree: bool = False, *, synapse_client: Optional[Synapse] = None, _progress_bar: Optional[tqdm] = None) -> AclListResult

List the Access Control Lists (ACLs) for this entity and optionally its children.

This function returns the local sharing settings for the entity and optionally its children. It provides a mapping of all ACLs for the given container/entity.

Important Note: This function returns the LOCAL sharing settings only, not the effective permissions that each Synapse User ID/Team has on the entities. More permissive permissions could be granted via a Team that the user has access to that has permissions on the entity, or through inheritance from parent entities.

PARAMETER DESCRIPTION
recursive

If True and the entity is a container (e.g., Project or Folder), recursively process child containers. Note that this must be used with include_container_content=True to have any effect. Setting recursive=True with include_container_content=False will raise a ValueError. Only works on classes that support the sync_from_synapse_async method.

TYPE: bool DEFAULT: False

include_container_content

If True, include ACLs from contents directly within containers (files and folders inside self). This must be set to True for recursive to have any effect. Defaults to False.

TYPE: bool DEFAULT: False

target_entity_types

Specify which entity types to process when listing ACLs. Allowed values are "folder", "file", "project", "table", "entityview", "materializedview", "virtualtable", "dataset", "datasetcollection", "submissionview" (case-insensitive). If None, defaults to ["folder", "file"].

TYPE: Optional[List[str]] DEFAULT: None

log_tree

If True, logs the ACL results to console in ASCII tree format showing entity hierarchies and their ACL permissions in a tree-like structure. Defaults to False.

TYPE: bool DEFAULT: False

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

_progress_bar

Internal parameter. Progress bar instance to use for updates when called recursively. Should not be used by external callers.

TYPE: Optional[tqdm] DEFAULT: None

RETURNS DESCRIPTION
AclListResult

An AclListResult object containing a structured representation of ACLs where:

AclListResult
  • entity_acls: A list of EntityAcl objects, each representing one entity's ACL
AclListResult
  • Each EntityAcl contains acl_entries (a list of AclEntry objects)
AclListResult
  • Each AclEntry contains the principal_id and their list of permissions
RAISES DESCRIPTION
ValueError

If the entity does not have an ID or if an invalid entity type is provided.

SynapseHTTPError

If there are permission issues accessing ACLs.

Exception

For any other errors that may occur during the process.

List ACLs for a single entity
import asyncio
from synapseclient import Synapse
from synapseclient.models import File

syn = Synapse()
syn.login()

async def main():
    acl_result = await File(id="syn123").list_acl_async()
    print(acl_result)

    # Access entity ACLs (entity_acls is a list, not a dict)
    for entity_acl in acl_result.all_entity_acls:
        if entity_acl.entity_id == "syn123":
            # Access individual ACL entries
            for acl_entry in entity_acl.acl_entries:
                if acl_entry.principal_id == "273948":
                    print(f"Principal 273948 has permissions: {acl_entry.permissions}")

    # I can also access the ACL for the file itself
    print(acl_result.entity_acl)

    print(acl_result)

asyncio.run(main())
List ACLs recursively for a folder and all its children
import asyncio
from synapseclient import Synapse
from synapseclient.models import Folder

syn = Synapse()
syn.login()

async def main():
    acl_result = await Folder(id="syn123").list_acl_async(
        recursive=True,
        include_container_content=True
    )

    # Access each entity's ACL (entity_acls is a list)
    for entity_acl in acl_result.all_entity_acls:
        print(f"Entity {entity_acl.entity_id} has ACL with {len(entity_acl.acl_entries)} principals")

    # I can also access the ACL for the folder itself
    print(acl_result.entity_acl)

    # List ACLs for only folder entities
    folder_acl_result = await Folder(id="syn123").list_acl_async(
        recursive=True,
        include_container_content=True,
        target_entity_types=["folder"]
    )

    # List ACLs for specific entity types (e.g., tables and views)
    table_view_acl_result = await Folder(id="syn123").list_acl_async(
        recursive=True,
        include_container_content=True,
        target_entity_types=["table", "entityview", "materializedview"]
    )

asyncio.run(main())
List ACLs with ASCII tree visualization

When log_tree=True, the ACLs will be logged in a tree format. Additionally, the ascii_tree attribute of the AclListResult will contain the ASCII tree representation of the ACLs.

import asyncio
from synapseclient import Synapse
from synapseclient.models import Folder

syn = Synapse()
syn.login()

async def main():
    acl_result = await Folder(id="syn123").list_acl_async(
        recursive=True,
        include_container_content=True,
        log_tree=True, # Enable ASCII tree logging
    )

    # The ASCII tree representation of the ACLs will also be available
    # in acl_result.ascii_tree
    print(acl_result.ascii_tree)

asyncio.run(main())
Source code in synapseclient/models/mixins/access_control.py
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
async def list_acl_async(
    self,
    recursive: bool = False,
    include_container_content: bool = False,
    target_entity_types: Optional[List[str]] = None,
    log_tree: bool = False,
    *,
    synapse_client: Optional[Synapse] = None,
    _progress_bar: Optional[tqdm] = None,  # Internal parameter for recursive calls
) -> AclListResult:
    """
    List the Access Control Lists (ACLs) for this entity and optionally its children.

    This function returns the local sharing settings for the entity and optionally
    its children. It provides a mapping of all ACLs for the given container/entity.

    **Important Note:** This function returns the LOCAL sharing settings only, not
    the effective permissions that each Synapse User ID/Team has on the entities.
    More permissive permissions could be granted via a Team that the user has access
    to that has permissions on the entity, or through inheritance from parent entities.

    Arguments:
        recursive: If True and the entity is a container (e.g., Project or Folder),
            recursively process child containers. Note that this must be used with
            include_container_content=True to have any effect. Setting recursive=True
            with include_container_content=False will raise a ValueError.
            Only works on classes that support the `sync_from_synapse_async` method.
        include_container_content: If True, include ACLs from contents directly within
            containers (files and folders inside self). This must be set to
            True for recursive to have any effect. Defaults to False.
        target_entity_types: Specify which entity types to process when listing ACLs.
            Allowed values are "folder", "file", "project", "table", "entityview",
            "materializedview", "virtualtable", "dataset", "datasetcollection",
            "submissionview" (case-insensitive). If None, defaults to ["folder", "file"].
        log_tree: If True, logs the ACL results to console in ASCII tree format showing
            entity hierarchies and their ACL permissions in a tree-like structure.
            Defaults to False.
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.
        _progress_bar: Internal parameter. Progress bar instance to use for updates
            when called recursively. Should not be used by external callers.

    Returns:
        An AclListResult object containing a structured representation of ACLs where:
        - entity_acls: A list of EntityAcl objects, each representing one entity's ACL
        - Each EntityAcl contains acl_entries (a list of AclEntry objects)
        - Each AclEntry contains the principal_id and their list of permissions

    Raises:
        ValueError: If the entity does not have an ID or if an invalid entity type is provided.
        SynapseHTTPError: If there are permission issues accessing ACLs.
        Exception: For any other errors that may occur during the process.

    Example: List ACLs for a single entity
        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File

        syn = Synapse()
        syn.login()

        async def main():
            acl_result = await File(id="syn123").list_acl_async()
            print(acl_result)

            # Access entity ACLs (entity_acls is a list, not a dict)
            for entity_acl in acl_result.all_entity_acls:
                if entity_acl.entity_id == "syn123":
                    # Access individual ACL entries
                    for acl_entry in entity_acl.acl_entries:
                        if acl_entry.principal_id == "273948":
                            print(f"Principal 273948 has permissions: {acl_entry.permissions}")

            # I can also access the ACL for the file itself
            print(acl_result.entity_acl)

            print(acl_result)

        asyncio.run(main())
        ```

    Example: List ACLs recursively for a folder and all its children
        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Folder

        syn = Synapse()
        syn.login()

        async def main():
            acl_result = await Folder(id="syn123").list_acl_async(
                recursive=True,
                include_container_content=True
            )

            # Access each entity's ACL (entity_acls is a list)
            for entity_acl in acl_result.all_entity_acls:
                print(f"Entity {entity_acl.entity_id} has ACL with {len(entity_acl.acl_entries)} principals")

            # I can also access the ACL for the folder itself
            print(acl_result.entity_acl)

            # List ACLs for only folder entities
            folder_acl_result = await Folder(id="syn123").list_acl_async(
                recursive=True,
                include_container_content=True,
                target_entity_types=["folder"]
            )

            # List ACLs for specific entity types (e.g., tables and views)
            table_view_acl_result = await Folder(id="syn123").list_acl_async(
                recursive=True,
                include_container_content=True,
                target_entity_types=["table", "entityview", "materializedview"]
            )

        asyncio.run(main())
        ```

    Example: List ACLs with ASCII tree visualization
        When `log_tree=True`, the ACLs will be logged in a tree format. Additionally,
        the `ascii_tree` attribute of the AclListResult will contain the ASCII tree
        representation of the ACLs.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import Folder

        syn = Synapse()
        syn.login()

        async def main():
            acl_result = await Folder(id="syn123").list_acl_async(
                recursive=True,
                include_container_content=True,
                log_tree=True, # Enable ASCII tree logging
            )

            # The ASCII tree representation of the ACLs will also be available
            # in acl_result.ascii_tree
            print(acl_result.ascii_tree)

        asyncio.run(main())
        ```
    """
    if not self.id:
        raise ValueError("The entity must have an ID to list ACLs.")

    normalized_types = self._normalize_target_entity_types(target_entity_types)
    client = Synapse.get_client(synapse_client=synapse_client)

    all_acls: Dict[str, Dict[str, List[str]]] = {}
    all_entities = []

    # Only update progress bar for self ACL if we're the top-level call (not recursive)
    # When _progress_bar is passed, it means this is a recursive call and the parent
    # is managing progress updates
    update_progress_for_self = _progress_bar is None
    acl = await self._get_current_entity_acl(
        client=client,
        progress_bar=_progress_bar if update_progress_for_self else None,
    )
    if acl is not None:
        all_acls[self.id] = acl
    all_entities.append(self)

    should_process_children = (recursive or include_container_content) and hasattr(
        self, "sync_from_synapse_async"
    )

    if should_process_children and (recursive and not include_container_content):
        raise ValueError(
            "When recursive=True, include_container_content must also be True. "
            "Setting recursive=True with include_container_content=False has no effect."
        )

    if should_process_children and _progress_bar is None:
        with shared_download_progress_bar(
            file_size=1,
            synapse_client=client,
            custom_message="Collecting ACLs...",
            unit=None,
        ) as progress_bar:
            await self._process_children_with_progress(
                client=client,
                normalized_types=normalized_types,
                include_container_content=include_container_content,
                recursive=recursive,
                all_entities=all_entities,
                all_acls=all_acls,
                progress_bar=progress_bar,
            )
            # Ensure progress bar reaches 100% completion
            if progress_bar:
                remaining = (
                    progress_bar.total - progress_bar.n
                    if progress_bar.total > progress_bar.n
                    else 0
                )
                if remaining > 0:
                    progress_bar.update(remaining)
    elif should_process_children:
        await self._process_children_with_progress(
            client=client,
            normalized_types=normalized_types,
            include_container_content=include_container_content,
            recursive=recursive,
            all_entities=all_entities,
            all_acls=all_acls,
            progress_bar=_progress_bar,
        )
    current_acl = all_acls.get(self.id)
    acl_result = AclListResult.from_dict(
        all_acl_dict=all_acls, current_acl_dict=current_acl
    )

    if log_tree:
        logged_tree = await self._log_acl_tree(acl_result, all_entities, client)
        acl_result.ascii_tree = logged_tree

    return acl_result

bind_schema_async async

bind_schema_async(json_schema_uri: str, *, enable_derived_annotations: bool = False, synapse_client: Optional[Synapse] = None) -> JSONSchemaBinding

Bind a JSON schema to the entity.

PARAMETER DESCRIPTION
json_schema_uri

The URI of the JSON schema to bind to the entity.

TYPE: str

enable_derived_annotations

If true, enable derived annotations. Defaults to False.

TYPE: bool DEFAULT: False

synapse_client

The Synapse client instance. If not provided, the last created instance from the Synapse class constructor will be used.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
JSONSchemaBinding

An object containing details about the JSON schema binding.

Using this function

Binding JSON schema to a folder or a file. This example expects that you have a Synapse project to use, and a file to upload. Set the PROJECT_NAME and FILE_PATH variables to your project name and file path respectively.

import asyncio
from synapseclient import Synapse
from synapseclient.models import File, Folder

syn = Synapse()
syn.login()

# Define Project and JSON schema info
PROJECT_NAME = "test_json_schema_project"  # replace with your project name
FILE_PATH = "~/Sample.txt"  # replace with your test file path

PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
ORG_NAME = "UniqueOrg"  # replace with your organization name
SCHEMA_NAME = "myTestSchema"  # replace with your schema name
FOLDER_NAME = "test_script_folder"
VERSION = "0.0.1"
SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

# Create organization (if not already created)
js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        print(f"Organization {ORG_NAME} already exists: {org}")
        break
else:
    print(f"Creating organization {ORG_NAME}.")
    created_organization = js.create_organization(ORG_NAME)
    print(f"Created organization: {created_organization}")

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

if not test_schema:
    # Create the schema (if not already created)
    schema_definition = {
        "$id": "mySchema",
        "type": "object",
        "properties": {
            "foo": {"type": "string"},
            "bar": {"type": "integer"},
        },
        "required": ["foo"]
    }
    test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
    print(f"Created new schema: {SCHEMA_NAME}")

async def main():
    # Create a test folder
    test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
    await test_folder.store_async()
    print(f"Created test folder: {FOLDER_NAME}")

    # Bind JSON schema to the folder
    bound_schema = await test_folder.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Result from binding schema to folder: {bound_schema}")

    # Create and bind schema to a file
    example_file = File(
        path=FILE_PATH,  # Replace with your test file path
        parent_id=test_folder.id,
    )
    await example_file.store_async()

    bound_schema_file = await example_file.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Result from binding schema to file: {bound_schema_file}")

asyncio.run(main())
Source code in synapseclient/models/mixins/json_schema.py
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
async def bind_schema_async(
    self,
    json_schema_uri: str,
    *,
    enable_derived_annotations: bool = False,
    synapse_client: Optional["Synapse"] = None,
) -> JSONSchemaBinding:
    """
    Bind a JSON schema to the entity.

    Arguments:
        json_schema_uri: The URI of the JSON schema to bind to the entity.
        enable_derived_annotations: If true, enable derived annotations. Defaults to False.
        synapse_client: The Synapse client instance. If not provided,
            the last created instance from the Synapse class constructor will be used.

    Returns:
        An object containing details about the JSON schema binding.

    Example: Using this function
        Binding JSON schema to a folder or a file. This example expects that you
        have a Synapse project to use, and a file to upload. Set the `PROJECT_NAME`
        and `FILE_PATH` variables to your project name and file path respectively.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File, Folder

        syn = Synapse()
        syn.login()

        # Define Project and JSON schema info
        PROJECT_NAME = "test_json_schema_project"  # replace with your project name
        FILE_PATH = "~/Sample.txt"  # replace with your test file path

        PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
        ORG_NAME = "UniqueOrg"  # replace with your organization name
        SCHEMA_NAME = "myTestSchema"  # replace with your schema name
        FOLDER_NAME = "test_script_folder"
        VERSION = "0.0.1"
        SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

        # Create organization (if not already created)
        js = syn.service("json_schema")
        all_orgs = js.list_organizations()
        for org in all_orgs:
            if org["name"] == ORG_NAME:
                print(f"Organization {ORG_NAME} already exists: {org}")
                break
        else:
            print(f"Creating organization {ORG_NAME}.")
            created_organization = js.create_organization(ORG_NAME)
            print(f"Created organization: {created_organization}")

        my_test_org = js.JsonSchemaOrganization(ORG_NAME)
        test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

        if not test_schema:
            # Create the schema (if not already created)
            schema_definition = {
                "$id": "mySchema",
                "type": "object",
                "properties": {
                    "foo": {"type": "string"},
                    "bar": {"type": "integer"},
                },
                "required": ["foo"]
            }
            test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
            print(f"Created new schema: {SCHEMA_NAME}")

        async def main():
            # Create a test folder
            test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
            await test_folder.store_async()
            print(f"Created test folder: {FOLDER_NAME}")

            # Bind JSON schema to the folder
            bound_schema = await test_folder.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Result from binding schema to folder: {bound_schema}")

            # Create and bind schema to a file
            example_file = File(
                path=FILE_PATH,  # Replace with your test file path
                parent_id=test_folder.id,
            )
            await example_file.store_async()

            bound_schema_file = await example_file.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Result from binding schema to file: {bound_schema_file}")

        asyncio.run(main())
        ```
    """
    response = await bind_json_schema_to_entity(
        synapse_id=self.id,
        json_schema_uri=json_schema_uri,
        enable_derived_annotations=enable_derived_annotations,
        synapse_client=synapse_client,
    )
    json_schema_version = response.get("jsonSchemaVersionInfo", {})
    return JSONSchemaBinding(
        json_schema_version_info=JSONSchemaVersionInfo(
            organization_id=json_schema_version.get("organizationId", None),
            organization_name=json_schema_version.get("organizationName", None),
            schema_id=json_schema_version.get("schemaId", None),
            id=json_schema_version.get("$id", None),
            schema_name=json_schema_version.get("schemaName", None),
            version_id=json_schema_version.get("versionId", None),
            semantic_version=json_schema_version.get("semanticVersion", None),
            json_sha256_hex=json_schema_version.get("jsonSHA256Hex", None),
            created_on=json_schema_version.get("createdOn", None),
            created_by=json_schema_version.get("createdBy", None),
        ),
        object_id=response.get("objectId", None),
        object_type=response.get("objectType", None),
        created_on=response.get("createdOn", None),
        created_by=response.get("createdBy", None),
        enable_derived_annotations=response.get("enableDerivedAnnotations", None),
    )

get_schema_async async

get_schema_async(*, synapse_client: Optional[Synapse] = None) -> JSONSchemaBinding

Get the JSON schema bound to the entity.

PARAMETER DESCRIPTION
synapse_client

The Synapse client instance. If not provided, the last created instance from the Synapse class constructor will be used.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
JSONSchemaBinding

An object containing details about the bound JSON schema.

Using this function

Retrieving the bound JSON schema from a folder or file. This example demonstrates how to get existing schema bindings from entities that already have schemas bound. Set the PROJECT_NAME and FILE_PATH variables to your project name and file path respectively.

import asyncio
from synapseclient import Synapse
from synapseclient.models import File, Folder

syn = Synapse()
syn.login()

# Define Project and JSON schema info
PROJECT_NAME = "test_json_schema_project"  # replace with your project name
FILE_PATH = "~/Sample.txt"  # replace with your test file path

PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
ORG_NAME = "UniqueOrg"  # replace with your organization name
SCHEMA_NAME = "myTestSchema"  # replace with your schema name
FOLDER_NAME = "test_script_folder"
VERSION = "0.0.1"
SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

# Create organization (if not already created)
js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        print(f"Organization {ORG_NAME} already exists: {org}")
        break
else:
    print(f"Creating organization {ORG_NAME}.")
    created_organization = js.create_organization(ORG_NAME)
    print(f"Created organization: {created_organization}")

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

if not test_schema:
    # Create the schema (if not already created)
    schema_definition = {
        "$id": "mySchema",
        "type": "object",
        "properties": {
            "foo": {"type": "string"},
            "bar": {"type": "integer"},
        },
        "required": ["foo"]
    }
    test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
    print(f"Created new schema: {SCHEMA_NAME}")

async def main():
    # Create a test folder
    test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
    await test_folder.store_async()
    print(f"Created test folder: {FOLDER_NAME}")

    # Bind JSON schema to the folder first
    bound_schema = await test_folder.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to folder: {bound_schema}")

    # Create and bind schema to a file
    example_file = File(
        path=FILE_PATH,  # Replace with your test file path
        parent_id=test_folder.id,
    )
    await example_file.store_async()

    bound_schema_file = await example_file.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to file: {bound_schema_file}")

    # Retrieve the bound schema from the folder
    bound_schema = await test_folder.get_schema_async()
    print(f"Retrieved schema from folder: {bound_schema}")

    # Retrieve the bound schema from the file
    bound_schema_file = await example_file.get_schema_async()
    print(f"Retrieved schema from file: {bound_schema_file}")

asyncio.run(main())
Source code in synapseclient/models/mixins/json_schema.py
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
async def get_schema_async(
    self, *, synapse_client: Optional["Synapse"] = None
) -> JSONSchemaBinding:
    """
    Get the JSON schema bound to the entity.

    Arguments:
        synapse_client: The Synapse client instance. If not provided,
            the last created instance from the Synapse class constructor will be used.

    Returns:
        An object containing details about the bound JSON schema.

    Example: Using this function
        Retrieving the bound JSON schema from a folder or file. This example demonstrates
        how to get existing schema bindings from entities that already have schemas bound.
        Set the `PROJECT_NAME` and `FILE_PATH` variables to your project name
        and file path respectively.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File, Folder

        syn = Synapse()
        syn.login()

        # Define Project and JSON schema info
        PROJECT_NAME = "test_json_schema_project"  # replace with your project name
        FILE_PATH = "~/Sample.txt"  # replace with your test file path

        PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
        ORG_NAME = "UniqueOrg"  # replace with your organization name
        SCHEMA_NAME = "myTestSchema"  # replace with your schema name
        FOLDER_NAME = "test_script_folder"
        VERSION = "0.0.1"
        SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

        # Create organization (if not already created)
        js = syn.service("json_schema")
        all_orgs = js.list_organizations()
        for org in all_orgs:
            if org["name"] == ORG_NAME:
                print(f"Organization {ORG_NAME} already exists: {org}")
                break
        else:
            print(f"Creating organization {ORG_NAME}.")
            created_organization = js.create_organization(ORG_NAME)
            print(f"Created organization: {created_organization}")

        my_test_org = js.JsonSchemaOrganization(ORG_NAME)
        test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

        if not test_schema:
            # Create the schema (if not already created)
            schema_definition = {
                "$id": "mySchema",
                "type": "object",
                "properties": {
                    "foo": {"type": "string"},
                    "bar": {"type": "integer"},
                },
                "required": ["foo"]
            }
            test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
            print(f"Created new schema: {SCHEMA_NAME}")

        async def main():
            # Create a test folder
            test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
            await test_folder.store_async()
            print(f"Created test folder: {FOLDER_NAME}")

            # Bind JSON schema to the folder first
            bound_schema = await test_folder.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to folder: {bound_schema}")

            # Create and bind schema to a file
            example_file = File(
                path=FILE_PATH,  # Replace with your test file path
                parent_id=test_folder.id,
            )
            await example_file.store_async()

            bound_schema_file = await example_file.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to file: {bound_schema_file}")

            # Retrieve the bound schema from the folder
            bound_schema = await test_folder.get_schema_async()
            print(f"Retrieved schema from folder: {bound_schema}")

            # Retrieve the bound schema from the file
            bound_schema_file = await example_file.get_schema_async()
            print(f"Retrieved schema from file: {bound_schema_file}")

        asyncio.run(main())
        ```
    """
    response = await get_json_schema_from_entity(
        synapse_id=self.id, synapse_client=synapse_client
    )
    json_schema_version_info = response.get("jsonSchemaVersionInfo", {})
    return JSONSchemaBinding(
        json_schema_version_info=JSONSchemaVersionInfo(
            organization_id=json_schema_version_info.get("organizationId", None),
            organization_name=json_schema_version_info.get(
                "organizationName", None
            ),
            schema_id=json_schema_version_info.get("schemaId", None),
            id=json_schema_version_info.get("$id", None),
            schema_name=json_schema_version_info.get("schemaName", None),
            version_id=json_schema_version_info.get("versionId", None),
            semantic_version=json_schema_version_info.get("semanticVersion", None),
            json_sha256_hex=json_schema_version_info.get("jsonSHA256Hex", None),
            created_on=json_schema_version_info.get("createdOn", None),
            created_by=json_schema_version_info.get("createdBy", None),
        ),
        object_id=response.get("objectId", None),
        object_type=response.get("objectType", None),
        created_on=response.get("createdOn", None),
        created_by=response.get("createdBy", None),
        enable_derived_annotations=response.get("enableDerivedAnnotations", None),
    )

unbind_schema_async async

unbind_schema_async(*, synapse_client: Optional[Synapse] = None) -> None

Unbind the JSON schema bound to the entity.

PARAMETER DESCRIPTION
synapse_client

The Synapse client instance. If not provided, the last created instance from the Synapse class constructor will be used.

TYPE: Optional[Synapse] DEFAULT: None

Using this function

Unbinding a JSON schema from a folder or file. This example demonstrates how to remove schema bindings from entities. Assumes entities already have schemas bound. Set the PROJECT_NAME and FILE_PATH variables to your project name and file path respectively.

import asyncio
from synapseclient import Synapse
from synapseclient.models import File, Folder

syn = Synapse()
syn.login()

# Define Project and JSON schema info
PROJECT_NAME = "test_json_schema_project"  # replace with your project name
FILE_PATH = "~/Sample.txt"  # replace with your test file path

PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
ORG_NAME = "UniqueOrg"  # replace with your organization name
SCHEMA_NAME = "myTestSchema"  # replace with your schema name
FOLDER_NAME = "test_script_folder"
VERSION = "0.0.1"
SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

# Create organization (if not already created)
js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        print(f"Organization {ORG_NAME} already exists: {org}")
        break
else:
    print(f"Creating organization {ORG_NAME}.")
    created_organization = js.create_organization(ORG_NAME)
    print(f"Created organization: {created_organization}")

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

if not test_schema:
    # Create the schema (if not already created)
    schema_definition = {
        "$id": "mySchema",
        "type": "object",
        "properties": {
            "foo": {"type": "string"},
            "bar": {"type": "integer"},
        },
        "required": ["foo"]
    }
    test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
    print(f"Created new schema: {SCHEMA_NAME}")

async def main():
    # Create a test folder
    test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
    await test_folder.store_async()
    print(f"Created test folder: {FOLDER_NAME}")

    # Bind JSON schema to the folder first
    bound_schema = await test_folder.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to folder: {bound_schema}")

    # Create and bind schema to a file
    example_file = File(
        path=FILE_PATH,  # Replace with your test file path
        parent_id=test_folder.id,
    )
    await example_file.store_async()
    print(f"Created test file: {FILE_PATH}")

    bound_schema_file = await example_file.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to file: {bound_schema_file}")

    # Unbind the schema from the folder
    await test_folder.unbind_schema_async()
    print("Successfully unbound schema from folder")

    # Unbind the schema from the file
    await example_file.unbind_schema_async()
    print("Successfully unbound schema from file")

asyncio.run(main())
Source code in synapseclient/models/mixins/json_schema.py
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
async def unbind_schema_async(
    self, *, synapse_client: Optional["Synapse"] = None
) -> None:
    """
    Unbind the JSON schema bound to the entity.

    Arguments:
        synapse_client: The Synapse client instance. If not provided,
            the last created instance from the Synapse class constructor will be used.

    Example: Using this function
        Unbinding a JSON schema from a folder or file. This example demonstrates
        how to remove schema bindings from entities. Assumes entities already have
        schemas bound. Set the `PROJECT_NAME` and `FILE_PATH` variables to your
        project name and file path respectively.


        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File, Folder

        syn = Synapse()
        syn.login()

        # Define Project and JSON schema info
        PROJECT_NAME = "test_json_schema_project"  # replace with your project name
        FILE_PATH = "~/Sample.txt"  # replace with your test file path

        PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
        ORG_NAME = "UniqueOrg"  # replace with your organization name
        SCHEMA_NAME = "myTestSchema"  # replace with your schema name
        FOLDER_NAME = "test_script_folder"
        VERSION = "0.0.1"
        SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

        # Create organization (if not already created)
        js = syn.service("json_schema")
        all_orgs = js.list_organizations()
        for org in all_orgs:
            if org["name"] == ORG_NAME:
                print(f"Organization {ORG_NAME} already exists: {org}")
                break
        else:
            print(f"Creating organization {ORG_NAME}.")
            created_organization = js.create_organization(ORG_NAME)
            print(f"Created organization: {created_organization}")

        my_test_org = js.JsonSchemaOrganization(ORG_NAME)
        test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

        if not test_schema:
            # Create the schema (if not already created)
            schema_definition = {
                "$id": "mySchema",
                "type": "object",
                "properties": {
                    "foo": {"type": "string"},
                    "bar": {"type": "integer"},
                },
                "required": ["foo"]
            }
            test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
            print(f"Created new schema: {SCHEMA_NAME}")

        async def main():
            # Create a test folder
            test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
            await test_folder.store_async()
            print(f"Created test folder: {FOLDER_NAME}")

            # Bind JSON schema to the folder first
            bound_schema = await test_folder.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to folder: {bound_schema}")

            # Create and bind schema to a file
            example_file = File(
                path=FILE_PATH,  # Replace with your test file path
                parent_id=test_folder.id,
            )
            await example_file.store_async()
            print(f"Created test file: {FILE_PATH}")

            bound_schema_file = await example_file.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to file: {bound_schema_file}")

            # Unbind the schema from the folder
            await test_folder.unbind_schema_async()
            print("Successfully unbound schema from folder")

            # Unbind the schema from the file
            await example_file.unbind_schema_async()
            print("Successfully unbound schema from file")

        asyncio.run(main())
        ```
    """
    return await delete_json_schema_from_entity(
        synapse_id=self.id, synapse_client=synapse_client
    )

validate_schema_async async

validate_schema_async(*, synapse_client: Optional[Synapse] = None) -> Union[JSONSchemaValidation, InvalidJSONSchemaValidation]

Validate the entity against the bound JSON schema.

PARAMETER DESCRIPTION
synapse_client

The Synapse client instance. If not provided, the last created instance from the Synapse class constructor will be used.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Union[JSONSchemaValidation, InvalidJSONSchemaValidation]

The validation results.

Using this function

Validating a folder or file against the bound JSON schema. This example demonstrates how to validate entities with annotations against their bound schemas. Requires entities to have schemas already bound. Set the PROJECT_NAME and FILE_PATH variables to your project name and file path respectively.

import asyncio
import time
from synapseclient import Synapse
from synapseclient.models import File, Folder

syn = Synapse()
syn.login()

# Define Project and JSON schema info
PROJECT_NAME = "test_json_schema_project"  # replace with your project name
FILE_PATH = "~/Sample.txt"  # replace with your test file path

PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
ORG_NAME = "UniqueOrg"  # replace with your organization name
SCHEMA_NAME = "myTestSchema"  # replace with your schema name
FOLDER_NAME = "test_script_folder"
VERSION = "0.0.1"
SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

# Create organization (if not already created)
js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        print(f"Organization {ORG_NAME} already exists: {org}")
        break
else:
    print(f"Creating organization {ORG_NAME}.")
    created_organization = js.create_organization(ORG_NAME)
    print(f"Created organization: {created_organization}")

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

if not test_schema:
    # Create the schema (if not already created)
    schema_definition = {
        "$id": "mySchema",
        "type": "object",
        "properties": {
            "foo": {"type": "string"},
            "bar": {"type": "integer"},
        },
        "required": ["foo"]
    }
    test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
    print(f"Created new schema: {SCHEMA_NAME}")

async def main():
    # Create a test folder
    test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
    await test_folder.store_async()
    print(f"Created test folder: {FOLDER_NAME}")

    # Bind JSON schema to the folder
    bound_schema = await test_folder.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to folder: {bound_schema}")

    # Create and bind schema to a file
    example_file = File(
        path=FILE_PATH,  # Replace with your test file path
        parent_id=test_folder.id,
    )
    await example_file.store_async()

    bound_schema_file = await example_file.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to file: {bound_schema_file}")

    # Validate the folder entity against the bound schema
    test_folder.annotations = {"foo": "test_value", "bar": 42}  # Example annotations
    await test_folder.store_async()
    print("Added annotations to folder and stored")
    time.sleep(2)  # Allow time for processing

    validation_response = await test_folder.validate_schema_async()
    print(f"Folder validation response: {validation_response}")

    # Validate the file entity against the bound schema
    example_file.annotations = {"foo": "test_value", "bar": 43}  # Example annotations
    await example_file.store_async()
    print("Added annotations to file and stored")
    time.sleep(2)  # Allow time for processing

    validation_response_file = await example_file.validate_schema_async()
    print(f"File validation response: {validation_response_file}")

asyncio.run(main())
Source code in synapseclient/models/mixins/json_schema.py
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
async def validate_schema_async(
    self, *, synapse_client: Optional["Synapse"] = None
) -> Union[JSONSchemaValidation, InvalidJSONSchemaValidation]:
    """
    Validate the entity against the bound JSON schema.

    Arguments:
        synapse_client (Optional[Synapse], optional): The Synapse client instance. If not provided,
            the last created instance from the Synapse class constructor will be used.

    Returns:
        The validation results.

    Example: Using this function
        Validating a folder or file against the bound JSON schema. This example demonstrates
        how to validate entities with annotations against their bound schemas. Requires entities
        to have schemas already bound. Set the `PROJECT_NAME` and `FILE_PATH` variables to your project name
        and file path respectively.

        ```python
        import asyncio
        import time
        from synapseclient import Synapse
        from synapseclient.models import File, Folder

        syn = Synapse()
        syn.login()

        # Define Project and JSON schema info
        PROJECT_NAME = "test_json_schema_project"  # replace with your project name
        FILE_PATH = "~/Sample.txt"  # replace with your test file path

        PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
        ORG_NAME = "UniqueOrg"  # replace with your organization name
        SCHEMA_NAME = "myTestSchema"  # replace with your schema name
        FOLDER_NAME = "test_script_folder"
        VERSION = "0.0.1"
        SCHEMA_URI = f"{ORG_NAME}-{SCHEMA_NAME}-{VERSION}"

        # Create organization (if not already created)
        js = syn.service("json_schema")
        all_orgs = js.list_organizations()
        for org in all_orgs:
            if org["name"] == ORG_NAME:
                print(f"Organization {ORG_NAME} already exists: {org}")
                break
        else:
            print(f"Creating organization {ORG_NAME}.")
            created_organization = js.create_organization(ORG_NAME)
            print(f"Created organization: {created_organization}")

        my_test_org = js.JsonSchemaOrganization(ORG_NAME)
        test_schema = my_test_org.get_json_schema(SCHEMA_NAME)

        if not test_schema:
            # Create the schema (if not already created)
            schema_definition = {
                "$id": "mySchema",
                "type": "object",
                "properties": {
                    "foo": {"type": "string"},
                    "bar": {"type": "integer"},
                },
                "required": ["foo"]
            }
            test_schema = my_test_org.create_json_schema(schema_definition, SCHEMA_NAME, VERSION)
            print(f"Created new schema: {SCHEMA_NAME}")

        async def main():
            # Create a test folder
            test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
            await test_folder.store_async()
            print(f"Created test folder: {FOLDER_NAME}")

            # Bind JSON schema to the folder
            bound_schema = await test_folder.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to folder: {bound_schema}")

            # Create and bind schema to a file
            example_file = File(
                path=FILE_PATH,  # Replace with your test file path
                parent_id=test_folder.id,
            )
            await example_file.store_async()

            bound_schema_file = await example_file.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to file: {bound_schema_file}")

            # Validate the folder entity against the bound schema
            test_folder.annotations = {"foo": "test_value", "bar": 42}  # Example annotations
            await test_folder.store_async()
            print("Added annotations to folder and stored")
            time.sleep(2)  # Allow time for processing

            validation_response = await test_folder.validate_schema_async()
            print(f"Folder validation response: {validation_response}")

            # Validate the file entity against the bound schema
            example_file.annotations = {"foo": "test_value", "bar": 43}  # Example annotations
            await example_file.store_async()
            print("Added annotations to file and stored")
            time.sleep(2)  # Allow time for processing

            validation_response_file = await example_file.validate_schema_async()
            print(f"File validation response: {validation_response_file}")

        asyncio.run(main())
        ```
    """
    response = await validate_entity_with_json_schema(
        synapse_id=self.id, synapse_client=synapse_client
    )
    if "validationException" in response:
        return InvalidJSONSchemaValidation(
            validation_response=JSONSchemaValidation(
                object_id=response.get("objectId", None),
                object_type=response.get("objectType", None),
                object_etag=response.get("objectEtag", None),
                id=response.get("schema$id", None),
                is_valid=response.get("isValid", None),
                validated_on=response.get("validatedOn", None),
            ),
            validation_error_message=response.get("validationErrorMessage", None),
            all_validation_messages=response.get("allValidationMessages", []),
            validation_exception=ValidationException(
                pointer_to_violation=response.get("validationException", {}).get(
                    "pointerToViolation", None
                ),
                message=response.get("validationException", {}).get(
                    "message", None
                ),
                schema_location=response.get("validationException", {}).get(
                    "schemaLocation", None
                ),
                causing_exceptions=[
                    CausingException(
                        keyword=ce.get("keyword", None),
                        pointer_to_violation=ce.get("pointerToViolation", None),
                        message=ce.get("message", None),
                        schema_location=ce.get("schemaLocation", None),
                        causing_exceptions=[
                            CausingException(
                                keyword=nce.get("keyword", None),
                                pointer_to_violation=nce.get(
                                    "pointerToViolation", None
                                ),
                                message=nce.get("message", None),
                                schema_location=nce.get("schemaLocation", None),
                            )
                            for nce in ce.get("causingExceptions", [])
                        ],
                    )
                    for ce in response.get("validationException", {}).get(
                        "causingExceptions", []
                    )
                ],
            ),
        )
    return JSONSchemaValidation(
        object_id=response.get("objectId", None),
        object_type=response.get("objectType", None),
        object_etag=response.get("objectEtag", None),
        id=response.get("schema$id", None),
        is_valid=response.get("isValid", None),
        validated_on=response.get("validatedOn", None),
    )

get_schema_derived_keys_async async

get_schema_derived_keys_async(*, synapse_client: Optional[Synapse] = None) -> JSONSchemaDerivedKeys

Retrieve derived JSON schema keys for the entity.

PARAMETER DESCRIPTION
synapse_client

The Synapse client instance. If not provided, the last created instance from the Synapse class constructor will be used.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
JSONSchemaDerivedKeys

An object containing the derived keys for the entity.

Using this function

Retrieving derived keys from a folder or file. This example demonstrates how to get derived annotation keys from schemas with constant values. Set the PROJECT_NAME variable to your project name.

import asyncio
from synapseclient import Synapse
from synapseclient.models import File, Folder

syn = Synapse()
syn.login()

# Define Project and JSON schema info
PROJECT_NAME = "test_json_schema_project"  # replace with your project name
FILE_PATH = "~/Sample.txt"  # replace with your test file path

PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
ORG_NAME = "UniqueOrg"  # replace with your organization name
DERIVED_TEST_SCHEMA_NAME = "myTestDerivedSchema"  # replace with your derived schema name
FOLDER_NAME = "test_script_folder"
VERSION = "0.0.1"
SCHEMA_URI = f"{ORG_NAME}-{DERIVED_TEST_SCHEMA_NAME}-{VERSION}"

# Create organization (if not already created)
js = syn.service("json_schema")
all_orgs = js.list_organizations()
for org in all_orgs:
    if org["name"] == ORG_NAME:
        print(f"Organization {ORG_NAME} already exists: {org}")
        break
else:
    print(f"Creating organization {ORG_NAME}.")
    created_organization = js.create_organization(ORG_NAME)
    print(f"Created organization: {created_organization}")

my_test_org = js.JsonSchemaOrganization(ORG_NAME)
test_schema = my_test_org.get_json_schema(DERIVED_TEST_SCHEMA_NAME)

if not test_schema:
    # Create the schema (if not already created)
    schema_definition = {
        "$id": "mySchema",
        "type": "object",
        "properties": {
            "foo": {"type": "string"},
            "baz": {"type": "string", "const": "example_value"},  # Example constant for derived annotation
            "bar": {"type": "integer"},
        },
        "required": ["foo"]
    }
    test_schema = my_test_org.create_json_schema(schema_definition, DERIVED_TEST_SCHEMA_NAME, VERSION)
    print(f"Created new derived schema: {DERIVED_TEST_SCHEMA_NAME}")

async def main():
    # Create a test folder
    test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
    await test_folder.store_async()
    print(f"Created test folder: {FOLDER_NAME}")

    # Bind JSON schema to the folder
    bound_schema = await test_folder.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to folder with derived annotations: {bound_schema}")

    # Create and bind schema to a file
    example_file = File(
        path=FILE_PATH,  # Replace with your test file path
        parent_id=test_folder.id,
    )
    await example_file.store_async()

    bound_schema_file = await example_file.bind_schema_async(
        json_schema_uri=SCHEMA_URI,
        enable_derived_annotations=True
    )
    print(f"Bound schema to file with derived annotations: {bound_schema_file}")

    # Get the derived keys from the bound schema of the folder
    test_folder.annotations = {"foo": "test_value_new", "bar": 42}  # Example annotations
    await test_folder.store_async()
    print("Added annotations to folder and stored")

    derived_keys = await test_folder.get_schema_derived_keys_async()
    print(f"Derived keys from folder: {derived_keys}")

    # Get the derived keys from the bound schema of the file
    example_file.annotations = {"foo": "test_value_new", "bar": 43}  # Example annotations
    await example_file.store_async()
    print("Added annotations to file and stored")

    derived_keys_file = await example_file.get_schema_derived_keys_async()
    print(f"Derived keys from file: {derived_keys_file}")

asyncio.run(main())
Source code in synapseclient/models/mixins/json_schema.py
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
async def get_schema_derived_keys_async(
    self, *, synapse_client: Optional["Synapse"] = None
) -> JSONSchemaDerivedKeys:
    """
    Retrieve derived JSON schema keys for the entity.

    Arguments:
        synapse_client (Optional[Synapse], optional): The Synapse client instance. If not provided,
            the last created instance from the Synapse class constructor will be used.

    Returns:
        An object containing the derived keys for the entity.

    Example: Using this function
        Retrieving derived keys from a folder or file. This example demonstrates
        how to get derived annotation keys from schemas with constant values.
        Set the `PROJECT_NAME` variable to your project name.

        ```python
        import asyncio
        from synapseclient import Synapse
        from synapseclient.models import File, Folder

        syn = Synapse()
        syn.login()

        # Define Project and JSON schema info
        PROJECT_NAME = "test_json_schema_project"  # replace with your project name
        FILE_PATH = "~/Sample.txt"  # replace with your test file path

        PROJECT_ID = syn.findEntityId(name=PROJECT_NAME)
        ORG_NAME = "UniqueOrg"  # replace with your organization name
        DERIVED_TEST_SCHEMA_NAME = "myTestDerivedSchema"  # replace with your derived schema name
        FOLDER_NAME = "test_script_folder"
        VERSION = "0.0.1"
        SCHEMA_URI = f"{ORG_NAME}-{DERIVED_TEST_SCHEMA_NAME}-{VERSION}"

        # Create organization (if not already created)
        js = syn.service("json_schema")
        all_orgs = js.list_organizations()
        for org in all_orgs:
            if org["name"] == ORG_NAME:
                print(f"Organization {ORG_NAME} already exists: {org}")
                break
        else:
            print(f"Creating organization {ORG_NAME}.")
            created_organization = js.create_organization(ORG_NAME)
            print(f"Created organization: {created_organization}")

        my_test_org = js.JsonSchemaOrganization(ORG_NAME)
        test_schema = my_test_org.get_json_schema(DERIVED_TEST_SCHEMA_NAME)

        if not test_schema:
            # Create the schema (if not already created)
            schema_definition = {
                "$id": "mySchema",
                "type": "object",
                "properties": {
                    "foo": {"type": "string"},
                    "baz": {"type": "string", "const": "example_value"},  # Example constant for derived annotation
                    "bar": {"type": "integer"},
                },
                "required": ["foo"]
            }
            test_schema = my_test_org.create_json_schema(schema_definition, DERIVED_TEST_SCHEMA_NAME, VERSION)
            print(f"Created new derived schema: {DERIVED_TEST_SCHEMA_NAME}")

        async def main():
            # Create a test folder
            test_folder = Folder(name=FOLDER_NAME, parent_id=PROJECT_ID)
            await test_folder.store_async()
            print(f"Created test folder: {FOLDER_NAME}")

            # Bind JSON schema to the folder
            bound_schema = await test_folder.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to folder with derived annotations: {bound_schema}")

            # Create and bind schema to a file
            example_file = File(
                path=FILE_PATH,  # Replace with your test file path
                parent_id=test_folder.id,
            )
            await example_file.store_async()

            bound_schema_file = await example_file.bind_schema_async(
                json_schema_uri=SCHEMA_URI,
                enable_derived_annotations=True
            )
            print(f"Bound schema to file with derived annotations: {bound_schema_file}")

            # Get the derived keys from the bound schema of the folder
            test_folder.annotations = {"foo": "test_value_new", "bar": 42}  # Example annotations
            await test_folder.store_async()
            print("Added annotations to folder and stored")

            derived_keys = await test_folder.get_schema_derived_keys_async()
            print(f"Derived keys from folder: {derived_keys}")

            # Get the derived keys from the bound schema of the file
            example_file.annotations = {"foo": "test_value_new", "bar": 43}  # Example annotations
            await example_file.store_async()
            print("Added annotations to file and stored")

            derived_keys_file = await example_file.get_schema_derived_keys_async()
            print(f"Derived keys from file: {derived_keys_file}")

        asyncio.run(main())
        ```
    """
    response = await get_json_schema_derived_keys(
        synapse_id=self.id, synapse_client=synapse_client
    )
    return JSONSchemaDerivedKeys(keys=response["keys"])

synapseclient.models.Column dataclass

Bases: ColumnSynchronousProtocol

A column model contains the metadata of a single column of a table or view.

Source code in synapseclient/models/table_components.py
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
@dataclass
@async_to_sync
class Column(ColumnSynchronousProtocol):
    """A column model contains the metadata of a single column of a table or view."""

    id: Optional[str] = None
    """The immutable ID issued to new columns"""

    name: Optional[str] = None
    """The display name of the column"""

    column_type: Optional[ColumnType] = None
    """The column type determines the type of data that can be stored in a column.
    Switching between types (using a transaction with TableUpdateTransaction
    in the "changes" list) is generally allowed except for switching to "_LIST"
    suffixed types. In such cases, a new column must be created and data must be
    copied over manually"""

    facet_type: Optional[FacetType] = None
    """Set to one of the enumerated values to indicate a column should be
    treated as a facet"""

    default_value: Optional[str] = None
    """The default value for this column. Columns of type ENTITYID, FILEHANDLEID,
    USERID, and LARGETEXT are not allowed to have default values."""

    maximum_size: Optional[int] = None
    """A parameter for columnTypes with a maximum size. For example, ColumnType.STRINGs
    have a default maximum size of 50 characters, but can be set to a maximumSize
    of 1 to 1000 characters. For columnType of STRING_LIST, this limits the size
    of individual string elements in the list"""

    maximum_list_length: Optional[int] = None
    """Required if using a columnType with a "_LIST" suffix. Describes the maximum number
    of values that will appear in that list. Value range 1-100 inclusive. Default 100"""

    enum_values: Optional[List[str]] = None
    """Columns of type STRING can be constrained to an enumeration values set on this
    list. The maximum number of entries for an enum is 100"""

    json_sub_columns: Optional[List[JsonSubColumn]] = None
    """For column of type JSON that represents the combination of multiple sub-columns,
    this property is used to define each sub-column."""

    _last_persistent_instance: Optional["Column"] = field(
        default=None, repr=False, compare=False
    )
    """The last persistent instance of this object. This is used to determine if the
    object has been changed and needs to be updated in Synapse."""

    async def get_async(
        self, *, synapse_client: Optional["Synapse"] = None
    ) -> "Column":
        """
        Get a column by its ID.

        Arguments:
            synapse_client: If not passed in and caching was not disabled by
                `Synapse.allow_client_caching(False)` this will use the last created
                instance from the Synapse class constructor.

        Returns:
            The Column instance.

        Example: Getting a column by ID
            Getting a column by ID

                import asyncio
                from synapseclient import Synapse
                from synapseclient.models import Column

                syn = Synapse()
                syn.login()

                async def get_column():
                    column = await Column(id="123").get_async()
                    print(column.name)

                asyncio.run(get_column())
        """
        from synapseclient.api import get_column

        if not self.id:
            raise ValueError("Column ID is required to get a column")

        result = await get_column(
            column_id=self.id,
            synapse_client=synapse_client,
        )

        self.fill_from_dict(result)
        return self

    @skip_async_to_sync
    @staticmethod
    async def list_async(
        prefix: Optional[str] = None,
        limit: int = 100,
        offset: int = 0,
        *,
        synapse_client: Optional["Synapse"] = None,
    ) -> AsyncGenerator["Column", None]:
        """
        List columns with optional prefix filtering.

        Arguments:
            prefix: Optional prefix to filter columns by name.
            limit: Number of columns to retrieve per request to Synapse (pagination parameter).
                The function will continue retrieving results until all matching columns are returned.
            offset: The index of the first column to return (pagination parameter).
            synapse_client: If not passed in and caching was not disabled by
                `Synapse.allow_client_caching(False)` this will use the last created
                instance from the Synapse class constructor.

        Yields:
            Column instances.

        Example: Getting all columns
            Getting all columns

                import asyncio
                from synapseclient import Synapse
                from synapseclient.models import Column

                syn = Synapse()
                syn.login()

                async def get_columns():
                    async for column in Column.list_async():
                        print(column.name)

                asyncio.run(get_columns())

        Example: Getting columns with a prefix
            Getting columns with a prefix

                import asyncio
                from synapseclient import Synapse
                from synapseclient.models import Column

                syn = Synapse()
                syn.login()

                async def get_columns():
                    async for column in Column.list_async(prefix="my_prefix"):
                        print(column.name)

                asyncio.run(get_columns())
        """
        from synapseclient.api import list_columns

        async for column in list_columns(
            prefix=prefix,
            limit=limit,
            offset=offset,
            synapse_client=synapse_client,
        ):
            yield column

    def fill_from_dict(
        self, synapse_column: Union[Synapse_Column, Dict[str, Any]]
    ) -> "Column":
        """Converts a response from the synapseclient into this dataclass."""
        self.id = synapse_column.get("id", None)
        self.name = synapse_column.get("name", None)
        self.column_type = (
            ColumnType(synapse_column.get("columnType", None))
            if synapse_column.get("columnType", None)
            else None
        )
        self.facet_type = (
            FacetType(synapse_column.get("facetType", None))
            if synapse_column.get("facetType", None)
            else None
        )
        self.default_value = synapse_column.get("defaultValue", None)
        self.maximum_size = synapse_column.get("maximumSize", None)
        self.maximum_list_length = synapse_column.get("maximumListLength", None)
        self.enum_values = synapse_column.get("enumValues", None)

        json_sub_columns_data = synapse_column.get("jsonSubColumns", None)
        if json_sub_columns_data:
            self.json_sub_columns = [
                JsonSubColumn.fill_from_dict(sub_column_data)
                for sub_column_data in json_sub_columns_data
            ]
        else:
            self.json_sub_columns = None

        self._set_last_persistent_instance()
        return self

    @property
    def has_changed(self) -> bool:
        """Determines if the object has been changed and needs to be updated in Synapse."""
        return (
            not self._last_persistent_instance or self._last_persistent_instance != self
        )

    def _set_last_persistent_instance(self) -> None:
        """Stash the last time this object interacted with Synapse. This is used to
        determine if the object has been changed and needs to be updated in Synapse."""
        del self._last_persistent_instance
        self._last_persistent_instance = replace(self)
        self._last_persistent_instance.json_sub_columns = (
            [replace(sub_col) for sub_col in self.json_sub_columns]
            if self.json_sub_columns
            else None
        )

    def to_synapse_request(self) -> Dict[str, Any]:
        """Converts the Column object into a dictionary that can be passed into the
        REST API."""
        if self.column_type and isinstance(self.column_type, str):
            self.column_type = ColumnType(self.column_type)

        if self.facet_type and isinstance(self.facet_type, str):
            self.facet_type = FacetType(self.facet_type)
        result = {
            "concreteType": concrete_types.COLUMN_MODEL,
            "name": self.name,
            "columnType": self.column_type.value if self.column_type else None,
            "facetType": self.facet_type.value if self.facet_type else None,
            "defaultValue": self.default_value,
            "maximumSize": self.maximum_size,
            "maximumListLength": self.maximum_list_length,
            "enumValues": self.enum_values,
            "jsonSubColumns": (
                [
                    sub_column.to_synapse_request()
                    for sub_column in self.json_sub_columns
                ]
                if self.json_sub_columns
                else None
            ),
        }
        delete_none_keys(result)
        return result

Functions

get_async async

get_async(*, synapse_client: Optional[Synapse] = None) -> Column

Get a column by its ID.

PARAMETER DESCRIPTION
synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

RETURNS DESCRIPTION
Column

The Column instance.

Getting a column by ID

Getting a column by ID

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column

syn = Synapse()
syn.login()

async def get_column():
    column = await Column(id="123").get_async()
    print(column.name)

asyncio.run(get_column())
Source code in synapseclient/models/table_components.py
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
async def get_async(
    self, *, synapse_client: Optional["Synapse"] = None
) -> "Column":
    """
    Get a column by its ID.

    Arguments:
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Returns:
        The Column instance.

    Example: Getting a column by ID
        Getting a column by ID

            import asyncio
            from synapseclient import Synapse
            from synapseclient.models import Column

            syn = Synapse()
            syn.login()

            async def get_column():
                column = await Column(id="123").get_async()
                print(column.name)

            asyncio.run(get_column())
    """
    from synapseclient.api import get_column

    if not self.id:
        raise ValueError("Column ID is required to get a column")

    result = await get_column(
        column_id=self.id,
        synapse_client=synapse_client,
    )

    self.fill_from_dict(result)
    return self

list_async async staticmethod

list_async(prefix: Optional[str] = None, limit: int = 100, offset: int = 0, *, synapse_client: Optional[Synapse] = None) -> AsyncGenerator[Column, None]

List columns with optional prefix filtering.

PARAMETER DESCRIPTION
prefix

Optional prefix to filter columns by name.

TYPE: Optional[str] DEFAULT: None

limit

Number of columns to retrieve per request to Synapse (pagination parameter). The function will continue retrieving results until all matching columns are returned.

TYPE: int DEFAULT: 100

offset

The index of the first column to return (pagination parameter).

TYPE: int DEFAULT: 0

synapse_client

If not passed in and caching was not disabled by Synapse.allow_client_caching(False) this will use the last created instance from the Synapse class constructor.

TYPE: Optional[Synapse] DEFAULT: None

YIELDS DESCRIPTION
AsyncGenerator[Column, None]

Column instances.

Getting all columns

Getting all columns

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column

syn = Synapse()
syn.login()

async def get_columns():
    async for column in Column.list_async():
        print(column.name)

asyncio.run(get_columns())
Getting columns with a prefix

Getting columns with a prefix

import asyncio
from synapseclient import Synapse
from synapseclient.models import Column

syn = Synapse()
syn.login()

async def get_columns():
    async for column in Column.list_async(prefix="my_prefix"):
        print(column.name)

asyncio.run(get_columns())
Source code in synapseclient/models/table_components.py
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
@skip_async_to_sync
@staticmethod
async def list_async(
    prefix: Optional[str] = None,
    limit: int = 100,
    offset: int = 0,
    *,
    synapse_client: Optional["Synapse"] = None,
) -> AsyncGenerator["Column", None]:
    """
    List columns with optional prefix filtering.

    Arguments:
        prefix: Optional prefix to filter columns by name.
        limit: Number of columns to retrieve per request to Synapse (pagination parameter).
            The function will continue retrieving results until all matching columns are returned.
        offset: The index of the first column to return (pagination parameter).
        synapse_client: If not passed in and caching was not disabled by
            `Synapse.allow_client_caching(False)` this will use the last created
            instance from the Synapse class constructor.

    Yields:
        Column instances.

    Example: Getting all columns
        Getting all columns

            import asyncio
            from synapseclient import Synapse
            from synapseclient.models import Column

            syn = Synapse()
            syn.login()

            async def get_columns():
                async for column in Column.list_async():
                    print(column.name)

            asyncio.run(get_columns())

    Example: Getting columns with a prefix
        Getting columns with a prefix

            import asyncio
            from synapseclient import Synapse
            from synapseclient.models import Column

            syn = Synapse()
            syn.login()

            async def get_columns():
                async for column in Column.list_async(prefix="my_prefix"):
                    print(column.name)

            asyncio.run(get_columns())
    """
    from synapseclient.api import list_columns

    async for column in list_columns(
        prefix=prefix,
        limit=limit,
        offset=offset,
        synapse_client=synapse_client,
    ):
        yield column

synapseclient.models.SchemaStorageStrategy

Bases: str, Enum

Enum used to determine how to store the schema of a table in Synapse.

Source code in synapseclient/models/table_components.py
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
class SchemaStorageStrategy(str, Enum):
    """Enum used to determine how to store the schema of a table in Synapse."""

    INFER_FROM_DATA = "INFER_FROM_DATA"
    """
    (Default)
    Allow the data to define which columns are created on the Synapse table
    automatically. The limitation with this behavior is that the columns created may
    only be of the following types:

    - STRING
    - LARGETEXT
    - INTEGER
    - DOUBLE
    - BOOLEAN
    - DATE

    The determination of the column type is based on the data that is passed in
    using the pandas function
    [infer_dtype](https://pandas.pydata.org/docs/reference/api/pandas.api.types.infer_dtype.html).
    If you need a more specific column type, or need to add options to the colums
    follow the examples shown in the [Table][synapseclient.models.Table] class.


    The columns created as a result of this strategy will be appended to the end of the
    existing columns if the table already exists.
    """

Attributes

INFER_FROM_DATA class-attribute instance-attribute

INFER_FROM_DATA = 'INFER_FROM_DATA'

(Default) Allow the data to define which columns are created on the Synapse table automatically. The limitation with this behavior is that the columns created may only be of the following types:

  • STRING
  • LARGETEXT
  • INTEGER
  • DOUBLE
  • BOOLEAN
  • DATE

The determination of the column type is based on the data that is passed in using the pandas function infer_dtype. If you need a more specific column type, or need to add options to the colums follow the examples shown in the Table class.

The columns created as a result of this strategy will be appended to the end of the existing columns if the table already exists.

synapseclient.models.ColumnExpansionStrategy

Bases: str, Enum

Determines how to automate the expansion of columns based on the data that is being stored. The options given allow cells with a limit on the length of content (Such as strings) to be expanded to a larger size if the data being stored exceeds the limit. A limit to list length is also enforced in Synapse by automatic expansion for lists is not yet supported through this interface.

Source code in synapseclient/models/table_components.py
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
class ColumnExpansionStrategy(str, Enum):
    """
    Determines how to automate the expansion of columns based on the data
    that is being stored. The options given allow cells with a limit on the length of
    content (Such as strings) to be expanded to a larger size if the data being stored
    exceeds the limit. A limit to list length is also enforced in Synapse by automatic
    expansion for lists is not yet supported through this interface.
    """

    # To be supported at a later time
    # AUTO_EXPAND_CONTENT_AND_LIST_LENGTH = "AUTO_EXPAND_CONTENT_AND_LIST_LENGTH"
    # """
    # (Default)
    # Automatically expand both the content length and list length of columns if the data
    # being stored exceeds the limit.
    # """

    AUTO_EXPAND_CONTENT_LENGTH = "AUTO_EXPAND_CONTENT_LENGTH"
    """
    (Default)
    Automatically expand the content length of columns if the data being stored exceeds
    the limit.
    """

Attributes

AUTO_EXPAND_CONTENT_LENGTH class-attribute instance-attribute

AUTO_EXPAND_CONTENT_LENGTH = 'AUTO_EXPAND_CONTENT_LENGTH'

(Default) Automatically expand the content length of columns if the data being stored exceeds the limit.

synapseclient.models.FacetType

Bases: str, Enum

Set to one of the enumerated values to indicate a column should be treated as a facet.

Source code in synapseclient/models/table_components.py
397
398
399
400
401
402
403
404
405
406
407
408
class FacetType(str, Enum):
    """Set to one of the enumerated values to indicate a column should be treated as
    a facet."""

    ENUMERATION = "enumeration"
    """Returns the most frequently seen values and their respective frequency counts;
    selecting these returned values will cause the table results to be filtered such
    that only rows with the selected values are returned."""

    RANGE = "range"
    """Allows the column to be filtered by a chosen lower and upper bound; these bounds
    are inclusive."""

Attributes

ENUMERATION class-attribute instance-attribute

ENUMERATION = 'enumeration'

Returns the most frequently seen values and their respective frequency counts; selecting these returned values will cause the table results to be filtered such that only rows with the selected values are returned.

RANGE class-attribute instance-attribute

RANGE = 'range'

Allows the column to be filtered by a chosen lower and upper bound; these bounds are inclusive.

synapseclient.models.ColumnType

Bases: str, Enum

The column type determines the type of data that can be stored in a column. Switching between types (using a transaction with TableUpdateTransaction in the "changes" list) is generally allowed except for switching to "_LIST" suffixed types. In such cases, a new column must be created and data must be copied over manually

Source code in synapseclient/models/table_components.py
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
class ColumnType(str, Enum):
    """The column type determines the type of data that can be stored in a column.
    Switching between types (using a transaction with TableUpdateTransaction
    in the "changes" list) is generally allowed except for switching to "_LIST"
    suffixed types. In such cases, a new column must be created and data must be
    copied over manually"""

    STRING = "STRING"
    """The STRING data type is a small text strings with between 1 and 1,000 characters.
    Each STRING column will have a declared maximum size between 1 and 1,000 characters
    (with 50 characters as the default when maximumSize = null). The maximum STRING size
    is applied to the budget of the maximum table width, therefore it is best to use the
    smallest possible maximum size for the data. For strings larger than 250 characters,
    consider using the LARGETEXT column type for improved performance. Each STRING column
    counts as maxSize*4 (4 bytes per character) towards the total width of a table."""

    DOUBLE = "DOUBLE"
    """The DOUBLE data type is a double-precision 64-bit IEEE 754 floating point. Its
    range of values is approximately +/-1.79769313486231570E+308 (15 significant decimal
    digits). Each DOUBLE column counts as 23 bytes towards the total width of a table."""

    INTEGER = "INTEGER"
    """The INTEGER data type is a 64-bit two's complement integer. The signed integer has
    a minimum value of -2^63 and a maximum value of 2^63-1. Each INTEGER column counts as
    20 bytes towards the total width of a table."""

    BOOLEAN = "BOOLEAN"
    """The BOOLEAN data type has only two possible values: 'true' and 'false'. Each
    BOOLEAN column counts as 5 bytes towards the total width of a table."""

    DATE = "DATE"
    """The DATE data type represent the specified number of milliseconds since the
    standard base time known as 'the epoch', namely January 1, 1970, 00:00:00 GM.
    Each DATE column counts as 20 bytes towards the total width of a table."""

    FILEHANDLEID = "FILEHANDLEID"
    """The FILEHANDLEID data type represents a file stored within a table. To store a
    file in a table, first use the 'File Services' to upload a file to generate a new
    FileHandle, then apply the fileHandle.id as the value for this column. Note: This
    column type works best for files that are binary (non-text) or text files that are 1
    MB or larger. For text files that are smaller than 1 MB consider using the LARGETEXT
    column type to improve download performance. Each FILEHANDLEID column counts as 20
    bytes towards the total width of a table."""

    ENTITYID = "ENTITYID"
    """The ENTITYID type represents a reference to a Synapse Entity. Values will include
    the 'syn' prefix, such as 'syn123'. Each ENTITYID column counts as 44 bytes towards
    the total width of a table."""

    SUBMISSIONID = "SUBMISSIONID"
    """The SUBMISSIONID type represents a reference to an evaluation submission. The
    value should be the ID of the referenced submission. Each SUBMISSIONID column counts
    as 20 bytes towards the total width of a table."""

    EVALUATIONID = "EVALUATIONID"
    """The EVALUATIONID type represents a reference to an evaluation. The value should be
    the ID of the referenced evaluation. Each EVALUATIONID column counts as 20 bytes
    towards the total width of a table."""

    LINK = "LINK"
    """The LINK data type represents any URL with 1,000 characters or less. Each LINK
    column counts as maxSize*4 (4 bytes per character) towards the total width of a
    table."""

    MEDIUMTEXT = "MEDIUMTEXT"
    """The MEDIUMTEXT data type represents a string that is between 1 and 2,000
    characters without the need to specify a maximum size. For smaller strings where the
    maximum size is known consider using the STRING column type. For larger strings,
    consider using the LARGETEXT or FILEHANDLEID column types. Each MEDIUMTEXT column
    counts as 421 bytes towards the total width of a table."""

    LARGETEXT = "LARGETEXT"
    """The LARGETEXT data type represents a string that is greater than 250 characters
    but less than 524,288 characters (2 MB of UTF-8 4 byte chars). For smaller strings
    consider using the STRING or MEDIUMTEXT column types. For larger strings, consider
    using the FILEHANDELID column type. Each LARGE_TEXT column counts as 2133 bytes
    towards the total width of a table."""

    USERID = "USERID"
    """The USERID data type represents a reference to a Synapse User. The value should
    be the ID of the referenced User. Each USERID column counts as 20 bytes towards the
    total width of a table."""

    STRING_LIST = "STRING_LIST"
    """Multiple values of STRING."""

    INTEGER_LIST = "INTEGER_LIST"
    """Multiple values of INTEGER."""

    BOOLEAN_LIST = "BOOLEAN_LIST"
    """Multiple values of BOOLEAN."""

    DATE_LIST = "DATE_LIST"
    """Multiple values of DATE."""

    ENTITYID_LIST = "ENTITYID_LIST"
    """Multiple values of ENTITYID."""

    USERID_LIST = "USERID_LIST"
    """Multiple values of USERID."""

    JSON = "JSON"
    """A flexible type that allows to store JSON data. Each JSON column counts as 2133
    bytes towards the total width of a table. A JSON value string should be less than
    524,288 characters (2 MB of UTF-8 4 byte chars)."""

    def __repr__(self) -> str:
        """Print out the string value of self"""
        return self.value

Attributes

STRING class-attribute instance-attribute

STRING = 'STRING'

The STRING data type is a small text strings with between 1 and 1,000 characters. Each STRING column will have a declared maximum size between 1 and 1,000 characters (with 50 characters as the default when maximumSize = null). The maximum STRING size is applied to the budget of the maximum table width, therefore it is best to use the smallest possible maximum size for the data. For strings larger than 250 characters, consider using the LARGETEXT column type for improved performance. Each STRING column counts as maxSize*4 (4 bytes per character) towards the total width of a table.

DOUBLE class-attribute instance-attribute

DOUBLE = 'DOUBLE'

The DOUBLE data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is approximately +/-1.79769313486231570E+308 (15 significant decimal digits). Each DOUBLE column counts as 23 bytes towards the total width of a table.

INTEGER class-attribute instance-attribute

INTEGER = 'INTEGER'

The INTEGER data type is a 64-bit two's complement integer. The signed integer has a minimum value of -2^63 and a maximum value of 2^63-1. Each INTEGER column counts as 20 bytes towards the total width of a table.

BOOLEAN class-attribute instance-attribute

BOOLEAN = 'BOOLEAN'

The BOOLEAN data type has only two possible values: 'true' and 'false'. Each BOOLEAN column counts as 5 bytes towards the total width of a table.

DATE class-attribute instance-attribute

DATE = 'DATE'

The DATE data type represent the specified number of milliseconds since the standard base time known as 'the epoch', namely January 1, 1970, 00:00:00 GM. Each DATE column counts as 20 bytes towards the total width of a table.

FILEHANDLEID class-attribute instance-attribute

FILEHANDLEID = 'FILEHANDLEID'

The FILEHANDLEID data type represents a file stored within a table. To store a file in a table, first use the 'File Services' to upload a file to generate a new FileHandle, then apply the fileHandle.id as the value for this column. Note: This column type works best for files that are binary (non-text) or text files that are 1 MB or larger. For text files that are smaller than 1 MB consider using the LARGETEXT column type to improve download performance. Each FILEHANDLEID column counts as 20 bytes towards the total width of a table.

ENTITYID class-attribute instance-attribute

ENTITYID = 'ENTITYID'

The ENTITYID type represents a reference to a Synapse Entity. Values will include the 'syn' prefix, such as 'syn123'. Each ENTITYID column counts as 44 bytes towards the total width of a table.

SUBMISSIONID class-attribute instance-attribute

SUBMISSIONID = 'SUBMISSIONID'

The SUBMISSIONID type represents a reference to an evaluation submission. The value should be the ID of the referenced submission. Each SUBMISSIONID column counts as 20 bytes towards the total width of a table.

EVALUATIONID class-attribute instance-attribute

EVALUATIONID = 'EVALUATIONID'

The EVALUATIONID type represents a reference to an evaluation. The value should be the ID of the referenced evaluation. Each EVALUATIONID column counts as 20 bytes towards the total width of a table.

LINK = 'LINK'

The LINK data type represents any URL with 1,000 characters or less. Each LINK column counts as maxSize*4 (4 bytes per character) towards the total width of a table.

MEDIUMTEXT class-attribute instance-attribute

MEDIUMTEXT = 'MEDIUMTEXT'

The MEDIUMTEXT data type represents a string that is between 1 and 2,000 characters without the need to specify a maximum size. For smaller strings where the maximum size is known consider using the STRING column type. For larger strings, consider using the LARGETEXT or FILEHANDLEID column types. Each MEDIUMTEXT column counts as 421 bytes towards the total width of a table.

LARGETEXT class-attribute instance-attribute

LARGETEXT = 'LARGETEXT'

The LARGETEXT data type represents a string that is greater than 250 characters but less than 524,288 characters (2 MB of UTF-8 4 byte chars). For smaller strings consider using the STRING or MEDIUMTEXT column types. For larger strings, consider using the FILEHANDELID column type. Each LARGE_TEXT column counts as 2133 bytes towards the total width of a table.

USERID class-attribute instance-attribute

USERID = 'USERID'

The USERID data type represents a reference to a Synapse User. The value should be the ID of the referenced User. Each USERID column counts as 20 bytes towards the total width of a table.

STRING_LIST class-attribute instance-attribute

STRING_LIST = 'STRING_LIST'

Multiple values of STRING.

INTEGER_LIST class-attribute instance-attribute

INTEGER_LIST = 'INTEGER_LIST'

Multiple values of INTEGER.

BOOLEAN_LIST class-attribute instance-attribute

BOOLEAN_LIST = 'BOOLEAN_LIST'

Multiple values of BOOLEAN.

DATE_LIST class-attribute instance-attribute

DATE_LIST = 'DATE_LIST'

Multiple values of DATE.

ENTITYID_LIST class-attribute instance-attribute

ENTITYID_LIST = 'ENTITYID_LIST'

Multiple values of ENTITYID.

USERID_LIST class-attribute instance-attribute

USERID_LIST = 'USERID_LIST'

Multiple values of USERID.

JSON class-attribute instance-attribute

JSON = 'JSON'

A flexible type that allows to store JSON data. Each JSON column counts as 2133 bytes towards the total width of a table. A JSON value string should be less than 524,288 characters (2 MB of UTF-8 4 byte chars).

synapseclient.models.JsonSubColumn dataclass

For column of type JSON that represents the combination of multiple sub-columns, this property is used to define each sub-column.

Source code in synapseclient/models/table_components.py
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
@dataclass
class JsonSubColumn:
    """For column of type JSON that represents the combination of multiple
    sub-columns, this property is used to define each sub-column."""

    name: str
    """The display name of the column."""

    column_type: ColumnType
    """The column type determines the type of data that can be stored in a column.
    Switching between types (using a transaction with TableUpdateTransaction
    in the "changes" list) is generally allowed except for switching to "_LIST" suffixed
    types. In such cases, a new column must be created and data must be copied
    over manually"""

    json_path: str
    """Defines the JSON path of the sub column. Use the '$' char to represent the root
    of JSON object. If the JSON key of a sub column is 'a', then the jsonPath for that
    column would be: '$.a'."""

    facet_type: Optional[FacetType] = None
    """Set to one of the enumerated values to indicate a column should be
    treated as a facet"""

    @classmethod
    def fill_from_dict(cls, synapse_sub_column: Dict[str, Any]) -> "JsonSubColumn":
        """Converts a response from the synapseclient into this dataclass."""
        return cls(
            name=synapse_sub_column.get("name", ""),
            column_type=(
                ColumnType(synapse_sub_column.get("columnType", None))
                if synapse_sub_column.get("columnType", None)
                else ColumnType.STRING
            ),
            json_path=synapse_sub_column.get("jsonPath", ""),
            facet_type=(
                FacetType(synapse_sub_column.get("facetType", None))
                if synapse_sub_column.get("facetType", None)
                else None
            ),
        )

    def to_synapse_request(self) -> Dict[str, Any]:
        """Converts the Column object into a dictionary that can be passed into the
        REST API."""
        if self.column_type and isinstance(self.column_type, str):
            self.column_type = ColumnType(self.column_type)

        if self.facet_type and isinstance(self.facet_type, str):
            self.facet_type = FacetType(self.facet_type)

        result = {
            "name": self.name,
            "columnType": self.column_type.value if self.column_type else None,
            "jsonPath": self.json_path,
            "facetType": self.facet_type.value if self.facet_type else None,
        }
        delete_none_keys(result)
        return result

Attributes

name instance-attribute

name: str

The display name of the column.

column_type instance-attribute

column_type: ColumnType

The column type determines the type of data that can be stored in a column. Switching between types (using a transaction with TableUpdateTransaction in the "changes" list) is generally allowed except for switching to "_LIST" suffixed types. In such cases, a new column must be created and data must be copied over manually

json_path instance-attribute

json_path: str

Defines the JSON path of the sub column. Use the '$' char to represent the root of JSON object. If the JSON key of a sub column is 'a', then the jsonPath for that column would be: '$.a'.

facet_type class-attribute instance-attribute

facet_type: Optional[FacetType] = None

Set to one of the enumerated values to indicate a column should be treated as a facet

Functions

fill_from_dict classmethod

fill_from_dict(synapse_sub_column: Dict[str, Any]) -> JsonSubColumn

Converts a response from the synapseclient into this dataclass.

Source code in synapseclient/models/table_components.py
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
@classmethod
def fill_from_dict(cls, synapse_sub_column: Dict[str, Any]) -> "JsonSubColumn":
    """Converts a response from the synapseclient into this dataclass."""
    return cls(
        name=synapse_sub_column.get("name", ""),
        column_type=(
            ColumnType(synapse_sub_column.get("columnType", None))
            if synapse_sub_column.get("columnType", None)
            else ColumnType.STRING
        ),
        json_path=synapse_sub_column.get("jsonPath", ""),
        facet_type=(
            FacetType(synapse_sub_column.get("facetType", None))
            if synapse_sub_column.get("facetType", None)
            else None
        ),
    )

to_synapse_request

to_synapse_request() -> Dict[str, Any]

Converts the Column object into a dictionary that can be passed into the REST API.

Source code in synapseclient/models/table_components.py
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
def to_synapse_request(self) -> Dict[str, Any]:
    """Converts the Column object into a dictionary that can be passed into the
    REST API."""
    if self.column_type and isinstance(self.column_type, str):
        self.column_type = ColumnType(self.column_type)

    if self.facet_type and isinstance(self.facet_type, str):
        self.facet_type = FacetType(self.facet_type)

    result = {
        "name": self.name,
        "columnType": self.column_type.value if self.column_type else None,
        "jsonPath": self.json_path,
        "facetType": self.facet_type.value if self.facet_type else None,
    }
    delete_none_keys(result)
    return result

synapseclient.models.SumFileSizes dataclass

A model for the sum of file sizes in a query result bundle.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/SumFileSizes.html

Source code in synapseclient/models/table_components.py
37
38
39
40
41
42
43
44
45
46
47
48
@dataclass
class SumFileSizes:
    """
    A model for the sum of file sizes in a query result bundle.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/SumFileSizes.html>
    """

    sum_file_size_bytes: int = None
    """The sum of the file size in bytes."""
    greater_than: bool = None
    """When true, the actual sum of the files sizes is greater than the value provided with 'sumFileSizesBytes'. When false, the actual sum of the files sizes is equals the value provided with 'sumFileSizesBytes'"""

Attributes

sum_file_size_bytes class-attribute instance-attribute

sum_file_size_bytes: int = None

The sum of the file size in bytes.

greater_than class-attribute instance-attribute

greater_than: bool = None

When true, the actual sum of the files sizes is greater than the value provided with 'sumFileSizesBytes'. When false, the actual sum of the files sizes is equals the value provided with 'sumFileSizesBytes'

synapseclient.models.Query dataclass

Represents a SQL query with optional parameters.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/Query.html

Source code in synapseclient/models/table_components.py
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
@dataclass
class Query:
    """
    Represents a SQL query with optional parameters.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/Query.html>
    """

    sql: str
    """The SQL query string"""

    additional_filters: Optional[List[Dict[str, Any]]] = None
    """Appends additional filters to the SQL query. These are applied before facets.
    Filters within the list have an AND relationship. If a WHERE clause already exists
    on the SQL query or facets are selected, it will also be ANDed with the query
    generated by these additional filters."""
    """TODO: create QueryFilter dataclass: https://sagebionetworks.jira.com/browse/SYNPY-1651"""

    selected_facets: Optional[List[Dict[str, Any]]] = None
    """The selected facet filters"""
    """TODO: create FacetColumnRequest dataclass: https://sagebionetworks.jira.com/browse/SYNPY-1651"""

    include_entity_etag: Optional[bool] = False
    """Optional, default false. When true, a query results against views will include
    the Etag of each entity in the results. Note: The etag is necessary to update
    Entities in the view."""

    select_file_column: Optional[int] = None
    """The id of the column used to select file entities (e.g. to fetch the action
    required for download). The column needs to be an ENTITYID type column and be
    part of the schema of the underlying table/view."""

    select_file_version_column: Optional[int] = None
    """The id of the column used as the version for selecting file entities when required
    (e.g. to add a materialized view query to the download cart with version enabled).
    The column needs to be an INTEGER type column and be part of the schema of the
    underlying table/view."""

    offset: Optional[int] = None
    """The optional offset into the results"""

    limit: Optional[int] = None
    """The optional limit to the results"""

    sort: Optional[List[Dict[str, Any]]] = None
    """The sort order for the query results (ARRAY<SortItem>)"""
    """TODO: Add SortItem dataclass: https://sagebionetworks.jira.com/browse/SYNPY-1651 """

    def to_synapse_request(self) -> Dict[str, Any]:
        """Converts the Query object into a dictionary that can be passed into the REST API."""
        result = {
            "sql": self.sql,
            "additionalFilters": self.additional_filters,
            "selectedFacets": self.selected_facets,
            "includeEntityEtag": self.include_entity_etag,
            "selectFileColumn": self.select_file_column,
            "selectFileVersionColumn": self.select_file_version_column,
            "offset": self.offset,
            "limit": self.limit,
            "sort": self.sort,
        }
        delete_none_keys(result)
        return result

Attributes

sql instance-attribute

sql: str

The SQL query string

additional_filters class-attribute instance-attribute

additional_filters: Optional[List[Dict[str, Any]]] = None

Appends additional filters to the SQL query. These are applied before facets. Filters within the list have an AND relationship. If a WHERE clause already exists on the SQL query or facets are selected, it will also be ANDed with the query generated by these additional filters.

selected_facets class-attribute instance-attribute

selected_facets: Optional[List[Dict[str, Any]]] = None

The selected facet filters

include_entity_etag class-attribute instance-attribute

include_entity_etag: Optional[bool] = False

Optional, default false. When true, a query results against views will include the Etag of each entity in the results. Note: The etag is necessary to update Entities in the view.

select_file_column class-attribute instance-attribute

select_file_column: Optional[int] = None

The id of the column used to select file entities (e.g. to fetch the action required for download). The column needs to be an ENTITYID type column and be part of the schema of the underlying table/view.

select_file_version_column class-attribute instance-attribute

select_file_version_column: Optional[int] = None

The id of the column used as the version for selecting file entities when required (e.g. to add a materialized view query to the download cart with version enabled). The column needs to be an INTEGER type column and be part of the schema of the underlying table/view.

offset class-attribute instance-attribute

offset: Optional[int] = None

The optional offset into the results

limit class-attribute instance-attribute

limit: Optional[int] = None

The optional limit to the results

sort class-attribute instance-attribute

sort: Optional[List[Dict[str, Any]]] = None

The sort order for the query results (ARRAY)

Functions

to_synapse_request

to_synapse_request() -> Dict[str, Any]

Converts the Query object into a dictionary that can be passed into the REST API.

Source code in synapseclient/models/table_components.py
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
def to_synapse_request(self) -> Dict[str, Any]:
    """Converts the Query object into a dictionary that can be passed into the REST API."""
    result = {
        "sql": self.sql,
        "additionalFilters": self.additional_filters,
        "selectedFacets": self.selected_facets,
        "includeEntityEtag": self.include_entity_etag,
        "selectFileColumn": self.select_file_column,
        "selectFileVersionColumn": self.select_file_version_column,
        "offset": self.offset,
        "limit": self.limit,
        "sort": self.sort,
    }
    delete_none_keys(result)
    return result

synapseclient.models.QueryBundleRequest dataclass

Bases: AsynchronousCommunicator

A query bundle request that can be submitted to Synapse to retrieve query results with metadata.

This class combines query request parameters with the ability to receive a QueryResultBundle through the AsynchronousCommunicator pattern.

The partMask determines which parts of the result bundle are included: - Query Results (queryResults) = 0x1 - Query Count (queryCount) = 0x2 - Select Columns (selectColumns) = 0x4 - Max Rows Per Page (maxRowsPerPage) = 0x8 - The Table Columns (columnModels) = 0x10 - Facet statistics for each faceted column (facetStatistics) = 0x20 - The sum of the file sizes (sumFileSizesBytes) = 0x40 - The last updated on date (lastUpdatedOn) = 0x80 - The combined SQL query including additional filters (combinedSql) = 0x100 - The list of actions required for any file in the query (actionsRequired) = 0x200

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryBundleRequest.html

Source code in synapseclient/models/table_components.py
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
@dataclass
class QueryBundleRequest(AsynchronousCommunicator):
    """
    A query bundle request that can be submitted to Synapse to retrieve query results with metadata.

    This class combines query request parameters with the ability to receive
    a QueryResultBundle through the AsynchronousCommunicator pattern.

    The partMask determines which parts of the result bundle are included:
    - Query Results (queryResults) = 0x1
    - Query Count (queryCount) = 0x2
    - Select Columns (selectColumns) = 0x4
    - Max Rows Per Page (maxRowsPerPage) = 0x8
    - The Table Columns (columnModels) = 0x10
    - Facet statistics for each faceted column (facetStatistics) = 0x20
    - The sum of the file sizes (sumFileSizesBytes) = 0x40
    - The last updated on date (lastUpdatedOn) = 0x80
    - The combined SQL query including additional filters (combinedSql) = 0x100
    - The list of actions required for any file in the query (actionsRequired) = 0x200

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryBundleRequest.html>
    """

    # Request parameters
    entity_id: str
    """The ID of the entity (table/view) being queried"""

    query: Query
    """The SQL query with parameters"""

    concrete_type: str = QUERY_BUNDLE_REQUEST
    """The concrete type of this request"""

    part_mask: Optional[int] = None
    """Optional integer mask to request specific parts. Default includes all parts if not specified."""

    # Response attributes (filled after job completion from QueryResultBundle)
    query_result: Optional[QueryResult] = None
    """A page of query result"""

    query_count: Optional[int] = None
    """The total number of rows that match the query"""

    select_columns: Optional[List[SelectColumn]] = None
    """The list of SelectColumns from the select clause"""

    max_rows_per_page: Optional[int] = None
    """The maximum number of rows that can be retrieved in a single call"""

    column_models: Optional[List[Dict[str, Any]]] = None
    """The list of ColumnModels for the table"""

    facets: Optional[List[Dict[str, Any]]] = None
    """The list of facets for the search results"""

    sum_file_sizes: Optional[SumFileSizes] = None
    """The sum of the file size for all files in the given view query"""

    last_updated_on: Optional[str] = None
    """The date-time when this table/view was last updated"""

    combined_sql: Optional[str] = None
    """The SQL that is combination of a the input SQL, FacetRequests, AdditionalFilters, Sorting, and Pagination"""

    actions_required: Optional[List[ActionRequiredCount]] = None
    """The first 50 actions required to download the files that are part of the query"""

    def to_synapse_request(self) -> Dict[str, Any]:
        """Convert to QueryBundleRequest format for async job submission."""
        result = {
            "concreteType": self.concrete_type,
            "entityId": self.entity_id,
            "query": self.query,
        }

        if self.part_mask is not None:
            result["partMask"] = self.part_mask

        delete_none_keys(result)
        return result

    def fill_from_dict(self, synapse_response: Dict[str, Any]) -> "Self":
        """Fill the request results from Synapse response (QueryResultBundle)."""
        # Use QueryResultBundle's fill_from_dict logic to populate response fields
        bundle = QueryResultBundle.fill_from_dict(synapse_response)

        # Copy all the result fields from the bundle
        self.query_result = bundle.query_result
        self.query_count = bundle.query_count
        self.select_columns = bundle.select_columns
        self.max_rows_per_page = bundle.max_rows_per_page
        self.column_models = bundle.column_models
        self.facets = bundle.facets
        self.sum_file_sizes = bundle.sum_file_sizes
        self.last_updated_on = bundle.last_updated_on
        self.combined_sql = bundle.combined_sql
        self.actions_required = bundle.actions_required

        return self

Attributes

entity_id instance-attribute

entity_id: str

The ID of the entity (table/view) being queried

query instance-attribute

query: Query

The SQL query with parameters

concrete_type class-attribute instance-attribute

concrete_type: str = QUERY_BUNDLE_REQUEST

The concrete type of this request

part_mask class-attribute instance-attribute

part_mask: Optional[int] = None

Optional integer mask to request specific parts. Default includes all parts if not specified.

query_result class-attribute instance-attribute

query_result: Optional[QueryResult] = None

A page of query result

query_count class-attribute instance-attribute

query_count: Optional[int] = None

The total number of rows that match the query

select_columns class-attribute instance-attribute

select_columns: Optional[List[SelectColumn]] = None

The list of SelectColumns from the select clause

max_rows_per_page class-attribute instance-attribute

max_rows_per_page: Optional[int] = None

The maximum number of rows that can be retrieved in a single call

column_models class-attribute instance-attribute

column_models: Optional[List[Dict[str, Any]]] = None

The list of ColumnModels for the table

facets class-attribute instance-attribute

facets: Optional[List[Dict[str, Any]]] = None

The list of facets for the search results

sum_file_sizes class-attribute instance-attribute

sum_file_sizes: Optional[SumFileSizes] = None

The sum of the file size for all files in the given view query

last_updated_on class-attribute instance-attribute

last_updated_on: Optional[str] = None

The date-time when this table/view was last updated

combined_sql class-attribute instance-attribute

combined_sql: Optional[str] = None

The SQL that is combination of a the input SQL, FacetRequests, AdditionalFilters, Sorting, and Pagination

actions_required class-attribute instance-attribute

actions_required: Optional[List[ActionRequiredCount]] = None

The first 50 actions required to download the files that are part of the query

Functions

to_synapse_request

to_synapse_request() -> Dict[str, Any]

Convert to QueryBundleRequest format for async job submission.

Source code in synapseclient/models/table_components.py
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
def to_synapse_request(self) -> Dict[str, Any]:
    """Convert to QueryBundleRequest format for async job submission."""
    result = {
        "concreteType": self.concrete_type,
        "entityId": self.entity_id,
        "query": self.query,
    }

    if self.part_mask is not None:
        result["partMask"] = self.part_mask

    delete_none_keys(result)
    return result

fill_from_dict

fill_from_dict(synapse_response: Dict[str, Any]) -> Self

Fill the request results from Synapse response (QueryResultBundle).

Source code in synapseclient/models/table_components.py
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
def fill_from_dict(self, synapse_response: Dict[str, Any]) -> "Self":
    """Fill the request results from Synapse response (QueryResultBundle)."""
    # Use QueryResultBundle's fill_from_dict logic to populate response fields
    bundle = QueryResultBundle.fill_from_dict(synapse_response)

    # Copy all the result fields from the bundle
    self.query_result = bundle.query_result
    self.query_count = bundle.query_count
    self.select_columns = bundle.select_columns
    self.max_rows_per_page = bundle.max_rows_per_page
    self.column_models = bundle.column_models
    self.facets = bundle.facets
    self.sum_file_sizes = bundle.sum_file_sizes
    self.last_updated_on = bundle.last_updated_on
    self.combined_sql = bundle.combined_sql
    self.actions_required = bundle.actions_required

    return self

synapseclient.models.QueryJob dataclass

Bases: AsynchronousCommunicator

A query job that can be submitted to Synapse and return a DownloadFromTableResult.

This class combines query request parameters with the ability to receive query results through the AsynchronousCommunicator pattern.

Request modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/DownloadFromTableRequest.html

Response modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/DownloadFromTableResult.html

Source code in synapseclient/models/table_components.py
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
@dataclass
class QueryJob(AsynchronousCommunicator):
    """
    A query job that can be submitted to Synapse and return a DownloadFromTableResult.

    This class combines query request parameters with the ability to receive
    query results through the AsynchronousCommunicator pattern.

    Request modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/DownloadFromTableRequest.html>

    Response modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/DownloadFromTableResult.html>
    """

    # Request parameters
    entity_id: str
    """The ID of the entity (table/view) being queried"""

    concrete_type: str = QUERY_TABLE_CSV_REQUEST
    "The concrete type of the request (usually DownloadFromTableRequest)"

    write_header: Optional[bool] = True
    """Should the first line contain the columns names as a header in the resulting file? Set to 'true' to include the headers else, 'false'. The default value is 'true'."""

    include_row_id_and_row_version: Optional[bool] = True
    """Should the first two columns contain the row ID and row version? The default value is 'true'."""

    csv_table_descriptor: Optional[CsvTableDescriptor] = None
    """The description of a csv for upload or download."""

    file_name: Optional[str] = None
    """The optional name for the downloaded table."""

    sql: Optional[str] = None
    """The SQL query to execute"""

    additional_filters: Optional[List[Dict[str, Any]]] = None
    """Appends additional filters to the SQL query. These are applied before facets. Filters within the list have an AND relationship. If a WHERE clause already exists on the SQL query or facets are selected, it will also be ANDed with the query generated by these additional filters."""
    """TODO: create QueryFilter dataclass: https://sagebionetworks.jira.com/browse/SYNPY-1651"""

    selected_facets: Optional[List[Dict[str, Any]]] = None
    """The selected facet filters."""
    """TODO: create FacetColumnRequest dataclass: https://sagebionetworks.jira.com/browse/SYNPY-1651"""

    include_entity_etag: Optional[bool] = False
    """"Optional, default false. When true, a query results against views will include the Etag of each entity in the results. Note: The etag is necessary to update Entities in the view."""

    select_file_column: Optional[int] = None
    """The id of the column used to select file entities (e.g. to fetch the action required for download). The column needs to be an ENTITYID type column and be part of the schema of the underlying table/view."""

    select_file_version_column: Optional[int] = None
    """The id of the column used as the version for selecting file entities when required (e.g. to add a materialized view query to the download cart with version enabled). The column needs to be an INTEGER type column and be part of the schema of the underlying table/view."""

    offset: Optional[int] = None
    """The optional offset into the results"""

    limit: Optional[int] = None
    """The optional limit to the results"""

    sort: Optional[List[Dict[str, Any]]] = None
    """The sort order for the query results (ARRAY<SortItem>)"""
    """TODO: Add SortItem dataclass: https://sagebionetworks.jira.com/browse/SYNPY-1651"""

    # Response attributes (filled after job completion)
    job_id: Optional[str] = None
    """The job ID returned from the async job"""

    results_file_handle_id: Optional[str] = None
    """The file handle ID of the results CSV file"""

    table_id: Optional[str] = None
    """The ID of the table that was queried"""

    etag: Optional[str] = None
    """The etag of the table"""

    headers: Optional[List[SelectColumn]] = None
    """The column headers from the query result"""

    response_concrete_type: Optional[str] = QUERY_TABLE_CSV_RESULT
    """The concrete type of the response (usually DownloadFromTableResult)"""

    def to_synapse_request(self) -> Dict[str, Any]:
        """Convert to DownloadFromTableRequest format for async job submission."""

        csv_table_descriptor = None
        if self.csv_table_descriptor:
            csv_table_descriptor = self.csv_table_descriptor.to_synapse_request()

        synapse_request = {
            "concreteType": QUERY_TABLE_CSV_REQUEST,
            "entityId": self.entity_id,
            "csvTableDescriptor": csv_table_descriptor,
            "sql": self.sql,
            "writeHeader": self.write_header,
            "includeRowIdAndRowVersion": self.include_row_id_and_row_version,
            "includeEntityEtag": self.include_entity_etag,
            "fileName": self.file_name,
            "additionalFilters": self.additional_filters,
            "selectedFacet": self.selected_facets,
            "selectFileColumns": self.select_file_column,
            "selectFileVersionColumns": self.select_file_version_column,
            "offset": self.offset,
            "sort": self.sort,
        }
        delete_none_keys(synapse_request)
        return synapse_request

    def fill_from_dict(self, synapse_response: Dict[str, Any]) -> "Self":
        """Fill the job results from Synapse response."""
        # Fill response attributes from DownloadFromTableResult
        headers = None
        headers_data = synapse_response.get("headers")
        if headers_data and isinstance(headers_data, list):
            headers = [SelectColumn.fill_from_dict(header) for header in headers_data]

        self.job_id = synapse_response.get("jobId")
        self.response_concrete_type = synapse_response.get("concreteType")
        self.results_file_handle_id = synapse_response.get("resultsFileHandleId")
        self.table_id = synapse_response.get("tableId")
        self.etag = synapse_response.get("etag")
        self.headers = headers

        return self

Attributes

entity_id instance-attribute

entity_id: str

The ID of the entity (table/view) being queried

concrete_type class-attribute instance-attribute

concrete_type: str = QUERY_TABLE_CSV_REQUEST

The concrete type of the request (usually DownloadFromTableRequest)

write_header class-attribute instance-attribute

write_header: Optional[bool] = True

Should the first line contain the columns names as a header in the resulting file? Set to 'true' to include the headers else, 'false'. The default value is 'true'.

include_row_id_and_row_version class-attribute instance-attribute

include_row_id_and_row_version: Optional[bool] = True

Should the first two columns contain the row ID and row version? The default value is 'true'.

csv_table_descriptor class-attribute instance-attribute

csv_table_descriptor: Optional[CsvTableDescriptor] = None

The description of a csv for upload or download.

file_name class-attribute instance-attribute

file_name: Optional[str] = None

The optional name for the downloaded table.

sql class-attribute instance-attribute

sql: Optional[str] = None

The SQL query to execute

additional_filters class-attribute instance-attribute

additional_filters: Optional[List[Dict[str, Any]]] = None

Appends additional filters to the SQL query. These are applied before facets. Filters within the list have an AND relationship. If a WHERE clause already exists on the SQL query or facets are selected, it will also be ANDed with the query generated by these additional filters.

selected_facets class-attribute instance-attribute

selected_facets: Optional[List[Dict[str, Any]]] = None

The selected facet filters.

include_entity_etag class-attribute instance-attribute

include_entity_etag: Optional[bool] = False

"Optional, default false. When true, a query results against views will include the Etag of each entity in the results. Note: The etag is necessary to update Entities in the view.

select_file_column class-attribute instance-attribute

select_file_column: Optional[int] = None

The id of the column used to select file entities (e.g. to fetch the action required for download). The column needs to be an ENTITYID type column and be part of the schema of the underlying table/view.

select_file_version_column class-attribute instance-attribute

select_file_version_column: Optional[int] = None

The id of the column used as the version for selecting file entities when required (e.g. to add a materialized view query to the download cart with version enabled). The column needs to be an INTEGER type column and be part of the schema of the underlying table/view.

offset class-attribute instance-attribute

offset: Optional[int] = None

The optional offset into the results

limit class-attribute instance-attribute

limit: Optional[int] = None

The optional limit to the results

sort class-attribute instance-attribute

sort: Optional[List[Dict[str, Any]]] = None

The sort order for the query results (ARRAY)

job_id class-attribute instance-attribute

job_id: Optional[str] = None

The job ID returned from the async job

results_file_handle_id class-attribute instance-attribute

results_file_handle_id: Optional[str] = None

The file handle ID of the results CSV file

table_id class-attribute instance-attribute

table_id: Optional[str] = None

The ID of the table that was queried

etag class-attribute instance-attribute

etag: Optional[str] = None

The etag of the table

headers class-attribute instance-attribute

headers: Optional[List[SelectColumn]] = None

The column headers from the query result

response_concrete_type class-attribute instance-attribute

response_concrete_type: Optional[str] = QUERY_TABLE_CSV_RESULT

The concrete type of the response (usually DownloadFromTableResult)

Functions

to_synapse_request

to_synapse_request() -> Dict[str, Any]

Convert to DownloadFromTableRequest format for async job submission.

Source code in synapseclient/models/table_components.py
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
def to_synapse_request(self) -> Dict[str, Any]:
    """Convert to DownloadFromTableRequest format for async job submission."""

    csv_table_descriptor = None
    if self.csv_table_descriptor:
        csv_table_descriptor = self.csv_table_descriptor.to_synapse_request()

    synapse_request = {
        "concreteType": QUERY_TABLE_CSV_REQUEST,
        "entityId": self.entity_id,
        "csvTableDescriptor": csv_table_descriptor,
        "sql": self.sql,
        "writeHeader": self.write_header,
        "includeRowIdAndRowVersion": self.include_row_id_and_row_version,
        "includeEntityEtag": self.include_entity_etag,
        "fileName": self.file_name,
        "additionalFilters": self.additional_filters,
        "selectedFacet": self.selected_facets,
        "selectFileColumns": self.select_file_column,
        "selectFileVersionColumns": self.select_file_version_column,
        "offset": self.offset,
        "sort": self.sort,
    }
    delete_none_keys(synapse_request)
    return synapse_request

fill_from_dict

fill_from_dict(synapse_response: Dict[str, Any]) -> Self

Fill the job results from Synapse response.

Source code in synapseclient/models/table_components.py
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
def fill_from_dict(self, synapse_response: Dict[str, Any]) -> "Self":
    """Fill the job results from Synapse response."""
    # Fill response attributes from DownloadFromTableResult
    headers = None
    headers_data = synapse_response.get("headers")
    if headers_data and isinstance(headers_data, list):
        headers = [SelectColumn.fill_from_dict(header) for header in headers_data]

    self.job_id = synapse_response.get("jobId")
    self.response_concrete_type = synapse_response.get("concreteType")
    self.results_file_handle_id = synapse_response.get("resultsFileHandleId")
    self.table_id = synapse_response.get("tableId")
    self.etag = synapse_response.get("etag")
    self.headers = headers

    return self

synapseclient.models.QueryNextPageToken dataclass

Token for retrieving the next page of query results.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryNextPageToken.html

Source code in synapseclient/models/table_components.py
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
@dataclass
class QueryNextPageToken:
    """
    Token for retrieving the next page of query results.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryNextPageToken.html>
    """

    concrete_type: Optional[str] = None
    """The concrete type of this object"""

    entity_id: Optional[str] = None
    """The ID of the entity (table/view) being queried"""

    token: Optional[str] = None
    """The token for the next page."""

    @classmethod
    def fill_from_dict(cls, data: Dict[str, Any]) -> "QueryNextPageToken":
        """Create a QueryNextPageToken from a dictionary response."""
        return cls(
            concrete_type=data.get("concreteType"),
            entity_id=data.get("entityId"),
            token=data.get("token"),
        )

Attributes

concrete_type class-attribute instance-attribute

concrete_type: Optional[str] = None

The concrete type of this object

entity_id class-attribute instance-attribute

entity_id: Optional[str] = None

The ID of the entity (table/view) being queried

token class-attribute instance-attribute

token: Optional[str] = None

The token for the next page.

Functions

fill_from_dict classmethod

fill_from_dict(data: Dict[str, Any]) -> QueryNextPageToken

Create a QueryNextPageToken from a dictionary response.

Source code in synapseclient/models/table_components.py
706
707
708
709
710
711
712
713
@classmethod
def fill_from_dict(cls, data: Dict[str, Any]) -> "QueryNextPageToken":
    """Create a QueryNextPageToken from a dictionary response."""
    return cls(
        concrete_type=data.get("concreteType"),
        entity_id=data.get("entityId"),
        token=data.get("token"),
    )

synapseclient.models.QueryResult dataclass

A page of query result.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryResult.html

Source code in synapseclient/models/table_components.py
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
@dataclass
class QueryResult:
    """
    A page of query result.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryResult.html>
    """

    query_results: RowSet
    """Represents a set of row of a TableEntity (RowSet)"""

    concrete_type: str = QUERY_RESULT
    """The concrete type of this object"""

    next_page_token: Optional[QueryNextPageToken] = None
    """Token for retrieving the next page of results, if available"""

    @classmethod
    def fill_from_dict(cls, data: Dict[str, Any]) -> "QueryResult":
        """Create a QueryResult from a dictionary response."""
        next_page_token = None
        query_results = data.get("queryResults", None)

        if data.get("nextPageToken", None):
            next_page_token = QueryNextPageToken.fill_from_dict(data["nextPageToken"])

        if data.get("queryResults", None):
            query_results = RowSet.fill_from_dict(data["queryResults"])

        return cls(
            concrete_type=data.get("concreteType"),
            query_results=query_results,
            next_page_token=next_page_token,
        )

Attributes

query_results instance-attribute

query_results: RowSet

Represents a set of row of a TableEntity (RowSet)

concrete_type class-attribute instance-attribute

concrete_type: str = QUERY_RESULT

The concrete type of this object

next_page_token class-attribute instance-attribute

next_page_token: Optional[QueryNextPageToken] = None

Token for retrieving the next page of results, if available

Functions

fill_from_dict classmethod

fill_from_dict(data: Dict[str, Any]) -> QueryResult

Create a QueryResult from a dictionary response.

Source code in synapseclient/models/table_components.py
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
@classmethod
def fill_from_dict(cls, data: Dict[str, Any]) -> "QueryResult":
    """Create a QueryResult from a dictionary response."""
    next_page_token = None
    query_results = data.get("queryResults", None)

    if data.get("nextPageToken", None):
        next_page_token = QueryNextPageToken.fill_from_dict(data["nextPageToken"])

    if data.get("queryResults", None):
        query_results = RowSet.fill_from_dict(data["queryResults"])

    return cls(
        concrete_type=data.get("concreteType"),
        query_results=query_results,
        next_page_token=next_page_token,
    )

synapseclient.models.QueryResultBundle dataclass

A bundle of information about a query result.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryResultBundle.html

Source code in synapseclient/models/table_components.py
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
@dataclass
class QueryResultBundle:
    """
    A bundle of information about a query result.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryResultBundle.html>
    """

    concrete_type: str = QUERY_TABLE_CSV_REQUEST
    """The concrete type of this object"""

    query_result: QueryResult = None
    """A page of query result"""

    query_count: Optional[int] = None
    """The total number of rows that match the query. Use mask = 0x2 to include in the
    bundle."""

    select_columns: Optional[List[SelectColumn]] = None
    """The list of SelectColumns from the select clause. Use mask = 0x4 to include in
    the bundle."""

    max_rows_per_page: Optional[int] = None
    """The maximum number of rows that can be retrieved in a single call. This is a
    function of the columns that are selected in the query. Use mask = 0x8 to include
    in the bundle."""

    column_models: Optional[List[Column]] = None
    """The list of ColumnModels for the table. Use mask = 0x10 to include in the bundle."""

    facets: Optional[List[Dict[str, Any]]] = None
    """TODO: create facets dataclass"""
    """The list of facets for the search results. Use mask = 0x20 to include in the bundle."""

    sum_file_sizes: Optional[SumFileSizes] = None
    """The sum of the file size for all files in the given view query. Use mask = 0x40
    to include in the bundle."""

    last_updated_on: Optional[str] = None
    """The date-time when this table/view was last updated. Note: Since views are
    eventually consistent a view might still be out-of-date even if it was recently
    updated. Use mask = 0x80 to include in the bundle. This is returned in the
    ISO8601 format like `2000-01-01T00:00:00.000Z`."""

    combined_sql: Optional[str] = None
    """The SQL that is combination of a the input SQL, FacetRequests, AdditionalFilters,
    Sorting, and Pagination. Use mask = 0x100 to include in the bundle."""

    actions_required: Optional[List[ActionRequiredCount]] = None
    """The first 50 actions required to download the files that are part of the query.
    Use mask = 0x200 to include them in the bundle."""

    @classmethod
    def fill_from_dict(cls, data: Dict[str, Any]) -> "QueryResultBundle":
        """Create a QueryResultBundle from a dictionary response."""
        # Handle sum_file_sizes
        sum_file_sizes = None
        sum_file_sizes_data = data.get("sumFileSizes")
        if sum_file_sizes_data:
            sum_file_sizes = SumFileSizes(
                sum_file_size_bytes=sum_file_sizes_data.get("sumFileSizesBytes"),
                greater_than=sum_file_sizes_data.get("greaterThan"),
            )

        # Handle query_result
        query_result = None
        query_result_data = data.get("queryResult")
        if query_result_data:
            query_result = QueryResult.fill_from_dict(query_result_data)

        # Handle select_columns
        select_columns = None
        select_columns_data = data.get("selectColumns")
        if select_columns_data and isinstance(select_columns_data, list):
            select_columns = [
                SelectColumn.fill_from_dict(col) for col in select_columns_data
            ]

        # Handle actions_required
        actions_required = None
        actions_required_data = data.get("actionsRequired")
        if actions_required_data and isinstance(actions_required_data, list):
            actions_required = [
                ActionRequiredCount.fill_from_dict(action)
                for action in actions_required_data
            ]

        # Handle column_models
        column_models = None
        column_models_data = data.get("columnModels")
        if column_models_data and isinstance(column_models_data, list):
            column_models = [Column().fill_from_dict(col) for col in column_models_data]

        return cls(
            concrete_type=data.get("concreteType"),
            query_result=query_result,
            query_count=data.get("queryCount"),
            select_columns=select_columns,
            max_rows_per_page=data.get("maxRowsPerPage"),
            column_models=column_models,
            facets=data.get("facets"),
            sum_file_sizes=sum_file_sizes,
            last_updated_on=data.get("lastUpdatedOn"),
            combined_sql=data.get("combinedSql"),
            actions_required=actions_required,
        )

Attributes

concrete_type class-attribute instance-attribute

concrete_type: str = QUERY_TABLE_CSV_REQUEST

The concrete type of this object

query_result class-attribute instance-attribute

query_result: QueryResult = None

A page of query result

query_count class-attribute instance-attribute

query_count: Optional[int] = None

The total number of rows that match the query. Use mask = 0x2 to include in the bundle.

select_columns class-attribute instance-attribute

select_columns: Optional[List[SelectColumn]] = None

The list of SelectColumns from the select clause. Use mask = 0x4 to include in the bundle.

max_rows_per_page class-attribute instance-attribute

max_rows_per_page: Optional[int] = None

The maximum number of rows that can be retrieved in a single call. This is a function of the columns that are selected in the query. Use mask = 0x8 to include in the bundle.

column_models class-attribute instance-attribute

column_models: Optional[List[Column]] = None

The list of ColumnModels for the table. Use mask = 0x10 to include in the bundle.

facets class-attribute instance-attribute

facets: Optional[List[Dict[str, Any]]] = None

TODO: create facets dataclass

sum_file_sizes class-attribute instance-attribute

sum_file_sizes: Optional[SumFileSizes] = None

The sum of the file size for all files in the given view query. Use mask = 0x40 to include in the bundle.

last_updated_on class-attribute instance-attribute

last_updated_on: Optional[str] = None

The date-time when this table/view was last updated. Note: Since views are eventually consistent a view might still be out-of-date even if it was recently updated. Use mask = 0x80 to include in the bundle. This is returned in the ISO8601 format like 2000-01-01T00:00:00.000Z.

combined_sql class-attribute instance-attribute

combined_sql: Optional[str] = None

The SQL that is combination of a the input SQL, FacetRequests, AdditionalFilters, Sorting, and Pagination. Use mask = 0x100 to include in the bundle.

actions_required class-attribute instance-attribute

actions_required: Optional[List[ActionRequiredCount]] = None

The first 50 actions required to download the files that are part of the query. Use mask = 0x200 to include them in the bundle.

Functions

fill_from_dict classmethod

fill_from_dict(data: Dict[str, Any]) -> QueryResultBundle

Create a QueryResultBundle from a dictionary response.

Source code in synapseclient/models/table_components.py
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
@classmethod
def fill_from_dict(cls, data: Dict[str, Any]) -> "QueryResultBundle":
    """Create a QueryResultBundle from a dictionary response."""
    # Handle sum_file_sizes
    sum_file_sizes = None
    sum_file_sizes_data = data.get("sumFileSizes")
    if sum_file_sizes_data:
        sum_file_sizes = SumFileSizes(
            sum_file_size_bytes=sum_file_sizes_data.get("sumFileSizesBytes"),
            greater_than=sum_file_sizes_data.get("greaterThan"),
        )

    # Handle query_result
    query_result = None
    query_result_data = data.get("queryResult")
    if query_result_data:
        query_result = QueryResult.fill_from_dict(query_result_data)

    # Handle select_columns
    select_columns = None
    select_columns_data = data.get("selectColumns")
    if select_columns_data and isinstance(select_columns_data, list):
        select_columns = [
            SelectColumn.fill_from_dict(col) for col in select_columns_data
        ]

    # Handle actions_required
    actions_required = None
    actions_required_data = data.get("actionsRequired")
    if actions_required_data and isinstance(actions_required_data, list):
        actions_required = [
            ActionRequiredCount.fill_from_dict(action)
            for action in actions_required_data
        ]

    # Handle column_models
    column_models = None
    column_models_data = data.get("columnModels")
    if column_models_data and isinstance(column_models_data, list):
        column_models = [Column().fill_from_dict(col) for col in column_models_data]

    return cls(
        concrete_type=data.get("concreteType"),
        query_result=query_result,
        query_count=data.get("queryCount"),
        select_columns=select_columns,
        max_rows_per_page=data.get("maxRowsPerPage"),
        column_models=column_models,
        facets=data.get("facets"),
        sum_file_sizes=sum_file_sizes,
        last_updated_on=data.get("lastUpdatedOn"),
        combined_sql=data.get("combinedSql"),
        actions_required=actions_required,
    )

synapseclient.models.QueryResultOutput dataclass

The result of querying Synapse with an included part_mask. This class contains a subnet of the available items that may be returned by specifying a part_mask.

Source code in synapseclient/models/table_components.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
@dataclass
class QueryResultOutput:
    """
    The result of querying Synapse with an included `part_mask`. This class contains a
    subnet of the available items that may be returned by specifying a `part_mask`.
    """

    result: "DATA_FRAME_TYPE"
    """The result of the query"""

    count: Optional[int] = None
    """The total number of rows that match the query. Use mask = 0x2 to include in the
    bundle."""

    sum_file_sizes: Optional[SumFileSizes] = None
    """The sum of the file size for all files in the given view query. Use mask = 0x40
    to include in the bundle."""

    last_updated_on: Optional[str] = None
    """The date-time when this table/view was last updated. Note: Since views are
    eventually consistent a view might still be out-of-date even if it was recently
    updated. Use mask = 0x80 to include in the bundle. This is returned in the
    ISO8601 format like `2000-01-01T00:00:00.000Z`."""

    @classmethod
    def fill_from_dict(
        cls, result: "DATA_FRAME_TYPE", data: Dict[str, Any]
    ) -> "QueryResultOutput":
        """
        Create a QueryResultOutput from a result DataFrame and dictionary response.

        Arguments:
            result: The pandas DataFrame result from the query.
            data: The dictionary response from the REST API containing metadata.

        Returns:
            A QueryResultOutput instance.
        """
        sum_file_sizes = (
            SumFileSizes(
                sum_file_size_bytes=data["sum_file_sizes"].sum_file_size_bytes,
                greater_than=data["sum_file_sizes"].greater_than,
            )
            if data.get("sum_file_sizes")
            else None
        )

        return cls(
            result=result,
            count=data.get("count", None),
            sum_file_sizes=sum_file_sizes,
            last_updated_on=data.get("last_updated_on", None),
        )

Attributes

result instance-attribute

result: DATA_FRAME_TYPE

The result of the query

count class-attribute instance-attribute

count: Optional[int] = None

The total number of rows that match the query. Use mask = 0x2 to include in the bundle.

sum_file_sizes class-attribute instance-attribute

sum_file_sizes: Optional[SumFileSizes] = None

The sum of the file size for all files in the given view query. Use mask = 0x40 to include in the bundle.

last_updated_on class-attribute instance-attribute

last_updated_on: Optional[str] = None

The date-time when this table/view was last updated. Note: Since views are eventually consistent a view might still be out-of-date even if it was recently updated. Use mask = 0x80 to include in the bundle. This is returned in the ISO8601 format like 2000-01-01T00:00:00.000Z.

Functions

fill_from_dict classmethod

fill_from_dict(result: DATA_FRAME_TYPE, data: Dict[str, Any]) -> QueryResultOutput

Create a QueryResultOutput from a result DataFrame and dictionary response.

PARAMETER DESCRIPTION
result

The pandas DataFrame result from the query.

TYPE: DATA_FRAME_TYPE

data

The dictionary response from the REST API containing metadata.

TYPE: Dict[str, Any]

RETURNS DESCRIPTION
QueryResultOutput

A QueryResultOutput instance.

Source code in synapseclient/models/table_components.py
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
@classmethod
def fill_from_dict(
    cls, result: "DATA_FRAME_TYPE", data: Dict[str, Any]
) -> "QueryResultOutput":
    """
    Create a QueryResultOutput from a result DataFrame and dictionary response.

    Arguments:
        result: The pandas DataFrame result from the query.
        data: The dictionary response from the REST API containing metadata.

    Returns:
        A QueryResultOutput instance.
    """
    sum_file_sizes = (
        SumFileSizes(
            sum_file_size_bytes=data["sum_file_sizes"].sum_file_size_bytes,
            greater_than=data["sum_file_sizes"].greater_than,
        )
        if data.get("sum_file_sizes")
        else None
    )

    return cls(
        result=result,
        count=data.get("count", None),
        sum_file_sizes=sum_file_sizes,
        last_updated_on=data.get("last_updated_on", None),
    )

synapseclient.models.Row dataclass

Represents a single row of a TableEntity.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/Row.html

Source code in synapseclient/models/table_components.py
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
@dataclass
class Row:
    """
    Represents a single row of a TableEntity.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/Row.html>
    """

    row_id: Optional[int] = None
    """The immutable ID issued to a new row."""

    version_number: Optional[int] = None
    """The version number of this row. Each row version is immutable, so when a row
    is updated a new version is created."""

    etag: Optional[str] = None
    """For queries against EntityViews with query.includeEntityEtag=true, this field
    will contain the etag of the entity. Will be null for all other cases."""

    values: Optional[List[str]] = None
    """The values for each column of this row. To delete a row, set this to an empty list: []"""

    def to_boolean(value):
        """
        Convert a string to boolean, case insensitively,
        where true values are: true, t, and 1 and false values are: false, f, 0.
        Raise a ValueError for all other values.
        """
        if value is None:
            raise ValueError("Can't convert None to boolean.")

        if isinstance(value, bool):
            return value

        if isinstance(value, str):
            lower_value = value.lower()
            if lower_value in ["true", "t", "1"]:
                return True
            if lower_value in ["false", "f", "0"]:
                return False

        raise ValueError(f"Can't convert {value} to boolean.")

    @staticmethod
    def cast_values(values, headers):
        """
        Convert a row of table query results from strings to the correct column type.

        See: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/ColumnType.html>
        """
        if len(values) != len(headers):
            raise ValueError(
                f"The number of columns in the csv file does not match the given headers. {len(values)} fields, {len(headers)} headers"
            )

        result = []
        for header, field in zip(headers, values):  # noqa: F402
            columnType = header.get("columnType", "STRING")

            # convert field to column type
            if field is None or field == "":
                result.append(None)
            elif columnType in {
                "STRING",
                "ENTITYID",
                "FILEHANDLEID",
                "LARGETEXT",
                "USERID",
                "LINK",
            }:
                result.append(field)
            elif columnType == "DOUBLE":
                result.append(float(field))
            elif columnType == "INTEGER":
                result.append(int(field))
            elif columnType == "BOOLEAN":
                result.append(Row.to_boolean(field))
            elif columnType == "DATE":
                result.append(from_unix_epoch_time(field))
            elif columnType in {
                "STRING_LIST",
                "INTEGER_LIST",
                "BOOLEAN_LIST",
                "ENTITYID_LIST",
                "USERID_LIST",
            }:
                result.append(json.loads(field))
            elif columnType == "DATE_LIST":
                result.append(json.loads(field, parse_int=from_unix_epoch_time))
            else:
                # default to string for unknown column type
                result.append(field)

        return result

    @classmethod
    def fill_from_dict(cls, data: Dict[str, Any]) -> "Row":
        """Create a Row from a dictionary response."""
        return cls(
            row_id=data.get("rowId"),
            version_number=data.get("versionNumber"),
            etag=data.get("etag"),
            values=data.get("values"),
        )

Attributes

row_id class-attribute instance-attribute

row_id: Optional[int] = None

The immutable ID issued to a new row.

version_number class-attribute instance-attribute

version_number: Optional[int] = None

The version number of this row. Each row version is immutable, so when a row is updated a new version is created.

etag class-attribute instance-attribute

etag: Optional[str] = None

For queries against EntityViews with query.includeEntityEtag=true, this field will contain the etag of the entity. Will be null for all other cases.

values class-attribute instance-attribute

values: Optional[List[str]] = None

The values for each column of this row. To delete a row, set this to an empty list: []

Functions

to_boolean

to_boolean(value)

Convert a string to boolean, case insensitively, where true values are: true, t, and 1 and false values are: false, f, 0. Raise a ValueError for all other values.

Source code in synapseclient/models/table_components.py
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
def to_boolean(value):
    """
    Convert a string to boolean, case insensitively,
    where true values are: true, t, and 1 and false values are: false, f, 0.
    Raise a ValueError for all other values.
    """
    if value is None:
        raise ValueError("Can't convert None to boolean.")

    if isinstance(value, bool):
        return value

    if isinstance(value, str):
        lower_value = value.lower()
        if lower_value in ["true", "t", "1"]:
            return True
        if lower_value in ["false", "f", "0"]:
            return False

    raise ValueError(f"Can't convert {value} to boolean.")

cast_values staticmethod

cast_values(values, headers)

Convert a row of table query results from strings to the correct column type.

See: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/ColumnType.html

Source code in synapseclient/models/table_components.py
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
@staticmethod
def cast_values(values, headers):
    """
    Convert a row of table query results from strings to the correct column type.

    See: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/ColumnType.html>
    """
    if len(values) != len(headers):
        raise ValueError(
            f"The number of columns in the csv file does not match the given headers. {len(values)} fields, {len(headers)} headers"
        )

    result = []
    for header, field in zip(headers, values):  # noqa: F402
        columnType = header.get("columnType", "STRING")

        # convert field to column type
        if field is None or field == "":
            result.append(None)
        elif columnType in {
            "STRING",
            "ENTITYID",
            "FILEHANDLEID",
            "LARGETEXT",
            "USERID",
            "LINK",
        }:
            result.append(field)
        elif columnType == "DOUBLE":
            result.append(float(field))
        elif columnType == "INTEGER":
            result.append(int(field))
        elif columnType == "BOOLEAN":
            result.append(Row.to_boolean(field))
        elif columnType == "DATE":
            result.append(from_unix_epoch_time(field))
        elif columnType in {
            "STRING_LIST",
            "INTEGER_LIST",
            "BOOLEAN_LIST",
            "ENTITYID_LIST",
            "USERID_LIST",
        }:
            result.append(json.loads(field))
        elif columnType == "DATE_LIST":
            result.append(json.loads(field, parse_int=from_unix_epoch_time))
        else:
            # default to string for unknown column type
            result.append(field)

    return result

fill_from_dict classmethod

fill_from_dict(data: Dict[str, Any]) -> Row

Create a Row from a dictionary response.

Source code in synapseclient/models/table_components.py
617
618
619
620
621
622
623
624
625
@classmethod
def fill_from_dict(cls, data: Dict[str, Any]) -> "Row":
    """Create a Row from a dictionary response."""
    return cls(
        row_id=data.get("rowId"),
        version_number=data.get("versionNumber"),
        etag=data.get("etag"),
        values=data.get("values"),
    )

synapseclient.models.RowSet dataclass

Represents a set of row of a TableEntity.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/RowSet.html

Source code in synapseclient/models/table_components.py
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
@dataclass
class RowSet:
    """
    Represents a set of row of a TableEntity.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/RowSet.html>
    """

    concrete_type: Optional[str] = None
    """The concrete type of this object"""

    table_id: Optional[str] = None
    """The ID of the TableEntity than owns these rows"""

    etag: Optional[str] = None
    """Any RowSet returned from Synapse will contain the current etag of the change set.
    To update any rows from a RowSet the etag must be provided with the POST."""

    headers: Optional[List[SelectColumn]] = None
    """The list of SelectColumns that describes the rows of this set."""

    rows: Optional[List[Row]] = field(default_factory=list)
    """The Rows of this set. The index of each row value aligns with the index of each header."""

    @classmethod
    def cast_row(
        cls, row: Dict[str, Any], headers: List[Dict[str, Any]]
    ) -> Dict[str, Any]:
        """
        Cast the values in a single row to their appropriate column types.

        This method takes a row dictionary containing string values from a table query
        response and converts them to the correct Python types based on the column
        headers. For example, converts string "123" to integer 123 for INTEGER columns,
        or string "true" to boolean True for BOOLEAN columns.

        Arguments:
            row: A dictionary representing a single table row with keys that need to be cast to proper types.
            headers: A list of header dictionaries, each containing column metadata
                including 'columnType' which determines how to cast the corresponding
                value in the row.

        Returns:
            The same row dictionary with the 'values' field updated to contain
            properly typed values instead of strings.
        """
        row["values"] = Row.cast_values(row["values"], headers)
        return row

    @classmethod
    def cast_row_set(cls, rows: List[Row], headers: List[Dict[str, Any]]) -> List[Row]:
        """
        Cast the values in multiple rows to their appropriate column types.

        This method takes a list of row dictionaries containing string values from a table query
        response and converts them to the correct Python types based on the column headers.
        It applies the same type casting logic as `cast_row` to each row in the collection.

        Arguments:
            rows: A list of row dictionaries, each representing a single table row with
                field contains a list of string values that need to be cast to proper types.
            headers: A list of header dictionaries, each containing column metadata
                including 'columnType' which determines how to cast the corresponding
                values in each row.

        Returns:
            A list of row dictionaries with the 'values' field in each row updated to
            contain properly typed values instead of strings.
        """
        rows = [cls.cast_row(row, headers) for row in rows]
        return rows

    @classmethod
    def fill_from_dict(cls, data: Dict[str, Any]) -> "RowSet":
        """Create a RowSet from a dictionary response."""
        headers_data = data.get("headers")
        rows_data = data.get("rows")

        # Handle headers - convert to SelectColumn objects
        headers = None
        if headers_data and isinstance(headers_data, list):
            headers = [SelectColumn.fill_from_dict(header) for header in headers_data]

        # Handle rows - cast values and convert to Row objects
        rows = None
        if rows_data and isinstance(rows_data, list):
            # Cast row values based on header types if headers are available
            if headers_data and isinstance(headers_data, list):
                rows_data = cls.cast_row_set(rows_data, headers_data)
            # Convert to Row objects
            rows = [Row.fill_from_dict(row) for row in rows_data]

        return cls(
            concrete_type=data.get("concreteType"),
            table_id=data.get("tableId"),
            etag=data.get("etag"),
            headers=headers,
            rows=rows,
        )

Attributes

concrete_type class-attribute instance-attribute

concrete_type: Optional[str] = None

The concrete type of this object

table_id class-attribute instance-attribute

table_id: Optional[str] = None

The ID of the TableEntity than owns these rows

etag class-attribute instance-attribute

etag: Optional[str] = None

Any RowSet returned from Synapse will contain the current etag of the change set. To update any rows from a RowSet the etag must be provided with the POST.

headers class-attribute instance-attribute

headers: Optional[List[SelectColumn]] = None

The list of SelectColumns that describes the rows of this set.

rows class-attribute instance-attribute

rows: Optional[List[Row]] = field(default_factory=list)

The Rows of this set. The index of each row value aligns with the index of each header.

Functions

cast_row classmethod

cast_row(row: Dict[str, Any], headers: List[Dict[str, Any]]) -> Dict[str, Any]

Cast the values in a single row to their appropriate column types.

This method takes a row dictionary containing string values from a table query response and converts them to the correct Python types based on the column headers. For example, converts string "123" to integer 123 for INTEGER columns, or string "true" to boolean True for BOOLEAN columns.

PARAMETER DESCRIPTION
row

A dictionary representing a single table row with keys that need to be cast to proper types.

TYPE: Dict[str, Any]

headers

A list of header dictionaries, each containing column metadata including 'columnType' which determines how to cast the corresponding value in the row.

TYPE: List[Dict[str, Any]]

RETURNS DESCRIPTION
Dict[str, Any]

The same row dictionary with the 'values' field updated to contain

Dict[str, Any]

properly typed values instead of strings.

Source code in synapseclient/models/table_components.py
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
@classmethod
def cast_row(
    cls, row: Dict[str, Any], headers: List[Dict[str, Any]]
) -> Dict[str, Any]:
    """
    Cast the values in a single row to their appropriate column types.

    This method takes a row dictionary containing string values from a table query
    response and converts them to the correct Python types based on the column
    headers. For example, converts string "123" to integer 123 for INTEGER columns,
    or string "true" to boolean True for BOOLEAN columns.

    Arguments:
        row: A dictionary representing a single table row with keys that need to be cast to proper types.
        headers: A list of header dictionaries, each containing column metadata
            including 'columnType' which determines how to cast the corresponding
            value in the row.

    Returns:
        The same row dictionary with the 'values' field updated to contain
        properly typed values instead of strings.
    """
    row["values"] = Row.cast_values(row["values"], headers)
    return row

cast_row_set classmethod

cast_row_set(rows: List[Row], headers: List[Dict[str, Any]]) -> List[Row]

Cast the values in multiple rows to their appropriate column types.

This method takes a list of row dictionaries containing string values from a table query response and converts them to the correct Python types based on the column headers. It applies the same type casting logic as cast_row to each row in the collection.

PARAMETER DESCRIPTION
rows

A list of row dictionaries, each representing a single table row with field contains a list of string values that need to be cast to proper types.

TYPE: List[Row]

headers

A list of header dictionaries, each containing column metadata including 'columnType' which determines how to cast the corresponding values in each row.

TYPE: List[Dict[str, Any]]

RETURNS DESCRIPTION
List[Row]

A list of row dictionaries with the 'values' field in each row updated to

List[Row]

contain properly typed values instead of strings.

Source code in synapseclient/models/table_components.py
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
@classmethod
def cast_row_set(cls, rows: List[Row], headers: List[Dict[str, Any]]) -> List[Row]:
    """
    Cast the values in multiple rows to their appropriate column types.

    This method takes a list of row dictionaries containing string values from a table query
    response and converts them to the correct Python types based on the column headers.
    It applies the same type casting logic as `cast_row` to each row in the collection.

    Arguments:
        rows: A list of row dictionaries, each representing a single table row with
            field contains a list of string values that need to be cast to proper types.
        headers: A list of header dictionaries, each containing column metadata
            including 'columnType' which determines how to cast the corresponding
            values in each row.

    Returns:
        A list of row dictionaries with the 'values' field in each row updated to
        contain properly typed values instead of strings.
    """
    rows = [cls.cast_row(row, headers) for row in rows]
    return rows

fill_from_dict classmethod

fill_from_dict(data: Dict[str, Any]) -> RowSet

Create a RowSet from a dictionary response.

Source code in synapseclient/models/table_components.py
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
@classmethod
def fill_from_dict(cls, data: Dict[str, Any]) -> "RowSet":
    """Create a RowSet from a dictionary response."""
    headers_data = data.get("headers")
    rows_data = data.get("rows")

    # Handle headers - convert to SelectColumn objects
    headers = None
    if headers_data and isinstance(headers_data, list):
        headers = [SelectColumn.fill_from_dict(header) for header in headers_data]

    # Handle rows - cast values and convert to Row objects
    rows = None
    if rows_data and isinstance(rows_data, list):
        # Cast row values based on header types if headers are available
        if headers_data and isinstance(headers_data, list):
            rows_data = cls.cast_row_set(rows_data, headers_data)
        # Convert to Row objects
        rows = [Row.fill_from_dict(row) for row in rows_data]

    return cls(
        concrete_type=data.get("concreteType"),
        table_id=data.get("tableId"),
        etag=data.get("etag"),
        headers=headers,
        rows=rows,
    )

synapseclient.models.SelectColumn dataclass

A column model contains the metadata of a single column of a TableEntity.

This result is modeled from: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/SelectColumn.html

Source code in synapseclient/models/table_components.py
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
@dataclass
class SelectColumn:
    """
    A column model contains the metadata of a single column of a TableEntity.

    This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/SelectColumn.html>
    """

    name: Optional[str] = None
    """The required display name of the column"""

    column_type: Optional[ColumnType] = None
    """The column type determines the type of data that can be stored in a column.
    Switching between types (using a transaction with TableUpdateTransactionRequest
    in the "changes" list) is generally allowed except for switching to "_LIST"
    suffixed types. In such cases, a new column must be created and data must be
    copied over manually"""

    id: Optional[str] = None
    """The optional ID of the select column, if this is a direct column selected"""

    @classmethod
    def fill_from_dict(cls, data: Dict[str, Any]) -> "SelectColumn":
        """Create a SelectColumn from a dictionary response."""
        column_type = None
        column_type_value = data.get("columnType")
        if column_type_value:
            try:
                column_type = ColumnType(column_type_value)
            except ValueError:
                column_type = None
        return cls(
            name=data.get("name"),
            column_type=column_type,
            id=data.get("id"),
        )

Attributes

name class-attribute instance-attribute

name: Optional[str] = None

The required display name of the column

column_type class-attribute instance-attribute

column_type: Optional[ColumnType] = None

The column type determines the type of data that can be stored in a column. Switching between types (using a transaction with TableUpdateTransactionRequest in the "changes" list) is generally allowed except for switching to "_LIST" suffixed types. In such cases, a new column must be created and data must be copied over manually

id class-attribute instance-attribute

id: Optional[str] = None

The optional ID of the select column, if this is a direct column selected

Functions

fill_from_dict classmethod

fill_from_dict(data: Dict[str, Any]) -> SelectColumn

Create a SelectColumn from a dictionary response.

Source code in synapseclient/models/table_components.py
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
@classmethod
def fill_from_dict(cls, data: Dict[str, Any]) -> "SelectColumn":
    """Create a SelectColumn from a dictionary response."""
    column_type = None
    column_type_value = data.get("columnType")
    if column_type_value:
        try:
            column_type = ColumnType(column_type_value)
        except ValueError:
            column_type = None
    return cls(
        name=data.get("name"),
        column_type=column_type,
        id=data.get("id"),
    )

synapseclient.models.ColumnChange dataclass

A change to a column in a table. This is used in the TableSchemaChangeRequest to indicate what changes should be made to the columns in the table.

Source code in synapseclient/models/table_components.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
@dataclass
class ColumnChange:
    """
    A change to a column in a table. This is used in the `TableSchemaChangeRequest` to
    indicate what changes should be made to the columns in the table.
    """

    concrete_type: str = concrete_types.COLUMN_CHANGE

    old_column_id: Optional[str] = None
    """The ID of the old ColumnModel to be replaced with the new. Set to null to indicate a new column should be added without replacing an old column."""

    new_column_id: Optional[str] = None
    """The ID of the new ColumnModel to replace the old. Set to null to indicate the old column should be removed without being replaced."""

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""

        return {
            "concreteType": self.concrete_type,
            "oldColumnId": self.old_column_id,
            "newColumnId": self.new_column_id,
        }

Attributes

old_column_id class-attribute instance-attribute

old_column_id: Optional[str] = None

The ID of the old ColumnModel to be replaced with the new. Set to null to indicate a new column should be added without replacing an old column.

new_column_id class-attribute instance-attribute

new_column_id: Optional[str] = None

The ID of the new ColumnModel to replace the old. Set to null to indicate the old column should be removed without being replaced.

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
261
262
263
264
265
266
267
268
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""

    return {
        "concreteType": self.concrete_type,
        "oldColumnId": self.old_column_id,
        "newColumnId": self.new_column_id,
    }

synapseclient.models.PartialRow dataclass

A partial row to be added to a table. This is used in the PartialRowSet to indicate what rows should be updated in a table during the upsert operation.

Source code in synapseclient/models/table_components.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
@dataclass
class PartialRow:
    """
    A partial row to be added to a table. This is used in the `PartialRowSet` to
    indicate what rows should be updated in a table during the upsert operation.
    """

    row_id: str
    values: List[Dict[str, Any]]
    etag: Optional[str] = None

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        result = {
            "etag": self.etag,
            "rowId": self.row_id,
            "values": self.values,
        }
        delete_none_keys(result)
        return result

    def size(self) -> int:
        """
        Returns the size of the PartialRow in bytes. This is not an exact size but
        follows the calculation as used in the Rest API:

        <https://github.com/Sage-Bionetworks/Synapse-Repository-Services/blob/8bf7f60c46b76625c0d4be33fafc5cf896e50b36/lib/lib-table-cluster/src/main/java/org/sagebionetworks/table/cluster/utils/TableModelUtils.java#L952-L965>
        """
        char_count = 0
        if self.values:
            for value in self.values:
                char_count += len(value["key"])
                if value["value"] is not None:
                    char_count += len(str(value["value"]))
        return 4 * char_count

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
149
150
151
152
153
154
155
156
157
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    result = {
        "etag": self.etag,
        "rowId": self.row_id,
        "values": self.values,
    }
    delete_none_keys(result)
    return result

size

size() -> int

Returns the size of the PartialRow in bytes. This is not an exact size but follows the calculation as used in the Rest API:

https://github.com/Sage-Bionetworks/Synapse-Repository-Services/blob/8bf7f60c46b76625c0d4be33fafc5cf896e50b36/lib/lib-table-cluster/src/main/java/org/sagebionetworks/table/cluster/utils/TableModelUtils.java#L952-L965

Source code in synapseclient/models/table_components.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def size(self) -> int:
    """
    Returns the size of the PartialRow in bytes. This is not an exact size but
    follows the calculation as used in the Rest API:

    <https://github.com/Sage-Bionetworks/Synapse-Repository-Services/blob/8bf7f60c46b76625c0d4be33fafc5cf896e50b36/lib/lib-table-cluster/src/main/java/org/sagebionetworks/table/cluster/utils/TableModelUtils.java#L952-L965>
    """
    char_count = 0
    if self.values:
        for value in self.values:
            char_count += len(value["key"])
            if value["value"] is not None:
                char_count += len(str(value["value"]))
    return 4 * char_count

synapseclient.models.PartialRowSet dataclass

A set of partial rows to be added to a table. This is used in the AppendableRowSetRequest to indicate what rows should be updated in a table during the upsert operation.

Source code in synapseclient/models/table_components.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
@dataclass
class PartialRowSet:
    """
    A set of partial rows to be added to a table. This is used in the
    `AppendableRowSetRequest` to indicate what rows should be updated in a table
    during the upsert operation.
    """

    table_id: str
    rows: List[PartialRow]
    concrete_type: str = concrete_types.PARTIAL_ROW_SET

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        return {
            "concreteType": self.concrete_type,
            "tableId": self.table_id,
            "rows": [row.to_synapse_request() for row in self.rows],
        }

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
187
188
189
190
191
192
193
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    return {
        "concreteType": self.concrete_type,
        "tableId": self.table_id,
        "rows": [row.to_synapse_request() for row in self.rows],
    }

synapseclient.models.TableSchemaChangeRequest dataclass

A request to change the schema of a table. This is used to change the columns in a table. This request is used in the TableUpdateTransaction to indicate what changes should be made to the columns in the table.

Source code in synapseclient/models/table_components.py
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
@dataclass
class TableSchemaChangeRequest:
    """
    A request to change the schema of a table. This is used to change the columns in a
    table. This request is used in the `TableUpdateTransaction` to indicate what
    changes should be made to the columns in the table.
    """

    entity_id: str
    changes: List[ColumnChange]
    ordered_column_ids: List[str]
    concrete_type: str = concrete_types.TABLE_SCHEMA_CHANGE_REQUEST

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        return {
            "concreteType": self.concrete_type,
            "entityId": self.entity_id,
            "changes": [change.to_synapse_request() for change in self.changes],
            "orderedColumnIds": self.ordered_column_ids,
        }

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
284
285
286
287
288
289
290
291
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    return {
        "concreteType": self.concrete_type,
        "entityId": self.entity_id,
        "changes": [change.to_synapse_request() for change in self.changes],
        "orderedColumnIds": self.ordered_column_ids,
    }

synapseclient.models.AppendableRowSetRequest dataclass

A request to append rows to a table. This is used to append rows to a table. This request is used in the TableUpdateTransaction to indicate what rows should be upserted in the table.

Source code in synapseclient/models/table_components.py
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
@dataclass
class AppendableRowSetRequest:
    """
    A request to append rows to a table. This is used to append rows to a table. This
    request is used in the `TableUpdateTransaction` to indicate what rows should
    be upserted in the table.
    """

    entity_id: str
    to_append: PartialRowSet
    concrete_type: str = concrete_types.APPENDABLE_ROWSET_REQUEST

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        return {
            "concreteType": self.concrete_type,
            "entityId": self.entity_id,
            "toAppend": self.to_append.to_synapse_request(),
        }

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
208
209
210
211
212
213
214
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    return {
        "concreteType": self.concrete_type,
        "entityId": self.entity_id,
        "toAppend": self.to_append.to_synapse_request(),
    }

synapseclient.models.UploadToTableRequest dataclass

A request to upload a file to a table. This is used to insert any rows via a CSV file into a table. This request is used in the TableUpdateTransaction.

Source code in synapseclient/models/table_components.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
@dataclass
class UploadToTableRequest:
    """
    A request to upload a file to a table. This is used to insert any rows via a CSV
    file into a table. This request is used in the `TableUpdateTransaction`.
    """

    table_id: str
    upload_file_handle_id: str
    update_etag: str
    lines_to_skip: int = 0
    csv_table_descriptor: CsvTableDescriptor = field(default_factory=CsvTableDescriptor)
    concrete_type: str = concrete_types.UPLOAD_TO_TABLE_REQUEST

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        request = {
            "concreteType": self.concrete_type,
            "tableId": self.table_id,
            "uploadFileHandleId": self.upload_file_handle_id,
            "updateEtag": self.update_etag,
            "linesToSkip": self.lines_to_skip,
            "csvTableDescriptor": self.csv_table_descriptor.to_synapse_request(),
        }

        delete_none_keys(request)
        return request

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
231
232
233
234
235
236
237
238
239
240
241
242
243
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    request = {
        "concreteType": self.concrete_type,
        "tableId": self.table_id,
        "uploadFileHandleId": self.upload_file_handle_id,
        "updateEtag": self.update_etag,
        "linesToSkip": self.lines_to_skip,
        "csvTableDescriptor": self.csv_table_descriptor.to_synapse_request(),
    }

    delete_none_keys(request)
    return request

synapseclient.models.TableUpdateTransaction dataclass

Bases: AsynchronousCommunicator

A request to update a table. This is used to update a table with a set of changes.

After calling the send_job_and_wait_async method the results attribute will be filled in based off https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/TableUpdateTransactionResponse.html.

Source code in synapseclient/models/table_components.py
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
@dataclass
class TableUpdateTransaction(AsynchronousCommunicator):
    """
    A request to update a table. This is used to update a table with a set of changes.

    After calling the `send_job_and_wait_async` method the `results` attribute will be
    filled in based off <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/TableUpdateTransactionResponse.html>.
    """

    entity_id: str
    concrete_type: str = concrete_types.TABLE_UPDATE_TRANSACTION_REQUEST
    create_snapshot: bool = False
    changes: Optional[
        List[
            Union[
                TableSchemaChangeRequest, UploadToTableRequest, AppendableRowSetRequest
            ]
        ]
    ] = None
    snapshot_options: Optional[SnapshotRequest] = None
    results: Optional[List[Dict[str, Any]]] = None
    snapshot_version_number: Optional[int] = None
    entities_with_changes_applied: Optional[List[str]] = None

    """This will be an array of
    <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/TableUpdateResponse.html>."""

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        request = {
            "concreteType": self.concrete_type,
            "entityId": self.entity_id,
            "createSnapshot": self.create_snapshot,
        }

        if self.changes:
            request["changes"] = [
                change.to_synapse_request() for change in self.changes
            ]
        if self.snapshot_options:
            request["snapshotOptions"] = self.snapshot_options.to_synapse_request()

        return request

    def fill_from_dict(self, synapse_response: Dict[str, str]) -> "Self":
        """
        Converts a response from the REST API into this dataclass.

        Arguments:
            synapse_response: The response from the REST API that matches <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/TableUpdateTransactionResponse.html>

        Returns:
            An instance of this class.
        """
        self.results = synapse_response.get("results", None)
        self.snapshot_version_number = synapse_response.get(
            "snapshotVersionNumber", None
        )

        if "results" in synapse_response:
            successful_entities = []
            for result in synapse_response["results"]:
                if "updateResults" in result:
                    for update_result in result["updateResults"]:
                        failure_code = update_result.get("failureCode", None)
                        failure_message = update_result.get("failureMessage", None)
                        entity_id = update_result.get("entityId", None)
                        if not failure_code and not failure_message and entity_id:
                            successful_entities.append(entity_id)
            if successful_entities:
                self.entities_with_changes_applied = successful_entities
        return self

Attributes

entities_with_changes_applied class-attribute instance-attribute

entities_with_changes_applied: Optional[List[str]] = None

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    request = {
        "concreteType": self.concrete_type,
        "entityId": self.entity_id,
        "createSnapshot": self.create_snapshot,
    }

    if self.changes:
        request["changes"] = [
            change.to_synapse_request() for change in self.changes
        ]
    if self.snapshot_options:
        request["snapshotOptions"] = self.snapshot_options.to_synapse_request()

    return request

fill_from_dict

fill_from_dict(synapse_response: Dict[str, str]) -> Self

Converts a response from the REST API into this dataclass.

PARAMETER DESCRIPTION
synapse_response

TYPE: Dict[str, str]

RETURNS DESCRIPTION
Self

An instance of this class.

Source code in synapseclient/models/table_components.py
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
def fill_from_dict(self, synapse_response: Dict[str, str]) -> "Self":
    """
    Converts a response from the REST API into this dataclass.

    Arguments:
        synapse_response: The response from the REST API that matches <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/TableUpdateTransactionResponse.html>

    Returns:
        An instance of this class.
    """
    self.results = synapse_response.get("results", None)
    self.snapshot_version_number = synapse_response.get(
        "snapshotVersionNumber", None
    )

    if "results" in synapse_response:
        successful_entities = []
        for result in synapse_response["results"]:
            if "updateResults" in result:
                for update_result in result["updateResults"]:
                    failure_code = update_result.get("failureCode", None)
                    failure_message = update_result.get("failureMessage", None)
                    entity_id = update_result.get("entityId", None)
                    if not failure_code and not failure_message and entity_id:
                        successful_entities.append(entity_id)
        if successful_entities:
            self.entities_with_changes_applied = successful_entities
    return self

synapseclient.models.CsvTableDescriptor dataclass

Derived from https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/CsvTableDescriptor.html

Source code in synapseclient/models/table_components.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
@dataclass
class CsvTableDescriptor:
    """Derived from <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/CsvTableDescriptor.html>"""

    separator: str = ","
    """The delimiter to be used for separating entries in the resulting file. The default character ',' will be used if this is not provided by the caller. For tab-separated values use '\t'"""

    quote_character: str = '"'
    """The character to be used for quoted elements in the resulting file. The default character '"' will be used if this is not provided by the caller."""

    escape_character: str = "\\"
    """The escape character to be used for escaping a separator or quote in the resulting file. The default character '\\' will be used if this is not provided by the caller."""

    line_end: str = os.linesep
    """The line feed terminator to be used for the resulting file. The default value of '\n' will be used if this is not provided by the caller."""

    is_first_line_header: bool = True
    """Is the first line a header? The default value of 'true' will be used if this is not provided by the caller."""

    def to_synapse_request(self):
        """Converts the request to a request expected of the Synapse REST API."""
        request = {
            "separator": self.separator,
            "quoteCharacter": self.quote_character,
            "escapeCharacter": self.escape_character,
            "lineEnd": self.line_end,
            "isFirstLineHeader": self.is_first_line_header,
        }
        delete_none_keys(request)
        return request

Attributes

separator class-attribute instance-attribute

separator: str = ','

The delimiter to be used for separating entries in the resulting file. The default character ',' will be used if this is not provided by the caller. For tab-separated values use ' '

quote_character class-attribute instance-attribute

quote_character: str = '"'

The character to be used for quoted elements in the resulting file. The default character '"' will be used if this is not provided by the caller.

escape_character class-attribute instance-attribute

escape_character: str = '\\'

The escape character to be used for escaping a separator or quote in the resulting file. The default character '\' will be used if this is not provided by the caller.

line_end class-attribute instance-attribute

line_end: str = linesep

The line feed terminator to be used for the resulting file. The default value of ' ' will be used if this is not provided by the caller.

is_first_line_header class-attribute instance-attribute

is_first_line_header: bool = True

Is the first line a header? The default value of 'true' will be used if this is not provided by the caller.

Functions

to_synapse_request

to_synapse_request()

Converts the request to a request expected of the Synapse REST API.

Source code in synapseclient/models/table_components.py
125
126
127
128
129
130
131
132
133
134
135
def to_synapse_request(self):
    """Converts the request to a request expected of the Synapse REST API."""
    request = {
        "separator": self.separator,
        "quoteCharacter": self.quote_character,
        "escapeCharacter": self.escape_character,
        "lineEnd": self.line_end,
        "isFirstLineHeader": self.is_first_line_header,
    }
    delete_none_keys(request)
    return request

synapseclient.models.mixins.table_components.csv_to_pandas_df

csv_to_pandas_df(filepath: Union[str, BytesIO], separator: str = DEFAULT_SEPARATOR, quote_char: str = DEFAULT_QUOTE_CHARACTER, escape_char: str = DEFAULT_ESCAPSE_CHAR, contain_headers: bool = True, lines_to_skip: int = 0, date_columns: Optional[List[str]] = None, list_columns: Optional[List[str]] = None, row_id_and_version_in_index: bool = True, dtype: Optional[Dict[str, Any]] = None, **kwargs) -> DATA_FRAME_TYPE

Convert a csv file to a pandas dataframe

PARAMETER DESCRIPTION
filepath

The path to the file.

TYPE: Union[str, BytesIO]

separator

The separator for the file, Defaults to DEFAULT_SEPARATOR. Passed as sep to pandas. If sep is supplied as a kwarg it will be used instead of this separator argument.

TYPE: str DEFAULT: DEFAULT_SEPARATOR

quote_char

The quote character for the file, Defaults to DEFAULT_QUOTE_CHARACTER. Passed as quotechar to pandas. If quotechar is supplied as a kwarg it will be used instead of this quote_char argument.

TYPE: str DEFAULT: DEFAULT_QUOTE_CHARACTER

escape_char

The escape character for the file, Defaults to DEFAULT_ESCAPSE_CHAR.

TYPE: str DEFAULT: DEFAULT_ESCAPSE_CHAR

contain_headers

Whether the file contains headers, Defaults to True.

TYPE: bool DEFAULT: True

lines_to_skip

The number of lines to skip at the beginning of the file, Defaults to 0. Passed as skiprows to pandas. If skiprows is supplied as a kwarg it will be used instead of this lines_to_skip argument.

TYPE: int DEFAULT: 0

date_columns

The names of the date columns in the file

TYPE: Optional[List[str]] DEFAULT: None

list_columns

The names of the list columns in the file

TYPE: Optional[List[str]] DEFAULT: None

row_id_and_version_in_index

Whether the file contains rowId and version in the index, Defaults to True.

TYPE: bool DEFAULT: True

dtype

The data type for the file, Defaults to None.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional keyword arguments to pass to pandas.read_csv. See https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html for complete list of supported arguments.

DEFAULT: {}

RETURNS DESCRIPTION
DATA_FRAME_TYPE

A pandas dataframe

Source code in synapseclient/models/mixins/table_components.py
4258
4259
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
4328
4329
4330
4331
4332
4333
4334
4335
4336
4337
4338
4339
4340
4341
4342
4343
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353
4354
4355
4356
def csv_to_pandas_df(
    filepath: Union[str, BytesIO],
    separator: str = DEFAULT_SEPARATOR,
    quote_char: str = DEFAULT_QUOTE_CHARACTER,
    escape_char: str = DEFAULT_ESCAPSE_CHAR,
    contain_headers: bool = True,
    lines_to_skip: int = 0,
    date_columns: Optional[List[str]] = None,
    list_columns: Optional[List[str]] = None,
    row_id_and_version_in_index: bool = True,
    dtype: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> DATA_FRAME_TYPE:
    """
    Convert a csv file to a pandas dataframe

    Arguments:
        filepath: The path to the file.
        separator: The separator for the file, Defaults to `DEFAULT_SEPARATOR`.
                    Passed as `sep` to pandas. If `sep` is supplied as a `kwarg`
                    it will be used instead of this `separator` argument.
        quote_char: The quote character for the file,
                    Defaults to `DEFAULT_QUOTE_CHARACTER`.
                    Passed as `quotechar` to pandas. If `quotechar` is supplied as a `kwarg`
                    it will be used instead of this `quote_char` argument.
        escape_char: The escape character for the file,
                    Defaults to `DEFAULT_ESCAPSE_CHAR`.
        contain_headers: Whether the file contains headers,
                    Defaults to `True`.
        lines_to_skip: The number of lines to skip at the beginning of the file,
                        Defaults to `0`. Passed as `skiprows` to pandas.
                        If `skiprows` is supplied as a `kwarg`
                        it will be used instead of this `lines_to_skip` argument.
        date_columns: The names of the date columns in the file
        list_columns: The names of the list columns in the file
        row_id_and_version_in_index: Whether the file contains rowId and
                                version in the index, Defaults to `True`.
        dtype: The data type for the file, Defaults to `None`.
        **kwargs: Additional keyword arguments to pass to pandas.read_csv. See
                    https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
                    for complete list of supported arguments.

    Returns:
        A pandas dataframe
    """
    test_import_pandas()
    from pandas import read_csv

    line_terminator = str(os.linesep)

    pandas_args = {
        "dtype": dtype,
        "sep": separator,
        "quotechar": quote_char,
        "escapechar": escape_char,
        "header": 0 if contain_headers else None,
        "skiprows": lines_to_skip,
    }
    pandas_args.update(kwargs)

    # assign line terminator only if for single character
    # line terminators (e.g. not '\r\n') 'cause pandas doesn't
    # longer line terminators. See: <https://github.com/pydata/pandas/issues/3501>
    # "ValueError: Only length-1 line terminators supported"
    df = read_csv(
        filepath,
        lineterminator=line_terminator if len(line_terminator) == 1 else None,
        **pandas_args,
    )

    # parse date columns if exists
    if date_columns:
        df = _convert_df_date_cols_to_datetime(df, date_columns)
    # Turn list columns into lists
    if list_columns:
        for col in list_columns:
            # Fill NA values with empty lists, it must be a string for json.loads to work
            df.fillna({col: "[]"}, inplace=True)
            df[col] = df[col].apply(json.loads)

    if (
        row_id_and_version_in_index
        and "ROW_ID" in df.columns
        and "ROW_VERSION" in df.columns
    ):
        # combine row-ids (in index) and row-versions (in column 0) to
        # make new row labels consisting of the row id and version
        # separated by a dash.
        zip_args = [df["ROW_ID"], df["ROW_VERSION"]]
        if "ROW_ETAG" in df.columns:
            zip_args.append(df["ROW_ETAG"])

        df.index = _row_labels_from_id_and_version(zip(*zip_args))
        del df["ROW_ID"]
        del df["ROW_VERSION"]
        if "ROW_ETAG" in df.columns:
            del df["ROW_ETAG"]

    return df