Skip to content

Data sources

import groundwork.core.datasources

Datasource

Abstract interface for reading from an external resource.

For most REST APIs, unless you are wrapping an existing client library, you probably want to use the subclass ApiClient instead of this class.

Subclasses:

Constructor:

Datasource(**kwargs)

Class variables

identifer

str

An attribute of ResourceT that will re-fetch the resource when passed to get().

This will usually be id and that is the default.

resource_type

Type[~ResourceT]

Class that API responses should be deserialized into.

MockDatasource

Simple in-memory datasource useful for stubbing out remote APIs in tests.

Inherits:

Constructor:

MockDatasource(data: List[~ResourceT], identifer: str = 'id', **kwargs: Any)

RestDatasource

Base class for implementing Rest API clients and converting their responses to resource objects.

Responses are validated using a django-rest Serializer to ensure that the returned data matches the types declared on the resource type.

You are encouraged to use Python's inbuilt @dataclass decorator and define type hints when defining these classes as this allows type-safe serializers to be auto-generated and decreases the amount of boilerplate code that you need to write.

Provides reasonable default behaviour for get and list operations. You will likely want to subclass this for each external service to acommodate differing behaviours around things like pagination.

Class variables can all either be provided as keyword-args to the constructor, or overridden in subclasses.

Conforms to the Datasource interface, so instances of APIClient can be provided to SyncedModels as their datasource.

Inherits:

Subclasses:

Constructor:

RestDatasource(**kwargs: Dict[str, Any])

Class variables

base_url

str

Base API url prepended to path to produce the full endpoint url.

Can be overridden in subclasses or provided as a kwarg to the initializer.

filter

Optional[Callable[[~ResourceT], bool]]

Filter returned resources to those matching this predicate.

Can be overridden in subclasses or provided as a kwarg to the initializer.

parser_class

Type[rest_framework.parsers.BaseParser]

A django-rest parser used to parse API responses for processing by the serializer.

Can be overridden in subclasses or provided as a kwarg to the initializer.

If not provided, assumes you are dealing with json API responses using the same 'snake_case' conventions as Python attribute names.

path

str

Prepended to base_url to produce the full endpoint url.

Can be overridden in subclasses or provided as a kwarg to the initializer.

serializer_class

Type[rest_framework.serializers.Serializer]

A django-rest serializer used to deserialize API responses into instances of the dataclass.

Can be overridden in subclasses or provided as a kwarg to the initializer.

If not provided, a serializer is generated from the class provided in resource_type. You only need to provide a serializer if the resource type is not decorated with the @dataclass decorator, or you have custom serialization requirements.

Instance variables

url

str

Methods

deserialize

deserialize(
self, data: Any) ‑> ~ResourceT

Deserialize raw data representation returned by the API into an instance of resource_type.

Override this for advanced customization of resource deserialization. You will rarely need to do this as it is generally easier to provide a custom serializer_class

The default implementation validates and returns a deserialized instance by calling through to deserializer_class.

Parameters

data
Raw (parsed but still serialized) data representation of the remote resource.

Raises

TypeError
If validating the returned data fails.
Returns
An instance of this resource's resource_type type.

fetch_url

fetch_url(
self, url: str, query: Dict[str, Any]) ‑> Any

Get a resource by URL and return its raw (parsed but not deserialized) response data.

Override this to customize how HTTP GET requests are made. The list() method will

The default implementation validates that the request is successful then parses the response data using parser_class.

Parameters

url
URL of the fetched resource
query
Query params passed to the GET request.

Raises

OSError
If the server response does not have a 2xx status code.
Returns
Raw (parsed but still serialized) data representation of the remote resource identified by url.

get

get(
self, id: str, **kwargs: Dict[str, Any]) ‑> ~ResourceT

Get a resource by id, deserialize to the resource_type and return.

The default implementation creates the resource url by appending the id to the endpoint url.

Parameters

id
External identifier for the fetched resource
**kwargs
Query params passed to the API call.
Returns
A resource instance representing the remote datasource.

get_headers

get_headers(
self) ‑> Dict[str, str]

Headers to add to requests. Defaults implementation returns none.

Returns
Dictionary of headers

list

list(
self, **kwargs: Dict[str, Any]) ‑> Iterable[~ResourceT]

List, or search.

The default implementation creates the resource url by appending the id to the endpoint url.

Parameters

**kwargs
Query params passed to the API call.

Yields

Resource instances representing the remote datasource.

paginate

paginate(
self, **query: Dict[str, Any]) ‑> Iterable[~ResourceT]

List this resource and return an iterable of raw representations.

Override to customize how list() calls are paginated between or the url is constructed.

If you override this to support pagination, you should yield instances rather than returning a list.

The default implementation does not perform pagination – it expects the response data to be a simple list of resources.

Parameters

query
Query params passed to the GET request.

Yields

Raw (parsed but still serialized) resource objects.

SyncConfig

Config object defining how subclasses of SyncedModel sync with an external datasource.

Properties

All properties are valid as keyword-args to the constructor. They are required unless marked optional below.

datasource

Datasource[typing.Any]

External resource to periodically sync this model with

external_id

str

Field on both the external resource and this model that is used to map values returned from the external service onto instances of the model.

field_map

Optional[Dict[str, str]]

Map from fields in the model to fields in the external resource.

sync_interval

Optional[datetime.timedelta]

Frequency with which the model should be synced from the external source.

Defaults to one day. If set to None, this model will never refresh itself from the external source and only populate when referenced by another synced model, or sync() is explicitly called.

SyncedModel

Base class for models are fetched on a schedule from a remote data source.

Models that subclass this class must declare a sync_config attribute, which configures the remote resource to pull from and how to merge it into the database.

Constructor:

SyncedModel(*args, **kwargs)

Class variables

sync_config

SyncConfig

Configuration object defining the datasource and how to sync it. Required for all non-abstract subclasses.

Instance variables

id

SyncedModels need to have a uuid primary key to handle recursive references when syncing.

last_sync_time

Last time this resource was updated from the datasource.

Static methods

sync

sync(
)

Synchronizes the class immediately.