Data sources
import groundwork.core.datasources
Datasource
Abstract interface for reading from an external resource.
For most REST APIs, unless you are wrapping an existing client library, you probably want to use the subclass
ApiClient
instead of this class.
Subclasses:
Constructor:
Datasource(**kwargs)
Class variables
identifer
str
An attribute of ResourceT
that will re-fetch the resource when passed to get()
.
This will usually be id
and that is the default.
resource_type
Type[~ResourceT]
Class that API responses should be deserialized into.
MockDatasource
Simple in-memory datasource useful for stubbing out remote APIs in tests.
Inherits:
Constructor:
MockDatasource(data: List[~ResourceT], identifer: str = 'id', **kwargs: Any)
RestDatasource
Base class for implementing Rest API clients and converting their responses to resource objects.
Responses are validated using a django-rest Serializer to ensure that the returned data matches the types declared on the resource type.
You are encouraged to use Python's inbuilt @dataclass
decorator and define type hints when defining these classes as this allows type-safe serializers to be
auto-generated and decreases the amount of boilerplate code that you need to write.
Provides reasonable default behaviour for get and list operations. You will likely want to subclass this for each external service to acommodate differing behaviours around things like pagination.
Class variables can all either be provided as keyword-args to the constructor, or overridden in subclasses.
Conforms to the Datasource
interface, so instances of APIClient can be provided to SyncedModel
s as their
datasource.
Inherits:
Subclasses:
Constructor:
RestDatasource(**kwargs: Dict[str, Any])
Class variables
base_url
str
Base API url prepended to path
to produce the full endpoint url.
Can be overridden in subclasses or provided as a kwarg to the initializer.
filter
Optional[Callable[[~ResourceT], bool]]
Filter returned resources to those matching this predicate.
Can be overridden in subclasses or provided as a kwarg to the initializer.
parser_class
Type[rest_framework.parsers.BaseParser]
A django-rest parser used to parse API responses for processing by the serializer.
Can be overridden in subclasses or provided as a kwarg to the initializer.
If not provided, assumes you are dealing with json API responses using the same 'snake_case' conventions as Python attribute names.
path
str
Prepended to base_url
to produce the full endpoint url.
Can be overridden in subclasses or provided as a kwarg to the initializer.
serializer_class
Type[rest_framework.serializers.Serializer]
A django-rest serializer used to deserialize API responses into instances of the dataclass.
Can be overridden in subclasses or provided as a kwarg to the initializer.
If not provided, a serializer is generated from the class provided in resource_type
. You only need to provide a
serializer if the resource type is not decorated with the @dataclass
decorator, or you have custom serialization
requirements.
Instance variables
url
str
Methods
deserialize
deserialize( self, data: Any) ‑> ~ResourceT
Deserialize raw data representation returned by the API into an instance of resource_type.
Override this for advanced customization of resource deserialization. You will rarely need to do this as it is
generally easier to provide a custom serializer_class
The default implementation validates and returns a deserialized instance by calling through to deserializer_class
.
Parameters
data
- Raw (parsed but still serialized) data representation of the remote resource.
Raises
TypeError
- If validating the returned data fails.
- Returns
- An instance of this resource's resource_type type.
fetch_url
fetch_url( self, url: str, query: Dict[str, Any]) ‑> Any
Get a resource by URL and return its raw (parsed but not deserialized) response data.
Override this to customize how HTTP GET requests are made. The list() method will
The default implementation validates that the request is successful then parses the response data using parser_class
.
Parameters
url
- URL of the fetched resource
query
- Query params passed to the GET request.
Raises
OSError
- If the server response does not have a 2xx status code.
- Returns
- Raw (parsed but still serialized) data representation of the remote resource identified by
url
.
get
get( self, id: str, **kwargs: Dict[str, Any]) ‑> ~ResourceT
Get a resource by id, deserialize to the resource_type and return.
The default implementation creates the resource url by appending the id to the endpoint url.
Parameters
id
- External identifier for the fetched resource
**kwargs
- Query params passed to the API call.
- Returns
- A resource instance representing the remote datasource.
get_headers
get_headers( self) ‑> Dict[str, str]
Headers to add to requests. Defaults implementation returns none.
- Returns
- Dictionary of headers
list
list( self, **kwargs: Dict[str, Any]) ‑> Iterable[~ResourceT]
List, or search.
The default implementation creates the resource url by appending the id to the endpoint url.
Parameters
**kwargs
- Query params passed to the API call.
Yields
Resource instances representing the remote datasource.
paginate
paginate( self, **query: Dict[str, Any]) ‑> Iterable[~ResourceT]
List this resource and return an iterable of raw representations.
Override to customize how list() calls are paginated between or the url is constructed.
If you override this to support pagination, you should yield instances rather than returning a list.
The default implementation does not perform pagination – it expects the response data to be a simple list of resources.
Parameters
query
- Query params passed to the GET request.
Yields
Raw (parsed but still serialized) resource objects.
SyncConfig
Config object defining how subclasses of SyncedModel
sync with an external datasource.
Properties
All properties are valid as keyword-args to the constructor. They are required unless marked optional below.
datasource
Datasource[typing.Any]
External resource to periodically sync this model with
external_id
str
Field on both the external resource and this model that is used to map values returned from the external service onto instances of the model.
field_map
Optional[Dict[str, str]]
Map from fields in the model to fields in the external resource.
sync_interval
Optional[datetime.timedelta]
Frequency with which the model should be synced from the external source.
Defaults to one day. If set to None
, this model will never refresh itself from the external source and only
populate when referenced by another synced model, or sync()
is explicitly called.
SyncedModel
Base class for models are fetched on a schedule from a remote data source.
Models that subclass this class must declare a sync_config
attribute, which configures the remote
resource to pull from and how to merge it into the database.
Constructor:
SyncedModel(*args, **kwargs)
Class variables
sync_config
SyncConfig
Configuration object defining the datasource and how to sync it. Required for all non-abstract subclasses.
Instance variables
id
SyncedModels need to have a uuid primary key to handle recursive references when syncing.
last_sync_time
Last time this resource was updated from the datasource.
Static methods
sync
sync( )
Synchronizes the class immediately.