2. Data Exchange Services Specifications
2.1 Document Purpose
The purpose of the document is to specify specifications for Data Exchange Services, including high level architecture and key requirements for data value, trust and compliance.
2.2 Definitions and Vocabulary
Data
: Any digital representation of acts, facts or information and any compilation of such acts, facts or information.Data Catalog
: A Data Catalogue presents a set of available Data and Data Products that can be queried.Data Consumer
: A participant that receives data in the form of a Data Product. The data is used for query, analysis, reporting or any other data processing.Data Exchange Services
: A set of services that provides features enabling a Data Exchange, such as and not limited to: policy negotiation for access control and usage control, exchange traceability, service protocol negotiation, data access, data tiering, access enforcement, usage enforcement. Note: Data Connector or Data Exchange Platform are two different architecture implementations of potentially similar Data Exchange service features.Data License
: A contract template which contains the constraints (terms and conditions) associated with the data included in the Data Product. All the terms and conditions of the Data Usage Consent must be subsumed in the Data Licence for all data included in the product.Data Licensor
: A natural or legal participant (not necessarily a Gaia-X participant) who own usage rights for some data. It can be a data subject as per GDPR for personal data or a primary owner of non-personal data (i.e. not liable to GDPR).Data Producer
: A natural or legal participant who furnishes data to a Data Provider.Data Product
: A collection of one or more data that are packaged by the Data Provider and made ready for Data Exchange.Data Product Usage Contract
: A legally binding agreement made by and between a Data Provider and a Data Consumer specifying the terms and conditions of a data exchange.Data Product Self-Description
: The Self-Description (as per Gaia-X TAD) of a Data Product. Data Product Self-Description contains a Data License.Data Provider
: A participant that acquires the right to access and use some data and that makes Data Products available.Data Transmission
: The term “Data Transmission” symbolizes a unique unit of data transmission, logged, treated in a coherent and reliable way independent of other transactions. It is materialized by a Signed Data Product Self-Description. The term “transaction” is preferred to “exchange” since the data asset goes from Provider to Consumer and there’s no exchange of data as such. Note: there are several modes of technical data transaction, such as: Pull (direct download from an endpoint defined by the Data Provider), Stream (continuous download from an endpoint defined by the Data Provider), Push (delivery by the Data Provider to some endpoint defined by the Data Consumer), Publish/Subscribe (delivery of new versions of the data to some endpoint defined by the Data Consumer – usually for frequently updated data).Data Usage Consent
: A legally binding agreement made by and between a Data Producer and Data Provider specifying the terms and conditions to the usage of the data. Data Usage Consent constitutes a legal usage consent for data which are subject to GDPR.Metadata
: Data about other data, documents, or set of data that describes their content, context, structure, data format, provenance, and/or rights attached to them.Term of Usage
: A specific instantiation of a Data Licence included in a Signed Data Product Self-Description listing all the constraints associated with a data exchange.
2.3 Data Exchange Services
Data exchange in Gaia-X is enabled by a set of Data Exchange Services that are realized by each Participant and can be supported by the Federation.
Not all Data Exchange Services are mandatory.
-
Authentication: The identities and Trust Framework are essential. Without this, you cannot connect two Participants. Identities provide general information on the Participant, and the Trust Framework appends additional claims, like verified location, or verified application of other standards or regulations.
-
Policy negotiation and contracting include the ability to negotiate access and usage policies between 2 parties. This should be a sequence between the parties, but a contracting service can support here, when one or multiple parties do not have the technical abilities for this.
a. These policies have a focus on interoperability. All parties must be able to understand the policies to enforce them later on.
b. ODRL is a good language candidate for this to support the negotiation and contracting service.
c. Such policies may be translated to executable policies during the transaction.
-
A catalog (or metadata broker) provides mechanisms to publish metadata on a service or data as Self-Descriptions and support search or query of the Self-Descriptions. A catalog may be realized as a centralized or decentralized service, but the capability can also be realized as a distributed functionality.
-
Vocabularies to provide additional metadata to the Self-Descriptions. The SD should contain a limited amount of information as a common denominator but must be extensible with vocabularies from different (business or technical) domains.
-
Observability (Logging and audit data) abilities are required to provide an auditable framework for transactions. (describe more on Logging and audit data later)
-
Apps, or the general ability for code 2 data.
-
Data Exchange protocols are required to exchange data between Participants in a distributed manner. Data exchange should be realized peer to peer and must include required metadata, e.g. for identification, authentication and authorization, but also the data contract or license.
A Participant in the data exchange should realize the interfaces to the ‘services’ or functionality mentioned above, but also the following functionalities internally:
-
User Management
-
(Trusted) Configuration Management
-
Data and Metadata Management
-
Monitoring
-
Policy Management including the Policy Enforcement (in terms of Access and Usage Polices)
-
Data App Management and Execution, i.e. the ability to execute remote code as code 2 data or the ability make use of standard software components in a data processing pipeline
Service Name | Mandatory or Optional | Communication Partners |
---|---|---|
Authentication | Mandatory | Based on the Gaia-X Trust Framework OR Participant 2 Participant |
Policy negotiation and contracting | Mandatory | Participant 2 Participant |
Catalog | Optional | Participant 2 Someone* providing a catalog (to be explained) |
Vocabularies | Optional | Participant 2 Someone* providing a Vocabulary hub (to be explained) |
Observability | Optional | Participant 2 Someone* providing a “Observer Facility” (to be explained) |
Apps | Optional | To be understand better before we define this |
Data Exchange protocols | One is required, but no mandatory protocol *2 | Participant 2 Participant |
User Management (internal) | Optional | Internal only |
Data and Metadata Management (internal) | Optional | Internal only |
Monitoring (internal) | Optional | Internal only |
Policy Management including the Policy Enforcement (internal) | Optional | Internal only |
Data App Management (internal) | Optional | Internal only |
* Someone must be explained: It has to be something and someone the Federations trusts in. A Trust Anchor.
*2 Specified by the negotiated contract.
2.4 Conceptual Model
Data are at the core of Data Exchange Services. Data are furnished by Data Producers (who are either data owners or data controllers in GDPR sense) to Data Providers who compose these data into a Data Product to be used by Data Consumers.
A Data Usage Consent, including usage terms and conditions associated with these data, is signed by both Data Producer and Data Provider, and give the Data Provider the legal authorization to use these data in accordance with the specified constraints. If a specific Data License is attached to the data (for instance when the data is liable to GDPR), then this co-signed Data Usage Consent constitutes a legal usage consent and must refer to the explicit license rights from the Data Licensor (data subject as per GDPR or data owner). Signed Data Usage Consents are notarized by a Trusted Consent Authority.
As all Gaia-X entities, Data Products are described by a Self-Description. This Self-Description is stored in a (searchable) Federated Data Catalog. Each Data Product Self-Description contains a Data License defining the usage policy for all data in this Data Product – it also contains other information related to billing, technical means, service level, etc. Hence a Data Product Self-Description constitutes a data usage contract template.
Before using a Data Product, the Data Consumer negotiate and co-sign a Data Product Usage Contract (DPUC) with the Data Provider. This Data Product Usage Contract is based on the Data Product Self-Description but may differ from the original one: the Data License of the Data Product Self-Description is sub-licensed, possibly after modification during the negotiations, by enforceable Terms of Usage contained in the Data Product Usage Contract. For each licensed data included in the Data Product, the Data Product Usage Contract must include an explicit Data Usage Consent signed by the corresponding Data Licensor (in case of data liable to GDPR, the signed Data Usage Consent must contain all information required by the Regulation).
The Data Product Usage Contract is a Ricardian contract: a contract at law that is both human-readable and machine-readable, cryptographically signed and rendered tamper-proof, verifiable in a decentralized fashion, and electronically linked to the subject of the contract, i.e., the data. The parties can (optionally) request this contract to be notarized in a federated Data Product Usage Contract Store.
After such contract has been agreed upon and has been signed by both parties, a Data Transmission from the Data Provider to the Data Consumer is can start, realizing the Data Product Usage Contract. The contract negotiation can lead to both parties agreeing on a Data Transmission Logging Service which is then used by both sides to log data transmission details. The logs might also include information needed for billing (inc. service level details) even if billing is outside Gaia-X perimeter.
The signature of the agreement or contract involving a Natural Person shall contains the certificate proving an interaction with a Natural Person by a mean described in the Trust Framework
Several entities of the Data Exchange conceptual model are specialization of entities of the general Gaia-X conceptual model they inherit the corresponding attributes / properties:
- a Data Producer
is a Resource Owner
,
- a Data Licensor
is a Licensor
,
- a Data Provider
is a Provider
,
- a Data Consumer
is a Consumer
,
- a Data Product
is a composed Service Offering
(i.e. a Service Composition
),
- a Data License
is a License Right
,
- a Term of Usage
is a subset of Terms and Conditions
,
- a Data Transmission
is a Service Instance
,
- a Federated Data Catalog
is a Federated Catalog
2.5 Operational Model
The generic basic operational model for data exchanges is quite simple:
- The Data Consumer queries the Federated Data Catalog and reviews the Data Product Self-Descriptions to select a Data Product that correspond to its needs.
- The Data Consumer configures the Data Product in terms of data scope and operational characteristics and starts negotiating with the Data Provider.
- When both find an agreement, they sign the configured Data Product Self-Description to create the Data Product Usage Contract (DPUC) and can optionally notarize it in a Federated Data Product Usage Contract Store.
- Then the instantiated Data Product can be activated resulting in actual Data Transmission.
When the Data Product includes some personal data, the GDPR impose that the data subject explicitly gives her/his consent for the data usage and that she/he can revoke this consent at any time. Accordingly, a Data Usage Consent has to be signed by the data subject, who acts as a Data Licensor, before the first data transmission and the Data Usage Consent validity has to be checked before each transmission. It’s recommended to establish and sign the Data Usage Consent during the negotiation phase between the Data Provider and the Data Consumer because, without this signed Data Usage Consent, the agreement between the parties (i.e. the Data Product Usage Contract) would be legally void. Note that the Consent may be signed by a guardian (for minor persons) or a third party through a specific or generic power of attorney or a specific legislation (for instance in case of sick persons in a hospital).
The process is similar for a Data Product for which the Data Owner did not only give an unconditional usage right to the Data Provider and wants to precisely know who will use her/his data and for which purpose.
If the Data Consumer has access to the Data Licensor, then it can directly request the Data Usage Consent – this is usually the case when the Data Consumer is using the data to provide a service to the Data Licensor (for instance when a sport training app get historical monitoring data from sport watch provider). This is the most convenient and simple case, and it provides better privacy (the Data Provider does not know what the data will be used for). Otherwise, the Data Usage Consent has to be collected by the Data Product Provider through the Data Producer.
In the first sub-case, the operational model is:
- The Data Consumer queries the Catalog and review the Data Product Self-Descriptions to select a Data Product that correspond to its needs.
- The Data Consumer configures the data product in terms of data scope and operational characteristics and starts negotiation with the Data Provider.
- The Data Consume extracts the Data Usage Consent from the Data Product Self-Description, fills it and adds its specific information (how the data will be used), usually in a separate section that will not be communicated to the Data Provider. This Data Usage Consent is sent to the Data Licensor who will sign it through a Trusted Consent Authority and send it back to the Data Consumer.
- The Data Provider and the Data Consumer close the negotiation, they sign the configured Data Product Self-Description to create the Data Product Usage Contract (DPUC) which include the appropriate part of the Data Usage Consent. They can notarize this Data Product Usage Contract in a Federated DPUC Store.
- Then the instantiated data product can be activated, and actual Data Transmission can be requested. Before each Data Transmission, the Data Provider has to check the validity of the Data Usage Consent through the Trusted Consent Authority.
The operational model for the second sub-case is the same except for step 3, where the Data Usage Consent is requested by the Data Product Provider through the Data Producer. Note that the Data Consumer will have to provide the purpose of the data transmission, and this will be included in the Data Usage Consent sent to the Data Licensor – this is part of the service configuration phase. Note also that the Data Producer has to counter-sign the Data Usage Consent to guarantee that the person who signed the consent is really the Data Licensor of the data.
Note: This version of the Data Exchange Services does not provide specific mechanisms for cascading data deletion. For data liable to GDPR, the data subject has to contact each Data Provider and each Data Consumer to whom she/he gave explicit Data Usage Consent. In later versions, specific support may be offered by the trusted Consent Authority for that.
Note: The above operational model minimizes the functionalities of the Trusted Consent Authority. It is also possible to imagine a slightly more complex Trust Consent Authority who received part of the Consent from the various parties and communicate only the relevant parts to the appropriate actors. Hence in the case 2b, the Data Provider will not know the purpose of the Data Consumer but only that the Data subject authorize transmission of the data while the Data Subject will have access to the Data Consumer Purpose. We might imagine a more complex Trusted Consent Authority able to compare predefined Usage Clauses and to advise the Data Licensor or even automatically grant consent on behalf of the Data Licensor. For instance, the Data Licensor might specify that she/he allow transmission of some medical data to non-profit research laboratories which have appropriate certificates in term of data security and data privacy. The Data Licensor would just receive a notification and will always be able to revoke the consent. That would enable a more agile data economy.
2.6 Example
Note : This is not the way that consent management is currently put in place for DSP2 services. The current way is considered as providing sub-optimal (in fact bad) user experience. Hence, we propose a mechanism which is more generic and provide a better user experience.
Building on top of the Personal Finance Management (PFM) example in the TAD: Jane is using a financial dashboard provided by a supplier that we will call PFM and which aggregates the financial transactions from Jane’s accounts in several banks that we will call Bank(i).
In that case, each Bank(i) is both Data Producer and Data Provider for Jane’s financial transactions. The Data Consumer is PFM.
Jane, as an EndUser, want to use PFM service myFinanceDashboard. Because sensitive personal information is involved, she has to sign a specific “contract” stating precisely how PFM is authorized to use Jane’s data, for instance: authorized for establishing her financial dashboard, transmission anybody else is prohibited except association between a payee and an expense category (i.e. payee X is a grocery, payee Y is a gas station, …). The DataUsageConsent is provided by PFM to Jane who will sign it through a digital identity and digital signature provider trusted by both Jane and PFM (not mandatory to be a Gaia-X participant).
After that, Jane will communicate her various bank accounts (IBAN) to PFM. PFM will use the Gaia-X catalog to find out the appropriate services from each bank – we will suppose for simplicity sake that they are all named GetTransactionFromIBAN. PFM will review each DataProductSelfDescription to ensure that it is compatible with their needs. As sensitive personal information is involved, Bank(i) will require a signed consent from Jane. The consent template is included in the DataProductSelfDescription. (Note : DataSpace will have to predefine standard consent templates in order to enable automatic and agile processing). PFM will fill in the template and send it to Jane for her to sigh it and send it back.
PFM and Bank(i) will then configure the GetTransactionFromIBAN service for Jane, include Janes signed Data Consent in the service contract (i.e. the updated DataProductSelfDescription) and co-sign it.
PFM can then get Jane’s transaction data, process this data, and provide the financial dashboard to Jane.
Let now suppose that PFM also deliver a loan brokering service. For that PFM provides some financial data, a credit profile, to credit institutions to receive credit proposals that they can rank and forward to Jane. In this case PFM is both DataConsumer from Bank(i) and DataProvider to credit institutions.
If Jane wants to use this service, she will first have to sign a new data consent to authorize PFM to communicate her credit profile (total income, loan capacity, purpose of the loan, …) to some credit institutions – normally the credit profile is anonymous and should not enable to identify Jane.
Then PFM will query the Federator Catalog to find credit institutions providing such online loan services and will review the corresponding Service Self-Description to check compliance with Jane consent and with PFM policy. With each selected Lender(i), PFM will negotiate, configure, and sign the service contract. The terms of usage will not include Jane’s consent because the data transmitted to Lender(i) is anonymized at that stage. They will include conditions specific to PFM and Lender(i) business, for instance: PFM is not authorized to communicate the credit proposal to other credit institutions for a given duration, PFM guarantees that to their knowledge Jane is a real customer and not a data aggregator, Lender(i) commits to prepare a proposal within x hours, Lender(i) will pay PFM some money if their offer is selected, etc.
PFM will then call the getLoanProposal service from lender(i). At that stage no data is transmitted except an anonymous request identifier. In order to prepare a credit proposal, Lender(i) will have to get the credit profile associated with that identifier. For that Lender(i) will get, from the federator catalog, the description of the getLoanRequestData service provided by PFM will configure it and co-sign it. The terms of usage will still not involve Jane consent but should include clauses at least as strong as those in Jane’s consent, for instance that the data shall be used only for the purpose of establishing a credit proposal and shall be deleted within 30 days if the proposal is not activated. Lender(i) will then get the data from PFM by activating the getLoanRequestData service. At that stage, PFM acts as a DataProvider and Lender(i) as a DataConsumer. Lender(i) will then prepare the loan proposal and make it available to PFM.
PFM will then collect the credit proposals, review and rank them to prepare a recommendation for Jane.
2.7 Policies for Data Exchange
Policies for data exchange shall reflect different aspects to specify terms and conditions for the data and the exchange of the data. Therefore, such policies have a different scope and concern.
- Contract Policies that are interoperable to be clearly and unambiguous as a basis for a contract between the participants. This contract policy should be machine and human readable. It must be able to contain access and usage policies. (ODRL)[https://www.w3.org/TR/odrl-model/] is a good candidate for a Policy Definition Language for this.
- Runtime Policies are derrived from the Contract Policies and are used for the execution of the contract policy in the system of the participants. Options for Policy Definition Languages for execution are Rego or XACML.
For the Data Exchange the focus is on (1) the Contract Policies. The contract is negotiated between the participants of the data exchange by making use of a contract negotiation sequence like specified by IDSA or GXFS-DE). The result is a signed contract between the 2 parties, which is a Self-Description of the data asset and the contract as verifiable credential.
The contract policies contain at least: 1. General description of the data asset, the involved parties and general terms 2. Access policies describing the requirements and rules for access to the data at the data provider sides 3. Usage policies as obligations for the data consumer sides 4. Signatures
Usage control is an extension to traditional access control. It is about the specification and enforcement of restrictions regulating what must (not) happen to data.
Thus, usage control is concerned with requirements that pertain to data processing (obligations), rather than data access (provisions). Usage control is relevant in the context of intellectual property protection, compliance with regulations, and, more generally, digital rights management.
Access control restricts access to resources. The term authorization is the process of granting permission to resources.
Resource owners define attribute-based access control policies for their endpoints and define the attribute values a subject must attest in order to grant access to the resource.
In contrast to access control, the overall goal of usage control is to enforce usage restrictions for data after access has been granted. Therefore, the purpose of usage control is to bind policies to data being exchanged. Following the specifications (extracted from IDSA Position Paper about Usage Control) are examples of policy classes:
- Allow the Usage of the Data (provides data usage without any restrictions)
- Interval-restricted Data Usage (provides data usage within a specified time interval)
- Duration-restricted Data Usage (allows data usage for a specified time period)
- Location Restricted Policy
- Perpetual Data Sale (Payment once)
- Data Rental (Payment frequently)
- Role-restricted Data Usage
- Purpose-restricted Data Usage Policy
- Restricted Number of Usages (allow data usage for n times)
- Security Level Restricted Policy (allow data access with a specified security level)
- Use Data and Delete it After (allows data usage within a specified time interval with the restriction to delete it at a specified time stamp)
- Attach Policy when Distribute to a Third-party
- Distribute only if Encrypted
Examples of the realization of the mentioned usage policy classes can be found here
To express and and execute the Contract Policies different information are required during runtime to evaluate the policies. To do so, at least 3 different information models are required:
- The generic Federation/Gaia-X policy data models for basic discovery and trust negotiation policies
- The per Federation/Industry specific data model which needs to be understood by all participants of the federation
- The per data contract/data asset specific data model which might be irrelevant for someone who does not receive the data but crucial for someone who has to understand the usage restrictions of a specific contract.
2.8 Ontologies for Data Exchange
- Self Description for Data Provider / Data Consumer (based on Trust Framework):
Attribute | Mandatory | Comment |
---|---|---|
parentOrganisation[] |
No | A list of direct participant that this entity is a subOrganization of, if any. |
name:String |
Yes | Name of Participant |
registrationNumber:String |
Yes | Country’s registration number which identifies one specific company |
LEI Code:String |
No | Unique LEI number as defined by https://www.gleif.org |
headquarterAddress:String |
Yes | Physical location of head quarter in ISO 3166-2 alpha2, alpha-3 or numeric format. |
headquarterAddress.street-address:String |
No | Street Address |
headquarterAddress.postal-code:String |
No | Postal Code |
headquarterAddress.region:String |
No | Region |
headquarterAddress.locality:String |
No | Locality |
headquarterAddress.country-name:String |
Yes | Country Name |
legalAddress:String |
Yes | Physical location of legal quarter in ISO 3166-1 alpha2, alpha-3 or numeric format. |
legalAddress.street-address:String |
No | Street Address |
legalAddress.postal-code:String |
No | Postal Code |
legalAddress.region:String |
No | Region |
legalAddress.locality:String |
No | Locality |
legalAddress.country:String |
Yes | Country |
- Self Description for Data Product (based on Trust Framework), extending the DCAT-3 Dataset class:
A Data Product
consists of the characterisation of the actual data as well a description of the contractual part. At minimum, this self-description needs to contain all information so that a consumer can initiate a contract negotiation
. All other attributes that are used to describe the Data are optional. However, the provider has an interest to precisely describe the data so that it can be found and consumed. If the data resource is published in a catalogue, the Data Provider
might precisely describe the Data Product
so that it can be found and consumed by Data Consumers
.
Attribute | Mandatory | Comment |
---|---|---|
providedBy:URI |
Yes | A resolvable link to the participant self-description providing the service. |
termsAndConditions:URI |
Yes | A resolvable link to the Terms and Conditions applying to that service. |
title:String |
Yes | Title of the Data Product |
description:String |
No | Description of the Data Product |
issuedDateTime:Datetime |
No | Publication date in ISO 8601 format |
obsoleteDateTime:Datetime |
No | Date time in ISO 8601 format after which data is obsolete. |
expirationDateTime:Datetime |
No | Ddate time in ISO 8601 format after which data is expired and shall be deleted. |
dataDomains:String[] |
No | List of Tags or Keywords (Unicode) for data domains |
exposedThrough[] |
Yes | A resolvable link to the data exchange component that exposes the data resource. |
policies[] |
No | a list of policy expressed using a DSL (e.g., Rego or ODRL) |
dataController |
No | Data controller Participant as defined in GDPR. |
consent[] |
No | List of consents from the data subjects as Natural Person when the dataset contains PII, as defined by the Trust Framework |
aggregationOf:String[] |
Yes | DataSet Content |
copyrightOwnedBy:String[] |
Yes | A list of copyright owner either as a free form string or participant self-description. |
license:String[] |
Yes | A list of URIs to license document. |
identifier:String |
Yes | Unique uuid4 |
distribution:String[] |
Yes | List of distributions format of the dataset |
distribution.title:String |
Yes | Filename of the dataset distribution |
distribution.mediaType:String |
Yes | Format of the dataset distribution (pdf, csv, …) |
distribution.byteSize:String |
Yes (for file based data product) | Size of the dataset distribution |
distribution.location[]:String |
Yes (for file based data product) | List of dataset storage location |
distribution.hash:String |
Yes (for file based data product) | To uniquely identify the data contained in the dataset distribution |
distribution.hash Algorithm:String |
Yes (for file based data product) | Hash Algorithm |
Consistency rules
- the keypair used to sign the Data Resource claims must be traceable to the
producedBy
participant of the Data Resource. - If the data are about data subjects as one or more Natural Persons, or sensitive data as defined in GDPR article 9, than
dataController
andconsent
are mandatory.
To avoid data re-identification, this rule applies independently if the data is raw, pseudo-anonymized or anonymized. (Note: This is on purpose beyond GDPR requirements.) - if
dataController
is specified, the keypair used to sign at least the Data Resourceconsent
claims must be traceable to thedataController
.