A so called Active Integration is a software component which consumes differences between Actual State and the Target State and acts in order to close the gap between them. As an example this could be to initiate the process of creating a stream behind a Data Offer where there is no stream linked to the offer.
The Actual State data collection describes the actual observed state of the Streaming System instrumented by the State Agent or other parties observing and sending to the SDM Cloud API. The Actual State data gets enriched with runtime information about it e.g. metrics. Together with textual documentation this is the central entry point for a novel data analysis or innovation on existing streaming data.
In general agents act nearby the Streaming System in order to have direct access. They are working as contiguous runtime observing either the streaming platform and reporting metadata to the SaaS SDM or to act on behalf of the platform according their understanding of differences between the Actual State and the Target State (see Active Integrations and Hooks).
The Application Programming Interface is a set of definitions and methods to access a system. The API is a form of communication and standardization between developers and manages expectations between loosely coupled teams and systems.
Verification of the identity of an Authorization Subject.
Synonyms: Access Control
Deny or allow certain action of an Authorization Subject in general. Authorization is performed through a decision function taking a set of Authorization Policies and a corresponding context into account.
Rules for Authorization decisions like: user1 is allowed to create a new Data Offer.
A specific action to test against an Authorization Policy, like create a new Data Offer.
A user or system on behalf of a user acting towards a Resource
The CLI (Command Line Interface) is a low level User Interface interacting with some services. The SDM CLI is a low level user interface for all Logistics related actions.
The implementation component which consumes data streams.
The Container Node is a Node in a Topology which is able to cluster Data Source Nodes, Data Sink Nodes or other Container Nodes (as a composite) together. A Container Node is used for building Topology hierarchies - doesn't matter if it's an architectural hierarchy or anything else.
The contract entity is not yet available as entity.
Synonyms: offer, DO, Datenangebot, Angebot
A Data Offer is a Target State representation of a Messaging or Streaming API where a Data Subscription can subscribe to and consumes data from. Every Data Offer has an Owner who is responsible for the offering.
It's an exposed DataPort with additional meta-data, like owner. The owner of the DataOffer is responsible for guaranteeing the quality to the DataPort for any DataSubscription.
Formerly Data Offer State.
Technical representation of the data endpoint (how to access the data in a meaningful manner). Very dependent on the transport.
A Data Profile summarizes the characteristics of data flowing through a transport channel. In contrast to a Data Schema a data profile is a snapshot of the real data which is taken. On this snapshot one analyzes what fields the data items may have and of which value domains they are. On the profile which is a small sample of the data the Data Profiler provides statistics for example if a field has entropy or not or if and how a field correlates with another.
The profiler is responsible to gather samples (snapshots) from a stream and analyze them. The resulting report is called Data Profile. From the profiler we can get a feeling how the data in a stream looks like, what fields are in there and what domains this fields may have. We can see statistical distributions and have an idea about correlating fields. Stream inspection is not always possible and desirable therefore this is optional see Stream Inspection.
The data schema describes the structure of of data flowing through a Transport Channel. A schema contains things like names of data fields, value domains, if a certain field has to be set or not.
Data Sink Node¶
The Data Sink Node is a Node in a Topology where the Data is going to (directed edge). The counterpart is a Data Source Node where the Data is coming from. A Data Sink Node might wrap any Resource with the ability to consume data. Most prominent example of a wrapped resource in a Data Sink Node is a Data Subscription State.
Data Source Node¶
The Data Source Node is a Node in a Topology where the Data is coming from (directed edge). The counterpart is a Data Sink Node where the Data is going to. The Data Source Node might wrap any Resource with the ability to produce data. Most prominent example of a wrapped resource in a Data Source Node is a Data Port.
Synonyms: subscription, DS
Data Subscription State¶
The possibility to act on certain changes in SDM with customer or integration specific implementations like notifications, triggering other actions or implementing your own streaming pipeline Orchestration.
In Identity and Access Management we define who is authorized to access what. This has to be applied for all interactions with all the systems so that we know which user is allowed to do what but also which service is allowed to act how.
Integrations are ways to interact with systems outside of the SDM. This can be a trigger or Hook or a link in the Web UI. We can even have integrations using the SDM Cloud API pushing data to the state form an unknown external (streaming) system. We distinguish between Active Integrations which involves a software acting in order to achieve the goal and Passive Integrations where we link applications or calling web-hooks without the need of an implementation.
Messages from one or several services or components to understand the behavior either by humans or machines. Mostly human readable unstructured text with additional information like system, severity and timestamp.
SDM Looker is responsible for data and transport observability. It synchronizes Data Profile Jobs and stores the data profiles.
The system defined tag
no-data indicates that for a specified Data Port, we did not yet receive any data for creating a profile or schema.
No Valid Data¶
The system defined tag
no-valid-data indicates that for a specified Data Port, we did not receive data, that we could parse.
Measures for quantitative assessments. Mostly a Timestamp / Number pair. Examples might be
bytes received per hour or
amount of messages sent per hour or
current speed or
height of a person.
Is installed and runs on computers on the premises of the person or organization using the software, rather than at a remote facility.
This describes the automatic configuration, coordination and the management of computer systems. In our context it could mean that Active Integrations would make sure that components follow the Target State. In this way the SDM would be the source of truth for orchestrating this components. If we allow the user to change the Target State and we have orchestration we are able to provide Self-Service.
The owner represents a person or group of people (Team) responsible for the entity. For example on a Data Offer this means the responsible people behind the data and Data Offering. The owner should be contacted in order to get more information, promises and contracts on top of the defined entity.
The way SDM integrates with existing systems out of the browser session of a user. This means we can link other systems like monitoring and management systems or schema registry to provide insights on a certain context for the user. A passive integration does not require an active software component in order to integrate. The SDM web ui will pass the required parameters to the target system in order to have the right context.
The implementation component which produces a data flow into the streaming application.
Every resource is assigned to a Resource Group. A Resource Group is distinct and can be nested. Resource Groups are a place to group resources together for defining defaults, permissions and accounting.
A match against Resources.
A Resource Path is a tree (folder structure e.g.
/org1/finance/). Every Resource belongs to one Resource Path.
Resource type is the name of the resource kind, like
DataOffer for Data Offer.
An ecosystem of tools and services around streaming applications. SDM simplifies the interaction with complex streaming systems with its tools.
Synonyms: State agent
The SDM Agent represents several services which are running next to a Transport and other integrated services (like IAM, Self-Service, ERP, ...) to act and react towards all integration points through the SDM Cloud API.
Synonyms: SaaS SDM
The SDM Cloud is the heart of managing Actual States and Target States, expectations and and promises. Therefore it provides APIs for integrations through SDM Agents and a Web User Interface for managing Data Offers and other entities.
SDM Cloud API¶
The self service allows users to manage Data Subscriptions on their own behalf. With self service we support fast innovation since there are no complex long running processes to access data.
Inspect the data flowing through a stream. This is useful to understand the data with samples. The Data Profiler uses stream inspection in order to analyze data samples. Stream inspection needs full access to the stream and for this reason is not applicable to all streams.
An API to consume or produce data in a streaming (continuos) manner.
System which provides streaming data and supports application and data operation with streams.
Tags are buckets represented with short descriptive informational texts. There are user defined tags, that can be set on any Data Offer and Data Subscription. Additionally to those, there are also system defined tags. no-data and no-valid-data are currently the only two system defined tags.
Synonyms: Planned State, Plan
The Target State data collection defines expectations for several domain items the user wants to observe. With this expectations defined, the system can hand out difference information between the Actual State and the Target State in order to trigger actions towards the system (Self-Service) or notification for operational staff. Different implementations of Integration Agents can process the differences individually and act on behalf of the users desire.
A team represents a group of people which share the Ownership of entities.
A Topology can serve multiple business needs, like:
- Lineage for Governance/Audit purposes (e.g. detect P2 issues or classification mismatches)
- Notification/Collaboration of upstream changes to downstream applications owners
- Cost calculation and risk assessment down- and upstream.
- birds view, clustering of Nodes or Links to form hierarchical view/information layers
Analyzing the data or process flow. Nowadays used also in distributed or event driven systems, especially to prove a functionality or to debug a data flow.
A framework or technology for transporting data in a streaming or event driven manner. Most famous ones are Kafka and RabbitMQ.
Web User Interface¶
The web user interface provides a human readable, intuitive and nicely presented access point for users.