Glossary¶
Active Integration¶
A so called Active Integration is a software component which consumes differences between Actual State and the Target State and acts in order to close the gap between them. As an example this could be to initiate the process of creating a stream behind a Data Product where there is no stream linked to the offer.
Actual State¶
The Actual State data collection describes the actual observed state of the Streaming System instrumented by the State Agent or other parties observing and sending to the SDM Cloud API. The Actual State data gets enriched with runtime information about it e.g. metrics. Together with textual documentation this is the central entry point for a novel data analysis or innovation on existing streaming data.
Agents¶
In general agents act nearby the Streaming System in order to have direct access. They are working as contiguous runtime observing either the streaming platform and reporting metadata to the SaaS SDM or to act on behalf of the platform according their understanding of differences between the Actual State and the Target State (see Active Integrations and Hooks).
API¶
The Application Programming Interface is a set of definitions and methods to access a system. The API is a form of communication and standardization between developers and manages expectations between loosely coupled teams and systems.
Auth¶
Abbreviation for Authenthication (prove the identity of a user) AND Authorization (allow or deny certain actions of a user).
Authentication¶
Verification of the identity of an Authorization Subject.
Authorization¶
Synonyms: Access Control
Deny or allow certain action of an Authorization Subject in general. Authorization is performed through a decision function taking a set of Authorization Policies and a corresponding context into account.
Authorization Policy¶
Rules for Authorization decisions like: user1 is allowed to create a new Data Product.
Policies are defined through Authorization Subject -> Authorization Privilege -> Resource Match.
Authorization Privilege¶
Synonyms: Action
A specific action to test against an Authorization Policy, like create a new Data Product.
Authorization Subject¶
Synonyms: User
A user or system on behalf of a user acting towards a Resource
CLI¶
The CLI (Command Line Interface) is a low level User Interface interacting with some services. The SDM CLI is a low level user interface for all Logistics related actions.
Consumer¶
The implementation component which consumes data streams.
Container Node¶
The Container Node is a Node in a Topology which is able to cluster Data Source Nodes, Data Sink Nodes or other Container Nodes (as a composite) together. A Container Node is used for building Topology hierarchies - doesn't matter if it's an architectural hierarchy or anything else.
Contract¶
The result of a formal valid agreement formed from expectations and promises between a Data Product and Data Subscription.
The contract entity is not yet available as entity.
Data Dictionary¶
The documentation and context around the Data Schema of data flowing through a Transport Channel.
Data Offer¶
Original name for a Data Product.
The designation Data Offer is still used for the technical implementation.
Data Product¶
Synonyms: Data Offer, offer, DO, Datenangebot, Angebot
A Data Product is a Target State representation of a Messaging or Streaming API where a Data Subscription can subscribe to and consumes data from. Every Data Product has an Owner who is responsible for the offering.
It's an exposed DataPort with additional meta-data, like owner. The owner of the Data Product (technically called DataOffer) is responsible for guaranteeing the quality to the DataPort for any DataSubscription.
Formerly Data Offer
Data Port¶
Formerly Data Offer State.
Technical representation of the data endpoint (how to access the data in a meaningful manner). Very dependent on the transport.
Data Profile¶
A Data Profile summarizes the characteristics of data flowing through a transport channel. In contrast to a Data Schema a data profile is a snapshot of the real data which is taken. On this snapshot one analyzes what fields the data items may have and of which value domains they are. On the profile which is a small sample of the data the Data Profiler provides statistics for example if a field has entropy or not or if and how a field correlates with another.
Data Profiler¶
The profiler is responsible to gather samples (snapshots) from a stream and analyze them. The resulting report is called Data Profile. From the profiler we can get a feeling how the data in a stream looks like, what fields are in there and what domains this fields may have. We can see statistical distributions and have an idea about correlating fields. Stream inspection is not always possible and desirable therefore this is optional see Stream Inspection.
Data Schema¶
The data schema describes the structure of of data flowing through a Transport Channel. A schema contains things like names of data fields, value domains, if a certain field has to be set or not.
Data Sink Node¶
The Data Sink Node is a Node in a Topology where the Data is going to (directed edge). The counterpart is a Data Source Node where the Data is coming from. A Data Sink Node might wrap any Resource with the ability to consume data. Most prominent example of a wrapped resource in a Data Sink Node is a Data Subscription State.
Data Source Node¶
The Data Source Node is a Node in a Topology where the Data is coming from (directed edge). The counterpart is a Data Sink Node where the Data is going to. The Data Source Node might wrap any Resource with the ability to produce data. Most prominent example of a wrapped resource in a Data Source Node is a Data Port.
Data Subscription¶
Synonyms: subscription, DS
A Data Subscription defines the Target State subscription to a Messaging or Streaming API, in our terms to a Data Product. It defines the Consumer side flow of the data.
Data Subscription State¶
Represents the Actual State of a Data Subscription.
Hook¶
The possibility to act on certain changes in SDM with customer or integration specific implementations like notifications, triggering other actions or implementing your own streaming pipeline Orchestration.
IAM¶
In Identity and Access Management we define who is authorized to access what. This has to be applied for all interactions with all the systems so that we know which user is allowed to do what but also which service is allowed to act how.
Integration Agent¶
Implements an integration according differences between Target State and Actual State.
Integrations¶
Integrations are ways to interact with systems outside of the SDM. This can be a trigger or Hook or a link in the Web UI. We can even have integrations using the SDM Cloud API pushing data to the state form an unknown external (streaming) system. We distinguish between Active Integrations which involves a software acting in order to achieve the goal and Passive Integrations where we link applications or calling web-hooks without the need of an implementation.
Logging¶
Messages from one or several services or components to understand the behavior either by humans or machines. Mostly human readable unstructured text with additional information like system, severity and timestamp.
Logistics¶
Central service for storing the Target State (plan) and the Actual State (current). All core entities are managed in logistics.
Looker¶
SDM Looker is responsible for data and transport observability. It synchronizes Data Profile Jobs and stores the data profiles.
No Data¶
The system defined tag no-data
indicates that for a specified Data Port, we did not yet receive any data for creating a profile or schema.
No Valid Data¶
The system defined tag no-valid-data
indicates that for a specified Data Port, we did not receive data, that we could parse.
Metrics¶
Measures for quantitative assessments. Mostly a Timestamp / Number pair. Examples might be bytes received per hour
or amount of messages sent per hour
or current speed
or height of a person
.
Observability¶
Observability includes the ability to search and detect patterns you want to act or react on. It often includes Logging, Metrics and Tracing .
On-Premise¶
Is installed and runs on computers on the premises of the person or organization using the software, rather than at a remote facility.
Orchestration¶
This describes the automatic configuration, coordination and the management of computer systems. In our context it could mean that Active Integrations would make sure that components follow the Target State. In this way the SDM would be the source of truth for orchestrating this components. If we allow the user to change the Target State and we have orchestration we are able to provide Self-Service.
Owner¶
The owner represents a person or group of people (Team) responsible for the entity. For example on a Data Product this means the responsible people behind the data and Data Product. The owner should be contacted in order to get more information, promises and contracts on top of the defined entity.
Passive Integration¶
The way SDM integrates with existing systems out of the browser session of a user. This means we can link other systems like monitoring and management systems or schema registry to provide insights on a certain context for the user. A passive integration does not require an active software component in order to integrate. The SDM web ui will pass the required parameters to the target system in order to have the right context.
Producer¶
The implementation component which produces a data flow into the streaming application.
Resource¶
Every core entity is a resource. This includes Data Product and Data Port, Data Subscription and Data Subscription State, Transport. Resources can be grouped together (distinct) by a Resource Group
Resource Group¶
Every resource is assigned to a Resource Group. A Resource Group is distinct and can be nested. Resource Groups are a place to group resources together for defining defaults, permissions and accounting.
Resource ID¶
Every Resource has a unique identifier within an SDM Cloud deployment. It's usually a UUID, but can be also an email address.
Resource Match¶
A match against Resources.
Resource Path¶
A Resource Path is a tree (folder structure e.g. /org1/finance/
). Every Resource belongs to one Resource Path.
Resource Type¶
Resource type is the name of the resource kind, like DataPort
for Data Port. Note that for Data Products, the resource type is DataOffer
.
SDM¶
An ecosystem of tools and services around streaming applications. SDM simplifies the interaction with complex streaming systems with its tools.
SDM Agent¶
Synonyms: State agent
The SDM Agent represents several services which are running next to a Transport and other integrated services (like IAM, Self-Service, ERP, ...) to act and react towards all integration points through the SDM Cloud API.
SDM Cloud¶
Synonyms: SaaS SDM
The SDM Cloud is the heart of managing Actual States and Target States, expectations and and promises. Therefore it provides APIs for integrations through SDM Agents and a Web User Interface for managing Data Products and other entities.
SDM Cloud API¶
The main API to with from an SDM Agents or Hooks perspective. The SDM Cloud API provides a gRPC and a GraphQL API.
Self-Service¶
The self service allows users to manage Data Subscriptions on their own behalf. With self service we support fast innovation since there are no complex long running processes to access data.
Stream Inspection¶
Inspect the data flowing through a stream. This is useful to understand the data with samples. The Data Profiler uses stream inspection in order to analyze data samples. Stream inspection needs full access to the stream and for this reason is not applicable to all streams.
Streaming API¶
An API to consume or produce data in a streaming (continuos) manner.
Streaming System¶
System which provides streaming data and supports application and data operation with streams.
Tags¶
Tags are buckets represented with short descriptive informational texts. There are user defined tags, that can be set on any Data Product and Data Subscription. Additionally to those, there are also system defined tags. no-data and no-valid-data are currently the only two system defined tags.
Target State¶
Synonyms: Planned State, Plan
The Target State data collection defines expectations for several domain items the user wants to observe. With this expectations defined, the system can hand out difference information between the Actual State and the Target State in order to trigger actions towards the system (Self-Service) or notification for operational staff. Different implementations of Integration Agents can process the differences individually and act on behalf of the users desire.
Team¶
A team represents a group of people which share the Ownership of entities.
Topology¶
A Topology is defined through Nodes and Edges (Graph) with additional information. Nodes are e.g. Data Source Node, Data Sink Node or Container Node.
A Topology can serve multiple business needs, like:
- Lineage for Governance/Audit purposes (e.g. detect P2 issues or classification mismatches)
- Notification/Collaboration of upstream changes to downstream applications owners
- Cost calculation and risk assessment down- and upstream.
- birds view, clustering of Nodes or Links to form hierarchical view/information layers
Tracing¶
Analyzing the data or process flow. Nowadays used also in distributed or event driven systems, especially to prove a functionality or to debug a data flow.
Transport¶
A framework or technology for transporting data in a streaming or event driven manner. Most famous ones are Kafka and RabbitMQ.
Web User Interface¶
The web user interface provides a human readable, intuitive and nicely presented access point for users.