Data Handling

Key to any data-driven approach is to have clarity around the data and how it is handled. This page gives an answer to several data related questions.

Supported Data Types and Sources

The Skill Engine has the ability to process both structured and unstructured data - that means that it is compatible with a wide range of data sources. Below, you can find a table with possible data source types (and some examples of these), as well as the data that can typically be used from them. The list is non-exhaustive, so get in touch if you have questions about other options!

Source Example Data Types
HR Information System (e.g. SAP SuccessFactors, Oracle HCM Cloud, Workday…) Employee resume, Employee working history, Employee education, Employee location, Team & division information
Learning Management System or Learing Experience Platform (e.g. Degreed, Docebo, Cornerstone…) Available courses, Employee course history, Employee learning goals
Knowledge Base or Project Management Tool (e.g. Confluence, Jira, Microsoft Teams…) Issue descriptions, Articles written by employees
Communication Tool (e.g. Yammer, email…) Messages exchanged

The accepted format is detailed in the API Specification - typically, a minimal connector layer (as described on the architecture page) is used to bridge the gap between your systems and the Skill Engine API.

In addition to this fixed format, the Skill Engine API also allows you to leverage your metadata through the custom properties system - this way, you can filter and analyze Employees, Vacancies, and Courses based on any variable you want. For example, you can use this system to compare the skills of entire teams, see the distribution of a specific Skill Cluster over years of service or age... Metadata is allowed as a string (for categorial types) or as a number (for continuous types and quantities).

Data Processing

The Skill Engine processes documents through a process called skill extraction (visualized below). In this process, the Skill Engine reads through both structured and unstructured data, interpreting the skills connected to it and aggregating them into structured skill profiles. If skills are the atoms, you could see skill profiles as the molecules, with the bonds between the atoms representing the interaction between skills. Skill extraction is carried out through artificial intelligence, depending on state-of-the-art language models. While accurately representing the skillset of people, jobs, and courses, skill profiles are anonymous.

schematic representation of skill extraction

Stored Data

The Skill Engine stores your data in a logically separated data store.

Data stored is defined by a crucial principle: only the information that is needed to provide results downstream is retained. For example, resume files, which typically contain a wide range of sensitive personal information, are dropped immediately after their skill profile has been extracted, leaving only a pseudonymized profile. The same holds for any other unstructured data ( e.g. documents), which are never stored inside the Skill Engine.

The metadata of an entity is retained until either the entity or the custom property containing the metadata is removed, since the metadata must be available for flexible queries.