Last updated 3 weeks ago

Data Handling

Key to any data-driven approach is to have clarity around the data and how it is handled. This page gives an answer to several data related questions.

Supported Data Types and Sources

The Skill Engine has the ability to process both structured and unstructured data - that means that it is compatible with a wide range of data sources. Below, you can find a table with possible data source types (and some examples of these), as well as the data that can typically be used from them. The list is non-exhaustive, so get in touch if you have questions about other options!

Source Example Data Types
HR Information System (e.g. SAP SuccessFactors, Oracle HCM Cloud, Workday…) Employee resume, Employee working history, Employee education, Employee location, Team & division information
Learning Management System or Learing Experience Platform (e.g. Degreed, Docebo, Cornerstone…) Available courses, Employee course history, Employee learning goals
Knowledge Base or Project Management Tool (e.g. Confluence, Jira, Microsoft Teams…) Issue descriptions, Articles written by employees
Communication Tool (e.g. Yammer, email…) Messages exchanged

The accepted formats are detailed in the API Specification - typically, a minimal connector layer (as described on the architecture page) is used to bridge the gap between your systems and the Skill Engine API.

In addition to this data, the Skill Engine API also allows you to leverage your metadata through the custom properties system - this way, you can filter and analyse Employees, Vacancies and Courses based on any variable you want. For example, you can use this system to compare the skills of different teams, see the distribution of a competency over years of service or age... Metadata is allowed as a string (for categorial types) or as a number (for continuous types and quantities).

Data Processing

The Skill Engine processes documents through a process called skill extraction (visualised below). In this process, the Skill Engine reads through both structured and unstructured data, interpreting the skills connected to it and aggregating them into structured skill profiles. If skills are the atoms, you could see skill profiles as the molecules, with the bonds between the atoms representing the interaction between skills. Skill extraction is carried out through artificial intelligence, depending on state-of-the-art language models. While accurately representing the skillset of people, jobs and courses, skill profiles are anonymous, with no personal information included in them.

Stored Data

The Skill Engine API stores your data in a logically separated data store.

Data stored is defined by a crucial principle: only things that are needed to provide results downstream are retained. For example, resume files, which typically contain a wide range of sensitive personal information, are dropped immediately after their skill profile has been extracted, leaving only this pseudonymised profile. The same holds for any other unstructured data (e.g. documents), which are never stored inside the system.

Metadata of an entity is retained until either the entity or the custom property containing the metadata is removed, since the metadata needs to be available for flexible queries.