AI Lifecycle
We take a holistic approach to AI development, ensuring a lifecycle that helps us align each process and outcome with our AI principles. We govern our AI lifecycle explicitly through all its stages: inception, development, verification and deployment. New AI models go through this process, and improvements to existing features get scoped and prioritised based on their performance measured during deployment. TechWolf does not apply continuous online learning, which means that any change to the AI models runs through the process as pictured below.
Inception
TechWolf defines the scope and intended use of an AI model during the inception and design phase. This process ensures that both internal and external requirements are explicitly listed, and that all possible risks are identified. TechWolf specifies the requirements for the system design at inception to ensure the right level of transparency. The success criteria and testing requirements are listed during this process.
Development
We put high emphasis on data quality, representativeness and validity in the development of AI models. We adhere to standard practices of data cleaning and logging all data artefacts for each model version. Data annotation is performed in-house, using carefully designed tooling and guidelines to ensure high quality.
Verification
Before models are released, they are tested in several ways during the verification phrase. This step includes technical testing (unit, integration and end-to-end tests) as well as evaluation against different datasets. This way, we can measure the overall accuracy (precision, recall and level of detail) of the model, probe for harmful bias, and assess the performance of the model in specific test cases.
TechWolf considers two types of updates:
- Minor updates - these are small changes, such as adding a small number of skills to the model or tweaking local properties, which keep the rest of the model stable. As the difference between versions is only noticeable in the targeted areas of improvement, these models are verified by the TechWolf team using automated tests, evaluation datasets and manual verification.
- Major updates - these are big upgrades, that can cause substantial shifts in outputs and results. For example, this could be an entirely new model architecture or algorithm. These bigger updates are grouped into milestone releases, which are communicated to the customer in advance. The customer gets the chance to test changes in development and acceptance environments, before being guided to the actual deployment.
By distinguishing between these two, we make sure that we can innovate continuously while keeping disruptive changes to a controlled minimum.
Deployment
The skill timeline allows customers to configure the weight of each data type in their deployment, to maximise the accuracy of skill inference. Additionally, the accuracy of the full system is being monitored continuously. Human feedback automatically informs our KPIs on skill profile completeness and accuracy. Negative feedback on skills is traced back to the types of data on which they were identified. This is aggregated into an overview of live model performance, and signals risk of model drift. The fairness of the system can be monitored live using our bias testing toolbox.
Evaluation
TechWolf’s AI team internally keeps track of an overarching AI health map, which provides a detailed view of key aspects including the quality, fairness, explainability and maintainability of more than 30 AI components in the product. This framework is informed by the measures taken in the verification step, and the continuous monitoring in deployment. Our infrastructure gives us insight into the accuracy of outputs, as well as reflecting sudden or gradual changes in AI behaviour, making sure actions can be taken when needed.
The AI health map is used to identify risks and opportunities, which drive the priorities for investment of our AI resources.
Customer data
TechWolf never trains on customer data without explicit permission. Customers can contribute data to improve and deepen the model’s understanding of their business area. Each customer environment is monitored for anomalies and performance gaps, which when detected can trigger a deeper investigation into improving aspects of our product. In this scenario, customers can be requested to voluntarily provide data to help drive improvements.