Cogstack Services Tutorial¶
This is a step-by-step walkthrough that shows how to call two CogStack services over HTTP: MedCAT (entity extraction) and AnonCAT (de-identification).
Overview¶
Who it is for:¶
This is for developers, data engineers, and analysts who want a quick, practical example of how to integrate MedCAT/AnonCAT into a Python workflow (and later into a notebook-based analysis).
What it will do:¶
- Define a sample clinical sentence and the service URLs.
- Extract Entities, by calling the medcat-service API
- Print the extracted entity annotations from the MedCAT response.
- Deidentify text by calling the anoncat-service API
- Print the de-identified text (and show the full JSON response for inspection).
Prerequisites¶
The best way to run this notebook interactively is to run the CogStack Community Edition with Helm. Look at https://docs.cogstack.org/ to get started.
Initialisation: Define the inputs and services¶
We pick a single sample sentence and the two HTTP endpoints we will call.
The sample sentence contains concepts that the example demo packs used by medcat service have been trained for.
For the service URLs, if using the cogstack community edition helm chart, these should all be setup for you automatically using kubernetes services and env vars. Otherwise change these accordingly.
import json
import os
import requests
sample_text = "John was diagnosed with Kidney Failure"
medcat_base_url = os.getenv(
"MEDCAT_SERVICE_URL", "http://cogstack-medcat-service:5000"
).rstrip("/")
anoncat_base_url = os.getenv(
"ANONCAT_SERVICE_URL", "http://cogstack-ce-anoncat-service:5000"
).rstrip("/")
medcat_url = medcat_base_url + "/api/process"
anoncat_url = anoncat_base_url + "/api/process"
Perform Named Entity Resolution by calling MedCAT service¶
We can now use medcat service to extract entities from our note.
We will send sample_text to MedCAT’s /api/process route where the payload is shaped as: {"content": {"text": sample_text}}
We can then parse the JSON response and pull out: medcat_result.get("result").get("annotations").
medcat_payload = {"content": {"text": sample_text}}
medcat_result = requests.post(medcat_url, json=medcat_payload).json()
medcat_annotations = medcat_result.get("result").get("annotations")
print("=== MedCAT: entities ===")
print(json.dumps(medcat_annotations, indent=2))
=== MedCAT: entities ===
[
{
"0": {
"pretty_name": "Kidney Failure",
"cui": "1",
"type_ids": [
"T047"
],
"source_value": "Kidney Failure",
"detected_name": "kidney~failure",
"acc": 1,
"context_similarity": 1,
"start": 24,
"end": 38,
"id": 0,
"meta_anns": {},
"context_left": [],
"context_center": [],
"context_right": []
}
}
]
From the above results, we can see that the service has detected "Kidney Failure" in the text with a cui of "1".
We can see the raw JSON response from medcat by printing it
print("=== MedCAT Service: Raw results ===")
print(json.dumps(medcat_result, indent=2))
=== MedCAT Service: Raw results ===
{
"medcat_info": {
"service_app_name": "MedCAT",
"service_language": "en",
"service_version": "2.4.0.dev0",
"service_model": "unknown",
"model_card_info": {
"ontologies": "None",
"meta_cat_model_names": [],
"rel_cat_model_names": [],
"model_last_modified_on": "2025-07-14T12:36:10.286051"
}
},
"result": {
"text": "John was diagnosed with Kidney Failure",
"annotations": [
{
"0": {
"pretty_name": "Kidney Failure",
"cui": "1",
"type_ids": [
"T047"
],
"source_value": "Kidney Failure",
"detected_name": "kidney~failure",
"acc": 1,
"context_similarity": 1,
"start": 24,
"end": 38,
"id": 0,
"meta_anns": {},
"context_left": [],
"context_center": [],
"context_right": []
}
}
],
"success": true,
"timestamp": "2026-03-18T16:08:44.595+00:00",
"elapsed_time": 0.003085773,
"footer": null
}
}
Perform deidentificaition by using AnonCAT sercice¶
We can also use AnonCat service to deidentify our notes.
The process for this is the same as medcat, the only difference is we are will call a different endpoint. We will send sample_text to MedCAT’s /api/process route where the payload is shaped as: {"content": {"text": sample_text}}.
We can then parse the JSON response and pull out the text which should be anonymised
anoncat_payload = {"content": {"text": sample_text}}
anoncat_result = requests.post(anoncat_url, json=anoncat_payload).json()
deidentified_text = anoncat_result.get("result").get("text")
print("=== AnonCAT: Deidentification result ===")
print(f"The input was '{sample_text}'. The output was '{deidentified_text}'")
=== AnonCAT: Deidentification result === The input was 'John was diagnosed with Kidney Failure'. The output was '[PATIENT] diagnosed with Kidney Failure'
From the above result, we can see that it has found that the note had the name "John", which it's replaced with the placeholder [PATIENT]. This has anonymised the note. Note we could alternatively change the service to redact the text, and return [***], which we can do by configuring the service values and redeploying.
We can see the raw JSON response from medcat by printing it. Note that it is the same format as medcat, just instead of finding medical concepts, it has found the patient name.
print("=== AnonCAT: Deidentification result ===")
print(json.dumps(anoncat_result, indent=2))
=== AnonCAT: Deidentification result ===
{
"medcat_info": {
"service_app_name": "MedCAT",
"service_language": "en",
"service_version": "2.4.0.dev0",
"service_model": "unknown",
"model_card_info": {
"ontologies": [],
"meta_cat_model_names": [],
"rel_cat_model_names": [],
"model_last_modified_on": "2025-08-15T15:14:34.047031"
}
},
"result": {
"text": "[PATIENT] diagnosed with Kidney Failure",
"annotations": [
{
"0": {
"pretty_name": "PATIENT",
"cui": "PATIENT",
"type_ids": [],
"source_value": "John was",
"detected_name": "",
"acc": 0.9922866225242615,
"context_similarity": 0.9922866225242615,
"start": 0,
"end": 8,
"id": 0,
"meta_anns": {},
"context_left": [],
"context_center": [],
"context_right": []
}
}
],
"success": true,
"timestamp": "2026-03-18T16:09:41.266+00:00",
"elapsed_time": 0.011446122,
"footer": null
}
}
Summary¶
This is the end of this tutorial.
We can see by calling the model services, we are able to get entities and deidentify text just by calling two http APIs.
What next?¶
There's two options of where to go next:
- Setup a data pipeline, that can call these services and write results into OpenSearch
- Use MedCAT Trainer and setup a MLOps flow for training a model, and redeploying the services with the new model.