MedCAT Service Helm Chart
A Helm chart to deploy CogStack medcat-service
Installation
helm install medcat-service-helm oci://registry-1.docker.io/cogstacksystems/medcat-service-helm
Usage
For local testing, by default you can port forward the service using this command:
kubectl port-forward svc/medcat-service-helm 5000:5000
Then navigate to http://localhost:5000 to try the service. You can also use http://localhost:5000/docs to view the REST APIs
Configuration
To configure medcat service, create a values.yaml file and install with helm.
Model Pack
You should specify a model pack to be used by the service. By default it will use a small bundled model, which can be used for testing
Default: Use the demo model pack
There is a model pack already bundled into medcat service, and is the default in this chart.
This pack is only really used for testing, and has just a few concepts built in.
Recommended: Download Model on Startup
Enable MedCAT to download the model from a remote URL on container startup.
Create a values file like values-model-download.yaml and set these values:
model:
downloadUrl: "http://localhost:9000/models/my-model.zip"
Use this if you prefer dynamic loading of models at runtime.
Advanced: Create a custom volume and load a model into it
The service can use a model pack if you want to setup your own download flow. For example, setup an initContainer pattern that downloads to a volume, then mount the volume yourself.
-
Create a persistent volume and PVC in kubernetes following the official documentation. Alternatively specifiy it in
values.extraManifestsand it will be created. -
Create a values file like the following, which mounts the volume, and defines a custom init container.
env:
APP_MEDCAT_MODEL_PACK: "/my/models/custom-model.zip"
volumeMounts:
name: model-volume
mountPath: /my/models
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: my-custom-pvc
extraInitContainers:
- name: model-downloader
image: busybox:1.28
# In this command, you can write custom code required to download a file. For example you could configure authentication.
command: ["sh", "-c", "wget -O /my/models/custom-model.zip http://example.com"]
volumeMounts:
- name: model-volume
mountPath: /my/models
DeID Mode
The service can perform DeID of EHRs by swithcing to the following values
env:
APP_MEDCAT_MODEL_PACK: "/cat/models/examples/example-deid-model-pack.zip"
DEID_MODE: "true"
DEID_REDACT: "true"
GPU Support
To run MedCAT Service with GPU acceleration, use the GPU-enabled image and set the pod runtime class accordingly.
Note GPU support is only used for deidentification
Create a values file like values-gpu.yaml with the following content:
image:
repository: ghcr.io/cogstack/medcat-service-gpu
runtimeClassName: nvidia
resources:
limits:
nvidia.com/gpu: 1
env:
APP_CUDA_DEVICE_COUNT: 1
APP_TORCH_THREADS: -1
DEID_MODE: true
To use GPU acceleration, your Kubernetes cluster should be configured with the NVIDIA GPU Operator or the following components: - NVIDIA device plugin for Kubernetes - NVIDIA GPU Feature Discovery - The NVIDIA Container Toolkit
Test GPU support
You can verify that the MedCAT Service pod has access to the GPU by executing nvidia-smi inside the pod.
kubectl exec -it <POD_NAME> -- nvidia-smi
You should see the NVIDIA GPU device listing if the GPU is properly accessible.
Values
| Key | Type | Default | Description |
|---|---|---|---|
| affinity | object | {} |
|
| autoscaling.enabled | bool | false |
|
| autoscaling.maxReplicas | int | 100 |
|
| autoscaling.minReplicas | int | 1 |
|
| autoscaling.targetCPUUtilizationPercentage | int | 80 |
|
| env.APP_ENABLE_DEMO_UI | bool | true |
|
| env.APP_ENABLE_METRICS | bool | true |
Observability Env Vars |
| env.APP_ENABLE_TRACING | bool | false |
|
| env.APP_MEDCAT_MODEL_PACK | string | "/cat/models/examples/example-medcat-v2-model-pack.zip" |
This defines the Model Pack used by the medcat service Example (download on startup): uncomment ENABLE_MODEL_DOWNLOAD and the MODEL_* URLs below. Example (DeID mode): uncomment DEID_MODE/DEID_REDACT and use the DeID model pack referenced below. |
| env.OTEL_EXPERIMENTAL_RESOURCE_DETECTORS | string | "containerid,os" |
|
| env.OTEL_EXPORTER_OTLP_ENDPOINT | string | "http://<unused>:4317" |
|
| env.OTEL_EXPORTER_OTLP_PROTOCOL | string | "grpc" |
|
| env.OTEL_LOGS_EXPORTER | string | "none" |
|
| env.OTEL_METRICS_EXPORTER | string | "none" |
|
| env.OTEL_PYTHON_FASTAPI_EXCLUDED_URLS | string | "/api/health,/metrics" |
|
| env.OTEL_RESOURCE_ATTRIBUTES | string | "k8s.pod.uid=$(K8S_POD_UID),k8s.pod.name=$(K8S_POD_NAME),k8s.namespace.name=$(K8S_POD_NAMESPACE),k8s.node.name=$(K8S_NODE_NAME)" |
|
| env.OTEL_SERVICE_NAME | string | "medcat-service" |
|
| env.OTEL_TRACES_EXPORTER | string | "otlp" |
|
| env.SERVER_GUNICORN_MAX_REQUESTS | string | "100000" |
Set SERVER_GUNICORN_MAX_REQUESTS to a high number instead of the default 1000. Trust k8s instead to restart pod when needed. Example (tuning): see the commented SERVER_GUNICORN_EXTRA_ARGS setting below. |
| envValueFrom | object | {"K8S_NODE_NAME":{"fieldRef":{"fieldPath":"spec.nodeName"}},"K8S_POD_NAME":{"fieldRef":{"fieldPath":"metadata.name"}},"K8S_POD_NAMESPACE":{"fieldRef":{"fieldPath":"metadata.namespace"}},"K8S_POD_UID":{"fieldRef":{"fieldPath":"metadata.uid"}}} |
Allow setting env values from field/configmap/secret references. Defaults to include k8s details for observability. |
| extraInitContainers | list | [] |
Additional init containers to run before the main container. Can be templated |
| extraManifests | list | [] |
Additional manifests to deploy to kubernetes. Can be templated |
| fullnameOverride | string | "" |
|
| hostAliases | list | [] |
Host aliases for the pod |
| image | object | {"pullPolicy":"IfNotPresent","repository":"cogstacksystems/medcat-service"} |
This sets the container image more information can be found here: https://kubernetes.io/docs/concepts/containers/images/ |
| image.pullPolicy | string | "IfNotPresent" |
This sets the pull policy for images. |
| image.repository | string | "cogstacksystems/medcat-service" |
Image repository for the MedCAT service container |
| imagePullSecrets | list | [] |
This is for the secrets for pulling an image from a private repository more information can be found here: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ |
| ingress.annotations | object | {} |
|
| ingress.className | string | "" |
|
| ingress.enabled | bool | false |
|
| ingress.hosts[0].host | string | "chart-example.local" |
|
| ingress.hosts[0].paths[0].path | string | "/" |
|
| ingress.hosts[0].paths[0].pathType | string | "ImplementationSpecific" |
|
| ingress.tls | list | [] |
|
| livenessProbe.httpGet.path | string | "/api/health/live" |
|
| livenessProbe.httpGet.port | string | "http" |
|
| model | object | {} |
Enable downloading of public models using wget on startup. Model will be downloaded to /models/model.downloadUrl and model.name below to fetch a model pack at startup. |
| nameOverride | string | "" |
This is to override the chart name. |
| networkPolicy.egress.egressRules | list | [] |
Append any custom egress rules following the standard format |
| networkPolicy.egress.enabled | bool | false |
Choose to block egress by enabling it in the network policy |
| networkPolicy.enabled | bool | true |
Choose to create a default network policy blocking all ingress other than to the service port. |
| nodeSelector | object | {} |
|
| podAnnotations | object | {} |
This is for setting Kubernetes Annotations to a Pod. For more information checkout: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/ |
| podLabels | object | {} |
This is for setting Kubernetes Labels to a Pod. For more information checkout: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ |
| podSecurityContext | object | {} |
|
| readinessProbe.httpGet.path | string | "/api/health/ready" |
|
| readinessProbe.httpGet.port | string | "http" |
|
| replicaCount | int | 1 |
This will set the replicaset count more information can be found here: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/ |
| resources | object | {} |
Configure resources for the pod. More information can be found here: https://kubernetes.io/docs/concepts/containers/ Recommendation for a default production model is { requests: { cpu: 1, memory: 4Gi }, limits: { cpu: null |
| runtimeClassName | string | "" |
Runtime class name for the pod (e.g., "nvidia" for GPU workloads) More information: https://kubernetes.io/docs/concepts/containers/runtime-class/ |
| securityContext | object | {} |
|
| service.port | int | 5000 |
This sets the ports more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#field-spec-ports |
| service.type | string | "ClusterIP" |
This sets the service type more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types |
| serviceAccount.annotations | object | {} |
Annotations to add to the service account |
| serviceAccount.automount | bool | true |
Automatically mount a ServiceAccount's API credentials? |
| serviceAccount.create | bool | true |
Specifies whether a service account should be created |
| serviceAccount.name | string | "" |
The name of the service account to use. If not set and create is true, a name is generated using the fullname template |
| serviceMonitor | object | {"enabled":false,"interval":"10s","labels":{},"path":"/metrics","scheme":"http","tlsConfig":{}} |
Create a Prometheus ServiceMonitor for the medcat service. Requires the Prometheus Operator to be installed Ensure APP_ENABLE_METRICS is set to true to expose the /metrics endpoint. |
| serviceMonitor.enabled | bool | false |
Set to true to enable creation of a ServiceMonitor resource |
| serviceMonitor.interval | string | "10s" |
Frequency at which Prometheus will scrape metrics. |
| serviceMonitor.labels | object | {} |
Additional labels to be added to the ServiceMonitor |
| serviceMonitor.path | string | "/metrics" |
HTTP path where metrics are exposed. |
| serviceMonitor.scheme | string | "http" |
Scheme to use for scraping. |
| startupProbe.failureThreshold | int | 30 |
|
| startupProbe.httpGet.path | string | "/api/health/ready" |
|
| startupProbe.httpGet.port | string | "http" |
|
| startupProbe.initialDelaySeconds | int | 2 |
|
| startupProbe.periodSeconds | int | 10 |
|
| tolerations | list | [] |
|
| updateStrategy.type | string | "RollingUpdate" |
Used for Kubernetes deployment .spec.strategy.type. Allowed values are "Recreate" or "RollingUpdate". |
| volumeMounts | list | [] |
Additional volumeMounts on the output Deployment definition. |
| volumes | list | [] |
Additional volumes on the output Deployment definition. |
Autogenerated from chart metadata using helm-docs v1.14.2