MedCAT Service Helm Chart

A Helm chart to deploy CogStack medcat-service

Installation

helm install medcat-service-helm oci://registry-1.docker.io/cogstacksystems/medcat-service-helm

Usage

For local testing, by default you can port forward the service using this command:

kubectl port-forward svc/medcat-service-helm 5000:5000

Then navigate to http://localhost:5000 to try the service. You can also use http://localhost:5000/docs to view the REST APIs

Configuration

To configure medcat service, create a values.yaml file and install with helm.

Model Pack

You should specify a model pack to be used by the service. By default it will use a small bundled model, which can be used for testing

Default: Use the demo model pack

There is a model pack already bundled into medcat service, and is the default in this chart.

This pack is only really used for testing, and has just a few concepts built in.

Recommended: Download Model on Startup

Enable MedCAT to download the model from a remote URL on container startup.

Create a values file like values-model-download.yaml and set these values:

model:
  downloadUrl: "http://localhost:9000/models/my-model.zip"

Use this if you prefer dynamic loading of models at runtime.

Advanced: Create a custom volume and load a model into it

The service can use a model pack if you want to setup your own download flow. For example, setup an initContainer pattern that downloads to a volume, then mount the volume yourself.

Create a persistent volume and PVC in kubernetes following the official documentation. Alternatively specifiy it in values.extraManifests and it will be created.
Create a values file like the following, which mounts the volume, and defines a custom init container.

env:
  APP_MEDCAT_MODEL_PACK: "/my/models/custom-model.zip"
volumeMounts:
  name: model-volume
  mountPath: /my/models

volumes:
- name: model-volume
  persistentVolumeClaim:
    claimName: my-custom-pvc
extraInitContainers:
 - name: model-downloader
   image: busybox:1.28
   # In this command, you can write custom code required to download a file. For example you could configure authentication.
   command: ["sh", "-c", "wget -O /my/models/custom-model.zip http://example.com"]
   volumeMounts:
    - name: model-volume
      mountPath: /my/models

DeID Mode

The service can perform DeID of EHRs by swithcing to the following values

env:
  APP_MEDCAT_MODEL_PACK: "/cat/models/examples/example-deid-model-pack.zip"
  DEID_MODE: "true"
  DEID_REDACT: "true"

GPU Support

To run MedCAT Service with GPU acceleration, use the GPU-enabled image and set the pod runtime class accordingly.

Note GPU support is only used for deidentification

Create a values file like values-gpu.yaml with the following content:

image:
  repository: ghcr.io/cogstack/medcat-service-gpu

runtimeClassName: nvidia

resources:
  limits:
      nvidia.com/gpu: 1
env:
  APP_CUDA_DEVICE_COUNT: 1
  APP_TORCH_THREADS: -1
  DEID_MODE: true

To use GPU acceleration, your Kubernetes cluster should be configured with the NVIDIA GPU Operator or the following components: - NVIDIA device plugin for Kubernetes - NVIDIA GPU Feature Discovery - The NVIDIA Container Toolkit

Test GPU support

You can verify that the MedCAT Service pod has access to the GPU by executing nvidia-smi inside the pod.

kubectl exec -it <POD_NAME> -- nvidia-smi

You should see the NVIDIA GPU device listing if the GPU is properly accessible.

Values

Key	Type	Default	Description
affinity	object	`{}`
autoscaling.enabled	bool	`false`
autoscaling.maxReplicas	int	`100`
autoscaling.minReplicas	int	`1`
autoscaling.targetCPUUtilizationPercentage	int	`80`
env.APP_ENABLE_DEMO_UI	bool	`true`
env.APP_ENABLE_METRICS	bool	`true`	Observability Env Vars
env.APP_ENABLE_TRACING	bool	`false`
env.APP_MEDCAT_MODEL_PACK	string	`"/cat/models/examples/example-medcat-v2-model-pack.zip"`	This defines the Model Pack used by the medcat service Example (download on startup): uncomment `ENABLE_MODEL_DOWNLOAD` and the `MODEL_*` URLs below. Example (DeID mode): uncomment `DEID_MODE`/`DEID_REDACT` and use the DeID model pack referenced below.
env.OTEL_EXPERIMENTAL_RESOURCE_DETECTORS	string	`"containerid,os"`
env.OTEL_EXPORTER_OTLP_ENDPOINT	string	`"http://<unused>:4317"`
env.OTEL_EXPORTER_OTLP_PROTOCOL	string	`"grpc"`
env.OTEL_LOGS_EXPORTER	string	`"none"`
env.OTEL_METRICS_EXPORTER	string	`"none"`
env.OTEL_PYTHON_FASTAPI_EXCLUDED_URLS	string	`"/api/health,/metrics"`
env.OTEL_RESOURCE_ATTRIBUTES	string	`"k8s.pod.uid=$(K8S_POD_UID),k8s.pod.name=$(K8S_POD_NAME),k8s.namespace.name=$(K8S_POD_NAMESPACE),k8s.node.name=$(K8S_NODE_NAME)"`
env.OTEL_SERVICE_NAME	string	`"medcat-service"`
env.OTEL_TRACES_EXPORTER	string	`"otlp"`
env.SERVER_GUNICORN_MAX_REQUESTS	string	`"100000"`	Set SERVER_GUNICORN_MAX_REQUESTS to a high number instead of the default 1000. Trust k8s instead to restart pod when needed. Example (tuning): see the commented `SERVER_GUNICORN_EXTRA_ARGS` setting below.
envValueFrom	object	`{"K8S_NODE_NAME":{"fieldRef":{"fieldPath":"spec.nodeName"}},"K8S_POD_NAME":{"fieldRef":{"fieldPath":"metadata.name"}},"K8S_POD_NAMESPACE":{"fieldRef":{"fieldPath":"metadata.namespace"}},"K8S_POD_UID":{"fieldRef":{"fieldPath":"metadata.uid"}}}`	Allow setting env values from field/configmap/secret references. Defaults to include k8s details for observability.
extraInitContainers	list	`[]`	Additional init containers to run before the main container. Can be templated
extraManifests	list	`[]`	Additional manifests to deploy to kubernetes. Can be templated
fullnameOverride	string	`""`
hostAliases	list	`[]`	Host aliases for the pod
image	object	`{"pullPolicy":"IfNotPresent","repository":"cogstacksystems/medcat-service"}`	This sets the container image more information can be found here: https://kubernetes.io/docs/concepts/containers/images/
image.pullPolicy	string	`"IfNotPresent"`	This sets the pull policy for images.
image.repository	string	`"cogstacksystems/medcat-service"`	Image repository for the MedCAT service container
imagePullSecrets	list	`[]`	This is for the secrets for pulling an image from a private repository more information can be found here: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
ingress.annotations	object	`{}`
ingress.className	string	`""`
ingress.enabled	bool	`false`
ingress.hosts[0].host	string	`"chart-example.local"`
ingress.hosts[0].paths[0].path	string	`"/"`
ingress.hosts[0].paths[0].pathType	string	`"ImplementationSpecific"`
ingress.tls	list	`[]`
livenessProbe.httpGet.path	string	`"/api/health/live"`
livenessProbe.httpGet.port	string	`"http"`
model	object	`{}`	Enable downloading of public models using wget on startup. Model will be downloaded to /models/ and used for APP_MEDCAT_MODEL_PACK Example: uncomment `model.downloadUrl` and `model.name` below to fetch a model pack at startup.
nameOverride	string	`""`	This is to override the chart name.
networkPolicy.egress.egressRules	list	`[]`	Append any custom egress rules following the standard format
networkPolicy.egress.enabled	bool	`false`	Choose to block egress by enabling it in the network policy
networkPolicy.enabled	bool	`true`	Choose to create a default network policy blocking all ingress other than to the service port.
nodeSelector	object	`{}`
podAnnotations	object	`{}`	This is for setting Kubernetes Annotations to a Pod. For more information checkout: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
podLabels	object	`{}`	This is for setting Kubernetes Labels to a Pod. For more information checkout: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
podSecurityContext	object	`{}`
readinessProbe.httpGet.path	string	`"/api/health/ready"`
readinessProbe.httpGet.port	string	`"http"`
replicaCount	int	`1`	This will set the replicaset count more information can be found here: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
resources	object	`{}`	Configure resources for the pod. More information can be found here: https://kubernetes.io/docs/concepts/containers/ Recommendation for a default production model is { requests: { cpu: 1, memory: 4Gi }, limits: { cpu: null , memory: 4Gi } }
runtimeClassName	string	`""`	Runtime class name for the pod (e.g., "nvidia" for GPU workloads) More information: https://kubernetes.io/docs/concepts/containers/runtime-class/
securityContext	object	`{}`
service.port	int	`5000`	This sets the ports more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#field-spec-ports
service.type	string	`"ClusterIP"`	This sets the service type more information can be found here: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
serviceAccount.annotations	object	`{}`	Annotations to add to the service account
serviceAccount.automount	bool	`true`	Automatically mount a ServiceAccount's API credentials?
serviceAccount.create	bool	`true`	Specifies whether a service account should be created
serviceAccount.name	string	`""`	The name of the service account to use. If not set and create is true, a name is generated using the fullname template
serviceMonitor	object	`{"enabled":false,"interval":"10s","labels":{},"path":"/metrics","scheme":"http","tlsConfig":{}}`	Create a Prometheus ServiceMonitor for the medcat service. Requires the Prometheus Operator to be installed Ensure APP_ENABLE_METRICS is set to true to expose the /metrics endpoint.
serviceMonitor.enabled	bool	`false`	Set to true to enable creation of a ServiceMonitor resource
serviceMonitor.interval	string	`"10s"`	Frequency at which Prometheus will scrape metrics.
serviceMonitor.labels	object	`{}`	Additional labels to be added to the ServiceMonitor
serviceMonitor.path	string	`"/metrics"`	HTTP path where metrics are exposed.
serviceMonitor.scheme	string	`"http"`	Scheme to use for scraping.
startupProbe.failureThreshold	int	`30`
startupProbe.httpGet.path	string	`"/api/health/ready"`
startupProbe.httpGet.port	string	`"http"`
startupProbe.initialDelaySeconds	int	`2`
startupProbe.periodSeconds	int	`10`
tolerations	list	`[]`
updateStrategy.type	string	`"RollingUpdate"`	Used for Kubernetes deployment .spec.strategy.type. Allowed values are "Recreate" or "RollingUpdate".
volumeMounts	list	`[]`	Additional volumeMounts on the output Deployment definition.
volumes	list	`[]`	Additional volumes on the output Deployment definition.

Autogenerated from chart metadata using helm-docs v1.14.2