Skip to content

Serverless Limitations ​

Controller Limitations ​

Function Controller does not serve time-critical requests from users. It reconciles Function custom resources (CR), stored at the Kubernetes API Server, and has no persistent state on its own.

Function Controller doesn't serve Functions using its allocated runtime resources. It delegates this work to the dedicated Kubernetes workloads. Refer to the architecture diagram for more details.

Having this in mind, also remember that Function Controller does not require horizontal scaling. It scales vertically up to 1Gi of memory and 500m of CPU time.

Namespace Setup Limitations ​

Be aware that if you apply LimitRanges in the target namespace where you create Functions, the limits also apply to the Function workloads and may prevent Functions from being run. In such cases, ensure that resources requested in the Function configuration are lower than the limits applied in the namespace.

Limitation for the Number of Functions ​

There is no upper limit of Functions that you can run on Kyma. Once you define a Function, Pods are always requested by Function Controller. It's up to Kubernetes to schedule them based on the available memory and CPU time on the Kubernetes worker nodes. This is determined mainly by the number of the Kubernetes worker nodes (and the node auto-scaling capabilities) and their computational capacity.

Runtime Phase Limitations ​

NOTE

All measurements were taken on Kubernetes with three Azure worker nodes of type Standard_D2s_v5 (two vCPU amd64 cores, ~8 GiB memory), distributed across availability zones westeurope-1, westeurope-2, and westeurope-3, running Garden Linux 1877.10 with kernel 6.12.66-cloud-amd64 and Kubernetes v1.34.3.

The values in the tables below are averages from three test runs. Last updated: 2026-04-02.

Functions serve user-provided logic wrapped in the web framework, Express for Node.js and Bottle for Python. Taking the user logic aside, those frameworks have limitations and depend on the selected runtime profile and the Kubernetes nodes specification.

The following tables present the response times of the selected runtime profiles for a "Hello World" Function across three load scenarios. This describes the overhead of the serving framework itself. Any user logic added on top of that adds extra milliseconds and must be profiled separately.

Tests are implemented using k6 and consist of the following scenarios:

  • Constant load — 50 virtual users send one request per second each (with a 1-second sleep between calls) for 2 minutes. Represents a steady, moderate traffic baseline.
  • Max load — 100 virtual users send requests as fast as possible (no sleep) for 2 minutes. Represents sustained high concurrency.

Constant load ​

Node.js 22 ​

response time [ms]XSSMLXL
median1.71.41.30.91.2
95 percentile4.94.44.43.74.8
99 percentile9361221212

Node.js 24 ​

response time [ms]XSSMLXL
median1.51.61.31.31.8
95 percentile3.74.13.43.24.1
99 percentile13127.25.47.7

Python 3.12 ​

response time [ms]XSSMLXL
median2.52.43.73.43.7
95 percentile167.69.89.29.4
99 percentile14747213020

Max load ​

Node.js 22 ​

response time [ms]XSSMLXL
median10497501815
95 percentile3002041045728
99 percentile3902931566940

Node.js 24 ​

response time [ms]XSSMLXL
median86128.26.86.3
95 percentile10093652317
99 percentile191102733127

Python 3.12 ​

response time [ms]XSSMLXL
median902699384137119
95 percentile1157846397213163
99 percentile15401100585300228

The bigger the runtime profile, the more resources are available to serve the response quicker. Consider these limits of the serving layer as a baseline because this does not take your Function logic into account.

Scaling ​

Function runtime Pods can be scaled horizontally from zero up to the limits of the available resources at the Kubernetes worker nodes. See the Use External Scalers tutorial for more information.