What Is Runtime Security | Palo Alto Networks

4 min. read

Runtime security refers to the continuous, end-to-end monitoring and validation of all activity within containers, hosts, and serverless functions. It works by leveraging application control and allowlisting to establish a baseline of normal behavior for each host, container, serverless function, and other objects within a cloud-native environment. Through real-time observation of file systems, processes, and network activity, runtime security tools detect suspicious or anomalous activity and alert teams as needed.

Real-time security monitoring, of course, isn’t new. For almost two decades, security information and event management (SIEM) platforms have been monitoring application environments for anomalies. So what’s different about cloud-native runtime security?

The difference centers on the environment.

Runtime Security for Modern Applications

By automating security for fast-moving, dynamic applications like those that run in containers, runtime security addresses the unique security and compliance needs of cloud-native environments.

Cloud-native runtime security operates in environments moving so fast that baselines, in the traditional sense, don’t exist. When clusters change as nodes come offline or containers spin up and down (or load balancers redirect traffic between instances. etc.), conventional data sources like logs and network traffic are incapable of detecting anomalies.

Runtime security in the cloud-native environment works on a deeper level, establishing a dynamic baseline by interpreting how behavioral trends vary over time. From here, runtime security tools can detect changes in internal container processes, file system activity, and so on, that deviate from the norm — even within environments that rapidly scale.

Put another way, runtime defense is the set of features that provide predictive and threat-based active protection for rapidly changing environments.

  • Predictive protection includes capabilities like determining when a container creates an unexpected network socket or runs a process not included in the origin image.
  • Threat-based protection includes capabilities like detecting when malware is added to a container or when a container connects to a botnet.

Models and Rules: Understanding Runtime Security

Runtime security focuses on safeguarding containers during their execution — when they’re active, operational containers and most vulnerable to malicious activity. Traditional security tools weren’t designed to monitor running containers.

  • The cloud-native runtime environment is unique.
  • Runtime security is the only way to secure cloud-native applications at scale.

Using AI and machine learning, runtime security automates the process of modeling healthy activity.

Modeling refers to the process of creating a representation of normal, safe behavior for applications and services running in a cloud-native environment. This representation, or model, serves as a baseline to identify and detect deviations or anomalies that might indicate security threats.

By continuously monitoring and comparing the runtime activities of applications and services against the established model, teams can identify and respond to unauthorized actions, privilege escalations, and other potential incidents.

A runtime security solution like Prisma Cloud implements individual sensors for file system, network, and process activity, each with a unique set of rules and alerting. The unified runtime defense architecture simplifies the administrator experience and provides detail about what the solution learns from each image. Within this framework, runtime defense consists of two main object types — models and rules.

Container Models

Models are generated from the autonomous learning of a container runtime security solution and represent the allowed activities for a given container image across all runtime sensors. They offer administrators an overview of what the system has learned about their images. An Apache image model, for example, would specify the processes that should run within the container and the exposed network sockets.

Models are built from static analysis, like hashing process maps based on Dockerfile ENTRYPOINT scripts, and dynamic behavioral analysis, like observing actual process activity during early container runtime. Models can be in active, archived, or learning mode.

Modeling Capabilities

Some containers, like Jenkins containers, are difficult to model due to their dynamic nature. A container runtime security solution can automatically detect known containers and enhance the model with capabilities, tuning runtime behaviors for specific apps and configurations without changing the learned model.

Learning Mode

Learning mode is when the container runtime security solution performs static or dynamic analysis. Images stay in learning mode for one hour, followed by a 24-hour "dry run" period to ensure model completeness. If behavioral changes are observed during the dry run, the model returns to learning mode for an additional 24 hours. During learning mode, only threat-based runtime events are logged.

Active Mode

Active mode is when the container runtime security solution enforces the model and looks for anomalies that violate it. Active mode begins after the learning mode's 1-hour period. The system monitors for variances against the model, such as unexpected processes.

Archived Mode

Archived mode occurs when a container no longer actively runs a model. Models persist in archived mode for 24 hours before removal. Archived mode serves as a recycle bin for models, ensuring that frequently starting and stopping images don't need to re-enter learning mode.

Rules

Rules control how a container runtime security solution uses autonomously generated models to protect an environment. They allow or block activities by sensor and are evaluated together with models to create a resultant policy:

model + allowed activity from rule(s) - blocked activity from rule(s) = resultant policy

For example, if a model allows the httpd process and you want to ensure the bar process is allowed while the foo process is blocked, you can create a rule for all httpd images, add bar to the allowed process list, and add foo to the blocked process list.

Via models and rules, a runtime protection solution automatically learns how applications behave under different conditions. Users can then distinguish normal shifts in application behavior from those that reflect a security problem.

Components of Container Runtime Security

Identifying new vulnerabilities in running containers relies on knowing what normal looks like — even in dynamic environments. With dozens of microservices to manage and hundreds of containers, serverless functions, and VMs hosting them, teams don’t have time to manually collect behavioral data and configure behavior models. Organizations must leverage enhanced runtime protection capable of identifying and investigating suspicious activities potentially indicating zero-day attacks.

Control Application Behavior

In addition to modeling safe behavior, runtime defenses should automatically define and enforce allowed and disallowed actions for each container, serverless function, or objects in the environment. This includes determining which other containers a given container can communicate with and the type of communication allowed, as well as specifying which data storage volumes can access it. Enforcing these rules is essential for limiting the impact of a potential security breach.

Send Meaningful Alerts

Runtime security tools need to automate defenses and alert your team when manual intervention is required. To achieve this, they should monitor and send alerts for suspicious changes in processes, network connections, or file system read/writes within cloud-native infrastructure. They must also be able to decide whether to send an alert based on dynamic alert rules. Static alerting rules are insufficient for addressing the evolving nature of cloud-native threats, given that activity appearing threatening at one moment may prove benign at another.

Anatomy of the container attack surface

Figure 1: Anatomy of the container attack surface

Integrate with Other Security Solutions

Runtime security represents only one layer of defense that should exist within your organization’s cloud-native security tech stack. Particularly when working with highly distributed, containerized microservices, you’ll want your runtime protection to integrate with security solutions addressing the additional layers of your ecosystem.

Automated data security protections, access control, auditing tools, container image scanners, and so on, are equally important. Your runtime security solution must be able to integrate with other security tools to provide full depth and context for incidents, as well as an understanding of how a threat at one layer of your tech stack (like the runtime environment) impacts another (like data at rest).

Detect Incidents in Real Time

Although runtime security is capable of mitigating the impact of a breach after it occurs, your runtime solution will ideally allow you to find and remediate threats in real time, before they have an opportunity to escalate.

Limit the Blast Radius, Prevent the Breach

By delivering control over file systems, processes, and network activity for each container and serverless function, your runtime security solution should mitigate damage that could result if a security breach occurred within the environment. It should automatically model application-safe behavior and enforce rules that prevent dangerous activity on the container or host, ultimately preventing situations such as a compromised container executing processes that spread to other containers or the host.

Enable Incident Response

Incident response hinges on the data collected by your runtime security solution. By capturing and storing audit data for cloud-native applications, it provides teams with the information needed to understand what went wrong in the wake of an incident, even if the cloud-native environment no longer exists in its earlier form when the investigation occurs.

Best Practices for Optimal Runtime Security

Runtime security best practices serve to safeguard applications and infrastructure from runtime threats. By implementing proactive measures, organizations can minimize vulnerabilities, detect malicious activities, and limit the impact of security breaches.

End-to-End Runtime Coverage

Monitoring only part of your environment or focusing on only key services or infrastructure isn’t enough to detect all security threats. For optimum results, apply runtime security to all layers of your environment and use it to protect both development and production workloads.

Unique Resource Treatment

Because every host, container instance, and serverless function in your cloud-native environment has a unique configuration and behavior you should model each object separately. Don’t assume all containers will behave the same — not even those based on a common container image. Operating from a sweeping assumption will lead to a sampled approach that limits visibility into security incidents.

System Call Monitoring and Filtering Techniques

At the core of container runtime security is the monitoring and filtering of system calls made by processes within containers. System calls act as an interface between applications and the operating system kernel, allowing applications to request resources or services. By monitoring and controlling these calls, organizations can detect and prevent unauthorized actions, privilege escalations, and other malicious activities.

Falco is an open-source runtime security tool that monitors system calls and network activity, detecting and alerting on suspicious behavior. Also open source, Seccomp filters and restricts system calls, providing granular control over the actions of processes in containers.

Comprehensive Vulnerability and Malware Scanning

Regularly scanning containers for known vulnerabilities and malware during runtime is essential in identifying and addressing security risks. Continuous scanning ensures that organizations can detect newly discovered vulnerabilities and take appropriate action to secure their container environments.

Employ a runtime scanning solution that can detect unknown vulnerabilities and malicious code execution. Additionally, consider integrating threat intelligence feeds to stay updated on the latest threats and vulnerabilities affecting container environments.

Advanced Network Segmentation and Traffic Monitoring

Incorporate advanced network segmentation and traffic monitoring techniques by utilizing tools like Cilium or Calico to enforce network policies and enable microsegmentation. Leverage service mesh technologies, such as Istio or Linkerd, to encrypt container-to-container communication and implement fine-grained access controls. Use network monitoring and analysis tools to capture and analyze container traffic, facilitating the detection of anomalies and potential security threats.

Figure 2: Web application and API security (WAAS)

Figure 2: Web application and API security (WAAS)

Compliance and Auditing

Implement and maintain compliance for Docker, Kubernetes, and Linux CIS Benchmarks, as well as external compliance regulations and custom requirements. Remember to consider that, by default, Kubernetes APIs offer various easy privilege escalation routes. In a multitenant cluster, using certain features can introduce instability, so proceed cautiously when deploying them.

Policy Engines

Policy engine management solutions like Kyverno and OpenPolicyAgent (OPA), or a CSPM like Prisma Cloud, help ensure that containers adhere to policies aligned with standards like PCI DSS, HIPAA, GDPR, ISO 27001:2013, and NIST. Custom policies can also be created to enforce organizational standards.

Policy use cases detect a myriad of activities, including account hijacking attempts, backdoor activity, network data exfiltration, unusual protocol, and DDoS activity. Once a threat is detected, an alert is generated, notifying administrators of the issue so that they can respond quickly. Many policies map to the MITRE ATT&CK Enterprise IaaS Matrix, providing a comprehensive roadmap for securing your cloud assets.

Audit Checks

Implement a regular auditing process that scans all layers of your Kubernetes cluster and configurations to ensure they align with industry standards and best practices. Audits won’t necessarily detect threats in real time, but they will help you stay ahead of security problems or misconfigurations you may be overlooking that could give attackers an entry point to your cluster or applications.

Monitoring and Logging

Implementing monitoring and logging solutions for container activities enables organizations to detect and respond to security incidents in real-time, mitigating potential threats and facilitating incident response. Tools like Grafana, Jaeger, Prisma Cloud, and Prometheus provide visibility into container performance and health, enabling proactive management. Key metrics include cluster state, node status, pod availability, memory, disk, and CPU utilization. Monitoring helps identify configuration issues and ensures that containers meet business needs.

Monitoring Level Metrics Description

Cluster

Cluster Nodes

Measure how many nodes are available, which helps determine the cloud resources required to run the cluster.

Cluster Pods

Measure how many pods are running to help determine if you have sufficient nodes available to handle your overall workload in the event of a node failure.

Resource Utilization

Measure the computing resources utilized by your nodes, including memory, CPU, bandwidth, and disk utilization. 

Pod

Container Metrics

Monitor network utilization, CPU, and memory usage. These metrics, held up to DevOps-prescribed maximum values, determine if pods are running as designed. 

Application Metrics

These metrics are application-specific and based on business use cases, for example, the number of concurrent users accessing the application, number of entries published or purged, user experience, etc.

Kubernetes Scaling and Availability Metrics

By monitoring the orchestration tool and how it handles a specific pod, you can see the number of pod instances at a given moment (compared to the expected number). These metrics will provide health checks of pods and applications, network data and on-progress deployments.

Table 6: Strategic runtime metrics

With metrics, teams can understand whether microservices or individual container-based applications are running as expected and meeting desired business needs through scale-out or scale-in automation and analytics based on expected traffic.

Reviewing metrics also proves beneficial when considering horizontal scale-out approaches for container-based applications, microservices, and security-based products like Palo Alto Networks CN-Series firewalls. Having an effective monitoring strategy in place ensures higher uptime for services with minimal degradation and performance issues.

Additionally, understanding resource consumption, service configurations, and usage helps reduce operational and development costs. This insight can assist in daily operations efforts and gauging CI/CD pipeline health.

When selecting a monitoring and logging solution, keep in mind the metrics you’d like to observe. Many tools have the capacity to address a range of reporting for a multitude of applications and integrations.

Monitoring Kubernetes Clusters and Nodes

Cluster resource usage

Is the cluster infrastructure underutilized?

Is the cluster infrastructure over capacity?

Project and team chargeback

 

Node availability and health

Do we have enough nodes available to replicate the applications?

Will we run out of resources?

Monitoring Kubernetes Deployments and Pods

Missing and failed pods

Are all the necessary pods running for each of the applications or microservices?

How many pods are dead or crashing?

Running vs. desired instances

How many instances for each microservice is actually ready?

What is the expected number of microservices meant to be ready?

Pod resource usage against requests and limits

Is the pod's resource usage within the configured CPU and memory requests and limits?

What is the expected number of microservices meant to be ready?

Monitoring Kubernetes Applications


Application availability

Is the application responding?

Application health and performance

How many requests are we seeing?

What is the responsiveness or latency for this application?

Do we have any errors?

Table 7: Additional metrics to consider, depending on use cases aligning with your organizational needs.

Incident Response and Forensics

In the event of a security incident, container runtime security tools can provide valuable data for investigation and remediation. This includes logs, system calls, and other forensic evidence that can help to identify the source of the attack and prevent future occurrences.

Container Escape Prevention

Container escape is a significant threat during runtime. It occurs when an attacker breaches a container's isolation, accessing the host system. Preventing this requires minimizing container privileges and avoiding critical mount points. Following best practices like CIS benchmarks for Docker and Kubernetes is essential.

Adopt a Defense-in-Depth Strategy

Employing a defense-in-depth approach to container security by implementing multiple layers of protection, including runtime security, image scanning, network segmentation, and host security helps organizations build a resilient security posture.

For instance, in addition to container network security via containerized next-generation firewalls, container runtime protection can serve as another layer of security to block malware. Runtime protection can also incorporate web application and API security to prevent HTTP-based Layer 7 attacks, such as the OWASP Top 10, denial of service (DoS), or bots.

Adopt a Holistic Approach

Container security must be addressed as part of a holistic enterprise cloud security strategy. While it’s tempting to add yet another security tool to the arsenal, addressing container and cloud security separately tends to leave organizations blind to risks that an otherwise integrated strategy would address. Mature organizations see containers as an essential component of their cloud infrastructure and address them with a centralized platform approach, typically leveraging a CNAPP.

If your security team is reactively focused on securing your applications during runtime, take a step back and consider the entire development and deployment process. While it's crucial to ensure the end state (runtime) is secure, concentrating solely on runtime security may cause you to overlook vulnerabilities or early-stage security issues that will likely repeat with a narrow approach.

By working backward, you can evaluate and address security concerns throughout the entire development lifecycle, from design and coding to testing and deployment. The holistic strategy will help you identify and fix issues before they become problems in the runtime environment, reducing the chances of repeating the same security issues.

At-a Glance Runtime Security Checklist

  • Continuously scan applications for real-time threat detection to prevent or arrest attacks.
  • Utilize a container runtime security solution to scan containers for known vulnerabilities and provide remediation recommendations.
  • Monitor container behavior for abnormal activity.
  • Run containers with low-privilege users, following the principle of least privilege.
  • Study data from logs, system calls, and other forensics to identify the source of an attack and prevent future occurrences.
  • Integrate container security solutions into CI/CD pipelines.
  • Stay up to date with the latest threats and vulnerabilities via continuous monitoring and integration of threat intelligence feeds.

Runtime Security FAQs

Process isolation is a security mechanism that keeps processes separate and independent from each other to prevent them from interfering or compromising one another. In cloud environments, it's crucial for maintaining the integrity of multiple applications running on the same physical hardware. Process isolation ensures that a process in one container can’t access or modify the processes in another container, providing a fundamental layer of security against malicious activities and vulnerabilities in multitenant environments.
System call filtering is a security technique used to restrict the set of system calls a process can execute. It acts as a control mechanism to limit the kernel-level operations that applications can perform, reducing the risk of kernel exploits. By filtering system calls, you can prevent potentially dangerous behaviors, ensuring that applications only perform actions they're explicitly allowed to.
Seccomp (Secure Computing Mode) is a Linux kernel feature that enables a process to make a one-way transition into a "secure" state where it cannot make any system calls except those explicitly allowed. It's used to restrict the capabilities of a process, reducing the risk of kernel-level security breaches. Seccomp is essential in container environments, as it limits the impact of a compromised container by restricting its access to the host kernel.
AppArmor (Application Armor) is a Linux kernel security module that confines programs to a limited set of resources, based on per-program profiles. It restricts programs' capabilities with file system access control and other restrictions, enhancing system security. AppArmor is used to prevent applications from performing unauthorized actions, effectively containing potential damage from exploits and limiting the scope of access and execution capabilities of applications.
SELinux (Security-Enhanced Linux) is a security architecture for Linux systems that provides a mechanism for supporting access control security policies. It includes mandatory access controls (MAC) that enforce the separation of information based on confidentiality and integrity requirements. SELinux allows administrators to define how each process and system user can interact with each other and with files, devices, and other resources.
Capability-based security is a concept where access and privileges within a system are assigned based on capabilities. A capability is a communicable, unforgeable token of authority that a process or user possesses, granting them the ability to perform specific actions. By focusing on the explicit rights that an entity has, rather than on the entity's identity, capability-based security is effective in environments requiring fine-grained access control.
The Container Runtime Interface (CRI) is a standard API in Kubernetes that enables integration of container runtimes with the kubelet, the primary node agent in Kubernetes. CRI consists of a protocol buffers and gRPC API, allowing various container runtimes to be plugged into Kubernetes. This standardization ensures that Kubernetes can work with a range of container runtimes — including Docker, containerd, and CRI-O — without needing custom integration for each.
Microsegmentation is a network security technique that involves dividing a network into distinct security segments down to the individual workload level. Each segment has its own set of access controls and security policies, isolating workloads from one another. Key to preventing lateral movement within a network, microsegmentation provides granular security controls and reduces the attack surface within cloud environments.
Intrusion detection and prevention systems (IDPS) are security solutions that monitor network and system activities for malicious activities or policy violations. An IDPS typically records information about observed events, notifies security administrators of important observed events, and produces reports. Many IDPSs can also respond to a detected threat by attempting to prevent it from succeeding. They use various methods to detect and prevent threats, including signature-based detection, anomaly-based detection, and stateful protocol analysis.
Anomaly detection in cybersecurity is the process of identifying patterns in data that do not conform to expected behavior. These anomalies often indicate potential security incidents such as breaches, exploits, or system malfunctions. Anomaly detection systems use various statistical techniques, machine learning algorithms, and behavioral analytics to detect unusual patterns that may signify a security threat.
Threat intelligence involves collecting, analyzing, and interpreting information about existing or emerging threats to cybersecurity. This intelligence is used to understand the capabilities, intentions, and activities of potential adversaries, enabling organizations to prepare, prevent, and identify cyberthreats looking to take advantage of valuable resources. Effective threat intelligence informs security strategies, helps prioritize responses to threats, and enhances an organization's overall security posture by providing actionable insights into the evolving threat landscape.
Network traffic analysis involves scrutinizing the flow of data within a network to identify patterns, detect anomalies, and monitor for security threats. It encompasses the collection, monitoring, and analysis of network signals to understand network behavior, performance issues, and security incidents. Network traffic analysis is crucial for detecting unauthorized access, data exfiltration, and other malicious activities often missed by traditional security measures. It provides insights into network usage and vulnerabilities, enabling informed decision-making for network management.
File integrity monitoring (FIM) is a security process that detects and alerts on unauthorized changes to software files, configurations, and critical system files. FIM continuously scans, analyzes, and reports changes to ensure that files have not been altered by unauthorized users, malware, or other harmful processes. It's a key component in compliance and security strategies, helping to protect against data breaches, ensure compliance with regulations, and maintain the integrity of the IT environment.
The Zero Trust security model is a security concept centered on the belief that organizations should not automatically trust anything inside or outside their perimeters and instead must verify anything and everything trying to connect to their systems before granting access. This model advocates for rigorous identity verification, microsegmentation, and the least-privileged access principle to minimize the attack surface and prevent lateral movement within a network. Zero Trust is particularly relevant in modern, distributed environments where traditional security perimeters are insufficient.
A container runtime sandbox is an isolated environment where containerized applications run. This sandboxing technology enhances security by isolating the container's runtime environment from the host system, preventing the containerized application from affecting the host or other containers. Sandboxes use various methods like namespaces, cgroups, and virtualization techniques to create a controlled and restricted execution environment.
gVisor is an open-source container runtime sandbox developed by Google, designed to provide an additional layer of isolation between running containers and the host operating system. It implements a lightweight, user-space kernel, intercepting and managing system calls made by containerized applications. gVisor enhances security by limiting the direct interaction of containers with the host kernel, thereby reducing the risk of kernel vulnerabilities being exploited. It's particularly useful in environments where strong isolation is required but full virtualization is too resource-intensive.
User namespaces in Linux allow for the separation of user IDs between the host and containers while providing a way to map user and group IDs inside a container to IDs outside the container. Namespaces enhance security by ensuring that a process running as root inside a container doesn’t have root privileges on the host system. User namespaces limit the impact of a container breakout, where a malicious process escapes the container environment to access the host.
Control groups (cgroups) are a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. In the context of containerization, cgroups ensure that each container gets its fair share of resources and that no single container can exhaust resources on the host. They are essential for maintaining the stability and efficiency of systems by preventing resource starvation and ensuring predictable application performance.
Container escape detection refers to the process of identifying attempts by a containerized process to break out of its isolation boundaries and gain unauthorized access to the host system or other containers. This involves monitoring for anomalous behavior or exploitation of vulnerabilities that could lead to a container escape. Effective container escape detection is crucial for maintaining the security integrity of containerized environments, as it helps in promptly identifying and mitigating potential breaches that could lead to wider system compromise.