Overview of Hardened Developer and Operational Processes at Solace

At Solace, our products are designed for security and reliability. As security threats are discovered, the security in our product is continuously improved. To ensure that this occurs, security threat checks and considerations are tightly coupled into our development and release processes. as a well as into way we conduct and operate our services. Solace conducts regular internal security audits and performs company-wide quarterly audits to identify new threats, which is integral to our continuous improvement processes.

To give you a better understanding and a higher degree of confidence of how PubSub+ Cloud security evolves to continuously improve to protect your data, here's an overview of some of the processes and procedures that Solace has in place:

Operational Procedures and Policies

At Solace, we have established operational procedures to ensure a secure PubSub+ Cloud infrastructure. This includes:

  • highly restricted access to PubSub+ Cloud production environments
  • well-defined responses for operational incidents
  • security incident responses

Restricted Access Production Environments for PubSub+ Cloud

Solace personnel have highly-restricted, authorized access to the PubSub+ Cloud production environment. These are some of the measures we have in place for Solace personnel:

  • any access from devices (laptops, computers, cellphones, etc.) are secured with encrypted storage
  • access to the production environments is restricted to select staff members with a clear hierarchical access and chain of responsibilities
  • access and actions on production PubSub+ Cloud is logged and tracked.
  • malware prevention, encryption, and anti-virus are installed on all devices as per our company security policies

In addition to the above policies, Solace has strict policies that cover operations of personnel that include:

  • encryption and key management
  • identity and access management
  • employee on-boarding and off-boarding to fulfill internal security policies
  • controlled change management

All Solace policies are reviewed periodically during the year as part of regular audits and our business practices.

Operational Incident Response

Solace has dedicated personnel to monitor Solace-controlled infrastructure 24/7. All event broker services are centrally managed and monitored by Solace, including customer-controlled deployments.

Solace has operational incident response processes in place that include:

  • rapid responses to address issues
  • focus on reducing recovery time and costs
  • Solace works closely with customers to debug customer-controlled environmental issues as required – this is often a joint activity with customers.
  • root cause analysis is always done as part of our processes identify the cause and take measures to prevent recurrence of an incident

Security Incident Response

There are two important aspects to your security incident responses. We have the portion that notifies us if an issue occurs that includes both the logging and alerting aspect of the response, and the response to the alert itself.

Logging and Alerting

In addition to using PubSub+ Cloud logging and notifications, we closely monitor the ecosystem of alerts and logs that include not only PubSub+ Cloud, but also infrastructure logs.

As part of this, we utilize security information and event management (SIEM) tools that monitor and analyze applications, systems, and security logs in one central location. SIEM tools are an important part of the data security ecosystem as they aggregate data from multiple systems and analyze that data to catch abnormal behavior or potential security and cyberattacks.

Security logs are retained for 30 days (an optional 90 days is available upon request for logs from event broker services if the customer is subscribed to PubSub+ Insights).

At Solace, we use two systems for PubSub+ Cloud including:

Datadog
This is the third-party central monitoring service where event broker service metadata, log data, and event logs are sent. The data stored does not contain information that identifies customers or location. We use the Datadog service to configure rules to detect for abnormalities and alert Solace staff for action.
AWS GuardDuty
We use Amazon Guard​Duty, which sends data to Amazon CloudWatch to alert and perform actions as required. In some cases, these actions are automated to ensure immediate measures are taken. This is limited for use with deployments on Amazon Web Services (AWS).

Response to Alerts

All responses follow internal PubSub+ Cloud Incident Management Process. All SIEM events generate alerts that are sent to production teams at Solace. All security incidents are followed with a root cause analysis.

Design and Deployment Processes

Solace has well-defined controls in place for the development of new features and enhancements to maintain secure. Security is the principal objective in all architectural design decisions and a main consideration throughout the feature development process.

Our agile processes leverage the best aspects of the Microsoft SDL process, but we've improved this process in areas that are applicable for the more stringent security requirements demanded by our customers to create a Security Development Lifecycle.

Daigram summarizing the stages in the Security Development Lifecycle described in the following text.

Design and Definition
In our design, security considerations and research are included in our epics. Threat modeling is done as part of the research and knowledge acquisition stage of the process. Any security considerations are raised as major stories during this research. This process greatly reduces the potential for security vulnerabilities and fosters a security-first mind-set.
Development
During development, we use a continuous integration system to ensure that any code changes pass testing before submission to the code base. In addition, static analysis checks are performed on all code before they are submitted to the code base. Prior to submission, peer reviews are performed by subject-matter experts to ensure that the approved code changes only do what is required. As part of the definition of completed for any development work, we use the OWASP Top Ten.
Deployment
After development is done, we perform additional testing as part of deployment to our staging environment. The code undergoes building, testing, and scanning and can be blocked from progressing in the deployment pipeline if any failures occur.
Included in the testing and scanning is static code analysis, dynamic code analysis, vulnerability scanning (using numerous tactics), and docker image scanning.
After the progression through the deployment pipeline undergoes production readiness review. If approved, the code progresses to rollout.
Rollout
As part of rollout, there is significant, automated vulnerability scanning done in our production environment. This scanning is heavily integrated with our processes. Our scanning occurs nightly, weekly, and quarterly with increasing stress levels. For more information, see Internal Audits and Internal Testing.

Policies and Internal Access Controls

The production systems at Solace have strict policies for access. Access is restricted to a production team with a clear hierarchical access and chain-of-command. All access to production systems for the PubSub+ Cloud require two-factor authentication.

Intrusion Detection Systems are in place for all Solace-controlled infrastructure and assets. These include all compute instances running in Solace-controlled Kubernetes clusters and Virtual Private Clouds and Virtual Networks (VPC/VNet).

Internal Audits and Internal Testing

At Solace, regular audits and internal testing is performed as part of our business practices. Security is the top priority because our PubSub+ platforms are trusted and used by highly-regulated lines of business. These audits and internal tests include (but not limited to):

Vulnerability Scanning

Solace runs vulnerability scans at regular intervals and whenever there are system changes. Here's an overview of the type of testing that we perform:

  • Nightly internal vulnerability scans that include static and dynamic code analysis
  • Weekly Dynamic Security testing Application Test (DSAT) scans all internal systems
  • Quarterly penetration testing by an external third-party
  • Whitesource (Open Source) vulnerability scanning and analysis

Penetration and Vulnerability Testing

All PubSub+ event broker releases are scanned and validated before being released into production.

Penetration testing of the service is scheduled annually. Any vulnerabilities identified are triaged and remediated.

Internal Testing in All Stages of Pipeline

In all stages of our pipeline (development, deployment, and production), there is a significant amount of effort focused on testing the code for vulnerabilities. As the code progresses from development on individual developer branches to integration to the code base, the testing and effort increases and becomes more involved.

Solace has a number of tests in each stage as shown in the following diagram to highlight the context of the testing and in our continuous deployment pipeline process.

  Stage Tests Performed

Step 1

Branch

  • Component/unit tests

Step 2

Pull Request

  • Component/unit tests
  • System tests
  • Static code analysis
  • Peer code review

Step 3

Code Merge

  • Static code analysis
  • System tests
  • Whitesource analysis

Step 4

Nightly

  • Stability tests
  • Upgrade tests
  • Security tests (Zap, Tenable)

Step 5

Staging

  • System tests
  • Manual verification

Production

  • Update tests
  • Automated tests
  • Manual verification

Though this process seems onerous, it is a very efficient process that also permits us to deploy changes, such as patches for security vulnerabilities, with a high-degree of confidence.

Security Activities

At Solace, there are a number of activities that we perform regularly to ensure the highest reliability in the event of a security issue. This is an overview of the intervals and activities that Solace performs on Solace-controlled infrastructure:

Annually
Disaster-recovery exercises to test procedures, systems, and ensure Solace staff react as trained.
Quarterly
An independent third-party organization performs penetration testing on Solace systems.
Weekly
On a weekly basis, vulnerability analysis is performed (Tenable).
Daily
Daily, dynamic analysis is performed using Zap.
Every Four Hours
Critical platform information is backed up to ensure minimal time lost if a disaster occurs.
Live/Real-time
Active and passive data replication is performed. Intrusion and anomaly detection is performed. Any vulnerabilities are immediately actioned.

Disaster Recovery Procedures for PubSub+ Home Cloud

In addition to the many security measures we have in place, as well as the reliability measures, we have a well-defined framework that includes processes and procedures for disaster recovery for PubSub+ Home Cloud.

  • We have 99.95% uptime for the PubSub+ Home Cloud and the PubSub+ Cloud Console. This is the control plane portion of the architecture.
  • The event broker services for all deployments are monitored for system-level events 24/7 and exist in the data plane. Event broker services remain running if the control plane is unavailable.

The following measures are also in place to ensure that PubSub+ Home Cloud is always available:

  • Data backups for the PubSub+ Home Cloud and PubSub+ Cloud are taken every four hours and the data is spread across multiple availability zones (AZ).

  • PubSub+ Home Cloud critical data is replicated to a separate, secure region to allow for timely recovery. Solace has a two-hour recovery time objective (RTO) and a four-hour recovery point objective (RPO) for configuration and control plane actions.

    If the PubSub+ Home Cloud is not available, applications and event broker service availability is not impacted. This means that your applications and event broker services continue to run and be available.

  • The disaster-recovery mechanisms Solace has in place are tested and exercised annually to ensure that these objectives operating within adhered parameters.

Physical and Environmental Security Controls

Solace-controlled infrastructure utilize highly reliable, best in class cloud providers that include Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). Solace leverages the existing security in these established cloud providers to protect against attacks and provide for features such as Basic Denial-of-Service Protection.

Solace selects vendors that are best-in-class and who have physical and environmental security controls such as:

  • badge readers and physical access logs
  • video surveillance
  • electrical and power redundancy controls
  • climate and temperature
  • fire detection & suppression
  • media destruction safeguards