Overview of Hardened Developer and Operational Processes at Solace
At Solace, our products are designed for security and reliability. As security threats are discovered, the security in our product is continuously improved. To ensure that this occurs, security threat checks and considerations are tightly coupled into our development and release processes. as a well way in which we conduct and operate our services. Solace conducts regular takes internal security audits, and performs company-wide quarterly audits to identify new threats, which is integral to our continuous improvement processes.
To give you a better understanding and a higher degree of confidence of how PubSub+ Cloud security evolves to continuously improve to protect your data, here's an overview of some of the processes and procedures that Solace has in place:
- operational procedures
- development and production processes
- policies and internal access controls
- internal audits, fire drills, and internal testing
- disaster recovery procedures
- physical and environmental security
At Solace, we have established operational procedures to ensure a secure PubSub+ Cloud infrastructure. This includes:
- highly restricted access to PubSub+ Cloud production environments
- well-defined responses for operational incidents
- security incident responses
Solace personnel has highly-restricted, authorized access to the PubSub+ Cloud production environment. These are some of the measures we have in place for Solace personnel:
- any access from devices (laptops, computers, cellphones, etc.) are secured with encrypted storage
- access to the production environments is restricted to select staff members with a clear hierarchical access and chain of responsibilities
- access and actions on production PubSub+ Cloud is logged and tracked.
- malware prevention, encryption, and anti-virus are installed on all devices as per our company security policies
- customer messaging plane and message data is never accessible to Solace personnel
In addition to the above policies, Solace has strict policies that cover operations of personnel that include:
- encryption and key management
- identity and access management
- employee on-boarding and off-boarding to fulfill internal security policies
- controlled change management
All Solace policies are reviewed periodically during the year as part of regular audits and our business practices.
Solace has dedicated personnel to monitor Solace-controlled infrastructure 24/7. All event broker services are centrally managed and monitored by Solace, including customer-controlled deployments.
Solace has operational incident response processes in place that include:
- rapid responses to address issues
- focus on reducing recovery time and costs
- Solace works closely with customers to debug customer-controlled environmental issues as required – this is often a joint activity with customers.
- root cause analysis (RCA) to always done as part of our processes identify the cause and take measures to prevent recurrence of an incident
There are two important aspects to your security incident responses. We have the portion that notifies us if an issue occurs that include both the logging and alerting aspect of the response, and the response to the alert itself.
In addition to using PubSub+ Cloud logging and notifications, we closely monitor the ecosystem of alerts and logs that include not only PubSub+ Cloud, but also infrastructure logs.
As part of this, we utilize security information and event management (SIEM) tools that monitors and analyzes applications, system, and security logs in one central location. SIEM tools are an important part of the data security ecosystem as they aggregate data from multiple systems and analyze that data to catch abnormal behavior or potential security and cyberattacks.
Security logs are retained for 30 days (an optional 90 days is available upon request for logs from event broker services if the customer is subscribed to PubSub+ Insights).
At Solace, we utilize two systems for PubSub+ Cloud that includes:
- This is the third-party central monitoring service where event broker service data is sent. The data is metadata, log data, and event logs. The data stored does not contain information that identifies customers or location. We use utilize the Datadog service to configure rules to detect for abnormalities and alert Solace staff for action.
- AWS GuardDuty
- We use Amazon GuardDuty that sends data to Amazon CloudWatch to alert and perform actions as required. In some cases, these actions are automated to ensure immediate measures are taken. This is limited for use with deployments on Amazon Web Services (AWS).
All responses follow internal Solace PubSub+ Cloud Incident Management Process. All SIEM events generate alerts that are sent to production teams at Solace. All security incidents are followed with a root-cause analysis.
Solace has well-defined controls in place for the development of new features and enhancements to maintain secure. Security is the principal objective in all of Solace's architectural design decisions and a main consideration throughout the feature development process.
Our Agile processes leverage the best aspects of the Microsoft SDL process, but we've improved this process in areas that are applicable for the more stringent security requirements demanded by our customers to create a Security Development Lifecycle.
- Design and Definition
- In our design, security considerations and research are included in our epics. Threat modeling is done as part of the research and knowledge acquisition stage of the process. Any security considerations are raised as major stories during this research. This process greatly reduces the potential for security vulnerabilities and fosters a security-first mind-set.
- During development, we use a continuous integration system to ensure that any code changes pass testing before submission to the code base. In addition, static analysis checks are performed on all code before they are submitted to the code base. Prior to submission, peer reviews are performed by subject-matter experts to ensure that the approved code changes only do what is required. As part of the definition of completed for any development work, we use the OWASP Top Ten.
- After development is done, we perform additional testing as part of deployment to our staging environment. The code undergoes building, testing, and scanning and can be blocked from progressing in the deployment pipeline if any failures occur.
- Included in the testing and scanning is static code analysis, dynamic code analysis, vulnerability scanning (using numerous tactics), and docker image scanning.
- After the progression through the deployment pipeline undergoes production readiness review. If approved, the code progresses to rollout.
- As part of rollout, there is significant, automated vulnerability scanning done in our production environment. This scanning is heavily integrated with our processes. Our scanning occurs nightly, weekly, and quarterly with increasing stress levels. For more information, see Internal Audits and Internal Testing.
The production systems at Solace have strict policies for access. Access is restricted to a production team with a clear hierarchical access and chain-of-command. All access to production systems for the PubSub+ Cloud require Two-Form Authentication (2FA).
Intrusion Detection Systems are in place for all Solace-controlled infrastructure and assets. These include all compute instances running in Solace-controlled Kubernetes clusters and Virtual Private Clouds and Virtual Networks (VPC/VNet).
At Solace, regular audits and internal testing is performed as part of our business practices. Security is the top priority because our PubSub+ platforms are trusted and used by highly-regulated lines of business. These audits and internal tests include (but not limited to):
- vulnerability scanning
- internal testing during all stages of our pipeline
- many activities to ensure a secure and reliable environment
Solace runs vulnerability scans at regular intervals and whenever there are system changes. Here's an overview of the type of testing that we perform:
- Nightly internal vulnerability scans that include static and dynamic code analysis
- Weekly Dynamic Security testing Application Test (DSAT) scans all internal systems
- Quarterly penetration testing by an external third-party
- Whitesource (Open Source) vulnerability scanning and analysis
Penetration and Vulnerability Testing
All PubSub+ event broker releases are scanned and validated before being released into production.
Penetration testing of the service is scheduled annually. Any vulnerabilities identified are triaged and remediated.
In all stages of our pipeline (development, deployment, and production), there is a significant amount of effort focused on testing the code for vulnerabilities. As the code progresses from development on individual developer branches to integration to the code base, the testing and effort increases and becomes more involved.
Solace has a number of tests in each stage as shown in the following diagram to highlight the context of the testing and in our continuous deployment pipeline process.
Though this process seems onerous, it is a very efficient process that also permits us to deploy changes with a high-degree of confidence such as patches for security vulnerabilities.
At Solace, there are a number of activities that we perform regularly to ensure the highest reliability in the event of a security issue. This is an overview of the intervals and activities that Solace performs on Solace-controlled infrastructure:
- Disaster-recovery exercises to test procedures, systems, and ensure Solace staff react as trained.
- An independent third-party organization performs penetration testing on Solace systems.
- On a weekly basis, vulnerability analysis is performed (Tenable).
- Daily, dynamic analysis is performed using Zap.
- Every Four Hours
- Critical platform information is backed up to ensure minimal time lost if a disaster occurs.
- Active and passive data replication is performed. Intrusion and anomaly detection is performed. Any vulnerabilities are immediately actioned.
In addition to the many security measures we have in place, as well as the reliability measures, we have a well-defined framework that includes processes and procedures for disaster recovery for Solace Home Cloud.
- We have 99.95% uptime for the Solace Home Cloud and the PubSub+ Cloud Console. This is the control plane portion of the architecture.
- The event broker services for all deployments are monitored for system-level events 24/7 and exist in the messaging plane. Event broker services in both Solace-controlled and customer-controlled environments have High-Availability (HA) built in to enterprise services and remain running if the control plane is unavailable.
The following measures are also in place to ensure that Solace Home Cloud and event broker services are available:
Data backups for the Solace Home Cloud and PubSub+ Cloud are taken every four hours and the data is spread across multiple availability zones (AZ).
Solace Home Cloud critical data is replicated to a separate, secure region to allow for timely recovery. Solace has a 24-hour recovery time objective (RTO) and a 4-hour recovery point objective (RPO) for configuration and control plane actions.
If the Solace Home Cloud is not available, applications and event broker service availability is not impacted. This means that your applications and event broker services continue to run and be available.
The disaster-recovery mechanisms Solace has in place are tested and exercised annually to ensure that these objectives operating within adhered parameters.
Solace-controlled infrastructure utilize highly reliable, best in class cloud providers that include Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). Solace leverages the existing security in these established cloud providers to protect against attacks and provide for features such as Basic Denial-of-Service Protection.
Solace selects vendors that are best-in-class and who have physical, environmental, and security controls from the vendors such as:
- badge readers and physical access logs
- video surveillance
- electrical and power redundancy controls
- climate and temperature
- fire detection & suppression
- media destruction safeguards