Quality Attributes
Connect architecture decisions and cross-cutting concerns to quality expectations
Quality as Enabling Foundation
What is often overlooked or misunderstood is that software projects come attached with a lot of risks. While each developed feature provides some form of user or business value, this value is actually at risk when software quality is neglected. The best features have no value if users cannot use them because of system outages, frustrating performance, horrible usability or missing platform support. Development itself can also become unsustainable when the codebase becomes too complex or fragile to change.
Quality is often misunderstood and seen as something that competes against feature quantity and can be easily traded (Tradable Quality Hypothesis). In reality, quality is more an enabling foundation for feature development rather than a tradeoff with feature development. We need quality to ensure that development and delivery of features is even possible.
Properly managing software quality is a shared responsibility among developers and business stakeholders. They should have a shared interest in understanding quality as a connection between architecture decisions and potential business risks. Business risks should not remain hidden but instead properly managed and disclosed. Architecture decisions should be made while considering the business context and the support of business stakeholders.
Quality Frameworks
Software quality is usually defined in abstract terms that define the different dimensions of quality. There is no universal definition but there are standards and frameworks that roughly agree on terminology:
- ISO 25010: Systems and software Quality Requirements and Evaluation
- Functional Suitability
- Performance efficiency
- Compatibility
- Usability
- Reliability
- Security
- Maintainability
- Portability
- AWS Well-Architected Framework
- Operational Excellence
- Security
- Reliability
- Performance
- Cost Optimization
- Sustainability
- Google Cloud Architecture Framework
- System design
- Operational excellence
- Security, privacy, and compliance
- Reliability
- Cost optimization
- Performance optimization
- Microsoft Azure Well-Architected Framework
- Reliability
- Security
- Cost optimization
- Operational excellence
- Performance efficiency
Quality Attributes
It is difficult to connect the often very abstract terminology of quality frameworks with real technical concepts that matter in everyday coding. Here is an attempt to pick a set of quality attributes and define them from a practice-oriented perspective:
📊 Modularity
Create a modular design by breaking down the system into smaller, independent components. This keeps the codebase maintainable and prevents it from becoming to complex to understand and to fragile to change.
- Loosely-Coupled Architecture: Hide complexity by designing interconnected but loosely-coupled components.
- Infrastructure as Code Provisioning: Integrate infrastructure as code components to automatically provision the required cloud infrastructure for operations.
- Data Storage: Define data storage components and the types of data that is stored in it.
📊 Portability
Make components more independent from specific platforms, versions and infrastructure environments. This prevents the system from being overly dependent on specific platforms or loosing the ability to run in multiple, independent environments which is required for continuous delivery.
- Supported Platform Versions: Define which components support which platform versions.
- Environment Variables: Use environment variables to decouple application code from infrastructure configuration to operate the system in multiple environments such as staging and production.
- Environment Parity: Keep different environments such as staging and production as similar as possible
- Configuration Management: Store environment-specific infrastructure configuration and retrieve it during infrastructure provisioning and application deployment.
📊 Security
Protect the system's data and functionality by implementing security measures such as encryption, access controls and authentication. Security is essential to preserve the users' trust in the system and to prevent financial losses and legal consequences through malicious activities.
- Authentication: Verify the identity of a user or a client component (e.g. a backend service). This can involve various type of authentication mechanisms:
- Username + Password
- A client secret / API key
- Biometry
- Two-Factor / Multi-Factor Authentication
- Social Login (e.g. via Google Account)
- Authorization: Grant or deny access to a resource or API based on an authenticated user or client identity. Both authentication and authorization are often standardized processes that follow the OAuth2 standard. The granted access if often encoded in access tokens based on the JWT standard.
- Role-Based Access Control: Define user roles that determine what permissions are granted to different user types during authorization.
- Identity Management: Manage user accounts and their authentication credentials and/or link accounts to user identities of external systems (e.g. Google Account). Identity management often follows the OpenID Connect standard.
- Access Token Verification: APIs need to verify that digital signatures of access tokens were issued by a valid authority which requires obtaining their public key. JWT-based tokens are verified by fetching the public keys from a JWKS endpoint.
- Input Validation: APIs need to validate their input and enforce restrictions to protect the system from malicious intent (e.g. SQL injection).
- Rate Limiting: Control the rate at which an API will process requests within a given time frame to prevent abuse or overload and to ensure fair usage of resources.
- Vulnerability Fixing: Fixing vulnerabilities by updating code dependencies needs to happen on a regular basis.
📊 Observability
Collect and analyze telemetry data from components to monitor the performance of the system and to diagnose and fix runtime problems. Observability is required to gain insights into performance, reliability and business intelligence.
- Instrumentation: Enhance components so that they emit telemetry data. This is often done by using frameworks and tools that implement the OpenTelemetry standard to ensure compatibility across vendors and platforms.
- Telemetry: Collect and transfer telemetry such as metrics, logs and traces from components.
- Visual Dashboards: Set up dashboards that visualize metrics and connect them to log output and traces
- Analytics: Telemetry can be used to provide business intelligence and product management insights.
📊 Reliability
Minimize downtimes, prevent data losses and guarantee a consistent and dependable performance. Users need a level of reliability to trust the system. Reliability can be improved through various techniques that make the system more resilient against errors or partial outages. Downtimes can be minimized by organizing an incident response process with on-call responders.
- Service Level Objectives: Specify a target level for the reliability of the system within a given time window.
- Alerts: Define thresholds on metrics that indicate an abnormal behavior that requires human intervention.
- Runbooks: Documented checklists and procedures for diagnosing or fixing operational problems. Alerts often link to runbooks to provide helpful first steps for incident response.
- Error Handling: Define how to handle, report and potentially recover from errors through strategies like retries, graceful degradation, transaction rollbacks or circuit breakers.
📊 Performance
Ensure that the system can handle user requests and process data in a timely manner without frustrating users. This is done by measuring and optimizing various performance metrics such as latency, throughput, processing power or storage capacity. Performance may also involve structural changes such as load balancing or caching.
- Capacity: Define capacity constraints for components such as:
- Latency
- Throughput
- Network Bandwidth
- Processing Power
- Storage Quota
- Timeliness:
- Eventual Consistency
- Transactions
- Load Balancing: Implement load balancing strategies to distribute load across horizontally scaled infrastructure.
- Caching: Improve performance through reusing data
- Profiling: Use tools to diagnose the performance of components:
- Load Testing
- Soak Testing
📊 Usability
Design a positive, intuitive and straightforward user experience that does not frustrate. This involves building the user interface layout and design on top of familiar conventions and considering potential user impairments to ensure accessibility. Users may also access the system through various third-party platforms such as search engines, social media platforms or AI assistants.
- URL Structure: Store certain aspects of the application state in the URL to support multiple concepts:
- Deeplinks
- Bookmarks
- Back Button Support
- Accessibility: Ensure that users with diverse abilities and disabilities can interact with the system.
- Search Engine Optimization: Ensure that search engines can properly crawl a web app.
- Social Preview: Ensure that web pages are associated with a small preview image that can be used by social media platforms.
- Responsive Design: Design for various screen sizes.
- Design System: Build on top of existing, well-known user interface and usability concepts.
- Internationalization: Support multiple languages and regional differences.
📊 Testability
Make it easy for developers to manually and automatically test the system. Tests are essential to check the functional correctness of code and to prevent new accidental defects through code changes. Testing requires tooling, frameworks, test data and techniques for testing various types of code elements.
- Unit Testing Techniques for Code Elements: Basically, every code element such as utility functions, API endpoints, frontend components etc need a corresponding unit testing technique.
- Test Isolation and Parallel Execution: Ensuring that tests can run in isolation and in parallel.
- Test Data: Provide sample data that is used by tests.
- Debugging Support: Integrate debugging tools for manual testing.
- Mocks: Being able to mock certain components or code elements.
- Fixtures: Run tests within a predefined system state or context.
📊 Privacy
Protect user privacy and comply with privacy regulations. This involves disclosing how personal data is collected, stored and processed by components and implementing mechanisms that ensure user privacy rights such as the right to correct or erase personal data.
- Data Processing Records: Companies may be legally obligated to keep written records of all data processing activities. These records are supposed to describe what data is processed, the purpose of processing, the legal basis of the processing, the retention period and the recipients to whom the data is disclosed.
- Privacy Rights: Guarantee users' privacy rights by implementing mechanisms that allow users to access, rectify or erase data.
- Consent Management: If data is processed on the legal basis of consent, that consent must be obtained through the user interface and somehow stored (e.g. in a cookie or a user account).
- Location Choice: Cloud infrastructure may offer choosing the country of data processing and storage so that different privacy legislation applies.
📊 Cost Optimization
Minimize unnecessary operating expenses and reliably predict costs. This involves keeping track of plans, subscriptions and the capacity utilization of cloud services.
- Cost Budget: Create a predictable budget for the operating costs of the system. This is typically a combination of fixed and variable costs:
- Plans/Subscriptions for Cloud Services
- Autoscaling Cloud Infrastructure
- Licensing Costs
- Environmental Impact: Track environmental impact such as carbon emissions that are indirectly caused by operating the system on cloud infrastructure.