Skip to content

Non-Functional Requirements (Quality Attributes)

Non-Functional Requirements (Quality Attributes)

Terminology: Quality Attributes vs. Non-Functional Requirements

The Carnegie Mellon University Software Engineering Institute (CMU SEI), in their seminal work "Software Architecture in Practice" by Bass, Clements, and Kazman, argues for using the term "quality attributes" rather than "non-functional requirements." Their reasoning centers on the observation that the term "non-functional" is misleading because all requirements, by definition, serve some function within the system context.

Key arguments from CMU SEI research:

  • All requirements have function: Every requirement, whether it specifies performance, security, or usability criteria, serves a functional purpose in meeting user and business needs
  • Precision in terminology: "Quality attributes" more accurately describes these requirements as measurable properties that determine system quality
  • Architectural significance: Quality attributes directly influence architectural decisions and trade-offs, making them central to system design
  • Stakeholder communication: The term "quality attributes" better communicates the value and importance of these requirements to non-technical stakeholders

Industry adoption: While both terms are used interchangeably in practice, leading software architecture practitioners and frameworks increasingly favor "quality attributes" for the reasons outlined above.

Note: This document uses both terms interchangeably to align with common industry usage while recognizing the CMU SEI perspective on preferred terminology.


Overview

Non-functional requirements (NFRs) define how a system performs rather than what it does. They are critical for ensuring systems meet business expectations for performance, reliability, security, and user experience.

Core NFR Categories

1. Performance Requirements

Response Time
  • API Response Time: < 200ms for 95th percentile
  • Page Load Time: < 3 seconds for web applications
  • Database Query Time: < 100ms for simple queries
  • Batch Processing: Define acceptable processing windows
Throughput
  • Transactions per Second (TPS): Define peak and sustained rates
  • Concurrent Users: Maximum simultaneous active users
  • Data Processing Rate: Records/messages processed per hour
  • API Rate Limits: Requests per minute/hour per client
Resource Utilization
  • CPU Usage: < 70% under normal load
  • Memory Usage: < 80% of allocated memory
  • Network Bandwidth: Define limits for data transfer
  • Storage I/O: Define IOPS requirements

2. Reliability and Availability

Availability Targets
Service Tier Uptime SLA Downtime per Year Downtime per Month Downtime per Week
99.999% (Five Nines) 99.999% 5.26 minutes 25.9 seconds 6 seconds
Critical 99.99% 52.56 minutes 4.32 minutes 1.01 minutes
High 99.9% 8.77 hours 43.2 minutes 10.1 minutes
Standard 99.5% 1.83 days 21.6 minutes 5 minutes
Basic 99% 3.65 days 7.2 hours 1.68 hours
System Availability Calculations

Understanding how to calculate total system availability is critical for designing reliable architectures. The methodology varies based on system topology:

1. Series Configuration (Single Critical Path) When components are arranged in series, all must function for the system to work. The system is only as reliable as its weakest component.

Formula: A_system = A₁ × A₂ × A₃ × ... × Aₙ

Example: Web application with App Service (99.99%), SQL Database (99.95%), and Redis Cache (99.9%)

A_system = 0.9999 × 0.9995 × 0.999 = 0.9984 = 99.84%

2. Parallel Configuration (Independent Paths) When components provide redundant functionality, the system remains available if at least one component functions.

Formula: A_system = 1 - (1 - A₁) × (1 - A₂) × ... × (1 - Aₙ)

Example: Load balancer with two web servers (99.9% each)

A_system = 1 - (1 - 0.999) × (1 - 0.999) = 1 - 0.001² = 99.9999%

3. Mixed Configuration (Series + Parallel) Real systems often combine both patterns.

Formula: A_system = A_series × [1 - (1 - A_parallel1) × (1 - A_parallel2)]

Example: Gateway (99.95%) with two redundant databases (99.9% each)

A_system = 0.9995 × [1 - (1 - 0.999)²] = 0.9995 × 0.999999 = 99.9489%

4. Multi-Region Availability For systems deployed across multiple regions.

Formula: A_multi = 1 - (1 - A_single)^R

Where R = number of regions

Example: Single-region availability of 99.95% across 2 regions

A_multi = 1 - (1 - 0.9995)² = 1 - 0.0005² = 99.999975%
Composite SLO Calculation Guidelines
  1. Identify Critical Path: Only include services that could cause total system failure
  2. Consider Dependencies: Account for all components in the user request flow
  3. Factor External Dependencies: Include third-party services, APIs, and infrastructure
  4. Account for Planned Maintenance: Budget for scheduled downtime
  5. Include Human Factors: Account for operational errors and deployment risks
Practical Calculation Example

Scenario: E-commerce application with the following components:

  • Azure Front Door: 99.99%
  • App Service (2 instances): 99.95% each
  • SQL Database (with failover): 99.99%
  • Azure Storage: 99.99%
  • External payment API: 99.9%

Calculation:

App Service HA = 1 - (1 - 0.9995)² = 99.999975%
System = 0.9999 × 0.99999975 × 0.9999 × 0.9999 × 0.999
System = 99.89% availability
Recovery Requirements
  • Recovery Time Objective (RTO): Maximum acceptable downtime
  • Mission Critical: < 1 hour
  • Business Critical: < 4 hours
  • Important: < 24 hours
  • Recovery Point Objective (RPO): Maximum acceptable data loss
  • Mission Critical: < 15 minutes
  • Business Critical: < 1 hour
  • Important: < 24 hours
  • Mean Time to Recovery (MTTR): Average time to restore service
  • Mean Time Between Failures (MTBF): Average operational time between failures
  • Backup Frequency: Daily, weekly, or real-time replication
Error Handling
  • Error Rate: < 0.1% of requests should result in errors
  • Graceful Degradation: Define fallback behaviors for component failures
  • Circuit Breaker Patterns: Prevent cascade failures and allow recovery
  • Retry Logic: Exponential backoff strategies with jitter
  • Bulkhead Isolation: Isolate critical resources to prevent resource exhaustion

3. Scalability Requirements

Horizontal Scaling
  • Auto-scaling Triggers: CPU, memory, or queue depth thresholds
  • Scaling Speed: Time to add/remove instances
  • Load Distribution: Even distribution across instances
  • State Management: Stateless application design
Vertical Scaling
  • Resource Limits: Maximum CPU, memory, storage per instance
  • Scaling Constraints: Hardware or platform limitations
  • Cost Considerations: Performance vs. cost optimization
Data Scaling
  • Database Sharding: Strategy for horizontal data distribution
  • Read Replicas: Number and geographic distribution
  • Caching Strategy: Redis, CDN, application-level caching
  • Data Archiving: Long-term storage and retrieval strategies

4. Security Requirements

Authentication & Authorization
  • Multi-Factor Authentication: Required for admin access
  • Session Management: Timeout periods, secure tokens
  • Role-Based Access Control: Principle of least privilege
  • API Security: OAuth 2.0, rate limiting, API keys
Data Protection
  • Encryption Standards: AES-256 for data at rest, TLS 1.3 for transit
  • Key Management: Azure Key Vault or similar HSM solutions
  • Data Classification: Public, internal, confidential, restricted
  • Data Retention: Compliance with GDPR, CCPA requirements
Monitoring & Auditing
  • Audit Logging: All security events logged and retained
  • Real-time Monitoring: Security incident detection and alerting
  • Compliance Reporting: Automated compliance status reporting
  • Vulnerability Management: Regular scanning and remediation

5. Usability and User Experience

User Interface
  • Accessibility: WCAG 2.1 AA compliance
  • Cross-browser Support: Chrome, Firefox, Safari, Edge
  • Mobile Responsiveness: Support for tablets and smartphones
  • Internationalization: Multi-language and locale support
User Experience
  • Task Completion Rate: > 95% for primary user workflows
  • User Error Rate: < 5% of user actions result in errors
  • Learning Curve: New users productive within defined timeframe
  • Help and Documentation: Context-sensitive help available

6. Maintainability and Operability

Code Quality
  • Code Coverage: > 80% test coverage for critical paths
  • Cyclomatic Complexity: Keep methods under complexity threshold
  • Technical Debt: Regular refactoring and cleanup schedules
  • Documentation: Up-to-date API documentation and runbooks
Deployment and Operations
  • Deployment Frequency: Support for frequent, automated deployments
  • Rollback Time: < 5 minutes to rollback failed deployments
  • Configuration Management: Environment-specific configurations
  • Monitoring and Alerting: Comprehensive observability stack

NFR Definition Process

1. Stakeholder Requirements Gathering

  • Business Stakeholders: Performance expectations, SLA requirements
  • End Users: User experience and accessibility needs
  • Operations Team: Monitoring, maintenance, and support requirements
  • Security Team: Compliance and security control requirements

2. Requirements Analysis and Prioritization

Use MoSCoW method for prioritization:

  • Must Have: Critical for business operation
  • Should Have: Important for user satisfaction
  • Could Have: Nice to have if resources allow
  • Won't Have: Explicitly out of scope

3. Quantitative Requirements Definition

Transform qualitative requirements into measurable criteria:

  • "Fast response times" → "API responses < 200ms for 95th percentile"
  • "Highly available" → "99.9% uptime SLA with < 4 hours monthly downtime"
  • "Secure system" → "Zero tolerance for data breaches, PCI DSS compliance"

4. NFR Testing Strategy

Performance Testing
  • Load Testing: Normal expected load using JMeter or Azure Load Testing
  • Stress Testing: Beyond normal capacity to find breaking points
  • Spike Testing: Sudden load increases and system recovery
  • Volume Testing: Large amounts of data processing
Security Testing
  • Penetration Testing: Simulated attacks by security professionals
  • Vulnerability Scanning: Automated scanning for known vulnerabilities
  • Authentication Testing: Multi-factor authentication and session management
  • Data Protection Testing: Encryption and access control validation
Reliability Testing
  • Chaos Engineering: Deliberate failure injection (Netflix Chaos Monkey)
  • Disaster Recovery Testing: Full system recovery procedures
  • Backup and Restore Testing: Data recovery verification
  • Failover Testing: High availability configuration validation

5. Monitoring and Measurement

Key Performance Indicators (KPIs)
  • Response Time Percentiles: P50, P95, P99 response times
  • Error Rates: 4xx and 5xx HTTP error percentages
  • Availability Metrics: Uptime percentage over time periods
  • Resource Utilization: CPU, memory, disk, network usage
Alerting Thresholds
  • Warning Levels: 80% of NFR thresholds
  • Critical Levels: 95% of NFR thresholds
  • Escalation Procedures: Automated escalation paths
  • On-call Rotation: 24/7 support for critical systems

NFR Documentation Template

# Non-Functional Requirements: [System Name]

## Performance Requirements

| Requirement       | Target             | Measurement Method     | Priority  |
| ----------------- | ------------------ | ---------------------- | --------- |
| API Response Time | < 200ms (P95)      | Application monitoring | Must Have |
| Concurrent Users  | 1,000 simultaneous | Load testing           | Must Have |

## Availability Requirements

| Requirement   | Target   | Measurement Method        | Priority  |
| ------------- | -------- | ------------------------- | --------- |
| System Uptime | 99.9%    | Infrastructure monitoring | Must Have |
| RTO           | < 1 hour | Disaster recovery testing | Must Have |

## Security Requirements

| Requirement     | Target                              | Measurement Method | Priority  |
| --------------- | ----------------------------------- | ------------------ | --------- |
| Data Encryption | AES-256 at rest, TLS 1.3 in transit | Security scanning  | Must Have |
| Authentication  | MFA for admin access                | Security audit     | Must Have |

## Testing Approach

- Performance: JMeter load tests, Azure Load Testing
- Security: OWASP ZAP scanning, penetration testing
- Reliability: Chaos engineering, disaster recovery drills

## Monitoring Strategy

- APM: Application Performance Monitoring (New Relic, Datadog)
- Infrastructure: Azure Monitor, CloudWatch
- Security: SIEM integration, security event correlation

## Acceptance Criteria

[Define specific criteria that must be met before release]

Integration with Development Process

Planning Phase

  • Define NFRs during epic and feature planning
  • Include NFR acceptance criteria in user stories
  • Estimate effort for NFR implementation and testing

Development Phase

  • Implement monitoring and instrumentation code
  • Include NFR-focused unit and integration tests
  • Conduct regular performance profiling during development

Testing Phase

  • Execute comprehensive NFR test suites
  • Performance baseline establishment and regression testing
  • Security scanning and penetration testing

Release Phase

  • NFR sign-off required before production deployment
  • Production monitoring setup and validation
  • Post-deployment NFR verification and tuning

Tools and Resources

Performance Testing

  • Azure Load Testing: Cloud-based load testing service
  • JMeter: Open-source performance testing tool
  • k6: Developer-centric performance testing tool
  • LoadRunner: Enterprise performance testing platform

Monitoring and Observability

  • Azure Application Insights: Application performance monitoring
  • Datadog: Full-stack monitoring platform
  • New Relic: Application performance monitoring
  • Prometheus + Grafana: Open-source monitoring stack

Security Testing

  • OWASP ZAP: Web application security scanner
  • Burp Suite: Web vulnerability scanner
  • Nessus: Vulnerability assessment tool
  • Qualys: Cloud security and compliance platform

Common Pitfalls and Best Practices

Pitfalls to Avoid

  • Vague Requirements: "System should be fast" instead of specific metrics
  • Late Definition: Defining NFRs after development begins
  • No Testing Strategy: Assuming NFRs will be met without validation
  • Ignoring Trade-offs: Not considering cost vs. performance implications

Best Practices

  • Start Early: Define NFRs during initial requirements gathering
  • Be Specific: Use quantifiable, measurable criteria
  • Test Continuously: Include NFR testing in CI/CD pipelines
  • Monitor Production: Continuously validate NFRs in production
  • Review Regularly: Update NFRs as business needs evolve

Additional Resources

Back to top