Non-Functional Requirements (Quality Attributes)

Non-Functional Requirements (Quality Attributes)¶

Terminology: Quality Attributes vs. Non-Functional Requirements¶

The Carnegie Mellon University Software Engineering Institute (CMU SEI), in their seminal work "Software Architecture in Practice" by Bass, Clements, and Kazman, argues for using the term "quality attributes" rather than "non-functional requirements." Their reasoning centers on the observation that the term "non-functional" is misleading because all requirements, by definition, serve some function within the system context.

Key arguments from CMU SEI research:

All requirements have function: Every requirement, whether it specifies performance, security, or usability criteria, serves a functional purpose in meeting user and business needs
Precision in terminology: "Quality attributes" more accurately describes these requirements as measurable properties that determine system quality
Architectural significance: Quality attributes directly influence architectural decisions and trade-offs, making them central to system design
Stakeholder communication: The term "quality attributes" better communicates the value and importance of these requirements to non-technical stakeholders

Industry adoption: While both terms are used interchangeably in practice, leading software architecture practitioners and frameworks increasingly favor "quality attributes" for the reasons outlined above.

Note: This document uses both terms interchangeably to align with common industry usage while recognizing the CMU SEI perspective on preferred terminology.

Overview¶

Non-functional requirements (NFRs) define how a system performs rather than what it does. They are critical for ensuring systems meet business expectations for performance, reliability, security, and user experience.

Core NFR Categories¶

1. Performance Requirements¶

Response Time¶

API Response Time: < 200ms for 95^th percentile
Page Load Time: < 3 seconds for web applications
Database Query Time: < 100ms for simple queries
Batch Processing: Define acceptable processing windows

Throughput¶

Transactions per Second (TPS): Define peak and sustained rates
Concurrent Users: Maximum simultaneous active users
Data Processing Rate: Records/messages processed per hour
API Rate Limits: Requests per minute/hour per client

Resource Utilization¶

CPU Usage: < 70% under normal load
Memory Usage: < 80% of allocated memory
Network Bandwidth: Define limits for data transfer
Storage I/O: Define IOPS requirements

2. Reliability and Availability¶

Availability Targets¶

Service Tier	Uptime SLA	Downtime per Year	Downtime per Month	Downtime per Week
99.999% (Five Nines)	99.999%	5.26 minutes	25.9 seconds	6 seconds
Critical	99.99%	52.56 minutes	4.32 minutes	1.01 minutes
High	99.9%	8.77 hours	43.2 minutes	10.1 minutes
Standard	99.5%	1.83 days	21.6 minutes	5 minutes
Basic	99%	3.65 days	7.2 hours	1.68 hours

System Availability Calculations¶

Understanding how to calculate total system availability is critical for designing reliable architectures. The methodology varies based on system topology:

1. Series Configuration (Single Critical Path) When components are arranged in series, all must function for the system to work. The system is only as reliable as its weakest component.

Formula: A_system = A₁ × A₂ × A₃ × ... × Aₙ

Example: Web application with App Service (99.99%), SQL Database (99.95%), and Redis Cache (99.9%)

A_system = 0.9999 × 0.9995 × 0.999 = 0.9984 = 99.84%

2. Parallel Configuration (Independent Paths) When components provide redundant functionality, the system remains available if at least one component functions.

Formula: A_system = 1 - (1 - A₁) × (1 - A₂) × ... × (1 - Aₙ)

Example: Load balancer with two web servers (99.9% each)

A_system = 1 - (1 - 0.999) × (1 - 0.999) = 1 - 0.001² = 99.9999%

3. Mixed Configuration (Series + Parallel) Real systems often combine both patterns.

Formula: A_system = A_series × [1 - (1 - A_parallel1) × (1 - A_parallel2)]

Example: Gateway (99.95%) with two redundant databases (99.9% each)

A_system = 0.9995 × [1 - (1 - 0.999)²] = 0.9995 × 0.999999 = 99.9489%

4. Multi-Region Availability For systems deployed across multiple regions.

Formula: A_multi = 1 - (1 - A_single)^R

Where R = number of regions

Example: Single-region availability of 99.95% across 2 regions

A_multi = 1 - (1 - 0.9995)² = 1 - 0.0005² = 99.999975%

Composite SLO Calculation Guidelines¶

Identify Critical Path: Only include services that could cause total system failure
Consider Dependencies: Account for all components in the user request flow
Factor External Dependencies: Include third-party services, APIs, and infrastructure
Account for Planned Maintenance: Budget for scheduled downtime
Include Human Factors: Account for operational errors and deployment risks

Practical Calculation Example¶

Scenario: E-commerce application with the following components:

Azure Front Door: 99.99%
App Service (2 instances): 99.95% each
SQL Database (with failover): 99.99%
Azure Storage: 99.99%
External payment API: 99.9%

Calculation:

App Service HA = 1 - (1 - 0.9995)² = 99.999975%
System = 0.9999 × 0.99999975 × 0.9999 × 0.9999 × 0.999
System = 99.89% availability

Recovery Requirements¶

Recovery Time Objective (RTO): Maximum acceptable downtime
Mission Critical: < 1 hour
Business Critical: < 4 hours
Important: < 24 hours
Recovery Point Objective (RPO): Maximum acceptable data loss
Mission Critical: < 15 minutes
Business Critical: < 1 hour
Important: < 24 hours
Mean Time to Recovery (MTTR): Average time to restore service
Mean Time Between Failures (MTBF): Average operational time between failures
Backup Frequency: Daily, weekly, or real-time replication

Error Handling¶

Error Rate: < 0.1% of requests should result in errors
Graceful Degradation: Define fallback behaviors for component failures
Circuit Breaker Patterns: Prevent cascade failures and allow recovery
Retry Logic: Exponential backoff strategies with jitter
Bulkhead Isolation: Isolate critical resources to prevent resource exhaustion

3. Scalability Requirements¶

Horizontal Scaling¶

Auto-scaling Triggers: CPU, memory, or queue depth thresholds
Scaling Speed: Time to add/remove instances
Load Distribution: Even distribution across instances
State Management: Stateless application design

Vertical Scaling¶

Resource Limits: Maximum CPU, memory, storage per instance
Scaling Constraints: Hardware or platform limitations
Cost Considerations: Performance vs. cost optimization

Data Scaling¶

Database Sharding: Strategy for horizontal data distribution
Read Replicas: Number and geographic distribution
Caching Strategy: Redis, CDN, application-level caching
Data Archiving: Long-term storage and retrieval strategies

4. Security Requirements¶

Authentication & Authorization¶

Multi-Factor Authentication: Required for admin access
Session Management: Timeout periods, secure tokens
Role-Based Access Control: Principle of least privilege
API Security: OAuth 2.0, rate limiting, API keys

Data Protection¶

Encryption Standards: AES-256 for data at rest, TLS 1.3 for transit
Key Management: Azure Key Vault or similar HSM solutions
Data Classification: Public, internal, confidential, restricted
Data Retention: Compliance with GDPR, CCPA requirements

Monitoring & Auditing¶

Audit Logging: All security events logged and retained
Real-time Monitoring: Security incident detection and alerting
Compliance Reporting: Automated compliance status reporting
Vulnerability Management: Regular scanning and remediation

5. Usability and User Experience¶

User Interface¶

Accessibility: WCAG 2.1 AA compliance
Cross-browser Support: Chrome, Firefox, Safari, Edge
Mobile Responsiveness: Support for tablets and smartphones
Internationalization: Multi-language and locale support

User Experience¶

Task Completion Rate: > 95% for primary user workflows
User Error Rate: < 5% of user actions result in errors
Learning Curve: New users productive within defined timeframe
Help and Documentation: Context-sensitive help available

6. Maintainability and Operability¶

Code Quality¶

Code Coverage: > 80% test coverage for critical paths
Cyclomatic Complexity: Keep methods under complexity threshold
Technical Debt: Regular refactoring and cleanup schedules
Documentation: Up-to-date API documentation and runbooks

Deployment and Operations¶

Deployment Frequency: Support for frequent, automated deployments
Rollback Time: < 5 minutes to rollback failed deployments
Configuration Management: Environment-specific configurations
Monitoring and Alerting: Comprehensive observability stack

NFR Definition Process¶

1. Stakeholder Requirements Gathering¶

Business Stakeholders: Performance expectations, SLA requirements
End Users: User experience and accessibility needs
Operations Team: Monitoring, maintenance, and support requirements
Security Team: Compliance and security control requirements

2. Requirements Analysis and Prioritization¶

Use MoSCoW method for prioritization:

Must Have: Critical for business operation
Should Have: Important for user satisfaction
Could Have: Nice to have if resources allow
Won't Have: Explicitly out of scope

3. Quantitative Requirements Definition¶

Transform qualitative requirements into measurable criteria:

"Fast response times" → "API responses < 200ms for 95^th percentile"
"Highly available" → "99.9% uptime SLA with < 4 hours monthly downtime"
"Secure system" → "Zero tolerance for data breaches, PCI DSS compliance"

4. NFR Testing Strategy¶

Performance Testing¶

Load Testing: Normal expected load using JMeter or Azure Load Testing
Stress Testing: Beyond normal capacity to find breaking points
Spike Testing: Sudden load increases and system recovery
Volume Testing: Large amounts of data processing

Security Testing¶

Penetration Testing: Simulated attacks by security professionals
Vulnerability Scanning: Automated scanning for known vulnerabilities
Authentication Testing: Multi-factor authentication and session management
Data Protection Testing: Encryption and access control validation

Reliability Testing¶

Chaos Engineering: Deliberate failure injection (Netflix Chaos Monkey)
Disaster Recovery Testing: Full system recovery procedures
Backup and Restore Testing: Data recovery verification
Failover Testing: High availability configuration validation

5. Monitoring and Measurement¶

Key Performance Indicators (KPIs)¶

Response Time Percentiles: P50, P95, P99 response times
Error Rates: 4xx and 5xx HTTP error percentages
Availability Metrics: Uptime percentage over time periods
Resource Utilization: CPU, memory, disk, network usage

Alerting Thresholds¶

Warning Levels: 80% of NFR thresholds
Critical Levels: 95% of NFR thresholds
Escalation Procedures: Automated escalation paths
On-call Rotation: 24/7 support for critical systems

NFR Documentation Template¶

# Non-Functional Requirements: [System Name]

## Performance Requirements

| Requirement       | Target             | Measurement Method     | Priority  |
| ----------------- | ------------------ | ---------------------- | --------- |
| API Response Time | < 200ms (P95)      | Application monitoring | Must Have |
| Concurrent Users  | 1,000 simultaneous | Load testing           | Must Have |

## Availability Requirements

| Requirement   | Target   | Measurement Method        | Priority  |
| ------------- | -------- | ------------------------- | --------- |
| System Uptime | 99.9%    | Infrastructure monitoring | Must Have |
| RTO           | < 1 hour | Disaster recovery testing | Must Have |

## Security Requirements

| Requirement     | Target                              | Measurement Method | Priority  |
| --------------- | ----------------------------------- | ------------------ | --------- |
| Data Encryption | AES-256 at rest, TLS 1.3 in transit | Security scanning  | Must Have |
| Authentication  | MFA for admin access                | Security audit     | Must Have |

## Testing Approach

- Performance: JMeter load tests, Azure Load Testing
- Security: OWASP ZAP scanning, penetration testing
- Reliability: Chaos engineering, disaster recovery drills

## Monitoring Strategy

- APM: Application Performance Monitoring (New Relic, Datadog)
- Infrastructure: Azure Monitor, CloudWatch
- Security: SIEM integration, security event correlation

## Acceptance Criteria

[Define specific criteria that must be met before release]

Integration with Development Process¶

Planning Phase¶

Define NFRs during epic and feature planning
Include NFR acceptance criteria in user stories
Estimate effort for NFR implementation and testing

Development Phase¶

Implement monitoring and instrumentation code
Include NFR-focused unit and integration tests
Conduct regular performance profiling during development

Testing Phase¶

Execute comprehensive NFR test suites
Performance baseline establishment and regression testing
Security scanning and penetration testing

Release Phase¶

NFR sign-off required before production deployment
Production monitoring setup and validation
Post-deployment NFR verification and tuning

Tools and Resources¶

Performance Testing¶

Azure Load Testing: Cloud-based load testing service
JMeter: Open-source performance testing tool
k6: Developer-centric performance testing tool
LoadRunner: Enterprise performance testing platform

Monitoring and Observability¶

Azure Application Insights: Application performance monitoring
Datadog: Full-stack monitoring platform
New Relic: Application performance monitoring
Prometheus + Grafana: Open-source monitoring stack

Security Testing¶

OWASP ZAP: Web application security scanner
Burp Suite: Web vulnerability scanner
Nessus: Vulnerability assessment tool
Qualys: Cloud security and compliance platform

Common Pitfalls and Best Practices¶

Pitfalls to Avoid¶

Vague Requirements: "System should be fast" instead of specific metrics
Late Definition: Defining NFRs after development begins
No Testing Strategy: Assuming NFRs will be met without validation
Ignoring Trade-offs: Not considering cost vs. performance implications

Best Practices¶

Start Early: Define NFRs during initial requirements gathering
Be Specific: Use quantifiable, measurable criteria
Test Continuously: Include NFR testing in CI/CD pipelines
Monitor Production: Continuously validate NFRs in production
Review Regularly: Update NFRs as business needs evolve

Non-Functional Requirements (Quality Attributes)