SAST Fundamentals
Understand how static analysis works, different analysis techniques, and the limitations of SAST tools.
Before diving into specific tools, you need to understand how static analysis works and what it can (and cannot) do. This knowledge helps you configure tools effectively, interpret results accurately, and set realistic expectations.
How Static Analysis Works
Static analysis examines source code without executing it. The analyzer builds a representation of your code—typically an Abstract Syntax Tree (AST)—and then applies rules to detect patterns that indicate vulnerabilities.
The Analysis Pipeline
Source Code → Parsing → AST → Analysis Engine → Rules → Findings
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
.java, Lexical Tree Data flow, Security Vulnerabilities
.py, .js tokens structure taint, patterns with severity
control flow
Step by step:
- Parsing — Convert source code into tokens, then build an AST
- Semantic Analysis — Resolve types, scopes, and symbol references
- Control Flow Analysis — Map possible execution paths
- Data Flow Analysis — Track how data moves through the program
- Pattern Matching — Apply security rules to find vulnerabilities
- Reporting — Generate findings with severity, location, and remediation
Types of Static Analysis
Different tools use different analysis techniques. Understanding these helps you choose the right tool.
Pattern Matching (Syntactic Analysis)
The simplest approach: look for code patterns that match known-bad signatures.
# Rule: Detect hardcoded passwords
# Pattern: variable named "password" assigned a string literal
password = "secretpassword123" # FLAGGED: Hardcoded credential
api_key = "AKIA1234567890ABCDEF" # FLAGGED: AWS key pattern
Pros:
- Fast execution (simple string/regex matching)
- Easy to write custom rules
- Low false positive rate for simple patterns
Cons:
- Misses complex vulnerabilities
- Easy to bypass (variable renaming, string concatenation)
- No understanding of data flow
Tools: Semgrep (primarily), grep-based custom scripts
Data Flow Analysis
Tracks how data moves from sources (user input) to sinks (dangerous functions).
# Data flow analysis tracks this:
def process_user(request):
username = request.get('username') # SOURCE: User input
# Data flows through transformations...
processed = username.strip()
lower_name = processed.lower()
# Eventually reaches a dangerous sink
query = f"SELECT * FROM users WHERE name = '{lower_name}'" # SINK: SQL
cursor.execute(query) # FLAGGED: Tainted data reaches SQL sink
The analyzer traces the "taint" from request.get() through all transformations to cursor.execute(). If no sanitization occurs, it reports a SQL injection.
Pros:
- Catches complex vulnerabilities across function boundaries
- Understands when data is sanitized
- Fewer false negatives than pattern matching
Cons:
- More false positives (sanitizers not recognized)
- Slower analysis (must trace all paths)
- Requires language-specific implementation
Tools: CodeQL, Fortify, Checkmarx
Taint Tracking
A specialized form of data flow analysis that specifically tracks "tainted" (untrusted) data.
// Taint tracking example
// Configuration defines:
// - Sources: req.body, req.query, req.params
// - Sinks: eval(), exec(), innerHTML
// - Sanitizers: DOMPurify.sanitize(), escapeHtml()
function handleRequest(req, res) {
const userInput = req.body.content; // TAINTED
// Option A: No sanitization - VULNERABLE
document.innerHTML = userInput; // FLAGGED: XSS
// Option B: With sanitization - SAFE
const clean = DOMPurify.sanitize(userInput); // Taint removed
document.innerHTML = clean; // OK: Input was sanitized
}
Pros:
- Precise tracking of untrusted data
- Understands sanitization functions
- Can trace across file boundaries
Cons:
- Must configure sources, sinks, and sanitizers
- Cannot track through reflection or dynamic code
- Complex interprocedural analysis is slow
Tools: CodeQL, SonarQube (advanced rules)
Control Flow Analysis
Examines the possible execution paths through your code.
def process_payment(user, amount):
# Control flow analysis builds this graph:
#
# ┌──────────────────┐
# │ Start │
# └────────┬─────────┘
# │
# ┌────────▼─────────┐
# │ user.is_admin? │
# └───────┬──┬───────┘
# yes │ │ no
# ┌───────▼┐ │
# │ bypass │ │
# │ check │ │
# └───┬────┘ │
# │ ┌───▼────────┐
# │ │ verify() │
# │ └───┬────────┘
# │ │
# ┌───▼──────▼────────┐
# │ execute_payment() │
# └───────────────────┘
if user.is_admin:
pass # Admin bypass - potential issue!
else:
if not verify_payment(amount):
raise PaymentError("Verification failed")
execute_payment(user, amount) # Reached on both paths
Control flow analysis can detect:
- Dead code (unreachable paths)
- Missing authorization checks on certain paths
- Exception handling gaps
Understanding SAST Limitations
SAST tools are powerful but not perfect. Understanding their limitations helps you use them effectively.
False Positives
A false positive is when the tool reports a vulnerability that is not actually exploitable.
# Common false positive: Framework-provided sanitization
from django.db import connection
def search_users(request):
name = request.GET.get('name')
# Django's parameterized queries are safe, but some tools still flag this
with connection.cursor() as cursor:
cursor.execute(
"SELECT * FROM users WHERE name = %s", # Safe: parameterized
[name]
)
Managing false positives:
- Tune your rules — Adjust sensitivity settings
- Add suppression comments — Mark known false positives
- Configure sanitizers — Tell the tool about your framework
- Review regularly — Some "false positives" are actually true issues
False Negatives
A false negative is when the tool misses a real vulnerability.
# SAST tools struggle with dynamic code
def execute_dynamic(request):
func_name = request.GET.get('action')
# Dynamic function lookup - hard to analyze statically
func = getattr(my_module, func_name)
func() # Potential command injection, but SAST may miss it
# Also hard to detect:
# - Vulnerabilities in third-party libraries (SAST only sees your code)
# - Business logic flaws (tool doesn't understand your requirements)
# - Configuration issues (SAST analyzes code, not config files)
Analysis Boundaries
SAST tools have limits on what they can analyze:
| Can Analyze | Cannot Analyze |
|---|---|
| Source code you provide | Binary dependencies |
| Known frameworks | Custom frameworks |
| Direct function calls | Reflection/dynamic dispatch |
| Compile-time values | Runtime configurations |
| Code paths | Business logic correctness |
Choosing the Right Analysis Depth
Different tools offer different tradeoffs:
| Tool Type | Speed | Accuracy | Setup Effort |
|---|---|---|---|
| Pattern matching (Semgrep) | Fast | Medium | Low |
| Data flow (SonarQube) | Medium | High | Medium |
| Semantic analysis (CodeQL) | Slow | Highest | High |
For most teams, a layered approach works best:
- IDE/Pre-commit — Fast pattern matching (Semgrep)
- CI Pipeline — Comprehensive analysis (SonarQube)
- Weekly/Deep Scans — Semantic analysis (CodeQL) for critical repos
SAST vs. Other Security Testing
SAST is one tool in your security arsenal:
| Type | When | What It Finds | Limitations |
|---|---|---|---|
| SAST | Compile-time | Code vulnerabilities | Misses runtime/config issues |
| DAST | Runtime | Deployment vulnerabilities | Needs running app, slow |
| SCA | Build-time | Vulnerable dependencies | Only known CVEs |
| IAST | Runtime | Runtime code paths | Requires instrumentation |
| Secrets Scanning | Commit-time | Leaked credentials | Only known patterns |
DAST (Dynamic Application Security Testing) tests the running application from the outside—like an attacker would. It catches issues SAST misses (misconfigured servers, deployment issues) but requires a deployed environment.
SCA (Software Composition Analysis) scans your dependencies for known vulnerabilities. SAST only analyzes your code, not the libraries you use.
Integrating SAST Into Your Workflow
For maximum effectiveness, integrate SAST at multiple points:
1. Developer IDE
Catch issues as code is written:
Developer writes code → IDE plugin highlights issue → Immediate fix
Benefits: Fastest feedback, easiest to fix
Challenge: Must keep IDE plugins updated
2. Pre-commit Hooks
Prevent vulnerable code from being committed:
#!/bin/bash
# .git/hooks/pre-commit
semgrep --config=p/security-audit --error .
if [ \$? -ne 0 ]; then
echo "Security issues detected. Please fix before committing."
exit 1
fi
Benefits: Prevents issues from entering codebase
Challenge: Can slow down commits
3. CI/CD Pipeline
Run comprehensive scans on every pull request:
# GitHub Actions example
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run SAST
run: |
sonar-scanner \\
-Dsonar.projectKey=my-project \\
-Dsonar.sources=. \\
-Dsonar.host.url=\${{ secrets.SONAR_URL }}
Benefits: Catches issues before merge, detailed reports
Challenge: Adds time to CI pipeline
4. Scheduled Scans
Run deep analysis periodically:
# Weekly CodeQL scan
on:
schedule:
- cron: '0 3 * * 0' # Every Sunday at 3 AM
Benefits: Catches issues introduced by dependency updates
Challenge: Delayed feedback
Key Takeaways
Before proceeding to specific tools, remember:
- No single tool catches everything — Use SAST alongside DAST, SCA, and manual review
- Tune for your stack — Configure tools to understand your frameworks and sanitizers
- Start with high-confidence rules — Reduce false positives to build developer trust
- Shift left gradually — Add IDE plugins and pre-commit hooks once CI integration is stable
- Treat findings as tech debt — Track, prioritize, and fix systematically
Now let's get hands-on with SonarQube, Semgrep, and CodeQL.
Found an issue?