SAST Fundamentals

Understand how static analysis works, different analysis techniques, and the limitations of SAST tools.

Before diving into specific tools, you need to understand how static analysis works and what it can (and cannot) do. This knowledge helps you configure tools effectively, interpret results accurately, and set realistic expectations.

How Static Analysis Works

Static analysis examines source code without executing it. The analyzer builds a representation of your code—typically an Abstract Syntax Tree (AST)—and then applies rules to detect patterns that indicate vulnerabilities.

The Analysis Pipeline

Source Code → Parsing → AST → Analysis Engine → Rules → Findings
     │           │         │          │            │         │
     ▼           ▼         ▼          ▼            ▼         ▼
  .java,     Lexical    Tree      Data flow,   Security   Vulnerabilities
  .py, .js   tokens     structure  taint,      patterns   with severity
                                   control flow

Step by step:

Parsing — Convert source code into tokens, then build an AST
Semantic Analysis — Resolve types, scopes, and symbol references
Control Flow Analysis — Map possible execution paths
Data Flow Analysis — Track how data moves through the program
Pattern Matching — Apply security rules to find vulnerabilities
Reporting — Generate findings with severity, location, and remediation

Types of Static Analysis

Different tools use different analysis techniques. Understanding these helps you choose the right tool.

Pattern Matching (Syntactic Analysis)

The simplest approach: look for code patterns that match known-bad signatures.

# Rule: Detect hardcoded passwords
# Pattern: variable named "password" assigned a string literal

password = "secretpassword123"  # FLAGGED: Hardcoded credential

api_key = "AKIA1234567890ABCDEF"  # FLAGGED: AWS key pattern

Pros:

Fast execution (simple string/regex matching)
Easy to write custom rules
Low false positive rate for simple patterns

Cons:

Misses complex vulnerabilities
Easy to bypass (variable renaming, string concatenation)
No understanding of data flow

Tools: Semgrep (primarily), grep-based custom scripts

Data Flow Analysis

Tracks how data moves from sources (user input) to sinks (dangerous functions).

# Data flow analysis tracks this:

def process_user(request):
    username = request.get('username')     # SOURCE: User input
    
    # Data flows through transformations...
    processed = username.strip()
    lower_name = processed.lower()
    
    # Eventually reaches a dangerous sink
    query = f"SELECT * FROM users WHERE name = '{lower_name}'"  # SINK: SQL
    cursor.execute(query)  # FLAGGED: Tainted data reaches SQL sink

The analyzer traces the "taint" from request.get() through all transformations to cursor.execute(). If no sanitization occurs, it reports a SQL injection.

Pros:

Catches complex vulnerabilities across function boundaries
Understands when data is sanitized
Fewer false negatives than pattern matching

Cons:

More false positives (sanitizers not recognized)
Slower analysis (must trace all paths)
Requires language-specific implementation

Tools: CodeQL, Fortify, Checkmarx

Taint Tracking

A specialized form of data flow analysis that specifically tracks "tainted" (untrusted) data.

// Taint tracking example

// Configuration defines:
// - Sources: req.body, req.query, req.params
// - Sinks: eval(), exec(), innerHTML
// - Sanitizers: DOMPurify.sanitize(), escapeHtml()

function handleRequest(req, res) {
    const userInput = req.body.content;    // TAINTED
    
    // Option A: No sanitization - VULNERABLE
    document.innerHTML = userInput;         // FLAGGED: XSS
    
    // Option B: With sanitization - SAFE
    const clean = DOMPurify.sanitize(userInput);  // Taint removed
    document.innerHTML = clean;             // OK: Input was sanitized
}

Pros:

Precise tracking of untrusted data
Understands sanitization functions
Can trace across file boundaries

Cons:

Must configure sources, sinks, and sanitizers
Cannot track through reflection or dynamic code
Complex interprocedural analysis is slow

Tools: CodeQL, SonarQube (advanced rules)

Control Flow Analysis

Examines the possible execution paths through your code.

def process_payment(user, amount):
    # Control flow analysis builds this graph:
    #
    #     ┌──────────────────┐
    #     │   Start          │
    #     └────────┬─────────┘
    #              │
    #     ┌────────▼─────────┐
    #     │ user.is_admin?   │
    #     └───────┬──┬───────┘
    #         yes │  │ no
    #     ┌───────▼┐ │
    #     │ bypass │ │
    #     │ check  │ │
    #     └───┬────┘ │
    #         │  ┌───▼────────┐
    #         │  │ verify()   │
    #         │  └───┬────────┘
    #         │      │
    #     ┌───▼──────▼────────┐
    #     │ execute_payment() │
    #     └───────────────────┘
    
    if user.is_admin:
        pass  # Admin bypass - potential issue!
    else:
        if not verify_payment(amount):
            raise PaymentError("Verification failed")
    
    execute_payment(user, amount)  # Reached on both paths

Control flow analysis can detect:

Dead code (unreachable paths)
Missing authorization checks on certain paths
Exception handling gaps

Understanding SAST Limitations

SAST tools are powerful but not perfect. Understanding their limitations helps you use them effectively.

False Positives

A false positive is when the tool reports a vulnerability that is not actually exploitable.

# Common false positive: Framework-provided sanitization

from django.db import connection

def search_users(request):
    name = request.GET.get('name')
    
    # Django's parameterized queries are safe, but some tools still flag this
    with connection.cursor() as cursor:
        cursor.execute(
            "SELECT * FROM users WHERE name = %s",  # Safe: parameterized
            [name]
        )

Managing false positives:

Tune your rules — Adjust sensitivity settings
Add suppression comments — Mark known false positives
Configure sanitizers — Tell the tool about your framework
Review regularly — Some "false positives" are actually true issues

False Negatives

A false negative is when the tool misses a real vulnerability.

# SAST tools struggle with dynamic code

def execute_dynamic(request):
    func_name = request.GET.get('action')
    
    # Dynamic function lookup - hard to analyze statically
    func = getattr(my_module, func_name)
    func()  # Potential command injection, but SAST may miss it

# Also hard to detect:
# - Vulnerabilities in third-party libraries (SAST only sees your code)
# - Business logic flaws (tool doesn't understand your requirements)
# - Configuration issues (SAST analyzes code, not config files)

Analysis Boundaries

SAST tools have limits on what they can analyze:

Can Analyze	Cannot Analyze
Source code you provide	Binary dependencies
Known frameworks	Custom frameworks
Direct function calls	Reflection/dynamic dispatch
Compile-time values	Runtime configurations
Code paths	Business logic correctness

Choosing the Right Analysis Depth

Different tools offer different tradeoffs:

Tool Type	Speed	Accuracy	Setup Effort
Pattern matching (Semgrep)	Fast	Medium	Low
Data flow (SonarQube)	Medium	High	Medium
Semantic analysis (CodeQL)	Slow	Highest	High

For most teams, a layered approach works best:

IDE/Pre-commit — Fast pattern matching (Semgrep)
CI Pipeline — Comprehensive analysis (SonarQube)
Weekly/Deep Scans — Semantic analysis (CodeQL) for critical repos

SAST vs. Other Security Testing

SAST is one tool in your security arsenal:

Type	When	What It Finds	Limitations
SAST	Compile-time	Code vulnerabilities	Misses runtime/config issues
DAST	Runtime	Deployment vulnerabilities	Needs running app, slow
SCA	Build-time	Vulnerable dependencies	Only known CVEs
IAST	Runtime	Runtime code paths	Requires instrumentation
Secrets Scanning	Commit-time	Leaked credentials	Only known patterns

DAST (Dynamic Application Security Testing) tests the running application from the outside—like an attacker would. It catches issues SAST misses (misconfigured servers, deployment issues) but requires a deployed environment.

SCA (Software Composition Analysis) scans your dependencies for known vulnerabilities. SAST only analyzes your code, not the libraries you use.

Integrating SAST Into Your Workflow

For maximum effectiveness, integrate SAST at multiple points:

1. Developer IDE

Catch issues as code is written:

Developer writes code → IDE plugin highlights issue → Immediate fix

Benefits: Fastest feedback, easiest to fix
Challenge: Must keep IDE plugins updated

2. Pre-commit Hooks

Prevent vulnerable code from being committed:

#!/bin/bash
# .git/hooks/pre-commit

semgrep --config=p/security-audit --error .
if [ \$? -ne 0 ]; then
    echo "Security issues detected. Please fix before committing."
    exit 1
fi

Benefits: Prevents issues from entering codebase
Challenge: Can slow down commits

3. CI/CD Pipeline

Run comprehensive scans on every pull request:

# GitHub Actions example
security-scan:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Run SAST
      run: |
        sonar-scanner \\
          -Dsonar.projectKey=my-project \\
          -Dsonar.sources=. \\
          -Dsonar.host.url=\${{ secrets.SONAR_URL }}

Benefits: Catches issues before merge, detailed reports
Challenge: Adds time to CI pipeline

4. Scheduled Scans

Run deep analysis periodically:

# Weekly CodeQL scan
on:
  schedule:
    - cron: '0 3 * * 0'  # Every Sunday at 3 AM

Benefits: Catches issues introduced by dependency updates
Challenge: Delayed feedback

Key Takeaways

Before proceeding to specific tools, remember:

No single tool catches everything — Use SAST alongside DAST, SCA, and manual review
Tune for your stack — Configure tools to understand your frameworks and sanitizers
Start with high-confidence rules — Reduce false positives to build developer trust
Shift left gradually — Add IDE plugins and pre-commit hooks once CI integration is stable
Treat findings as tech debt — Track, prioritize, and fix systematically

Now let's get hands-on with SonarQube, Semgrep, and CodeQL.

Overview

Part of: Static Application Security Testing (SAST)

Updated: 1/24/2025

SAST Fundamentals

How Static Analysis Works

The Analysis Pipeline

Types of Static Analysis

Pattern Matching (Syntactic Analysis)

Data Flow Analysis

Taint Tracking

Control Flow Analysis

Understanding SAST Limitations

False Positives

False Negatives

Analysis Boundaries

Choosing the Right Analysis Depth

SAST vs. Other Security Testing

Integrating SAST Into Your Workflow

1. Developer IDE

2. Pre-commit Hooks

3. CI/CD Pipeline

4. Scheduled Scans

Key Takeaways

Tags

Related Guides

Cryptography Essentials

Pre-commit Hooks for Security

Secure Coding Practices