Private keys accidentally (or intentionally) left lying around in production are one of the most common and dangerous mistakes engineering or development teams can make. This post walks through realistic ways developers and engineers might hide or use private keys and more importantly practical detection techniques you can use to find them. Use this as a defensive reference to build host scans, EDR rules, and repo checks.
How people typically hide private keys (high level)
- Plaintext files with nonstandard names/paths
Keys saved with innocuous names like
config.bin
,data.dat
,backup.bak
,.env.local
,log.old
, or hidden inside dotfolders(.cache/, .local/)
and odd application directories.
- Renamed / extension-changed files
Keys renamed to
.jpg
, .dat
, .log
, or other benign-looking extensions so naïve filename checks miss them.- Embedded in source code / configuration files
Keys placed inline inside JSON, YAML, TOML, JS, Python, or shell files and often split across multiple lines or broken into variables to avoid simple pattern matches.
- Base64 / alternative encodings
Key material encoded as base64 or otherwise encoded and stored without the usual BEGIN/END headers so simple string searchers don’t flag them.
- Stored inside archives or package files
Keys placed inside zip/tar archives, tarballs, or vendor packages; sometimes added into release artifacts or build outputs.
- Packed into binaries / compiled resources
- Hidden in Git history
Keys committed and then “deleted” will still exist in the repository’s object history, tags, or reflogs.
- Stored in environment variables / process memory
Keys placed into environment variables, systemd unit files, or passed on command lines will make them visible in
/proc/*/environ
, process listings, or memory scans.- Alternate filesystems / slack space / ADS (platform-specific)
Keys stored in obscure filesystem locations, slack space, alternate data streams (Windows), or on unexpected mounts to avoid casual inspection.
- Split across many files
A single key is split into multiple small files or fragments (e.g.,
part.aa
, part.ab
) so short-file scans don’t trigger.Detection artifacts & heuristics — what to look for
Below are robust indicators you can code into scanners, EDR rules, and host checks. Combine multiple signals (headers, entropy, file paths) to reduce false positives.
1. Literal PEM / OpenSSH headers
Look for canonical markers:
----BEGIN OPENSSH PRIVATE KEY-----
----BEGIN RSA PRIVATE KEY-----
----BEGIN EC PRIVATE KEY-----
----BEGIN PRIVATE KEY-----
- ASCII marker
openssh-key-v1
(may appear in binary or hex dumps)
2. Long base64-like blocks
Detect long continuous base64 strings that often indicate encoded keys or key blobs. Example pattern:
[A-Za-z0-9+/]{120,}={0,2}
(Adjust threshold to your environment; 120–200 chars is a common starting point.)
3. High-entropy blobs
Private keys (or large encrypted blobs) exhibit high entropy. Measuring Shannon entropy on candidate strings/blobs is a useful second-stage filter once you find base64-like runs.
4. Suspicious filenames & locations (heuristic)
Common names and locations worth scanning:
- Filenames:
id_rsa
,id_ed25519
,private
,server.key
,key.pem
,.ppk
- Paths:
~/.ssh/
,/etc/ssh/
,/root/.ssh/
,/tmp/
,/var/tmp/
, application config directories, build contexts
5. Git history indicators
Even if a file is removed in the latest commit, it might exist in
.git/objects
. Scan history with specialized tools to find committed secrets.6. Process / environment indicators
Look for environment vars or process arguments containing
KEY
, PRIVATE
, SSH
, TOKEN
, or long base64 strings. Scan /proc/*/environ
and unit files for suspicious content (requires appropriate privileges).Common detection commands & scripts
Drop these into host scans, scheduled cron jobs, or EDR playbooks. Tweak paths, thresholds, and allowlists for your environment.
- Find literal PEM / OpenSSH headers:
grep -R --line-number -E "-----BEGIN .*PRIVATE KEY-----|openssh-key-v1" /path/to/scan 2>/dev/null
- Find long base64-like strings (requires
rg
/ ripgrep; usegrep -P
fallback if needed):
rg --no-messages --hidden -U -n '[A-Za-z0-9+/]{200,}={0,2}' /path/to/scan
- Detect
openssh-key-v1
in binaries or text viastrings
:
strings /path/to/file | rg -i "openssh-key-v1" && echo "/path/to/file: OpenSSH signature found"
- Hex-magic check with
xxd
(fast binary sniff):
xxd -l 32 file | grep -i '6f70656e7373682d6b65792d7631' && echo "openssh-key-v1 (hex) found in file" # or simply: xxd -l 32 file | strings | rg -i "BEGIN OPENSSH"
- Check running processes for environment variables that look like keys:
for pid in /proc/[0-9]*; do envf="$pid/environ" [ -r "$envf" ] || continue tr '\0' '\n' < "$envf" | rg -n "PRIVATE|SSH|KEY|TOKEN|SECRET" && echo "pid $pid has suspicious env" done
Git-history scanning
Use purpose-built tools (recommended):
gitleaks detect -s /path/to/repo --report-format json --report-path gitleaks-report.json trufflehog --regex --entropy=True /path/to/repo
These tools will search both current tree and history for secrets.
YARA rule (host/EDR)
A compact YARA rule to flag PEM headers, OpenSSH signature, or long base64 runs:
rule PrivateKeyCandidates { meta: author = "sec-team" purpose = "Detect potential private key material" strings: $pem1 = "-----BEGIN OPENSSH PRIVATE KEY-----" ascii $pem2 = "-----BEGIN RSA PRIVATE KEY-----" ascii $pem3 = "-----BEGIN PRIVATE KEY-----" ascii $openssh = "openssh-key-v1" ascii $long_base64 = /[A-Za-z0-9+\/]{120,}={0,2}/ condition: any of ($pem*) or $openssh or $long_base64 }
Tune
$long_base64
length and combine with entropy checks where your YARA engine supports it.Python program example
Also, here is a standalone Python detector. It scans a directory (or single file), looks for PEM/OpenSSH headers, the
openssh-key-v1
signature, long base64-like runs, and applies a Shannon-entropy check to reduce false positives. Outputs findings as JSON lines for easy ingestion by SIEM/EDR/CI.A part of the script below has been generated using AI. Be careful before using it in your environment
#!/usr/bin/env python3 """ detect_keys.py Detect likely private-key material on disk. Scans files under a given path for: - PEM headers (BEGIN ... PRIVATE KEY) - OpenSSH native signature: 'openssh-key-v1' - Long base64-like runs (configurable threshold) - Entropy on candidate base64 runs (configurable threshold) Outputs JSON lines, each with: path, type(s), evidence, sample, score (entropy or -1). Usage: python3 detect_keys.py /path/to/scan --base64_min 150 --entropy 4.5 --max_size 10485760 Notes: - Tune --base64_min and --entropy for your environment. - Default: skip files > 10 MB to avoid scanning large binaries; adjust with --max_size. - Requires Python 3.8+ (uses pathlib). """ import argparse import json import math import re import sys from pathlib import Path # Regex patterns PEM_HEADER_RE = re.compile(rb"-----BEGIN [A-Z0-9 ]*PRIVATE KEY-----") OPENSSH_SIG_RE = re.compile(rb"openssh-key-v1") # base64-like run (no newlines required); adjust length threshold via CLI BASE64_RUN_RE_TEMPLATE = r"([A-Za-z0-9+/]{{{min_len},}}={0,2})" # Utility functions def shannon_entropy_bytes(bs: bytes) -> float: if not bs: return 0.0 counts = {} for b in bs: counts[b] = counts.get(b, 0) + 1 length = len(bs) ent = 0.0 for cnt in counts.values(): p = cnt / length ent -= p * math.log2(p) return ent def find_base64_runs(bs: bytes, min_len: int): """Return list of (match_bytes, start, end) for base64-like runs.""" pattern = re.compile(BASE64_RUN_RE_TEMPLATE.format(min_len=min_len).encode("ascii")) return [(m.group(1), m.start(1), m.end(1)) for m in pattern.finditer(bs)] def scan_file(path: Path, cfg): """Scan a single file and return findings (list).""" findings = [] try: size = path.stat().st_size if cfg.max_size and size > cfg.max_size: return findings # skip very large files # Read up to cfg.read_limit bytes (to avoid OOM on huge files), # but ensure we cover headers/base64 occurrences near start of file. with path.open("rb") as fh: raw = fh.read(cfg.read_limit) except Exception as e: # unreadable files (permission, device, etc.) - ignore or log externally return findings # 1) PEM header if PEM_HEADER_RE.search(raw): findings.append({ "type": "PEM_HEADER", "desc": "PEM header found (BEGIN ... PRIVATE KEY)", "sample": PEM_HEADER_RE.search(raw).group(0).decode(errors="ignore"), "score": -1, }) # 2) OpenSSH signature if OPENSSH_SIG_RE.search(raw): findings.append({ "type": "OPENSSH_SIGNATURE", "desc": "OpenSSH native signature 'openssh-key-v1' found", "sample": "openssh-key-v1", "score": -1, }) # 3) Long base64 runs (candidate) base64_runs = find_base64_runs(raw, cfg.base64_min) for idx, (match_bytes, s, e) in enumerate(base64_runs): # compute entropy on the decoded bytes (approx: treat match_bytes as raw ASCII) ent = shannon_entropy_bytes(match_bytes) # Entropy threshold often ~4.0-4.8 for meaningful data; CLI controls it if ent >= cfg.entropy_threshold: findings.append({ "type": "BASE64_HIGH_ENTROPY", "desc": f"Long base64-like run ({len(match_bytes)} chars) with entropy {ent:.2f}", "sample": (match_bytes[:200].decode("ascii", errors="ignore") + ("..." if len(match_bytes) > 200 else "")), "score": ent, }) else: # optional: report long base64s that don't meet entropy threshold as low-priority if cfg.report_low_entropy: findings.append({ "type": "BASE64_LOW_ENTROPY", "desc": f"Long base64-like run ({len(match_bytes)} chars) but low entropy {ent:.2f}", "sample": (match_bytes[:200].decode("ascii", errors="ignore") + ("..." if len(match_bytes) > 200 else "")), "score": ent, }) return findings def walk_and_scan(root: Path, cfg): """Walk filesystem and scan files. Yields (path, findings) for matches.""" # optional path whitelist/blacklist logic can be added here files_checked = 0 for p in root.rglob("*"): if p.is_symlink() and not cfg.follow_symlinks: continue if p.is_dir(): continue files_checked += 1 if cfg.verbose and files_checked % 1000 == 0: print(f"[+] scanned {files_checked} files...", file=sys.stderr) # skip some likely-noise filetypes by extension (images, videos) optionally if cfg.skip_exts and p.suffix.lower() in cfg.skip_exts: continue findings = scan_file(p, cfg) if findings: yield p, findings if cfg.verbose: print(f"[+] done scanning {files_checked} files", file=sys.stderr) def parse_args(): ap = argparse.ArgumentParser(description="Detect likely private key material on disk.") ap.add_argument("path", type=str, help="Path (file or directory) to scan.") ap.add_argument("--base64_min", type=int, default=150, help="Minimum length for base64-like runs to consider (default: 150).") ap.add_argument("--entropy", type=float, dest="entropy_threshold", default=4.5, help="Entropy threshold for base64 runs (default: 4.5).") ap.add_argument("--read_limit", type=int, default=262144, # 256 KiB help="How many bytes to read from each file (default: 262144).") ap.add_argument("--max_size", type=int, default=10 * 1024 * 1024, help="Skip files larger than this (bytes). Set 0 for no limit. Default 10MB.") ap.add_argument("--report_low_entropy", action="store_true", help="Also report long base64 runs that fail entropy threshold (lower priority).") ap.add_argument("--follow_symlinks", action="store_true", help="Follow symlinks while walking.") ap.add_argument("--skip_exts", nargs="*", default=[".jpg", ".jpeg", ".png", ".gif", ".mp4", ".mkv", ".iso", ".exe", ".dll"], help="Skip files with these extensions (default list). Use empty string to disable.") ap.add_argument("--verbose", action="store_true", help="Verbose progress to stderr.") ap.add_argument("--jsonl", action="store_true", help="Output JSON lines (default behavior).") return ap.parse_args() def main(): cfg = parse_args() root = Path(cfg.path) if not root.exists(): print(f"Path not found: {root}", file=sys.stderr) sys.exit(2) # prepare compiled regex template (global) global BASE64_RUN_RE_TEMPLATE BASE64_RUN_RE_TEMPLATE = BASE64_RUN_RE_TEMPLATE # already defined; format used dynamically inside function # walk and scan for path, findings in walk_and_scan(root, cfg): out = { "path": str(path), "size": path.stat().st_size, "findings": findings, } if cfg.jsonl: print(json.dumps(out, ensure_ascii=False)) else: # human-friendly print("=" * 80) print(f"FILE: {out['path']} (size={out['size']})") for f in findings: print(f"- {f['type']}: {f['desc']}") sample = f.get("sample") if sample: print(" sample:", sample) score = f.get("score") if score is not None: print(" score:", score) print() if __name__ == "__main__": main()