What Is Text Diffing?
Text diffing (differencing) is the process of comparing two versions of text to identify what has changed between them. The term comes from the Unix diff command, which has been a fundamental developer tool since 1974.
Text diff is used in version control systems (git diff), code review tools, wiki pages, document collaboration, and any context where tracking changes over time matters.
How Diff Algorithms Work
The Longest Common Subsequence (LCS)
The foundation of most diff algorithms is finding the Longest Common Subsequence (LCS) - the longest sequence of elements that appear in both texts in the same relative order (but not necessarily contiguous).
For two strings "ABCDE" and "ACBDE":
- LCS is "ABDE" (length 4)
- The diff shows: A is same, C is inserted, B is same, C is deleted, D and E are same
Myers Diff Algorithm
The Myers diff algorithm (1986), used by Git and many other tools, efficiently finds the shortest edit script (minimum number of insertions and deletions) to transform one text into another. It runs in O(ND) time where N is text length and D is the number of differences.
Word-Level vs. Line-Level vs. Character-Level
Different granularities serve different purposes:
Line-level diff: Standard for source code (shows entire modified lines). Best for code review.
Word-level diff: Shows which specific words changed within a line. Better for prose editing.
Character-level diff: Shows exact characters that changed. Best for detecting typos or small modifications.
Unified Diff Format
The standard unified diff format (used by git diff):
--- a/original.txt
+++ b/modified.txt
@@ -10,7 +10,8 @@
Context line (unchanged)
Context line (unchanged)
-Deleted line
+New replacement line
+Another added line
Context line (unchanged)
Line indicators:
(space): Unchanged context line-: Removed in new version+: Added in new version@@: Hunk header showing line numbers
Three-Way Merge
When merging changes from two branches, a three-way merge compares:
- The original base version
- Version A (your changes)
- Version B (their changes)
Changes are classified as:
- Non-conflicting: Only one side modified the text - automatically merged
- Conflicting: Both sides modified the same area - requires manual resolution
Git marks conflicts like: ``` <<<<<<< HEAD Your version of the code
Their version of the code
feature-branch
## Practical Applications
### Code Review
Before merging a pull request, developers review the diff to understand what changed, why, and whether there are any issues.
### Documentation
Wikis and document management systems (Confluence, Google Docs revision history) show diffs between versions.
### Configuration Management
Infrastructure teams diff configuration files before deploying changes to production.
### Legal and Academic
Contract amendments and paper revisions track exactly what changed between versions.
## Using Diff Tools
**Command line:**
```bash
diff original.txt modified.txt
diff -u original.txt modified.txt # Unified format
git diff HEAD~1 # Changes since last commit
git diff main feature-branch # Between branches
Code editors: VS Code, Sublime, IntelliJ all have built-in diff viewers.
Online tools: For quick comparisons without installing software.
Using This Tool
Paste two versions of text in the left and right panels. The tool highlights additions (green), deletions (red), and unchanged text with configurable comparison modes (characters, words, or lines).
-> Try the Text Diff Tool