Is an agentic approach necessary in this context?
Fig-Priv vs. current SOTA models: Hover over the images to zoom in!
Original Image
: Raw photo, no processing is applied.
Original Image
Gemini 2.5 Output
: Masking is completely unaligned.
Gemini 2.5 Output
GPT-4o Output
: Masks only a few small regions, masking is also unaligned, unable to identify all text.
GPT-4o Output
GPTo3 Output
: Thought for 2m 28s. Text-only answer returned, failed to identify the text.
GPTo3 Output
MistralOCR Output
: Full OCR transcript shown as plain text, without spatial localization.
MistralOCR Output
Full Fine-Grained (ours)
: Our fine-grained method masks *every* field with tight boxes.
Full Fine-Grained (ours)
⭐ Fig-Priv (ours)
: Fig-Priv removes only highest-risk sections, preserving non-identifiable content.
⭐ Fig-Priv (ours)
Original Image
: Raw photo, no processing is applied.
Original Image
Gemini 2.5 Output
: Masking is completely unaligned.
Gemini 2.5 Output
GPT-4o Output
: Unable to identify the text.
GPT-4o Output
GPTo3 Output
: Thought for 4m 52s. Result in blanket-masking, yet some text is still visible.
GPTo3 Output
MistralOCR Output
: Unable to identify the text.
MistralOCR Output
Full Fine-Grained (ours)
: Our fine-grained method masks *every* field with tight boxes.
Full Fine-Grained (ours)
⭐ Fig-Priv (ours)
: Fig-Priv removes only highest-risk sections, preserving non-identifiable content.
⭐ Fig-Priv (ours)
Original Image
: Raw photo, no processing is applied.
Original Image
Gemini 2.5 Output
: Masking is completely unaligned.
Gemini 2.5 Output
GPT-4o Output
: Unable to identify the text.
GPT-4o Output
GPTo3 Output
: Thought for 3m 58s. Result in fine-grained masking, but larger masks.
GPTo3 Output
MistralOCR Output
: Unable to identify the text.
MistralOCR Output
Full Fine-Grained (ours)
: Our fine-grained method masks *every* field with tight boxes.
Full Fine-Grained (ours)
⭐ Fig-Priv (ours)
: Fig-Priv preserves all the image, as PII content is not present.
⭐ Fig-Priv (ours)
Original Image
: Raw photo, no processing is applied.
Original Image
Gemini 2.5 Output
: Masking is completely unaligned.
Gemini 2.5 Output
GPT-4o Output
: Unable to identify the text.
GPT-4o Output
GPTo3 Output
: Thought for 7m 22s. Result in fine-grained masks, but some PII text is visible.
GPTo3 Output
MistralOCR Output
: Unable to identify the text.
MistralOCR Output
Full Fine-Grained (ours)
: Our fine-grained method masks *every* field with tight boxes.
Full Fine-Grained (ours)
⭐ Fig-Priv (ours)
: Fig-Priv removes only highest-risk sections, preserving non-identifiable content.
⭐ Fig-Priv (ours)
Original Image
: Raw photo, no processing is applied.
Original Image
Gemini 2.5 Output
: Masking is completely unaligned.
Gemini 2.5 Output
GPT-4o Output
: Unable to identify the text.
GPT-4o Output
GPTo3 Output
: Thought for 3m 48s. Result in blanket-masking.
GPTo3 Output
MistralOCR Output
: Enable to identify some text, but most text extraction is innacurate.
MistralOCR Output
Full Fine-Grained (ours)
: Our fine-grained method masks *every* field with tight boxes.
Full Fine-Grained (ours)
⭐ Fig-Priv (ours)
: Fig-Priv removes only highest-risk sections, preserving non-identifiable content.
⭐ Fig-Priv (ours)
Original Image
: Raw photo, no processing is applied.
Original Image
Gemini 2.5 Output
: Masking is completely unaligned.
Gemini 2.5 Output
GPT-4o Output
: Unable to identify the text.
GPT-4o Output
GPTo3 Output
: Thought for 3m 3s. Result in blanket-masking.
GPTo3 Output
MistralOCR Output
: Text extraction is innacurate.
MistralOCR Output
Full Fine-Grained (ours)
: Our fine-grained method masks *every* field with tight boxes.
Full Fine-Grained (ours)
⭐ Fig-Priv (ours)
: Fig-Priv removes only highest-risk sections, preserving non-identifiable content.
⭐ Fig-Priv (ours)