Docs / Supported Formats

Supported Formats

SAR Portal supports a wide range of document formats for upload, AI analysis, and redaction — over 28 file types across 6 categories.

Fully Supported Formats

These formats have full support for upload, AI analysis, and visual redaction:

PDF Documents

AspectSupport
Extension.pdf
Text extractionFull
OCR (scanned PDFs)Yes
Visual redactionYes
Searchable outputYes
Max size50 MB
Max pages~500

Best for: Reports, contracts, correspondence, exported records

Microsoft Word

AspectSupport
Formats.docx, .dotx, .docm, .dotm
Text extractionFull
Visual redactionYes
Formatting preservedYes
Headers/footersRedacted
Tables & commentsRedacted
Track changesAccepted and redacted
MetadataStripped on redaction
Max size50 MB

Best for: Letters, policies, contracts, memos, templates

Legacy .doc format
Legacy .doc files can be uploaded and stored but require conversion to .docx for full redaction support. We recommend converting to .docx before uploading.

Microsoft Excel

AspectSupport
Formats.xlsx, .xlsm, .xltx, .xltm
Text extractionFull
Cell-level redactionYes
Formulas preservedYes (in unredacted cells)
Multiple worksheetsYes
Cell commentsRedacted
MetadataStripped on redaction
Max size50 MB

Best for: Data exports, spreadsheets, reports, logs, templates

Legacy .xls format
Legacy .xls files can be uploaded and stored but require conversion to .xlsx for full redaction support. We recommend converting to .xlsx before uploading.

Images

FormatExtensionOCR SupportRedaction
PNG.pngYesYes
JPEG.jpg, .jpegYesYes
GIF.gifYesYes
BMP.bmpYesYes
TIFF.tiff, .tifYesYes
WebP.webpYesYes

Best for: Scanned documents, screenshots, photographs, ID verification

Email Files

AspectSupport
Formats.eml, .msg
Text extractionFull (headers + body)
Header redactionYes (To, From, CC, Subject)
Body redactionYes
Attachment handlingExtracted and processed separately
Output formatRedacted PDF
Max size50 MB

Best for: Correspondence, DSAR communications, email trails, Outlook exports

MSG auto-conversion
Outlook .msg files are automatically converted to .eml format during upload for processing. No manual conversion needed.

Text-Based Formats

These formats support text extraction and text-based redaction. The redacted output is generated as a formatted PDF:

FormatExtensionStructure PreservedRedaction
Plain text.txtN/AText replacement
CSV.csvRow/column structureValue replacement
Log files.logN/AText replacement
Markdown.mdN/AText replacement
JSON.jsonJSON structureValue replacement
XML.xmlXML structureValue replacement
HTML.htmlN/AText replacement
CSS.cssN/AText replacement
JavaScript.jsN/AText replacement

Best for: System exports, configuration files, data dumps, log files, API responses

Structured format handling
JSON, XML, and CSV files receive format-aware redaction that preserves document structure while replacing PII values. Other text formats use standard text replacement.

Complete Format Reference

CategoryExtensionsCountAI AnalysisVisual Redaction
PDF.pdf1YesYes
Word.docx, .dotx, .docm, .dotm4YesYes
Excel.xlsx, .xlsm, .xltx, .xltm4YesYes
Images.png, .jpg, .jpeg, .gif, .bmp, .tiff, .tif, .webp8Yes (OCR)Yes
Email.eml, .msg2YesYes (PDF output)
Text.txt, .csv, .log, .md, .json, .xml, .html, .css, .js9YesText replacement
Total28+

Format-Specific Considerations

PDF Files

Searchable PDFs (text-based)

Scanned PDFs (image-based)

Tips for best results:

Word Documents

Compatibility

What’s Extracted and Redacted

Excel Files

Processing

Redaction

Images

OCR Processing

Best Practices

Email Files

EML Processing

MSG Processing

Text-Based Formats

Format-Aware Redaction

Encoding

Unsupported Formats

The following formats are not currently supported for redaction:

FormatReasonWorkaround
Password-protected filesCannot access contentRemove password, then upload
Encrypted documentsCannot decryptDecrypt first, then upload
.pst (Outlook archives)Archive formatExport individual emails as .msg or .eml
Database files (.mdb, .sqlite)Binary formatExport to CSV or Excel
Compressed archives (.zip, .rar)Container formatExtract contents, upload individual files
Video/audio filesNot applicableExtract transcripts as text
PowerPoint (.pptx)Not yet supportedExport to PDF

File Size Limits

PlanPer-File LimitTotal Storage
Basic50 MB5 GB
Starter50 MB50 GB
Pro50 MB200 GB

Handling Large Files

If a file exceeds limits:

  1. Split into smaller sections
  2. Compress images (maintain quality)
  3. Remove unnecessary content
  4. Contact support for special cases

Upload Validation

All uploads are validated:

Security Checks

Rejection Reasons

“Invalid file type”

“File too large”

Recommendations by Use Case

Access Requests (Article 15)

Erasure Requests (Article 17)

Redaction Priority