You reviewed the document. You removed the sensitive paragraph. You sent it.
What you didn’t remove was the 47 minutes of editing time logged by Microsoft Word, the name of the colleague who wrote the first draft, or the network path showing which shared drive it originally came from.
That information is still there. It’s called metadata โ and most files you create carry it.
This isn’t a theoretical risk. In 2019, lawyers representing Paul Manafort โ Donald Trump’s former campaign chairman โ filed a court document in response to Special Counsel Robert Mueller’s team. Sensitive passages were covered with thick black bars. But the underlying text had never been deleted from the file. A reporter highlighted the blacked-out sections, copied them, and pasted them into a blank document. The hidden content โ including details of Manafort’s meetings with a Russian operative โ appeared instantly. The story was front page news within hours.
This guide covers what’s hiding in each major file type, what it can reveal, and how to remove it before you share.
Table of Contents
What is document metadata?
Metadata is data about data โ information embedded in a file that describes how it was created, who created it, and what happened to it over time. It’s generated automatically by your operating system and the software you use, usually without any visible indication that it’s being recorded.
There are two categories relevant to most people:
System metadata โ recorded by your OS and application: author name, creation date, last modified date, file size, software version used.
Embedded metadata โ recorded by the application itself: revision history, comments, tracked changes, GPS coordinates (in photos), editing time, internal network paths.
Most of it is invisible in normal use. You’d have to open specific panels or use dedicated tools to see it. But anyone who knows where to look can often extract it from files you send them โ and many people do.
Word documents (.docx)
Word is the most metadata-rich format in common use. A typical .docx file may contain:
- Author name and company โ pulled from your Windows or Microsoft 365 account settings at the time the document was created
- Last modified by โ the name of the most recent editor
- Revision count โ how many times the document has been saved
- Total editing time โ the cumulative time Word was open with this document active, in minutes
- Creation and modification timestamps
- Track changes โ all edits made while tracking was active, including deleted text, insertions, and who made each change, even if “Accept All” was clicked
- Comments โ including resolved and deleted comments, which are often still recoverable
- Template path โ the internal file path of the document template, which can reveal network share names or folder structures
- Previous author names โ if a document was repurposed from an existing file, the original author’s name may persist
Why it matters: Track changes and comments are the most common source of accidental disclosure. Lawyers, consultants, and contract negotiators frequently exchange Word documents during drafting โ after “Accept All” is clicked, tracked revisions are removed from the visible document. However, other metadata โ including comments, document properties, and editing history โ may still remain unless the file is explicitly cleaned with Document Inspector.
In one documented case, a law firm representing a client in a dispute sent a Word document to opposing counsel with tracked changes visible โ changes that contradicted the client’s stated position. The error was caught, but the damage to the case’s negotiating position was immediate.
How to check: In Word, go to File โ Info โ Check for Issues โ Inspect Document. The Document Inspector will scan for all categories of hidden data and let you remove them selectively.
PDFs
PDF is widely assumed to be a “clean” format โ a final output that strips the messiness of Word. This is partially true, but PDFs still carry meaningful metadata:
- Author, title, subject, keywords โ often auto-populated from the Word document used to create the PDF
- Creator application โ the software used to create or convert the file (e.g., “Microsoft Word 16.0” or the name of a free online PDF converter)
- Creation and modification dates
- XMP metadata โ an extended metadata standard embedded in many PDFs containing more detailed document history
- Annotations and comments โ including those hidden or marked as resolved
- Improperly redacted text โ black boxes drawn over text in PDF editors often leave the underlying text readable in the file’s structure
The redaction problem deserves special attention. In January 2019, lawyers for Paul Manafort filed a response to Special Counsel Robert Mueller’s team in a federal court. The document contained thick black bars over sensitive passages โ descriptions of Manafort’s contacts with a Russian operative and details about the Trump campaign. But the legal team had only drawn black boxes over the text; they hadn’t deleted the underlying content from the file. Any reader could highlight the blacked-out sections, copy them, and paste them into a new document to read everything. Within hours of the filing becoming public, reporters had extracted and published the hidden text.
This type of error is surprisingly common. Drawing a black rectangle over text or changing its background color in a word processor does not reliably delete the text from the final PDF output โ it often only covers it visually. Proper redaction requires dedicated tools that permanently scrub the underlying content from the file structure, not just obscure it. If you collect patient or client data and want to avoid PDF metadata risks entirely, converting your PDF into an online form keeps submissions in a structured, controlled environment from the start.
How to check: In Adobe Acrobat, go to Tools โ Redact โ Sanitize Document to permanently remove all metadata. For a free option, printing to PDF (File โ Print โ Save as PDF) strips most metadata but not all โ verify with a metadata viewer before sharing sensitive documents.
Excel spreadsheets (.xlsx)
Excel spreadsheets carry metadata similar to Word, but with additional risks specific to the format:
- Author and company name
- Revision history and editing time
- Comments and notes โ including those not visible in the current view
- Hidden rows and columns โ data that’s been hidden using Excel’s hide function is still present in the file and fully accessible to anyone who unhides it
- Hidden sheets โ entire worksheets can be hidden but remain in the file
- Named ranges and formulas โ can expose internal data structures or calculation logic not intended to be shared
- External links โ references to other files that can reveal internal network paths or cloud storage structures
Hidden rows and sheets are a particularly common source of accidental disclosure. A consultant preparing a client-facing pricing model might hide the cost and margin rows before sending โ but those rows are still in the file. Unhiding them takes two clicks.
In competitive procurement processes, suppliers occasionally receive Excel-based RFP templates that, when inspected, contain hidden sheets with the buyer’s internal scoring criteria or target price ranges โ information that was never meant to leave the buyer’s organization.
How to check: In Excel, go to File โ Info โ Check for Issues โ Inspect Workbook. Pay particular attention to hidden rows, columns, and sheets.
Images (JPG, PNG, HEIC)
Photos carry EXIF metadata โ a standardized format for recording technical and contextual information about an image:
- GPS coordinates โ latitude, longitude, and sometimes altitude, accurate to within a few meters on modern smartphones
- Timestamp โ the exact date and time the photo was taken
- Device information โ make, model, and sometimes serial number of the camera or phone
- Camera settings โ aperture, shutter speed, ISO, focal length
- Software โ editing software used and version number
GPS coordinates are the most consequential for most people. A photo of a document taken at home and sent via email carries your home address embedded in the file. A photo taken at a confidential client meeting reveals where that meeting occurred. In healthcare settings, photos of patient documents are subject to HIPAA โ see what HIPAA actually requires from the tools you use to collect patient data.
The risk is documented and real. In 2012, John McAfee โ then a fugitive from Belizean authorities โ was located in Guatemala after a photo published online by a journalist accompanying him retained GPS coordinates in its EXIF data. The coordinates pinpointed his location to within meters. (McAfee himself later claimed the GPS data had been deliberately falsified to mislead authorities, though he was nonetheless detained and subsequently deported.)
Consumer Reports researchers have documented the same risk in a more everyday context: sellers on resale platforms who photograph items at home are routinely embedding their home address in the listing photos, visible to any buyer who extracts the EXIF data.
How to check: On Windows, right-click any image โ Properties โ Details tab. On Mac, open in Preview โ Tools โ Show Inspector โ GPS tab. Many smartphones also strip GPS data when photos are shared through certain apps โ but not all, and not consistently.
How to remove: On Windows, right-click โ Properties โ Details โ “Remove Properties and Personal Information.” On Mac, use Preview’s Export function with metadata stripping enabled. For bulk processing, tools like ExifTool (command line) or a client-side browser tool can strip EXIF from multiple files at once.
PowerPoint presentations (.pptx)
PowerPoint presentations carry similar metadata to Word and Excel, with a few format-specific additions:
- Author and company name
- Revision history and editing time
- Comments โ including those added during review and marked as resolved
- Hidden slides โ slides set to hidden are still present in the file and can be unhidden
- Speaker notes โ notes added to slides for presenter use, which may contain internal talking points, objections anticipated, or pricing guidance not intended for the audience
- Embedded files and objects โ PowerPoint files can contain embedded Excel spreadsheets, Word documents, or other files that carry their own metadata
Speaker notes are the most commonly overlooked disclosure risk in presentations. A sales deck sent to a prospect as a PDF may strip most metadata โ but a .pptx file sent directly retains every note added by every presenter, including strategic talking points and information about the prospect that was gathered during sales research.
How to check: In PowerPoint, go to File โ Info โ Check for Issues โ Inspect Presentation.
How to check and remove metadata before sharing
The quickest method for Office documents (Word, Excel, PowerPoint)
Microsoft’s built-in Document Inspector covers all major metadata categories:
- Open the file
- Go to File โ Info โ Check for Issues โ Inspect Document (or Inspect Workbook / Inspect Presentation)
- Select the categories you want to scan
- Click Inspect, then Remove All for any categories you want to clear
Run the inspector on a copy of your file โ some removals can’t be undone, and you’ll want to keep the original with its full history for internal records.
For PDFs
- Adobe Acrobat Pro: Tools โ Redact โ Sanitize Document โ this permanently removes all metadata and hidden content
- Free alternative: Print to PDF (File โ Print โ Microsoft Print to PDF or macOS PDF) strips most metadata, but verify with a viewer before sharing
For images
- Windows: Right-click โ Properties โ Details โ “Remove Properties and Personal Information”
- Mac: Photos app โ Image โ Export โ uncheck location data
- Any platform: A client-side browser tool that strips EXIF data locally โ no upload required
Using a metadata viewer to verify
Before sending any sensitive document, it’s worth checking what metadata remains after cleaning. Several client-side tools can read metadata directly in your browser without uploading your file:
- For Office documents: open a metadata viewer that processes the file locally in your browser
- For PDFs: PDF metadata viewers that run client-side
- For images: EXIF viewers that work offline
The key criterion: verify that the tool processes your file locally, not on a remote server. If you’re checking a sensitive document for metadata, you don’t want to upload it to an unknown service to do so โ that would introduce a new risk while trying to eliminate an existing one.
The upload problem
Here’s the irony of metadata removal: many people search for “remove metadata from PDF online,” upload their document to a free web tool, and get back a clean file. The metadata is gone โ but the document just traveled to a server run by a company they’ve never heard of.
For documents where metadata is a genuine privacy concern โ legal contracts, financial models, medical records, internal presentations โ the act of uploading to an unvetted tool may be a larger risk than the metadata itself.
The same logic applies to any file processing: the safest tool is one that never receives your file in the first place. Client-side tools that run entirely in your browser โ processing files in local memory without any server upload โ eliminate this tradeoff entirely. And if the goal is to collect data rather than process a document, online forms with password protection keep submissions in a controlled environment without any file changing hands at all.
If you regularly handle sensitive documents and want a browser-based tool that processes files locally, PlatoForms PDF Toolbox handles core PDF operations โ merge, split, compress, reorder, password protect and remove โ without files ever leaving your device. For organizations that also collect sensitive data through online forms, our Trust Center covers the full security architecture, including encryption standards and compliance certifications.
Summary: What’s hiding where
| File type | Most common hidden data | Highest risk |
|---|---|---|
| Word (.docx) | Track changes, author name, editing time, comments | Deleted text still recoverable |
| Author, creator app, improperly redacted text | Black-box “redaction” leaves text intact | |
| Excel (.xlsx) | Hidden rows/sheets, comments, external links | Hidden pricing or margin data |
| Images (JPG, HEIC) | GPS coordinates, device model, timestamp | Home address in listing photos |
| PowerPoint (.pptx) | Speaker notes, hidden slides, embedded files | Internal talking points in sales decks |
The pattern across all formats is the same: metadata is generated automatically, invisibly, and continuously. The burden of removing it is entirely on the person sharing the file โ and most tools make it easy enough that there’s no reason not to check before you send.
Before you send any document: assume it contains more than you can see โ and verify before you share.
References
- Kaspersky, How ephemeral metadata may cause real problems, kaspersky.com/blog/office-documents-metadata/14215/
- CPO Magazine, Over Half of Fortune 500 Companies Are Leaving Sensitive Information Open to Reconnaissance via Document Metadata, cpomagazine.com
- Microsoft Support, Remove hidden data and personal information by inspecting documents, support.microsoft.com
- BigHand, The Importance of Metadata in the Legal Industry, bighand.com
- Columbia Journalism Review, Thank you to everyone who can’t redact documents properly, cjr.org/analysis/manafort-mueller-redacted-document-ukraine.php
- Wikipedia, Exif, en.wikipedia.org/wiki/Exif
- ISACA, What to Know About EXIF Data: A More Subtle Cybersecurity Risk, isaca.org, 2025
- Consumer Reports, How a Photo’s Hidden ‘Exif’ Data Exposes Your Personal Information, consumerreports.org
Regularly handling sensitive documents? PlatoForms PDF Toolbox processes files entirely in your browser โ no uploads, no server, no account required.
If you collect sensitive data through online forms rather than documents, read 5 Types of Files You Should Never Process Online โ and see how our Trust Center covers the security architecture behind PlatoForms’ form platform.