[GH-ISSUE #14482] 0.17.2 can't read PDFs on macOS #55906

Closed
opened 2026-04-29 09:55:32 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @xmddmx on GitHub (Feb 27, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14482

What is the issue?

Tried to read the attached PDF file using several models, and all claim the PDF is damaged.

gemma3:27b: Invalid File Format The system tried to open the file as a PDF but failed because: It lacks the required end-of-file marker (%%EOF), which is a standard component of PDFs.

qwen3.5:35b: 1. File Status: Unreadable
Error Message: failed to create PDF reader: not a PDF file: missing %%EOF

gpt-oss:20b : It looks like the file you attached isn’t being recognized as a valid PDF—there’s no proper PDF header/footer (the missing %%EOF marker). Because of that I can’t extract any text or metadata from it.

Onkyo.pdf

Relevant log output

nothing relevant showing up in server.log or app.log

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.17.2

Originally created by @xmddmx on GitHub (Feb 27, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14482 ### What is the issue? Tried to read the attached PDF file using several models, and all claim the PDF is damaged. gemma3:27b: Invalid File Format The system tried to open the file as a PDF but failed because: It lacks the required end-of-file marker (%%EOF), which is a standard component of PDFs. qwen3.5:35b: 1. File Status: Unreadable Error Message: failed to create PDF reader: not a PDF file: missing %%EOF gpt-oss:20b : It looks like the file you attached isn’t being recognized as a valid PDF—there’s no proper PDF header/footer (the missing %%EOF marker). Because of that I can’t extract any text or metadata from it. [Onkyo.pdf](https://github.com/user-attachments/files/25590784/Onkyo.pdf) ### Relevant log output ```shell nothing relevant showing up in server.log or app.log ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.17.2
GiteaMirror added the bug label 2026-04-29 09:55:32 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

Models don't understand PDF files, what happens when you upload a PDF to a model is that the client extracts the text from the file and sends that to the model. In this case, the client has a PDF extractor that is throwing an error because the PDF is badly formed, and the error message is being sent to the model.

Looking at the contents of the file, we see that there is what looks like a piece of HTML appended to the file after the %%EOF marker:

trailer
<</Size 9719/Root 4935 0 R/Info 9717 0 R/ID[<0527547B0E58A8B1F9640678CBC18361><867A241D7C81DB44888F7C5E6EFF6A8C>]>>
startxref
4102427
%%EOF
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML>
   <HEAD>
      <TITLE>
         A Small Hello
      </TITLE>
   </HEAD>
<BODY>
   <H1>Hi, I'm super! Thanks for asking!</H1>
   <P>Experiencing Errors, Please Hold On!</P>
</BODY>
</HTML>

So while the PDF extractor in the ollama client could be more resilient, this PDF is malformed. You can fix this by removing this trailing junk, either by loading it into a PDF editor and saving it, or just truncating the file by removing the last 261 bytes:

cp Onkyo.pdf Onkyo-fixed.pdf 
truncate -s -261 Onkyo-fixed.pdf
<!-- gh-comment-id:3972781363 --> @rick-github commented on GitHub (Feb 27, 2026): Models don't understand PDF files, what happens when you upload a PDF to a model is that the client extracts the text from the file and sends that to the model. In this case, the client has a PDF extractor that is throwing an error because the PDF is badly formed, and the error message is being sent to the model. Looking at the contents of the file, we see that there is what looks like a piece of HTML appended to the file after the `%%EOF` marker: ``` trailer <</Size 9719/Root 4935 0 R/Info 9717 0 R/ID[<0527547B0E58A8B1F9640678CBC18361><867A241D7C81DB44888F7C5E6EFF6A8C>]>> startxref 4102427 %%EOF <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML> <HEAD> <TITLE> A Small Hello </TITLE> </HEAD> <BODY> <H1>Hi, I'm super! Thanks for asking!</H1> <P>Experiencing Errors, Please Hold On!</P> </BODY> </HTML> ``` So while the PDF extractor in the ollama client could be more resilient, this PDF is malformed. You can fix this by removing this trailing junk, either by loading it into a PDF editor and saving it, or just truncating the file by removing the last 261 bytes: ```console cp Onkyo.pdf Onkyo-fixed.pdf truncate -s -261 Onkyo-fixed.pdf ```
Author
Owner

@xmddmx commented on GitHub (Feb 27, 2026):

Oh! I had no idea that was a nonstandard PDF file. Consider this a non issue.

<!-- gh-comment-id:3973336367 --> @xmddmx commented on GitHub (Feb 27, 2026): Oh! I had no idea that was a nonstandard PDF file. Consider this a non issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55906