[GH-ISSUE #7211] How to get forward method of model #66635

Closed
opened 2026-05-04 07:39:58 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @VijayRajIITP on GitHub (Oct 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7211

What is the issue?

I am using ollama
ollama pull llama3
llm=Ollama(model='llama3')
base_url="http://localhost:11434/v1"
l want to work with foraward method of model how to get that .Is it poosible

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

No response

Originally created by @VijayRajIITP on GitHub (Oct 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7211 ### What is the issue? I am using ollama ollama pull llama3 llm=Ollama(model='llama3') base_url="http://localhost:11434/v1" l want to work with foraward method of model how to get that .Is it poosible ### OS Linux ### GPU Nvidia ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the question label 2026-05-04 07:39:58 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 15, 2024):

What is forward method?

<!-- gh-comment-id:2414694836 --> @rick-github commented on GitHub (Oct 15, 2024): What is `forward method`?
Author
Owner

@VijayRajIITP commented on GitHub (Oct 15, 2024):

function that computes the output of the model given an input. It encapsulates the entire forward pass through the layers of the model, processing the input data and producing predictions or embeddings.
Encoding and decoding

<!-- gh-comment-id:2414759621 --> @VijayRajIITP commented on GitHub (Oct 15, 2024): function that computes the output of the model given an input. It encapsulates the entire forward pass through the layers of the model, processing the input data and producing predictions or embeddings. Encoding and decoding
Author
Owner

@rick-github commented on GitHub (Oct 15, 2024):

Are you referring to logprobs? https://github.com/ollama/ollama/issues/2415

<!-- gh-comment-id:2414852509 --> @rick-github commented on GitHub (Oct 15, 2024): Are you referring to logprobs? https://github.com/ollama/ollama/issues/2415
Author
Owner

@VijayRajIITP commented on GitHub (Oct 18, 2024):

from langchain.from langchain.llms import Ollama
llm=Ollama(model='llama3')
The response going through rest api using post method , can we acess server side code as it is local host , I want to see the internal working how my input is going to lma model and all the internal detail of model in .py itself

<!-- gh-comment-id:2421410315 --> @VijayRajIITP commented on GitHub (Oct 18, 2024): from langchain.from langchain.llms import Ollama llm=Ollama(model='llama3') The response going through rest api using post method , can we acess server side code as it is local host , I want to see the internal working how my input is going to lma model and all the internal detail of model in .py itself
Author
Owner

@rick-github commented on GitHub (Oct 18, 2024):

You can use strace to see the internal detail:

strace python3 my_ollama_script.py
<!-- gh-comment-id:2422180579 --> @rick-github commented on GitHub (Oct 18, 2024): You can use `strace` to see the internal detail: ``` strace python3 my_ollama_script.py ```
Author
Owner

@VijayRajIITP commented on GitHub (Oct 21, 2024):

I’m trying to understand how Ollama manages tokenization and detokenization when interacting with APIs. I used debug tools to inspect the process, but the responses received from the API are still in text form. I’m interested in the internal workings that occur after a response is processed through the API, such as how tokenization and detokenization are handled.

I also explored the Ollama installation directory, which I set up using the command:

curl -fsSL https://ollama.com/install.sh | sh

However, I couldn’t find any files related to the model there. This raises the question of how the model is loaded, how it processes the input, and how the output is post-processed.

<!-- gh-comment-id:2426491452 --> @VijayRajIITP commented on GitHub (Oct 21, 2024): I’m trying to understand how Ollama manages tokenization and detokenization when interacting with APIs. I used debug tools to inspect the process, but the responses received from the API are still in text form. I’m interested in the internal workings that occur after a response is processed through the API, such as how tokenization and detokenization are handled. I also explored the Ollama installation directory, which I set up using the command: ```bash curl -fsSL https://ollama.com/install.sh | sh ``` However, I couldn’t find any files related to the model there. This raises the question of how the model is loaded, how it processes the input, and how the output is post-processed.
Author
Owner

@rick-github commented on GitHub (Oct 21, 2024):

ollama is a wrapper for llama.cpp, you can find the details about the internal workings there.

<!-- gh-comment-id:2426725095 --> @rick-github commented on GitHub (Oct 21, 2024): ollama is a wrapper for [llama.cpp](https://github.com/ggerganov/llama.cpp), you can find the details about the internal workings there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66635