mirror of
https://github.com/imputnet/cobalt.git
synced 2026-03-10 15:52:58 -05:00
[Pinterest] Downloaded pin image has low resolution #838
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @iambtshft on GitHub (May 28, 2025).
Brief
When I try to download image from Pinterest, returned result sometimes has low resolution. Example (
https://www.pinterest.com/pin/70437489341156/).Technical analysis
Here's the parser line
4b9644ebdf/api/src/processing/services/pinterest.js (L34-L38)The service takes the first picture with proper extension matched by Regex. However, for the specified example picture the first picture is not of best quality, see output
Potential solution
Option 1 - Lookup for better resolution
I'm not expert in how Pinterest structures the data, but from names looks like it's possible to get image identifier part from first image
7c/0a/1c/7c0a1c5f1c999a4a67f3c5b847da093c.jpgand lookup for better image with the same id but better resolution{vvv}xOption 2 - Parse images from json
When I was investigating page content I found that besides images provided as
src=<something>there's a json structured pin data. It has much more information, such as original image URL (that is not present insrc=<>pattern)Not sure again if such data is available for every pin, but it looks like a more robust solution while src parsing could be used as fallback
reproduction steps
https://www.pinterest.com/pin/70437489341156/Actual result: Image has low quality
Expected result: Image has the same quality as on pinterest page.
screenshots
links
platform information
additional context
@agvantibo-again commented on GitHub (Jul 10, 2025):
+1, reproduced accidentally with https://pinterest.com/pin/333618284916219545
Downloaded image was 236x236, original image is 736x736
@potatolover68 commented on GitHub (Aug 12, 2025):
After further digging(testing on this), it seems that on every image there's a script tag named "PWS_INITIAL_PROPS" that has a list of image sizes, including the original.
https://regex101.com/r/IAmYqE/1
@potatolover68 commented on GitHub (Aug 13, 2025):
After even more further digging(testing on this), when you're not signed in, you can use the following regex:
to match all the image URLs.
pitfalls
This is time-sensitive, so it's best to run when the page is just loaded in; otherwise, it can't differentiate between the endless scroll content and the main content.
Note that the first image in the list is always the main content; perhaps this could be used to filter the list