Scrapy does not save images in Django during the crawl

Project ID: 499333f9-ca0b-4491-b644-2c6eeb389b5c Actually, I have a Scrappy implementation on my Django app to get some job post data across the internet. I was testing the crawling on production but noticed the spider was not saving the images in the volume. I tried to upload an image via admin manually and I see all it works fine. Also, I tried locally and all works fine there. So, I have some questions: 1. Is the volume available during the deployment process? 2. Can Scrappy download images from the service during the deploy? scrapper/pipeline.py
class RecolectorPipeline:
@sync_to_async
def save_image(self, object_field, url):
if not object_field:
name = f"{urlparse(url).path.split('/')[-1]}.jpg"
content = requests.get(url).content

img_temp = NamedTemporaryFile(delete=True)
img_temp.write(content)
img_temp.flush()

object_field.save(name, File(img_temp), save=True)
...
class RecolectorPipeline:
@sync_to_async
def save_image(self, object_field, url):
if not object_field:
name = f"{urlparse(url).path.split('/')[-1]}.jpg"
content = requests.get(url).content

img_temp = NamedTemporaryFile(delete=True)
img_temp.write(content)
img_temp.flush()

object_field.save(name, File(img_temp), save=True)
...
Notice it's calling object_field.save(...) (in this case is an ImageField from the model) to save the images using the FileSystemStorage from Django. settings.py
MEDIA_ROOT = str(BASE_DIR / "media")
MEDIA_ROOT = str(BASE_DIR / "media")
nixpack.toml
...
[phases.build]
cmds = [
'npm --prefix frontend/ run generate',
'python manage.py collectstatic --no-input',
'python manage.py migrate',
'python manage.py crawl'
]

[start]
cmd = 'gunicorn backend.config.wsgi'
...
[phases.build]
cmds = [
'npm --prefix frontend/ run generate',
'python manage.py collectstatic --no-input',
'python manage.py migrate',
'python manage.py crawl'
]

[start]
cmd = 'gunicorn backend.config.wsgi'
Please notice in the attachment images what happened in each case
Solution:
1. that is not the same path you set in MEDIA_ROOT the correct mount point would be /app/media 2. if the crawl script downloads media to the volume, you would need to run the crawl script during deployment and not during build`...
Jump to solution
11 Replies
Percy
Percy11mo ago
Project ID: 499333f9-ca0b-4491-b644-2c6eeb389b5c
Brody
Brody11mo ago
Is the volume available during the deployment process?
do you mean to ask "Is the volume available during the build stage?" ?
wafflefitoi
wafflefitoi11mo ago
Yeah
Brody
Brody11mo ago
what is the mount point of your volume
Brody
Brody11mo ago
also, no the volume is not available during build
Brody
Brody11mo ago
if it was, there would be a /data mount since that is where i mounted my volume to
wafflefitoi
wafflefitoi11mo ago
Solution
Brody
Brody11mo ago
1. that is not the same path you set in MEDIA_ROOT the correct mount point would be /app/media 2. if the crawl script downloads media to the volume, you would need to run the crawl script during deployment and not during build`
wafflefitoi
wafflefitoi11mo ago
ok, I will try these changes
wafflefitoi
wafflefitoi11mo ago
It works, thanks
Brody
Brody11mo ago
awesome!