Railway•15mo ago

Scrapy does not save images in Django during the crawl

Project ID: 499333f9-ca0b-4491-b644-2c6eeb389b5c Actually, I have a Scrappy implementation on my Django app to get some job post data across the internet. I was testing the crawling on production but noticed the spider was not saving the images in the volume. I tried to upload an image via admin manually and I see all it works fine. Also, I tried locally and all works fine there. So, I have some questions: 1. Is the volume available during the deployment process? 2. Can Scrappy download images from the service during the deploy? scrapper/pipeline.py

class RecolectorPipeline:
    @sync_to_async
    def save_image(self, object_field, url):
        if not object_field:
            name = f"{urlparse(url).path.split('/')[-1]}.jpg"
            content = requests.get(url).content

            img_temp = NamedTemporaryFile(delete=True)
            img_temp.write(content)
            img_temp.flush()

            object_field.save(name, File(img_temp), save=True)
    ...

class RecolectorPipeline:
    @sync_to_async
    def save_image(self, object_field, url):
        if not object_field:
            name = f"{urlparse(url).path.split('/')[-1]}.jpg"
            content = requests.get(url).content

            img_temp = NamedTemporaryFile(delete=True)
            img_temp.write(content)
            img_temp.flush()

            object_field.save(name, File(img_temp), save=True)
    ...

Notice it's calling object_field.save(...) (in this case is an ImageField from the model) to save the images using the FileSystemStorage from Django. settings.py

MEDIA_ROOT = str(BASE_DIR / "media")

MEDIA_ROOT = str(BASE_DIR / "media")

nixpack.toml

...
[phases.build]
cmds = [
    'npm --prefix frontend/ run generate',
    'python manage.py collectstatic --no-input',
    'python manage.py migrate',
    'python manage.py crawl'
]

[start]
cmd = 'gunicorn backend.config.wsgi'

...
[phases.build]
cmds = [
    'npm --prefix frontend/ run generate',
    'python manage.py collectstatic --no-input',
    'python manage.py migrate',
    'python manage.py crawl'
]

[start]
cmd = 'gunicorn backend.config.wsgi'

Please notice in the attachment images what happened in each case

Solution:

1. that is not the same path you set in MEDIA_ROOT the correct mount point would be /app/media 2. if the crawl script downloads media to the volume, you would need to run the crawl script during deployment and not during build`...

Jump to solution

11 Replies