New upload schema
- Secured direct S3 file upload
- Resume failed file upload
Now all files are uploading to S3 into a temporary bucket for each user's item. Also, we use unique paths and file names. We decided to split files into a smaller parts before upload and run uploads to S3 in parallel threads, each part is signed by our server so it can't be catched and replaced by someone else.
Now when a user selects a file to upload to S3, the following occurs:
Evaporate JS uploads a file into a temporary bucket directly
During the upload process, it makes a request to our servers only for signing each part of the file. This is done in order to avoid cases when somebody can catch part of the uploaded file and replace it.
After uploading files into the temporary bucket we started to process the files in the following order:
- Check for viruses:
If a virus was detected, we stop processing the file, move it into a quarantine bucket (it clears automatically after 4 days) and notify the user about this event.
- Generate preview files
- Collect information about the file (size, checksum, type of file, etc.)
During this process, the user sees an upload progress bar. And does not see a frozen 100% upload as it was with old upload system.
- User fills in required text fields and clicks on submit button.
- Process push job of funneling item into a queue:
When the user wants to create several items, we organize all the processes into a queue. Once the first item is submitted for creation/update, the user is able to select the next one and doesn't have to wait until the create/update process is finished for the first item. Once the item is created/updated, we display a message to inform the user.
- In the background, worker process takes jobs from the queue and performs the next job in parallel threads:
- Save item details into database
- Move uploaded files from temporary bucket into regular.
In a regular S3 bucket, we used MD5 from the file content as the file name. It gives us an opportunity to know when the file was changed and then store the whole file on S3 along with the file upload history. Now we give users a new feature - opportunity to download and edit older versions of downloaded file.
- In case the background saving process has failed (network issue, Amazon issue, etc.), the queue process always restarts failed tasks a couple of times.