PROBLEM
Users could not upload all their products to the marketplace because most of them were files that are larger than 200Mb or were video files. Both were not supported for upload; there was a size limitation for digital files and video files were not supported at all.
GOALS AND CHALLENGES
HOW WISE ENGINEERING HELPED
Direct upload of files to Amazon S3 was designed and successfully applied in the existing service (marketplace) within a shorter period of time than the upload system that was used before: half a month compared to 2 months. Moreover, it also enhanced security for downloaded files.
RESULTS
BACKGROUND
Existing service allowed users to upload files and attach them to products. However, this service puts a solid limit on the size of files that are selected for upload. As a result, it was not possible to upload a file that is larger than 200 Mb. Also, it worked in a way that when a file upload failed (due to the network error), the user was forced to start the entire process all over again. A major problem is that many users have files that are larger than 200Mb. Consequently, there was a lot of requests to upload such types of files. Besides, the NGINX module which was used for file upload was very old and unsupportable.
IMPLEMENTATION PROCESS
Old upload schema
Previously, when the user uploaded a file, we used an old NGINX module. This NGINX module simply read all the content of the file and stored it on our server in a temporary directory with a temporary file name. The following problems emerged:
Our upload progress bar displayed the upload process only.
We could not add a progress bar for virus check, consequently users just saw a stalled progress bar while we checked the file for viruses. Also, we generated preview files, which were a reduced version of the original user's files; because of this, users could not monitor the progression of the upload process.
After upload, we stored all files on our own data servers.
It required for us more attention and resources to maintain them. Moreover, it made our codebase more complicated since we had to have some extra fields in the database in table item properties; we called it mask of servers, which contains information where a file for a specified item was stored. It increased complexity when we needed to select all items which stored files on specific data servers only.
It required for us more attention and resources to maintain them. Moreover, it made our codebase more complicated since we had to have some extra fields in the database in table item properties; we called it mask of servers, which contains information where a file for a specified item was stored. It increased complexity when we needed to select all items which stored files on specific data servers only.
SOLUTION
New upload schema
We decided to upload files directly to S3 and no longer store them on our servers. Also, we've chosen to use a 3rd party JavaScript library to upload files to S3 directly instead of creating our own library. We selected https://github.com/TTLabs/EvaporateJS. This library has already implemented and tested features for all known web browsers:
Now all files are uploading to S3 into a temporary bucket for each user's item. Also, we use unique paths and file names. We decided to split files into a smaller parts before upload and run uploads to S3 in parallel threads, each part is signed by our server so it can't be catched and replaced by someone else.
Now when a user selects a file to upload to S3, the following occurs:
Evaporate JS uploads a file into a temporary bucket directly
During the upload process, it makes a request to our servers only for signing each part of the file. This is done in order to avoid cases when somebody can catch part of the uploaded file and replace it.
After uploading files into the temporary bucket we started to process the files in the following order:
RESULTS
The new upload process was designed and successfully applied. We gave our users an opportunity to upload files up to 500Mb. For premium users, we've added a new feature - opportunity to upload files up to 1 Gb. Also, we decided to allow upload of video files as a new item. This upload process was successfully applied in the existing service (marketplace) within a shorter period of time than the upload system that was used before: half a month compared to 2 months. Using the S3 server also makes it easier to maintain, and takes less time and effort to maintain, compared to the hardware servers that were used before. Moreover, it also reinforced security for downloaded files.