Document Caching Mechanism
Filez Document Platform can support online preview and online editing of files in business systems. Files are stored in the business system, and every time online preview or online editing is performed, Filez Document Platform needs to obtain files from the business system. After obtaining files from the third party each time, the files are converted into internal data structures to complete document preview or editing. These data structures stored on the Document Platform server can be considered as a certain type of cache for documents. If the file is previewed or edited again next time, the Document Platform will directly use the cached data to speed up preview and editing opening performance.
How does the Document Platform decide whether to use internal data (cache) during online editing and preview?
The Document Platform judges based on two attributes of the file. One is the file's docId, which is the docId in the request URL when requesting online editing or preview. The other is the file's last modification time in the business system, which is the modified_at in the meta information passed from the business system to the Document Platform during online editing or preview. If a file's docId + modified_at corresponding data has a cache in the platform when online editing or previewing, the platform will use the related cache.
Is there a mechanism to clear the cache?
For files that are only previewed online, the platform server will periodically delete these internal data. For files that are edited online, since online editing will generate some online-only data (collaboration records, per-person authorization, etc.), the internal data related to online editing files will not be automatically cleared.
The modified_at field returned by the meta API is very important. Incorrect return results will not only cause the entire system efficiency to decline, but also bring strange file conflict issues. modified_at will not only appear in the meta API's returned JSON, but also in the post content API (file save) returned JSON.
modified_at represents the last modification time of this file in the third-party system. For example, if a user uploads a file at 10:00, its modified_at should be 10:00. If the user uploads a new version of this file at 10:30, its modified_at should be 10:30. If the user edits online and saves at 10:40, the platform server returns this file to the third-party system. At this time, its modified_at should be 10:40.
After the platform server obtains the file from the third party each time, it will convert the file into an internal data structure. The conversion task is a CPU-intensive task. To improve efficiency, if the platform server finds that the file in the third-party system has not uploaded a new version since the last online editing, the platform server will not fetch the file from the third-party system.
The platform server judges whether the file has changed based on the modified_at field returned by the third-party system's meta API. If the modified_at value is the same as the value returned last time, it means the file has not changed, and the file content will not be fetched again.
- If the business system uses a database to manage files, you can record the last save time of this file in the database. If the business system directly uses the file system to manage this file, you can get the last modification time of this file from the file system.
- Sometimes, for convenience of implementation, the business system uses the current time value as the value of the
modified_atfield. This is wrong. It will not only cause the entire system efficiency to decline, but also bring strange file conflict issues. The situation where a file content conflict dialog appears is: when a user edits online, they find that the file has not been saved to the third-party system, and at the same time, the user/other users upload a new version of this file in the third-party system. At this time, the online editing content is not saved, and there is a potential conflict with the newly uploaded file. However, if themodified_atvalue is always the current time, when there are changes on the page and the user refreshes the page, the zOffice editor will perform asynchronous saving while responding to the editing request. The save is not completed and the user enters editing again. If themodified_atin the meta obtained from the third party during online editing is the latest time, this means to the platform server that the third-party system has a new version. And since the current save is not completed, the platform server will think there is a conflict. - When the user saves the document, the platform server will send a post content API to return the saved file to the third party. For this API, the JSON that the third party needs to return should be consistent with the JSON format returned by the meta API. At this time, the
modified_atvalue in the JSON returned by the post content API should be consistent with themodified_atvalue in the meta API returned by the third party during the next editing. For example, the user saved the file at 10:31, and themodified_atreturned by the third-party system's response to the post content API is "2023-03-22T10:31:38.000Z". Two minutes later at 10:33, another user edits this file online (during this period, no user uploaded a new version of this file), and themodified_atvalue returned by the third-party system's response to the meta API is also "2023-03-22T10:31:38.000Z". As long as the third-party system correctly records the last modification time of the file when implementing, it can correctly returnmodified_at. - The
modified_atfield returned by the meta API must be in a time format recognized by JavaScript (for example: a time format that conforms to the ISO-8601 standard). You can use the following method to judge. Open Chrome's dev tools, enter Console. Enter:
var a = new Date("2020-03-25T02:57:38.000Z");
console.log(a);
The following two formats ("2020-03-25T02:57:38.000Z") and ("2020-03-25T02:57:38-08:00") are both correct, but ("2006-01-02T15:04:05Z07:00") and (2020-3-25T02:57:38.000Z) are incorrect.