Document
What is a Document?
A Document is the primary abstraction within the Platform used to represent a file that has been uploaded for processing, either via the web interface or through API requests. Documents are the outcome of the Platform's internal processing pipeline, containing the results of various processing operations, such as detected text, layout information, and extracted data of interest.
Document Pages
A Document consists of one or more Pages, depending on the type and number of pages in the original file. For instance, a Document uploaded as an image or a single-page PDF will contain a single Page. On the other hand, a Document originating from a multi-page PDF will contain one Page for each individual page in the file.
In the myBiros Platform, inference is performed at the Page level. Our system processes the uploaded file, and for each page, it generates a Document Page containing the following information:
- General information, including the page index relative to the original file.
- Layout information, such as the page's width, height, and orientation.
- The text detected on the page through our Optical Characters Recognition (OCR) pipeline.
- Extracted data of interest, identified by our document understanding Deep Learning models.
Note
To learn more about how the data of interest extracted from a Document Page is represented, see Entities.
Document Status
A Document also contains important and relevant information about user interactions with it, as well as details regarding the execution of the processing operation itself.
Processing Status
The processing status provides information about the progress of the Document's processing operation. In other words, it indicates whether the operation is ongoing, has been completed successfully, or encountered an issue during execution.
In particular the Document's status can assume one of three distinct values:
RUNNING
: the document is either awaiting processing or the processing operation has not yet been completed, indicating that the document is currently being processed.ACTIVE
: the processing operation is complete and the document has been processed correctly.ERROR
: the processing operation is complete however an error occurred while processing the document.
Revision Status
The revision status delineates instead the document's status resulting from a user-initiated revision operation, carried out either through the web interface or through the dedicated API endpoint.
A Document's revision status can assume one of four distinct values, depending on if and which operations where performed on it:
not_ready
the document is currently undergoing processing, so the user has not yet been able to interact with it.pending
: the document is ready but the user has not yet performed any action on it.to_review
: the user has marked the document for review.review_required
: the document has been marked for review through an API call.approved
: the user has approved the results.not_approved
: the user has marked the document for discard.
About the usage of review_required
status
The review_required
revision status can be set only through API calls and should be preferred in those cases where
the user needs to differentiate the case in which a document is marked for review because, for example, it has been rejected by a
downstream process operating on the results of the inference APIs, from the case in which the document is marked for review
by a human operator utilizing the web interface.