emop package¶
Subpackages¶
- emop.lib package
- Subpackages
- emop.lib.models package
- emop.lib.processes package
- Submodules
- emop.lib.processes.denoise module
- emop.lib.processes.juxta_compare module
- emop.lib.processes.multi_column_skew module
- emop.lib.processes.page_corrector module
- emop.lib.processes.page_evaluator module
- emop.lib.processes.processes_base module
- emop.lib.processes.retas_compare module
- emop.lib.processes.tesseract module
- emop.lib.processes.xml_to_text module
- Module contents
- emop.lib.schedulers package
- emop.lib.transfer package
- Submodules
- emop.lib.emop_api module
- emop.lib.emop_base module
- emop.lib.emop_job module
- emop.lib.emop_payload module
- emop.lib.emop_scheduler module
- emop.lib.emop_settings module
- emop.lib.emop_stdlib module
- emop.lib.utilities module
- Module contents
- Subpackages
Submodules¶
emop.emop_query module¶
- class emop.emop_query.EmopQuery(config_path)[source]¶
Bases: emop.lib.emop_base.EmopBase
- pending_pages(q_filter, r_filter=None)[source]¶
Query pending pages
The q_filter would be in form of ‘{“batch_id”: 6}’, for example.
The r_filter would be in form of ‘page.pg_image_path,pg_ground_truth_file’ where each period denotes how far in the returned data to filter. So the key page containing the key pg_image path would be returned.
Currently r_filter only supports 2 levels deep.
- Args:
- q_filter (dict): Query filter passed to EmopDashboard API r_filter (str): Results filter used to filter returned results.
- Returns:
- list: List of pending pages. Each element is a dict.
emop.emop_run module¶
- class emop.emop_run.EmopRun(config_path, proc_id)[source]¶
Bases: emop.lib.emop_base.EmopBase
- append_result(job, results, failed=False)[source]¶
Append a page’s results to job’s results payload
The results are saved to the output JSON file so that the status of each page is saved upon failure or success.
- Args:
- job (EmopJob): EmopJob object results (str): The error output of a particular process failed (bool, optional): Sets if the result is a failure
- do_postprocesses(job)[source]¶
Run the post processes
Each post process class is called from here.
- Currently the steps are executed in the following order:
- Denoise
- MultiColumnSkew
- XML_To_Text
- PageEvaluator
- PageCorrector
- JuxtaCompare (postprocess)
- JuxtaCompare - COMMENTED OUT
- RetasCompare (postprocess)
- RetasCompare - COMMENTED OUT
If any step fails, the function terminates and returns False.
- Args:
- job (EmopJob): EmopJob object
- Returns:
- bool: True if successful, False otherwise.
emop.emop_submit module¶
- class emop.emop_submit.EmopSubmit(config_path)[source]¶
Bases: emop.lib.emop_base.EmopBase
- optimize_submit(page_count, running_job_count, sim=False)[source]¶
Determine optimal job submission
This function attempts to determine the best number of jobs and how many pages per job should be submitted to the scheduler.
This function does not return a value but sets the num_jobs and pages_per_job attributes.
- Args:
- page_count (int): Number of pages needing to be processed running_job_count (int): Number of active jobs
- Returns:
- list: First value is number of jobs and second value
- is number of pages per job.