emop package

Submodules

emop.emop_query module

class emop.emop_query.EmopQuery(config_path)[source]

Bases: emop.lib.emop_base.EmopBase

get_runtimes()[source]
pending_pages(q_filter, r_filter=None)[source]

Query pending pages

The q_filter would be in form of ‘{“batch_id”: 6}’, for example.

The r_filter would be in form of ‘page.pg_image_path,pg_ground_truth_file’ where each period denotes how far in the returned data to filter. So the key page containing the key pg_image path would be returned.

Currently r_filter only supports 2 levels deep.

Args:
q_filter (dict): Query filter passed to EmopDashboard API r_filter (str): Results filter used to filter returned results.
Returns:
list: List of pending pages. Each element is a dict.
pending_pages_count(q_filter)[source]

Query number of pending pages

The q_filter would be in form of ‘{“batch_id”: 6}’, for example.

Args:
q_filter (dict): Query filter passed to EmopDashboard API
Returns:
int: Number of pending pages

emop.emop_run module

class emop.emop_run.EmopRun(config_path, proc_id)[source]

Bases: emop.lib.emop_base.EmopBase

append_result(job, results, failed=False)[source]

Append a page’s results to job’s results payload

The results are saved to the output JSON file so that the status of each page is saved upon failure or success.

Args:
job (EmopJob): EmopJob object results (str): The error output of a particular process failed (bool, optional): Sets if the result is a failure
do_job(*args, **kwargs)[source]
do_ocr(*args, **kwargs)[source]
do_postprocesses(job)[source]

Run the post processes

Each post process class is called from here.

Currently the steps are executed in the following order:
  • Denoise
  • MultiColumnSkew
  • XML_To_Text
  • PageEvaluator
  • PageCorrector
  • JuxtaCompare (postprocess)
  • JuxtaCompare - COMMENTED OUT
  • RetasCompare (postprocess)
  • RetasCompare - COMMENTED OUT

If any step fails, the function terminates and returns False.

Args:
job (EmopJob): EmopJob object
Returns:
bool: True if successful, False otherwise.
do_process(*args, **kwargs)[source]
do_training(*args, **kwargs)[source]
get_results()[source]

Get this object’s results

Returns:
dict: Results to be used as payload to API
run(*args, **kwargs)[source]
emop.emop_run.signal_exit(signum, frame)[source]

Signal handler

This function will mark all non-completed jobs as failed and exit. This is intended to catch SIGUSR1 signals that indicate a job is nearing its time limit.

emop.emop_submit module

class emop.emop_submit.EmopSubmit(config_path)[source]

Bases: emop.lib.emop_base.EmopBase

optimize_submit(page_count, running_job_count, sim=False)[source]

Determine optimal job submission

This function attempts to determine the best number of jobs and how many pages per job should be submitted to the scheduler.

This function does not return a value but sets the num_jobs and pages_per_job attributes.

Args:
page_count (int): Number of pages needing to be processed running_job_count (int): Number of active jobs
Returns:
list: First value is number of jobs and second value
is number of pages per job.
reserve(num_pages, r_filter)[source]

Reserve pages for a job

Reserve page(s) for work by sending PUT request to dashboard API.

Returns:
str: The reserved work’s proc_id.
set_job_id(proc_id, job_id)[source]

Sends JobID back to dashboard

emop.emop_upload module

class emop.emop_upload.EmopUpload(config_path)[source]

Bases: emop.lib.emop_base.EmopBase

upload(data)[source]
upload_dir(dirname)[source]
upload_file(filename)[source]
upload_proc_id(proc_id)[source]

Module contents