emop.lib package

Submodules

emop.lib.emop_api module

class emop.lib.emop_api.EmopAPI(url_base, api_headers)[source]

Bases: object

get_request(url_path, params={})[source]

Sends a GET request

Send a GET request with optional params to the emop-dashboard.

Args:
url_path (str): The API URL path excluding hostname. params (dict, optional): Params to send with GET request.
Returns:
dict: The request’s JSON response data.
put_request(url_path, data={})[source]

Sends a PUT request

Send a PUT request to the emop-dashboard.

Args:
url_path (str): The API URL path excluding hostname. data (dict, optional): The data to send with PUT request.
Returns:
dict: The request’s JSON response data.

emop.lib.emop_base module

class emop.lib.emop_base.EmopBase(config_path)[source]

Bases: object

static add_prefix(prefix, path)[source]

Add prefix to a path

Args:
prefix (str): Path prefix path (str): Path to add prefix
Returns:
str: The prefix + path. None is returned if prefix or path are not present.
static remove_prefix(prefix, path)[source]

Remove prefix from a path

Args:
prefix (str): Path prefix path (str): Path to remove prefix from
Returns:
str: The path with prefix removed. None is returned if prefix or path are not present.
static run_timing(func)[source]

Decorator used to time a function

This decorator is intended to be used by functions in EmopRun. The functions should return only bool values.

The purpose of this decorator is to print the time it takes a function to run.

emop.lib.emop_job module

class emop.lib.emop_job.EmopJob(job_data, settings, scheduler)[source]

Bases: object

add_filename_suffix(file, suffix)[source]

Add filename suffix

This function adds a suffix to a filename before the extension

Example:
add_filename_suffix(‘5.xml’, ‘IDHMC’) 5.xml -> 5_IDHMC.xml
Args:
file (str): File name to add suffix suffix (str): The suffix to add
Returns:
str: The filename with suffix added before extension
get_output_dir(batch_id, work_id)[source]

Provide the job output directory

Format is the following:
/<config.ini output_path_prefix><config.ini ocr_root>/<batch ID>/<work ID>
Example:
/dh/data/shared/text-xml/IDHMC-OCR/<batch.id>/<work.id>
Returns:
str: Output directory path
output_file(fmt)[source]

Provide the job output file name

Format is the following:
<output_dir>/<page.number>.<fmt>
Example:
<output_dir>/<page.number>.<fmt>
Args:
fmt (str): File format (extension) for file path
Returns:
str: Output file path
parse_data(data)[source]

emop.lib.emop_payload module

class emop.lib.emop_payload.EmopPayload(settings, proc_id)[source]

Bases: object

completed_output_exists()[source]
file_exists(filename)[source]
input_exists()[source]
load(filename)[source]
load_input()[source]
output_exists()[source]
save(data, dirname, filename, overwrite=False)[source]
save_completed_output(data, overwrite=False)[source]
save_input(data)[source]
save_output(data, overwrite=False)[source]
save_uploaded_output(data)[source]
uploaded_output_exists()[source]

emop.lib.emop_scheduler module

class emop.lib.emop_scheduler.EmopScheduler(settings)[source]

Bases: object

current_job_count()[source]
get_job_id()[source]

Get the scheduler’s job ID

Loops over the class’ jobid_env_vars attribute to find the current job ID.

Returns:
int: The job ID of the current scheduler job.
get_name()[source]

Get the scheduler’s name

The return value is pulled from the class name attribute.

Returns:
str: The scheduler’s name
classmethod get_scheduler_instance(name, settings)[source]

Get the scheduler instance

Based on the value of the name argument, an instance of that scheduler class is returned.

The logic in the function is dynamic so that only the supported_schedulers dict in EmopScheduler needs to be updated to add additional scheduler support.

Args:
name (str): Name of scheduler to get instance of. settings (EmopSettings): Instance of EmopSettings.
Returns:
object: Instance of an EmopScheduler sub-class.
is_job_environment()[source]

Test if currently in a valid scheduler job environment.

The class attribute jobid_env_vars contains a list of the valid job ID environment variables for a scheduler.

The current environment is tested to see if it contains a valid job ID environment variable.

Returns:
bool: True if the current environment contains an environment
variable from the class attribute jobid_env_vars list. False is returned if none are found.
jobid_env_vars = []

The support JOB ID environment variables must be defined in a child class.

name = ''

The name of the scheduler must be defined in a child class.

submit_job(proc_id, num_pages)[source]
supported_schedulers = {'slurm': {'class': 'EmopSLURM', 'module': 'emop.lib.schedulers.emop_slurm'}}

Define attributes of supported schedulers. To add a new scheduler this dict must be updated.

walltime(num_pages)[source]

Determine walltime used for submitting job

This function determines the appropriate walltime to use when submitting a job.

The optimal walltime is determined by using avg_page_runtime * N * num_pages, where N is either 400%, 200% or 150%. The first optimal walltime to be less than the max_job_runtime is used.

Args:
num_pages (int): Number of pages to be run
Returns:
int: A walltime value in seconds.

emop.lib.emop_settings module

class emop.lib.emop_settings.EmopSettings(config_path)[source]

Bases: object

get_bool_value(section, option, default=None)[source]

Get settings bool value

This function is a warper for RawConfigParser.getboolean() that handles missing values and substitutes them for defaults set in a dict within global space of EmopSettings.

Args:
section (str): INI file section option (str): INI file option name default (str): Default value if one is not found. Defaults to None.
Returns:
bool: The config value
get_value(section, option, default=None)[source]

Get settings value

This function is a warper for ConfigParser.get() that handles missing values and substitutes them for defaults set in a dict within global space of EmopSettings.

Interpolation is performed on specific items found in %() within the INI file.

home - HOME environment variable emop_home - The emop_home setting value

Args:
section (str): INI file section option (str): INI file option name default (str): Default value if one is not found. Defaults to None.
Returns:
str: The config value

emop.lib.emop_stdlib module

class emop.lib.emop_stdlib.EmopEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]

Bases: json.encoder.JSONEncoder

default(obj)[source]
class emop.lib.emop_stdlib.EmopStdlib[source]

Bases: object

static to_JSON(obj)[source]

emop.lib.utilities module

emop.lib.utilities.exec_cmd(cmd, log_level='info', timeout=-1, realtime=False)[source]

Executes a command

This is the method used by this application to execute shell commands.

If the cmd argument can be a 2D list but only one level deep.

The command’s stdout, stderr and exitcode are turned as a namedtuple.

Args:
cmd (str or list): Command to execute log_level (str, optional): log level when printing information about the command being executed. timeout (int, optional): The time in seconds the command should be allowed to run before timing out.
Returns:
tuple: (stdout, stderr, exitcode)
emop.lib.utilities.get_max_rss()[source]

Returns max RSS soft ulimit in bytes

emop.lib.utilities.get_temp_dir()[source]

Gets the temp directory based on environment variables

Currently searched environment variables:
  • TMPDIR
Returns:
str: Path to temp directory.
emop.lib.utilities.mkdirs_exists_ok(path)[source]

Wrapper for os.makedirs

This function is needed to avoid race conditions when the directory exists when attempting to use os.makedirs. This emulates the behavior of Python 3.x os.makedirs exist_ok argument.

Args:
path (str): Path of directory to create
emop.lib.utilities.recursive_copy(src, dest, ignore=None, exclude=[])[source]

Recursive file copy

This method is currently not used, but left incase would be needed in the future.

Module contents