emop.lib package¶
Subpackages¶
- emop.lib.models package
- emop.lib.processes package
- Submodules
- emop.lib.processes.denoise module
- emop.lib.processes.juxta_compare module
- emop.lib.processes.multi_column_skew module
- emop.lib.processes.page_corrector module
- emop.lib.processes.page_evaluator module
- emop.lib.processes.processes_base module
- emop.lib.processes.retas_compare module
- emop.lib.processes.tesseract module
- emop.lib.processes.xml_to_text module
- Module contents
- emop.lib.schedulers package
- emop.lib.transfer package
Submodules¶
emop.lib.emop_api module¶
- class emop.lib.emop_api.EmopAPI(url_base, api_headers)[source]¶
Bases: object
emop.lib.emop_base module¶
- class emop.lib.emop_base.EmopBase(config_path)[source]¶
Bases: object
- static add_prefix(prefix, path)[source]¶
Add prefix to a path
- Args:
- prefix (str): Path prefix path (str): Path to add prefix
- Returns:
- str: The prefix + path. None is returned if prefix or path are not present.
emop.lib.emop_job module¶
- class emop.lib.emop_job.EmopJob(job_data, settings, scheduler)[source]¶
Bases: object
- add_filename_suffix(file, suffix)[source]¶
Add filename suffix
This function adds a suffix to a filename before the extension
- Example:
- add_filename_suffix(‘5.xml’, ‘IDHMC’) 5.xml -> 5_IDHMC.xml
- Args:
- file (str): File name to add suffix suffix (str): The suffix to add
- Returns:
- str: The filename with suffix added before extension
- get_output_dir(batch_id, work_id)[source]¶
Provide the job output directory
- Format is the following:
- /<config.ini output_path_prefix><config.ini ocr_root>/<batch ID>/<work ID>
- Example:
- /dh/data/shared/text-xml/IDHMC-OCR/<batch.id>/<work.id>
- Returns:
- str: Output directory path
emop.lib.emop_payload module¶
emop.lib.emop_scheduler module¶
- class emop.lib.emop_scheduler.EmopScheduler(settings)[source]¶
Bases: object
- get_job_id()[source]¶
Get the scheduler’s job ID
Loops over the class’ jobid_env_vars attribute to find the current job ID.
- Returns:
- int: The job ID of the current scheduler job.
- get_name()[source]¶
Get the scheduler’s name
The return value is pulled from the class name attribute.
- Returns:
- str: The scheduler’s name
- classmethod get_scheduler_instance(name, settings)[source]¶
Get the scheduler instance
Based on the value of the name argument, an instance of that scheduler class is returned.
The logic in the function is dynamic so that only the supported_schedulers dict in EmopScheduler needs to be updated to add additional scheduler support.
- Args:
- name (str): Name of scheduler to get instance of. settings (EmopSettings): Instance of EmopSettings.
- Returns:
- object: Instance of an EmopScheduler sub-class.
- is_job_environment()[source]¶
Test if currently in a valid scheduler job environment.
The class attribute jobid_env_vars contains a list of the valid job ID environment variables for a scheduler.
The current environment is tested to see if it contains a valid job ID environment variable.
- Returns:
- bool: True if the current environment contains an environment
- variable from the class attribute jobid_env_vars list. False is returned if none are found.
- jobid_env_vars = []¶
The support JOB ID environment variables must be defined in a child class.
- name = ''¶
The name of the scheduler must be defined in a child class.
- supported_schedulers = {'slurm': {'class': 'EmopSLURM', 'module': 'emop.lib.schedulers.emop_slurm'}}¶
Define attributes of supported schedulers. To add a new scheduler this dict must be updated.
- walltime(num_pages)[source]¶
Determine walltime used for submitting job
This function determines the appropriate walltime to use when submitting a job.
The optimal walltime is determined by using avg_page_runtime * N * num_pages, where N is either 400%, 200% or 150%. The first optimal walltime to be less than the max_job_runtime is used.
- Args:
- num_pages (int): Number of pages to be run
- Returns:
- int: A walltime value in seconds.
emop.lib.emop_settings module¶
- class emop.lib.emop_settings.EmopSettings(config_path)[source]¶
Bases: object
- get_bool_value(section, option, default=None)[source]¶
Get settings bool value
This function is a warper for RawConfigParser.getboolean() that handles missing values and substitutes them for defaults set in a dict within global space of EmopSettings.
- Args:
- section (str): INI file section option (str): INI file option name default (str): Default value if one is not found. Defaults to None.
- Returns:
- bool: The config value
- get_value(section, option, default=None)[source]¶
Get settings value
This function is a warper for ConfigParser.get() that handles missing values and substitutes them for defaults set in a dict within global space of EmopSettings.
Interpolation is performed on specific items found in %() within the INI file.
home - HOME environment variable emop_home - The emop_home setting value
- Args:
- section (str): INI file section option (str): INI file option name default (str): Default value if one is not found. Defaults to None.
- Returns:
- str: The config value
emop.lib.emop_stdlib module¶
emop.lib.utilities module¶
- emop.lib.utilities.exec_cmd(cmd, log_level='info', timeout=-1, realtime=False)[source]¶
Executes a command
This is the method used by this application to execute shell commands.
If the cmd argument can be a 2D list but only one level deep.
The command’s stdout, stderr and exitcode are turned as a namedtuple.
- Args:
- cmd (str or list): Command to execute log_level (str, optional): log level when printing information about the command being executed. timeout (int, optional): The time in seconds the command should be allowed to run before timing out.
- Returns:
- tuple: (stdout, stderr, exitcode)
- emop.lib.utilities.get_temp_dir()[source]¶
Gets the temp directory based on environment variables
- Currently searched environment variables:
- TMPDIR
- Returns:
- str: Path to temp directory.
- emop.lib.utilities.mkdirs_exists_ok(path)[source]¶
Wrapper for os.makedirs
This function is needed to avoid race conditions when the directory exists when attempting to use os.makedirs. This emulates the behavior of Python 3.x os.makedirs exist_ok argument.
- Args:
- path (str): Path of directory to create