watchme.watchers package

Submodules

watchme.watchers.data module

Copyright (C) 2019 Vanessa Sochat.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

watchme.watchers.data.export_dict(self, task, filename, name=None, export_json=False, from_commit=None, to_commit=None, base=None)[source]

Export a data frame of changes for a filename over time.

Parameters
  • task (the task folder for the watcher to look in)

  • name (the name of the watcher, defaults to the client’s)

  • base (the base of watchme to look for the task folder)

  • from_commit (the commit to start at)

  • to_commit (the commit to go to)

  • grep (the expression to match (not used if None))

  • filename (the filename to filter to. Includes all files if not specified.)

watchme.watchers.schedule module

Copyright (C) 2019 Vanessa Sochat.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

watchme.watchers.schedule.clear_schedule(self)[source]

clear all cron jobs associated with the watcher. To remove jobs associated with a single watcher, use remove_schedule

watchme.watchers.schedule.get_crontab(self)[source]

get an instance of the user’s crontab. We use the running user.

watchme.watchers.schedule.get_job(self, must_exist=True)[source]

return the job to the user, or None

watchme.watchers.schedule.has_schedule(self, must_exist=False)[source]

determine if a watcher already has a schedule, as a warning to the user.

watchme.watchers.schedule.remove_schedule(self, name=None, quiet=False)[source]

remove a scheduled item from crontab, this is based on the watcher name. By default, we use the watcher instance name, however you can specify a custom name if desired.

watchme.watchers.schedule.schedule(self, minute=12, hour=0, month='*', day='*', weekday='*', job=None, force=False)[source]

schedule the watcher to run at some frequency to update record of pages. By default, the task will run at 12 minutes passed midnight, daily. You can change the variables to change the frequency. See https://crontab.guru/ to get a setting that works for you.

Hourly: 0 * * * * Daily: 0 0 * * * (midnight) default weekly 0 0 * * 0 monthly 0 0 1 * * yearly 0 0 1 1 *

Parameters
  • minute (must be within 1 and 60, or set to “” for every minute*)

  • hour (must be within 0 through 23 or set to *)

  • month (must be within 1 and 12, or *)

  • day (must be between 1 and 31, or *)

  • weekday (must be between 0 and 6 or *)

  • job (if provided, assumes we are updated an existing entry.)

watchme.watchers.schedule.update_schedule(self, minute=12, hour='*', month='*', day='*')[source]

update a scheduled item from the crontab, with a new entry. This first looks for the entry (and removes it) and then clls the new_ schedule function to write a new one. This function is intended to be used by a client from within Python, and isn’t exposed from the command line.

watchme.watchers.settings module

Copyright (C) 2019 Vanessa Sochat.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

watchme.watchers.settings.get_section(self, name)[source]

get a section from the config, if it exists

watchme.watchers.settings.get_setting(self, section, name, default=None)[source]

return a setting from the config, if defined. Otherwise return default (None or set by user)

Parameters
  • section (the section in the config, defaults to self.name)

  • name (they key (index) of the setting to look up)

  • default ((optional) if not found, return default instead.)

watchme.watchers.settings.has_section(self, section)[source]

return a boolean if a config has a section (e.g., a task or exporter) :Parameters: section (the section in the config)

watchme.watchers.settings.has_setting(self, section, name)[source]

return a boolean if a config has a setting (or not) :Parameters: * section (the section in the config, defaults to self.name)

  • name (they key (index) of the setting to look up)

watchme.watchers.settings.print_add_task(self, task)[source]

assemble a task section into a command that can create/add it.

Parameters

task (the name of the task to inspect)

watchme.watchers.settings.print_section(self, section)[source]

print a section (usually a task) from a configuration file, if it exists.

Parameters

section (the name of the section (task))

watchme.watchers.settings.remove_section(self, section, save=True)[source]

remove a setting from the configuration file

Parameters
  • section (the name of the section (task))

  • save (save the configuration file (default is True))

watchme.watchers.settings.remove_setting(self, section, name, save=True)[source]

remove a setting from the configuration file

Parameters
  • section (the name of the section (task))

  • name (the name of the variable to remove)

  • save (save the configuration file (default is True))

watchme.watchers.settings.set_setting(self, section, key, value)[source]

set a key value pair in a section, if the section exists. Returns a boolean (True or False) to indicate if added.

Parameters
  • section (the section in the config, defaults to self.name)

  • key (they key (index) of the setting to set)

  • value (the value to set.)

Module contents

Copyright (C) 2019 Vanessa Sochat.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

class watchme.watchers.Watcher(name=None, base=None, create=False, **kwargs)[source]

Bases: object

activate(task=None)[source]

turn the active status of a watcher to True

add_task(task, task_type, params, force=False, active='true')[source]

add a task, meaning ensuring that the type is valid, and that the parameters are valid for the task.

Parameters
  • task (the Task object to add, should have a name and params and) – be child of watchme.tasks.TaskBase

  • task_type (must be in WATCHME_TASK_TYPES, meaning a client exists)

  • params (list of parameters to be validated (key@value))

  • force (if task already exists, overwrite)

  • active (add the task as active (default “true”))

clear_schedule()

clear all cron jobs associated with the watcher. To remove jobs associated with a single watcher, use remove_schedule

configfile = None
deactivate(task=None)[source]

turn the active status of a watcher to false. If a task is provided, update the config value for the task to be false.

delete()[source]

delete the entire watcher, only if not protected. Cannot be undone.

edit_task(name, action, key, value=None)[source]

edit a task, meaning doing an addition (add), update (update), or “remove”, All actions require a value other than remove.

Parameters
  • name (the name of the task to update)

  • action (the action to take (update, add, remove) a parameter)

  • key (the key to update)

  • value (the value to update)

export_dict(task, filename, name=None, export_json=False, from_commit=None, to_commit=None, base=None)

Export a data frame of changes for a filename over time.

Parameters
  • task (the task folder for the watcher to look in)

  • name (the name of the watcher, defaults to the client’s)

  • base (the base of watchme to look for the task folder)

  • from_commit (the commit to start at)

  • to_commit (the commit to go to)

  • grep (the expression to match (not used if None))

  • filename (the filename to filter to. Includes all files if not specified.)

finish_runs(results)[source]

finish runs should take a dictionary of results, with keys as the folder name, and for each, depending on the result type, write the result to file (or update file) and then commit to git.

Parameters

results (a dictionary of tasks, with keys as the task name, and) – values as the result.

freeze()[source]

freeze a watcher, meaning that it along with its tasks cannot be deleted. This does not prevent the user from manual editing.

get_crontab()

get an instance of the user’s crontab. We use the running user.

get_decorator(name)[source]

instantiate a task object for a decorator. Decorators must start with “decorator-” and since they are run on the fly, we don’t find them in the config.

Parameters

name (the name of the task to load)

get_job(must_exist=True)

return the job to the user, or None

get_section(name)

get a section from the config, if it exists

get_setting(section, name, default=None)

return a setting from the config, if defined. Otherwise return default (None or set by user)

Parameters
  • section (the section in the config, defaults to self.name)

  • name (they key (index) of the setting to look up)

  • default ((optional) if not found, return default instead.)

get_task(name)[source]

get a particular task, based on the name. This is where each type of class should check the “type” parameter from the config, and import the correct Task class.

Parameters

name (the name of the task to load)

get_tasks(regexp=None, quiet=False, active=True)[source]

get the tasks for a watcher, possibly matching a regular expression. A list of dictionaries is returned, each holding the parameters for a task. “uri” will hold the task (folder) name, active

Parameters
  • regexp (if supplied, the user wants to run only tasks that match) – a particular pattern

  • quiet (If quiet, don’t print the number of tasks found)

  • active (only return active tasks (default True))

has_schedule(must_exist=False)

determine if a watcher already has a schedule, as a warning to the user.

has_section(section)

return a boolean if a config has a section (e.g., a task or exporter) :Parameters: section (the section in the config)

has_setting(section, name)

return a boolean if a config has a setting (or not) :Parameters: * section (the section in the config, defaults to self.name)

  • name (they key (index) of the setting to look up)

has_task(name)[source]

returns True or False to indicate if the watcher has a specified task.

inspect(tasks=None, create_command=False)[source]

inspect a watcher, or one or more tasks belonging to it. This means printing the configuration for the entire watcher (if tasks is None) or just for one or more tasks.

Parameters
  • tasks (one or more tasks to inspect (None will show entire file))

  • create_command (if True, given one or more tasks, print the command) – to create them.

is_active(task=None)[source]

determine if the watcher is active by reading from the config directly if a task name is provided, check the active status of the task

is_frozen()[source]

return a boolean to indicate if the watcher is frozen. protected indicates no delete to the watcher, but allowed delete to tasks, frozen indicates no change of anything.

is_protected()[source]

return a boolean to indicate if the watcher is protected or frozen. protected indicates no delete to the watcher, but allowed delete to tasks, frozen indicates no change of anything.

list(quiet=False)[source]

list the watchers. If quiet is True, don’t print to the screen.

load_config()[source]

load a configuration file, and set the active setting for the watcher if the file doesn’t exist, the function will exit and prompt the user to create the watcher first. If the watcher section isn’t yet defined, it will be written with a default active status set to false.

print_add_task(task)

assemble a task section into a command that can create/add it.

Parameters

task (the name of the task to inspect)

print_section(section)

print a section (usually a task) from a configuration file, if it exists.

Parameters

section (the name of the section (task))

protect(status='on')[source]

protect a watcher, meaning that it cannot be deleted. This does not influence removing a task. To freeze the entire watcher, use the freeze() function.

remove_schedule(name=None, quiet=False)

remove a scheduled item from crontab, this is based on the watcher name. By default, we use the watcher instance name, however you can specify a custom name if desired.

remove_section(section, save=True)

remove a setting from the configuration file

Parameters
  • section (the name of the section (task))

  • save (save the configuration file (default is True))

remove_setting(section, name, save=True)

remove a setting from the configuration file

Parameters
  • section (the name of the section (task))

  • name (the name of the variable to remove)

  • save (save the configuration file (default is True))

remove_task(task)[source]

remove a task from the watcher repo, if it exists, and the watcher is not frozen.

Parameters

task (the name of the task to remove)

repo = None
run(regexp=None, parallel=True, test=False, show_progress=True)[source]

run the watcher, which should be done via the crontab, including:

  • checks: the instantiation of the client already ensures that

    the watcher folder exists, and has a configuration, and it loads.

  • parse: parse the tasks to be run

  • start: run the tasks that are defined for the watcher.

  • finish: after completion, commit to the repository changed files

Parameters
  • regexp (if supplied, the user wants to run only tasks that match) – a particular pattern

  • parallel (if True, use multiprocessing to run tasks (True)) – each watcher should have this setup ready to go.

  • test (run in test mode (no saving of results))

  • show_progress (if True, show progress bar instead of task information) – (defaults to True)

run_tasks(queue, parallel=True, show_progress=True)[source]

this run_tasks function takes a list of Task objects, each potentially a different kind of task, and extracts the parameters with task.export_params(), and the running function with task.export_func(), and hands these over to the multiprocessing worker. It’s up to the Task to return some correct function from it’s set of task functions that correspond with the variables.

Examples

funcs {‘task-reddit-hpc’: <function watchme.watchers.urls.tasks.get_task>}

tasks {‘task-reddit-hpc’: [(‘url’, ‘https://www.reddit.com/r/hpc’),

(‘active’, ‘true’), (‘type’, ‘urls’)]}

save()[source]

save the configuration to file.

schedule(minute=12, hour=0, month='*', day='*', weekday='*', job=None, force=False)

schedule the watcher to run at some frequency to update record of pages. By default, the task will run at 12 minutes passed midnight, daily. You can change the variables to change the frequency. See https://crontab.guru/ to get a setting that works for you.

Hourly: 0 * * * * Daily: 0 0 * * * (midnight) default weekly 0 0 * * 0 monthly 0 0 1 * * yearly 0 0 1 1 *

Parameters
  • minute (must be within 1 and 60, or set to “” for every minute*)

  • hour (must be within 0 through 23 or set to *)

  • month (must be within 1 and 12, or *)

  • day (must be between 1 and 31, or *)

  • weekday (must be between 0 and 6 or *)

  • job (if provided, assumes we are updated an existing entry.)

set_setting(section, key, value)

set a key value pair in a section, if the section exists. Returns a boolean (True or False) to indicate if added.

Parameters
  • section (the section in the config, defaults to self.name)

  • key (they key (index) of the setting to set)

  • value (the value to set.)

unfreeze()[source]

freeze a watcher, meaning that it along with its tasks cannot be deleted. This does not prevent the user from manual editing.

update_schedule(minute=12, hour='*', month='*', day='*')

update a scheduled item from the crontab, with a new entry. This first looks for the entry (and removes it) and then clls the new_ schedule function to write a new one. This function is intended to be used by a client from within Python, and isn’t exposed from the command line.