lookout.core.lib

Various utilities for analyzers to work with UASTs and plain texts.

Module Contents

lookout.core.lib.find_new_lines(before:str, after:str)

Return the new line numbers from the pair of “before” and “after” file contents.

Parameters:
  • before – The previous contents of the file.
  • after – The new contents of the file.
Returns:

List of line numbers new to after.

lookout.core.lib.find_deleted_lines(before:str, after:str)

Return line numbers next to deleted lines in the new file content.

Parameters:
  • before – The previous contents of the file.
  • after – The new contents of the file.
Returns:

list of line numbers next to deleted lines.

lookout.core.lib.extract_changed_nodes(root:Node, lines:Sequence[int])

Collect the list of UAST nodes which lie on the changed lines.

Parameters:
  • root – UAST root node.
  • lines – Changed lines, typically obtained via find_new_lines(). Empty list means all the lines.
Returns:

List of UAST nodes which are suspected to have been changed.

lookout.core.lib.files_by_language(files:Iterable[File])

Sorts files by programming language and path.

Parameters:files – Iterable of File-s.
Returns:Dictionary with languages as keys and files mapped to paths as values.
lookout.core.lib.filter_files_by_path(filepaths:Iterable[str], exclude_pattern:Optional[str]=None)

Filter out files by specific patterns in their path.

Parameters:
  • filepaths – Iterable of file paths to examine.
  • exclude_pattern – Regular expression to search in file paths. The matched files are excluded from the result. If it is None, we use the “garbage” pattern defined in lookout.core.langs. If it is an empty string, filtering is disabled.
Returns:

List of paths, filtered.

lookout.core.lib.filter_files_by_line_length(filepaths:Iterable[str], content_getter:callable, line_length_limit:int)

Filter out files that have lines longer than line_length_limit.

Parameters:
  • filepaths – Paths to the files to filter.
  • content_getter – Function which returns the file byte content by it’s path.
  • line_length_limit – Maximum line length to accept a file. We measure the length in bytes, not in Unicode characters.
Returns:

Files passed through the maximum line length filter.

lookout.core.lib.filter_files_by_overall_size(filepaths:Iterable[str], content_getter:callable, overall_size_limit:int, random_state:int=7)

Filter out files once the overall passed size is greater than the specified limit.

The files are randomly shuffled before filtering.

Parameters:
  • filepaths – Paths to the files to filter.
  • content_getter – Function which returns the file byte content by it’s path.
  • overall_size_limit – Maximum cumulative file size in bytes. The files are discarded after reaching this limit.
  • random_state – Random generator state for shuffling the files.
Returns:

Files passed through the overall size filter.

lookout.core.lib.parse_files(filepaths:Sequence[str], line_length_limit:int, overall_size_limit:int, client:BblfshClient, language:str, random_state:int=7, progress_tracker:Callable=lambda x: x, log:Optional[logging.Logger]=None)

Parse files with Babelfish.

If a file has lines longer than line_length_limit, it is skipped. If the summed size of parsed files exceeds overall_size_limit the rest of the files is skipped. Files paths are filtered with filter_files_by_path(). The order in which the files are parsed is random - and hence different from filepaths.

Parameters:
  • filepaths – File paths to filter.
  • line_length_limit – Maximum line length to accept a file.
  • overall_size_limit – Maximum cumulative files size in bytes. The files are discarded after reaching this limit.
  • client – Babelfish client instance. The Babelfish server should be running.
  • language – Language to consider. Will discard the other languages.
  • random_state – Random generator state for shuffling the files.
  • progress_tracker – Optional progress metric whenn iterating over the input files.
  • log – Logger to use to report the number of excluded files.
Returns:

File-s with parsed UASTs and which passed through the filters.

lookout.core.lib.filter_files(files:Dict[str, File], line_length_limit:int, overall_size_limit:int, random_state:int=7, log:Optional[logging.Logger]=None)

Filter files based on their maximum line length and overall size.

Parameters:
  • files – files_by_path[key]les to filter.
  • line_length_limit – maximum line length to accept a file.
  • overall_size_limit – maximum cumulative files size in bytes. The files are discarded after reaching this limit.
  • random_state – random generator state for shuffling the files.
  • log – logger to use to report the number of excluded files.
Returns:

files passed through the filter and the number of files which were excluded.