:mod:`lookout.core.lib` ======================= .. py:module:: lookout.core.lib .. autoapi-nested-parse:: Various utilities for analyzers to work with UASTs and plain texts. Module Contents --------------- .. function:: find_new_lines(before:str, after:str) Return the new line numbers from the pair of "before" and "after" file contents. :param before: The previous contents of the file. :param after: The new contents of the file. :return: List of line numbers new to `after`. .. function:: find_deleted_lines(before:str, after:str) Return line numbers next to deleted lines in the new file content. :param before: The previous contents of the file. :param after: The new contents of the file. :return: list of line numbers next to deleted lines. .. function:: extract_changed_nodes(root:Node, lines:Sequence[int]) Collect the list of UAST nodes which lie on the changed lines. :param root: UAST root node. :param lines: Changed lines, typically obtained via find_new_lines(). Empty list means all the lines. :return: List of UAST nodes which are suspected to have been changed. .. function:: files_by_language(files:Iterable[File]) Sorts files by programming language and path. :param files: Iterable of `File`-s. :return: Dictionary with languages as keys and files mapped to paths as values. .. function:: filter_files_by_path(filepaths:Iterable[str], exclude_pattern:Optional[str]=None) Filter out files by specific patterns in their path. :param filepaths: Iterable of file paths to examine. :param exclude_pattern: Regular expression to search in file paths. The matched files are excluded from the result. If it is None, we use the "garbage" pattern defined in lookout.core.langs. If it is an empty string, filtering is disabled. :return: List of paths, filtered. .. function:: filter_files_by_line_length(filepaths:Iterable[str], content_getter:callable, line_length_limit:int) Filter out files that have lines longer than `line_length_limit`. :param filepaths: Paths to the files to filter. :param content_getter: Function which returns the file byte content by it's path. :param line_length_limit: Maximum line length to accept a file. We measure the length in bytes, not in Unicode characters. :return: Files passed through the maximum line length filter. .. function:: filter_files_by_overall_size(filepaths:Iterable[str], content_getter:callable, overall_size_limit:int, random_state:int=7) Filter out files once the overall passed size is greater than the specified limit. The files are randomly shuffled before filtering. :param filepaths: Paths to the files to filter. :param content_getter: Function which returns the file byte content by it's path. :param overall_size_limit: Maximum cumulative file size in bytes. The files are discarded after reaching this limit. :param random_state: Random generator state for shuffling the files. :return: Files passed through the overall size filter. .. function:: parse_files(filepaths:Sequence[str], line_length_limit:int, overall_size_limit:int, client:BblfshClient, language:str, random_state:int=7, progress_tracker:Callable=lambda x: x, log:Optional[logging.Logger]=None) Parse files with Babelfish. If a file has lines longer than `line_length_limit`, it is skipped. If the summed size of parsed files exceeds `overall_size_limit` the rest of the files is skipped. Files paths are filtered with `filter_files_by_path()`. The order in which the files are parsed is random - and hence different from `filepaths`. :param filepaths: File paths to filter. :param line_length_limit: Maximum line length to accept a file. :param overall_size_limit: Maximum cumulative files size in bytes. The files are discarded after reaching this limit. :param client: Babelfish client instance. The Babelfish server should be running. :param language: Language to consider. Will discard the other languages. :param random_state: Random generator state for shuffling the files. :param progress_tracker: Optional progress metric whenn iterating over the input files. :param log: Logger to use to report the number of excluded files. :return: `File`-s with parsed UASTs and which passed through the filters. .. function:: filter_files(files:Dict[str, File], line_length_limit:int, overall_size_limit:int, random_state:int=7, log:Optional[logging.Logger]=None) Filter files based on their maximum line length and overall size. :param files: files_by_path[key]les to filter. :param line_length_limit: maximum line length to accept a file. :param overall_size_limit: maximum cumulative files size in bytes. The files are discarded after reaching this limit. :param random_state: random generator state for shuffling the files. :param log: logger to use to report the number of excluded files. :return: files passed through the filter and the number of files which were excluded.