almanack
Package API#
Base#
Module for interacting with Software Gardening Almanack book content through a Python package.
- almanack.book.read(chapter_name: str | None = None)[source]#
A function for reading almanack content through a package interface.
- Parameters:
chapter_name – Optional[str], default None A string which indicates the short-hand name of a chapter from the book. Short-hand names are lower-case title names read from src/book/_toc.yml.
- Returns:
- None
The outcome of this function involves printing the content of the selected chapter from the short-hand name or showing available chapter names through an exception.
Setup almanack CLI through python-fire
- class almanack.cli.AlmanackCLI[source]#
Bases:
object
Almanack CLI class for Google Fire
The following CLI-based commands are available (and in alignment with the methods below based on their name):
- almanack table <repo path>: Provides a JSON data
structure which includes Almanack metric data. Always returns a 0 exit.
- almanack check <repo path>: Provides a report
of boolean metrics which include a non-zero sustainability direction (“checks”) that are failing to inform a user whether they pass. Returns non-zero exit (1) if any checks are failing, otherwise 0.
- check(repo_path: str, ignore: List[str] | None = None) None [source]#
Used through CLI to check table of metrics for boolean values with a non-zero sustainability direction for failures.
This enables the use of CLI such as: almanack check <repo path>
- Parameters:
repo_path (str) – The path to the repository to analyze.
ignore (List[str]) – A list of metric IDs to ignore when running the checks. Defaults to None.
- table(repo_path: str, ignore: List[str] | None = None) None [source]#
Used through CLI to generate a table of metrics
This enables the use of CLI such as: almanack table <repo path>
- Parameters:
repo_path (str) – The path to the repository to analyze.
ignore (List[str]) – A list of metric IDs to ignore when running the checks. Defaults to None.
This module performs git operations
- almanack.git.clone_repository(repo_url: str) Path [source]#
Clones the GitHub repository to a temporary directory.
- Parameters:
repo_url (str) – The URL of the GitHub repository.
- Returns:
Path to the cloned repository.
- Return type:
pathlib.Path
- almanack.git.count_files(tree: Tree | Blob) int [source]#
Counts all files (Blobs) within a Git tree, including files in subdirectories.
This function recursively traverses the provided tree object to count each file, represented as a pygit2.Blob, within the tree and any nested subdirectories.
- Parameters:
tree (Union[pygit2.Tree, pygit2.Blob]) – The Git tree object (of type pygit2.Tree) to traverse and count files. The initial call should be made with the root tree of a commit.
- Returns:
The total count of files (Blobs) within the tree, including nested files in subdirectories.
- Return type:
int
- almanack.git.detect_encoding(blob_data: bytes) str [source]#
Detect the encoding of the given blob data using charset-normalizer.
- Parameters:
blob_data (bytes) – The raw bytes of the blob to analyze.
- Returns:
The best detected encoding of the blob data.
- Return type:
str
- Raises:
ValueError – If no encoding could be detected.
- almanack.git.file_exists_in_repo(repo: Repository, expected_file_name: str, check_extension: bool = False, extensions: list[str] = ['.md', '.txt', '.rtf', '']) bool [source]#
Check if a file (case-insensitive and with optional extensions) exists in the latest commit of the repository.
- Parameters:
repo (pygit2.Repository) – The repository object to search in.
expected_file_name (str) – The base file name to check (e.g., “readme”).
check_extension (bool) – Whether to check the extension of the file or not.
extensions (list[str]) – List of possible file extensions to check (e.g., [“.md”, “”]).
- Returns:
True if the file exists, False otherwise.
- Return type:
bool
- almanack.git.find_file(repo: Repository, filepath: str, case_insensitive: bool = False, extensions: list[str] = ['.md', '.txt', '.rtf', '.rst', '']) Object | None [source]#
Locate a file in the repository by its path.
- Parameters:
repo (pygit2.Repository) – The repository object.
filepath (str) – The path to the file within the repository.
case_insensitive (bool) – If True, perform case-insensitive comparison.
extensions (list[str]) – List of possible file extensions to check (e.g., [“.md”, “”]).
- Returns:
The entry of the found file, or None if no matching file is found.
- Return type:
Optional[pygit2.Object]
- almanack.git.get_commits(repo: Repository) List[Commit] [source]#
Retrieves the list of commits from the main branch.
- Parameters:
repo (pygit2.Repository) – The Git repository.
- Returns:
List of commits in the repository.
- Return type:
List[pygit2.Commit]
- almanack.git.get_edited_files(repo: Repository, source_commit: Commit, target_commit: Commit) List[str] [source]#
Finds all files that have been edited, added, or deleted between two specific commits.
- Parameters:
repo (pygit2.Repository) – The Git repository.
source_commit (pygit2.Commit) – The source commit.
target_commit (pygit2.Commit) – The target commit.
- Returns:
List of file names that have been edited, added, or deleted between the two commits.
- Return type:
List[str]
- almanack.git.get_loc_changed(repo_path: Path, source: str, target: str, file_names: List[str]) Dict[str, int] [source]#
Finds the total number of code lines changed for each specified file between two commits.
- Parameters:
repo_path (pathlib.Path) – The path to the git repository.
source (str) – The source commit hash.
target (str) – The target commit hash.
file_names (List[str]) – List of file names to calculate changes for.
- Returns:
A dictionary where the key is the filename, and the value is the lines changed (added and removed).
- Return type:
Dict[str, int]
- almanack.git.get_most_recent_commits(repo_path: Path) tuple[str, str] [source]#
Retrieves the two most recent commit hashes in the test repositories
- Parameters:
repo_path (pathlib.Path) – The path to the git repository.
- Returns:
Tuple containing the source and target commit hashes.
- Return type:
tuple[str, str]
- almanack.git.get_remote_url(repo: Repository) str | None [source]#
Determines the remote URL of a git repository, if available.
- Parameters:
repo (pygit2.Repository) – The pygit2 repository object.
- Returns:
The remote URL if found, otherwise None.
- Return type:
Optional[str]
- almanack.git.read_file(repo: Repository, entry: Object | None = None, filepath: str | None = None, case_insensitive: bool = False) str | None [source]#
Read the content of a file from the repository.
- Parameters:
repo (pygit2.Repository) – The repository object.
entry (Optional[pygit2.Object]) – The entry of the file to read. If not provided, filepath must be specified.
filepath (Optional[str]) – The path to the file within the repository. Used if entry is not provided.
case_insensitive (bool) – If True, perform case-insensitive comparison when using filepath.
- Returns:
The content of the file as a string, or None if the file is not found or reading fails.
- Return type:
Optional[str]
Reporting#
This module creates entropy reports
Metrics#
Data#
This module computes data for GitHub Repositories
- almanack.metrics.data._get_almanack_version() str [source]#
Seeks the current version of almanack using either pkg_resources or dunamai to determine the current version being used.
- Returns:
- str
A string representing the version of almanack currently being used.
- almanack.metrics.data.compute_almanack_score(almanack_table: List[Dict[str, int | float | bool]]) Dict[str, int | float] [source]#
Computes an Almanack score by counting boolean Almanack table metrics to provide a quick summary of software sustainability.
- Parameters:
almanack_table (List[Dict[str, Union[int, float, bool]]]) – A list of dictionaries containing metrics. Each dictionary must have a “result” key with a value that is an int, float, or bool. A “sustainability_correlation” key is included for values to specify the relationship to sustainability: - 1 (positive correlation) - 0 (no correlation) - -1 (negative correlation)
- Returns:
Dictionary of length three, including the following: 1) number of Almanack boolean metrics that passed (numerator), 2) number of total Almanack boolean metrics considered (denominator), and 3) a score that represents how likely the repository will be maintained over time based (numerator / denominator).
- Return type:
Dict[str, Union[int, float]]
- almanack.metrics.data.compute_pr_data(repo_path: str, pr_branch: str, main_branch: str) Dict[str, Any] [source]#
Computes entropy data for a PR compared to the main branch.
- Parameters:
repo_path (str) – The local path to the Git repository.
pr_branch (str) – The branch name for the PR.
main_branch (str) – The branch name for the main branch.
- Returns:
- A dictionary containing the following key-value pairs:
”pr_branch”: The PR branch being analyzed.
”main_branch”: The main branch being compared.
”total_entropy_introduced”: The total entropy introduced by the PR.
”number_of_files_changed”: The number of files changed in the PR.
”entropy_per_file”: A dictionary of entropy values for each changed file.
”commits”: A tuple containing the most recent commits on the PR and main branches.
- Return type:
dict
- almanack.metrics.data.compute_repo_data(repo_path: str) None [source]#
Computes comprehensive data for a GitHub repository.
- Parameters:
repo_path (str) – The local path to the Git repository.
- Returns:
A dictionary containing data key-pairs.
- Return type:
dict
- almanack.metrics.data.days_of_development(repo: Repository) float [source]#
- Parameters:
repo (pygit2.Repository) – Path to the git repository.
- Returns:
The average number of commits per day over the period of time.
- Return type:
float
- almanack.metrics.data.gather_failed_almanack_metric_checks(repo_path: str, ignore: List[str] | None = None) List[Dict[str, Any]] [source]#
Gather checks on the repository metrics and returns a list of failed checks for use in helping others understand the failed checks and rectify them.
- Parameters:
repo_path (str) – The file path to the repository which will have metrics calculated and includes boolean checks.
ignore (Optional[List[str]]) – A list of metric IDs to ignore when running the checks. Defaults to None.
- Returns:
A list of dictionaries containing the metrics and their associated results. Each dictionary includes the name, id, and
- Return type:
List[Dict[str, Any]]
- guidance on how to fix each failed check. The dictionary also
includes data about the almanack score for use in summarizing the results.
- almanack.metrics.data.get_github_build_metrics(repo_url: str, branch: str = 'main', max_runs: int = 100, github_api_endpoint: str = 'https://api.github.com/repos') dict [source]#
Fetches the success ratio of the latest GitHub Actions build runs for a specified branch.
- Parameters:
repo_url (str) – The full URL of the repository (e.g., ‘software-gardening/almanack’).
branch (str) – The branch to filter for the workflow runs (default: “main”).
max_runs (int) – The maximum number of latest workflow runs to analyze.
github_api_endpoint (str) – Base API endpoint for GitHub repositories.
- Returns:
The success ratio and details of the analyzed workflow runs.
- Return type:
dict
- almanack.metrics.data.get_table(repo_path: str, ignore: List[str] | None = None) List[Dict[str, Any]] [source]#
Gather metrics on a repository and return the results in a structured format.
This function reads a metrics table from a predefined YAML file, computes relevant data from the specified repository, and associates the computed results with the metrics defined in the metrics table. If an error occurs during data computation, an exception is raised.
- Parameters:
repo_path (str) – The file path to the repository for which the Almanack runs metrics.
ignore (Optional[List[str]]) – A list of metric IDs to ignore when running the checks. Defaults to None.
- Returns:
A list of dictionaries containing the metrics and their associated results. Each dictionary includes the original metrics data along with the computed result under the key “result”.
- Return type:
List[Dict[str, Any]]
- Raises:
ReferenceError – If there is an error encountered while processing the
data, providing context in the error message. –
- almanack.metrics.data.measure_coverage(repo: Repository, primary_language: str | None) Dict[str, Any] | None [source]#
Measures code coverage for a given repository.
- Parameters:
repo (pygit2.Repository) – The pygit2 repository object to analyze.
primary_language (Optional[str]) – The primary programming language of the repository.
- Returns:
Code coverage data or an empty dictionary if unable to find code coverage data.
- Return type:
Optional[dict[str,Any]]
- almanack.metrics.data.parse_python_coverage_data(repo: Repository) Dict[str, Any] | None [source]#
Parses coverage.py data from recognized formats such as JSON, XML, or LCOV. See here for more information: https://coverage.readthedocs.io/en/latest/cmd.html#cmd-report
- Parameters:
repo (pygit2.Repository) – The pygit2 repository object containing code.
- Returns:
A dictionary with standardized code coverage data or an empty dict if no data is found.
- Return type:
Optional[Dict[str, Any]]
- almanack.metrics.data.process_repo_for_analysis(repo_url: str) Tuple[float | None, str | None, str | None, int | None] [source]#
Processes GitHub repository URL’s to calculate entropy and other metadata. This is used to prepare data for analysis, particularly for the seedbank notebook that process PUBMED repositories.
- Parameters:
repo_url (str) – The URL of the GitHub repository.
- Returns:
- A tuple containing the normalized total entropy, the date of the first commit,
the date of the most recent commit, and the total time of existence in days.
- Return type:
tuple
Remote#
This module focuses on remote API requests and related aspects.
- almanack.metrics.remote.get_api_data(api_endpoint: str = 'https://repos.ecosyste.ms/api/v1/repositories/lookup', params: Dict[str, str] | None = None) dict [source]#
Get data from an API based on the remote URL, with retry logic for GitHub rate limiting.
- Parameters:
api_endpoint (str) – The HTTP API endpoint to use for the request.
params (Optional[Dict[str, str]]) – Additional query parameters to include in the GET request.
- Returns:
The JSON response from the API as a dictionary.
- Return type:
dict
- Raises:
requests.RequestException – If the API call fails for reasons other than rate limiting.
Entropy#
This module calculates the amount of Software information entropy
- almanack.metrics.entropy.calculate_entropy.calculate_aggregate_entropy(repo_path: Path, source_commit: Commit, target_commit: Commit, file_names: List[str]) float [source]#
Computes the aggregated normalized entropy score from the output of calculate_normalized_entropy for specified a Git repository. Inspired by Shannon’s information theory entropy formula. We follow an approach described by Hassan (2009) (see references).
- Parameters:
repo_path (str) – The file path to the git repository.
source_commit (pygit2.Commit) – The git hash of the source commit.
target_commit (pygit2.Commit) – The git hash of the target commit.
file_names (list[str]) – List of file names to calculate entropy for.
- Returns:
Normalized entropy calculation.
- Return type:
float
References
- Hassan, A. E. (2009). Predicting faults using the complexity of code changes.
2009 IEEE 31st International Conference on Software Engineering, 78-88. https://doi.org/10.1109/ICSE.2009.5070510
- almanack.metrics.entropy.calculate_entropy.calculate_normalized_entropy(repo_path: Path, source_commit: Commit, target_commit: Commit, file_names: list[str]) dict[str, float] [source]#
Calculates the entropy of changes in specified files between two commits, inspired by Shannon’s information theory entropy formula. Normalized relative to the total lines of code changes across specified files. We follow an approach described by Hassan (2009) (see references).
Application of Entropy Calculation: Entropy measures the uncertainty in a given system. Calculating the entropy of lines of code (LoC) changed reveals the variability and complexity of modifications in each file. Higher entropy values indicate more unpredictable changes, helping identify potentially unstable code areas.
- Parameters:
repo_path (str) – The file path to the git repository.
source_commit (pygit2.Commit) – The git hash of the source commit.
target_commit (pygit2.Commit) – The git hash of the target commit.
file_names (list[str]) – List of file names to calculate entropy for.
- Returns:
A dictionary mapping file names to their calculated entropy.
- Return type:
dict[str, float]
References
- Hassan, A. E. (2009). Predicting faults using the complexity of code changes.
2009 IEEE 31st International Conference on Software Engineering, 78-88. https://doi.org/10.1109/ICSE.2009.5070510
This module procesess GitHub data
- almanack.metrics.entropy.processing_repositories.process_pr_entropy(repo_path: str, pr_branch: str, main_branch: str) str [source]#
Processes GitHub PR data to calculate a report comparing the PR branch to the main branch.
- Parameters:
repo_path (str) – The local path to the Git repository.
pr_branch (str) – The branch name of the PR.
main_branch (str) – The branch name for the main branch.
- Returns:
A JSON string containing report data.
- Return type:
str
- Raises:
FileNotFoundError – If the specified directory does not contain a valid Git repository.
- almanack.metrics.entropy.processing_repositories.process_repo_entropy(repo_path: str) str [source]#
Processes GitHub repository data to calculate a report.
- Parameters:
repo_path (str) – The local path to the Git repository.
- Returns:
A JSON string containing the repository data and entropy metrics.
- Return type:
str
- Raises:
FileNotFoundError – If the specified directory does not contain a valid Git repository.
Garden Lattice#
Understanding#
This module focuses on the Almanack’s Garden Lattice materials which encompass aspects of human understanding.
- almanack.metrics.garden_lattice.understanding.includes_common_docs(repo: Repository) bool [source]#
Check whether the repo includes common documentation files and directories associated with building docsites.
- Parameters:
repo (pygit2.Repository) – The repository object.
- Returns:
True if any common documentation files are found, False otherwise.
- Return type:
bool
Connectedness#
Module for Almanack metrics covering human connection and engagement in software projects such as contributor activity, collaboration frequency.
- almanack.metrics.garden_lattice.connectedness.count_unique_contributors(repo: Repository, since: datetime | None = None) int [source]#
Counts the number of unique contributors to a repository.
If a since datetime is provided, counts contributors who made commits after the specified datetime. Otherwise, counts all contributors.
- Parameters:
repo (pygit2.Repository) – The repository to analyze.
since (Optional[datetime]) – The cutoff datetime. Only contributions after this datetime are counted. If None, all contributions are considered.
- Returns:
The number of unique contributors.
- Return type:
int
- almanack.metrics.garden_lattice.connectedness.default_branch_is_not_master(repo: Repository) bool [source]#
Checks if the default branch of the specified repository is “master”.
- Parameters:
repo (Repository) – A pygit2.Repository object representing the Git repository.
- Returns:
True if the default branch is “master”, False otherwise.
- Return type:
bool
- almanack.metrics.garden_lattice.connectedness.detect_social_media_links(content: str) Dict[str, List[str]] [source]#
Analyzes README.md content to identify social media links.
- Parameters:
readme_content (str) – The content of the README.md file as a string.
- Returns:
A dictionary containing social media details discovered from readme.md content.
- Return type:
Dict[str, List[str]]
- almanack.metrics.garden_lattice.connectedness.find_doi_citation_data(repo: Repository) Dict[str, Any] [source]#
Find and validate DOI information from a CITATION.cff file in a repository.
This function searches for a CITATION.cff file in the provided repository, extracts the DOI (if available), validates its format, checks its resolvability via HTTP, and performs an exact DOI lookup on the OpenAlex API.
- Parameters:
repo (pygit2.Repository) – The repository object to search for the CITATION.cff file.
- Returns:
A dictionary containing DOI-related information and metadata.
- Return type:
Dict[str, Any]
- almanack.metrics.garden_lattice.connectedness.is_citable(repo: Repository) bool [source]#
Check if the given repository is citable.
A repository is considered citable if it contains a CITATION.cff or CITATION.bib file, or if the README.md file contains a citation section indicated by “## Citation” or “## Citing”.
- Parameters:
repo (pygit2.Repository) – The repository to check for citation files.
- Returns:
True if the repository is citable, False otherwise.
- Return type:
bool
Practicality#
This module focuses on the Almanack’s Garden Lattice materials which involve how people can apply software in practice.
- almanack.metrics.garden_lattice.practicality.count_repo_tags(repo: Repository, since: datetime | None = None) int [source]#
Counts the number of tags in a pygit2 repository.
If a since datetime is provided, counts only tags associated with commits made after the specified datetime. Otherwise, counts all tags in the repository.
- Parameters:
repo (pygit2.Repository) – The repository to analyze.
since (Optional[datetime]) – The cutoff datetime. Only tags for commits after this datetime are counted. If None, all tags are counted.
- Returns:
The number of tags in the repository that meet the criteria.
- Return type:
int
- almanack.metrics.garden_lattice.practicality.get_ecosystems_package_metrics(repo_url: str) Dict[str, Any] [source]#
Fetches package data from the ecosyste.ms API and calculates metrics about the number of unique ecosystems, total version counts, and the list of ecosystem names.
- Parameters:
repo_url (str) – The repository URL of the package to query.
- Returns:
A dictionary containing information about packages related to the repository.
- Return type:
Dict[str, Any]