Contributing#

pybear is not actively seeking contributions. However, at the same time, pybear does not want to turn away good work that enhances pybear and the Python data analytics ecosystem. Below is the development framework for pybear from its inception, and must continue with any future contributions. In addition to the below guidelines, it is best to use existing pybear source code as a reference to guide you as to what is expected in contributions. Code should be internally consistent with the design and conventions of existing pybear modules.

This project adheres to a Code of Conduct. When you contribute to pybear, or engage with the pybear community, you are expected to adhere to these rules.

Code of Conduct

The Original pybear Mission Statement#

pybear seeks to add to and/or enhance existing functionality in the Python data analytics ecosystem.

pybear runs on all actively supported versions of Python.

pybear must seamlessly integrate into conventional Python data analytics workflows. The way pybear does this is by using the scikit-learn API.

Every pybear module seeks to fulfill 4 objectives:

handle missing data: robust handling of all nan-like values (numpy.nan, pandas.NA, etc.)
fast processing with parallelism: use joblib when there is benefit
bigger than memory data: every module has a partial_fit method for incremental learning
accept all common containers: accepts numpy, pandas, polars, and scipy sparse containers

Going forward, contributions must uphold the original mission statement.

Functional Code#

All modules must follow the appropriate scikit-learn API for their type (e.g. transformer, estimator).

Do not use any third party package validation functions or mixins to build pybear modules, even if they are part of that package’s public API. This especially means scikit! pybear has gone to great lengths to free itself from disruption by changes in third party packages. When using tools from a third party package, pybear deliberately tries to only use the most popular (and least likely to change) functionality in their public API. Everything you need to build the non-public API of pybear modules is in the pybear ‘base’ toolkit. If there is a need for some new functionality, do not borrow, build it for pybear.

All major modules must always accept numpy ndarrays, pandas dataframes, and polars dataframes. They must also accept all scipy sparse matrix/array formats (as of the time of this writing there are seven) when the data is strictly numeric. Some modules may also accept Python lists, tuples, and sets, if there is a good reason (consider ragged arrays in pybear text analytics, which uses Python built-ins.) pybear generally encourages the use of memory-optimized containers over Python built-ins except in the case of text analytics. Any containers beyond the ones listed here, especially if they require importing a new package, should be avoided unless there is a compelling case for the addition. Lazy containers, like dask, should be avoided.

All modules must robustly handle any nan-like values that could be found in the containers listed above (e.g, numpy.nan, pandas.NA). pybear recommends using the pybear.utilities.nan_mask module and/or its variants. Also see the documentation in these ‘nan_mask’ modules for the full discussion on what nan-like values are handled by pybear.

If a module can be written in such a way that joblib can demonstrably improve speed over linear code, then the module must be written in that way and use joblib.

All modules must have a partial_fit method if it is technically possible. If it can have it, it must have it.

Code Formatting#

pybear follows PEP8 conventions. While no specific format linter is required (e.g. flake8), contributions should follow the basic PEP8 guidelines. Use 4 spaces for indents. Use a right margin of 72 characters for docstrings and in-line comments. For code, use a right margin of 79, with allowance for overflow up to 88 characters when it “makes sense.” Use 2 blank lines before and after function definitions. These are just a few of the PEP8 formatting recommendations, see the actual spec for the full details. There is latitude afforded by PEP8, but if unsure, use existing pybear code as a formatting reference. The formatting of new code should not be conspicuously different from existing pybear code.

Docs#

pybear uses the numpydocs standard for docstrings. Please refer to the numpydoc Style Guide. All docstrings must thoroughly document the purpose and functionality of their respective modules. Functions must at least have the “Parameters”, “Returns” and “Examples” sections. Classes must at least have the “Parameters”, “Attributes”, “Returns” and “Examples” sections. Sections like “Notes”, “References”, “Raises”, “See Also”, etc., are optional, but encouraged if they add clarity. Type hints are expected, and must be accurate and consistent with the source code. See the Type Hints section for continuation of docstring guidance.

There is a dual mandate that docs must render accurately and aesthetically in PyCharm tooltips and on the pybear website. pybear uses sphinx with the numpydoc extension to automatically render docstrings and publish them to Read The Docs. Unfortunately, there is not a one-to-one relationship between the PyCharm linters and sphinx, meaning, that formatting that displays perfectly with sphinx is not-so-perfect for PyCharm, and vice-versa. pybear seeks to optimize sphinx first (i.e., follow the numpydoc standard exactly), but try to reasonably accommodate the PyCharm linters when possible.

These guidelines must be followed for every module and submodule, whether public or private.

Type Hints#

Type hints are required in the code body and docstring in every module, public or private. In public modules, they greatly improve clarity for the user. In all cases, they greatly improve clarity during the development and maintenance of the source code. Type hints are expected to be accurate and consistent. Type hinting can be subjective. When in doubt, refer to existing pybear source code for examples of usage.

Test#

pybear uses pytest. If you want to contribute to the public API then you must submit tests with it. Every module must be tested comprehensively for center cases and edge cases. If the module takes various types of data containers, all of them must be tested. Test for correct handling of invalid inputs, not just valid inputs. pybear uses pytest-cov to calculate coverage and does not use # pragma: no cover. At first release test coverage was 93% and going forward this number should stay over 90%. pybear is tested on Linux, Windows, and Mac operating systems, for all Python versions that pybear supports. All tests must pass on each operating system and Python version.

How To Get The Source Code#

pybear uses GitHub to manage versions. If you want to contribute, you will need to use GitHub to get the source code and make your contribution. See the pybear GitHub homepage on GitHub.

Fork the pybear repo to your own GitHub account then clone your fork to your local device. Use internet resources (such as GitHub’s documentation or community tutorials) for more help with this process.

How To Manage Project Dependencies#

Once you have the source files in a local project folder, you can use poetry to manage the project dependencies. To use poetry, you need the pyproject.toml file and a local install of poetry. The pyproject.toml file should be included in the clone of the online repo. To install poetry, pip install it into any of your local Python environments and be sure to use that Python version when you run the following command. (See elsewhere in the pybear documentation for the Python versions that are currently supported by pybear.) From the root of the project folder, install all the dependencies, including dev and test:

poetry install --with dev,test

This will install all the dependencies you need to develop pybear.

How To Submit#

Get the source code, follow the writing guidelines, and make your changes. Keep your fork up to date with main. Create a pull request from your fork, explain what changes you have made in detail, and ask for a review.