Contributing Guide
We welcome contributions to the toiro project! Whether it's bug reports, feature requests, or pull requests, all forms of contribution are appreciated.
Code of Conduct and Guidelines
- Code of Conduct: CODE_OF_CONDUCT.md
- Contributing Guide: CONTRIBUTING.md
Development Setup
1. Clone the repository
git clone https://github.com/taishi-i/toiro.git
cd toiro
2. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
3. Install development packages
pip install -U pip
pip install -e .
pip install -e ".[all_tokenizers]" # Install all tokenizers (optional)
4. Run tests
pytest -q # or: python -m pytest
Test files are located in the test/ directory:
- test_tokenizers.py - Tokenizer tests
- test_datadownloader.py - Data downloader tests
- test_classifiers.py - Classifier tests
How to Contribute
Bug Reports
Report bugs via GitHub Issues. Please include: - Python version - toiro version - Steps to reproduce - Error messages
Feature Requests
If you have ideas for new features, propose them via GitHub Issues.
Pull Requests
- Fork the repository
- Create a new branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin feature/your-feature) - Create a pull request
Adding a Tokenizer
To add a new tokenizer:
- Create a new file in
toiro/tokenizers/(e.g.,tokenizer_newone.py) - Implement the
tokenize()function - Add an availability check function in
tokenizer_utils.py - Update
__init__.pyto import the new tokenizer - Add tests in
test/test_tokenizers.py
Updating Documentation
Documentation is in the docs/ directory. You can preview it locally using MkDocs:
pip install mkdocs-material mkdocs-static-i18n mkdocstrings[python]
mkdocs serve
Open http://127.0.0.1:8000 in your browser to view the preview.
Questions and Support
If you have questions, feel free to ask via GitHub Discussions or Issues.