Skip to content

Installation

Requirements

  • Python 3.10 or later

Basic install

pip install toiro

toiro ships with a minimal setup, and Janome is available by default.

Add extra tokenizers

Install additional tokenizers individually, e.g., SudachiPy and nagisa:

pip install sudachipy sudachidict_core
pip install nagisa

Other tokenizers

# MeCab (mecab-python3)
pip install mecab-python3

# spaCy Japanese / GiNZA
pip install spacy ginza
pip install "spacy[ja]"

# KyTea (requires system install)
# After installing KyTea by its official instructions:
pip install kytea

# Juman++ v2 (requires system install)
# After installing Juman++ v2 by its official instructions:
pip install pyknp

# SentencePiece
pip install sentencepiece

# fugashi + IPADIC / UniDic
pip install fugashi ipadic
pip install fugashi unidic-lite

# TinySegmenter
pip install tinysegmenter3

# tiktoken (BPE for GPT-4o / GPT-5)
pip install tiktoken

Install all at once

To try all tokenizers at once:

pip install "toiro[all_tokenizers]"

System-level installation required

KyTea and Juman++ require system-level installation before installing the Python package. Please refer to their official documentation for details.

Using Docker

A Docker image with all tokenizers pre-installed is available:

docker run --rm -it taishii/toiro /bin/bash

See Docker Hub for details.

Add classifiers (optional)

To use BERT-based text classifiers:

pip install "toiro[all_classifiers]"

Or install individually:

pip install torch transformers catalyst