日本語の自然言語処理に関するPythonライブラリ、学習済みモデル、辞書、およびコーパスの厳選リストです。
| English | 日本語 (Japanese) | 繁體中文 (Chinese) | 简体中文 (Chinese) | 
Python
Updated on Nov 04, 2025
日本語を単語や形態素に分割し品詞や原形を付与するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| SudachiPy | |||
| Janome | |||
| mecab-python3 | |||
| mecab | |||
| fugashi | |||
| nagisa | |||
| pyknp | |||
| Mykytea-python | |||
| konoha | |||
| natto-py | |||
| rakutenma-python | |||
| python-vaporetto | |||
| dango | |||
| rhoknp | |||
| python-vibrato | |||
| jagger-python | |||
| Mecari | - | - | 
文の構造や依存関係を解析して文法関係を明らかにするライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| ginza | |||
| cabocha | |||
| UniDic2UD | |||
| camphr | |||
| SuPar-UniDic | |||
| depccg | |||
| bertknp | - | - | |
| esupar | |||
| yomikata | |||
| jdepp-python | |||
| lightblue | - | - | |
| natsume-simple | - | - | 
仮名ローマ字や全半角など文字や表記を変換するライブラリ
数量表現や時間表現の抽出・正規化を行うNormalizeNumexpのPython実装
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| pykakasi | |||
| cutlet | |||
| alphabet2kana | |||
| Convert-Numbers-to-Japanese | - | - | |
| mozcpy | |||
| jamorasep | |||
| text2phoneme | - | - | |
| jntajis-python | |||
| wiredify | |||
| mecab-text-cleaner | |||
| pynormalizenumexp | |||
| Jusho | |||
| yurenizer | |||
| e2k | |||
| alkana.py | - | - | |
| englishtokanaconverter | - | - | 
テキストを正規化し解析に適した形に整えるライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| neologdn | |||
| jaconv | |||
| mojimoji | |||
| text-cleaning | - | - | |
| HojiChar | |||
| utsuho | |||
| python-habachen | |||
| kairyou | 
文章を文ごとに自動で分割するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| bunkai | |||
| japanese-sentence-breaker | |||
| sengiri | |||
| budoux | |||
| ja_sentence_segmenter | |||
| hasami | |||
| kuzukiri | |||
| ja-senter-benchmark | - | - | |
| fast-bunkai | 
文に含まれる感情や評価を判定するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| oseti | |||
| negapoji | - | - | |
| pymlask | |||
| asari | 
異なる言語間で文章を自動翻訳するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| jparacrawl-finetune | - | - | |
| JASS | - | - | |
| PheMT | - | - | |
| VISA | - | - | 
文から人名地名組織名などの固有表現を抽出するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| namaco | - | - | |
| entitypedia | - | - | |
| noyaki | |||
| bert-japanese-ner-finetuning | - | - | |
| joint-information-extraction-hs | - | - | |
| pygeonlp | |||
| bert-ner-japanese | - | - | |
| huggingface-finetune-japanese | - | - | |
| novelanalysisbyner | - | - | 
画像から文字を読み取りテキスト化するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| manga-ocr | |||
| mokuro | |||
| handwritten-japanese-ocr | - | - | |
| OCR_Japanease | - | - | |
| ndlocr_cli | - | - | |
| donut | |||
| JMTrans | - | - | |
| Kindai-OCR | - | - | |
| text_recognition | - | - | |
| Poricom | - | - | |
| owocr | - | - | |
| yomitoku | |||
| findtextcenternet | - | - | |
| simple-ocr-for-manga | - | - | |
| jp-ocr-evaluation | - | - | 
事前学習済みモデルを活用して精度を高めるライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| JGLUE | - | - | |
| ginza-transformers | |||
| t5_japanese_dialogue_generation | - | - | |
| japanese_text_classification | - | - | |
| Japanese-BERT-Sentiment-Analyzer | - | - | |
| jmlm_scoring | - | - | |
| allennlp-shiba-model | |||
| evaluate_japanese_w2v | - | - | |
| gector-ja | - | - | |
| Japanese-BPEEncoder | - | - | |
| Japanese-BPEEncoder_V2 | - | - | |
| transformer-copy | - | - | |
| japanese-stable-diffusion | - | - | |
| nagisa_bert | |||
| prefix-tuning-gpt | - | - | |
| JGLUE-benchmark | - | - | |
| jptranstokenizer | |||
| jp-stable | - | - | |
| compare-ja-tokenizer | - | - | |
| lm-evaluation-harness-jp-stable | - | - | |
| llm-lora-classification | - | - | |
| jp-stable | - | - | |
| rinna_gpt-neox_ggml-lora | - | - | |
| japanese-llm-roleplay-benchmark | - | - | |
| japanese-llm-ranking | - | - | |
| llm-jp-eval | - | - | |
| llm-jp-sft | - | - | |
| llm-jp-tokenizer | - | - | |
| japanese-lm-fin-harness | - | - | |
| ja-vicuna-qa-benchmark | - | - | |
| swallow-evaluation | - | - | |
| swallow-evaluation-instruct | - | - | 
日本語処理を補助するその他の汎用ライブラリ
エラスティックサーチやGiNZA、患者表現辞書を使用して患者表現の揺れを吸収する意味構造検索を試してみました。
高速な日本語形態素解析を行うライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| mecab | - | - | |
| jumanpp | - | - | |
| kytea | - | - | 
日本語の文法構造や係り受けを解析するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| cabocha | - | - | |
| knp | - | - | 
その他の日本語NLP関連ライブラリ
| trimatch - Trimatch:(完全 | 接頭辞 | 近似)文字列マッチングライブラリ | 
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| jsc | - | - | |
| aquaskk | - | - | |
| mozc | - | - | |
| trimatch | - | - | |
| resembla | - | - | |
| corvusskk | - | - | 
Rustで実装された日本語形態素解析ライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| lindera | - | ||
| vaporetto | - | ||
| goya | - | ||
| vibrato | - | ||
| yoin | - | ||
| mecab-rs | - | ||
| awabi | - | ||
| kanpyo | - | 
日本語の文字や仮名を変換するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| wana_kana_rust | - | ||
| unicode-jp-rs | - | ||
| kana | - | - | |
| kanaria | - | - | |
| japanese-address-parser | - | - | |
| yosina | - | - | 
日本語全文検索のためのライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| lindera-tantivy | - | ||
| tantivy-vibrato | - | 
日本語処理やIMEを扱う補助ライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| daachorse | - | ||
| find-simdoc | - | ||
| crawdad | - | ||
| tokenizer-speed-bench | - | - | |
| stringmatch-bench | - | - | |
| vime | - | - | |
| voicevox_core | - | - | |
| akaza | - | - | |
| Jotoba | - | - | |
| dvorakjp-romantable | - | - | |
| niinii | - | - | |
| cskk | - | - | |
| japanki | - | - | |
| jpreprocess | - | - | |
| listup_precedent | - | - | |
| jisho | - | - | |
| kanalizer | - | - | |
| koharu | - | - | 
ブラウザやNode.jsで日本語形態素解析を行うライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| kuromoji.js | |||
| rakutenma | |||
| node-mecab-ya | |||
| juman-bin | |||
| node-mecab-async | 
日本語の表記や発音を変換するライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| kuroshiro | |||
| kuroshiro-analyzer-kuromoji | |||
| hepburn | |||
| japanese-numerals-to-number | |||
| jslingua | |||
| WanaKana | |||
| node-romaji-name | |||
| kyujitai.js | |||
| normalize-japanese-addresses | - | - | |
| jaconv | - | - | |
| romaji-conv | - | - | |
| japanese-addresses-v2 | - | - | |
| jptext-to-emoji | - | - | 
日本語NLPを扱うその他のJavaScriptライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| bangumi-data | |||
| yomichan | - | - | |
| proofreading-tool | - | - | |
| kanjigrid | - | - | |
| japanese-toolkit | - | - | |
| analyze-desumasu-dearu | |||
| hatsuon | |||
| sentiment_ja_js | - | - | |
| mecab-ipadic-seed | |||
| Japanese-Word-Of-The-Day | |||
| oskim | - | - | |
| tweetMapping | - | - | |
| pitch-accent | |||
| kana2ipa | - | - | |
| voicevox | - | - | |
| kamiya-codec | - | - | |
| closewords | - | - | |
| japanese-analyzer | - | - | 
Goで日本語形態素解析を行う軽量ライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| kagome | - | - | 
日本語処理を支援する追加ライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| ojosama | - | - | |
| nihongo | - | - | |
| yomichan-import | - | - | |
| imas-ime-dic | - | - | |
| go-kakasi | - | - | |
| go-moji | - | - | |
| ojichat | - | - | |
| name | - | - | 
日本語形態素解析と辞書管理を行うライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| kuromoji | - | - | |
| Sudachi | - | - | |
| SudachiDict | - | - | |
| meval | - | - | 
自然言語処理やOCRを支援するJavaライブラリ
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| kanjitomo-ocr | - | - | |
| jakaroma | - | - | |
| kakasi-java | - | - | |
| Kamite | - | - | |
| react-native-japanese-tokenizer | - | - | |
| elasticsearch-analysis-japanese | - | - | |
| moji4j | - | - | |
| neologdn-java | - | - | |
| elasticsearch-sudachi | - | - | 
単語を数値ベクトルに変換して意味的関係を学習するモデル
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| japanese-words-to-vectors | - | - | |
| chiVe | - | - | |
| elmo-japanese | - | - | |
| embedrank | - | - | |
| aovec | |||
| dependency-based-japanese-word-embeddings | - | - | |
| jawikivec | - | - | |
| jawiki_word_vector_updater | - | - | 
自己注意機構で文脈を理解し高度な言語処理を行うモデル
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| bert-japanese | - | - | |
| japanese-pretrained-models | - | - | |
| bert-japanese | - | - | |
| SudachiTra | |||
| japanese-dialog-transformers | - | - | |
| shiba | |||
| Dialog | - | - | |
| language-pretraining | - | - | |
| medbertjp | - | - | |
| ILYS-aoba-chatbot | - | - | |
| t5-japanese | - | - | |
| pytorch_bert_japanese | - | - | |
| Laboro-BERT-Japanese | - | - | |
| RoBERTa-japanese | - | - | |
| aMLP-japanese | - | - | |
| bert-japanese-aozora | - | - | |
| sbert-ja | - | - | |
| BERT-Japan-vaccination | - | - | |
| gpt2-japanese | - | - | |
| text2text-japanese | - | - | |
| gpt-ja | - | - | |
| friendly_JA-Model | - | - | |
| albert-japanese | - | - | |
| ja_text_bert | - | - | |
| DistilBERT-base-jp | - | - | |
| bert | - | - | |
| Laboro-DistilBERT-Japanese | - | - | |
| luke | - | - | |
| GPTSAN | - | - | |
| japanese-clip | - | - | |
| AcademicBART | - | - | |
| AcademicRoBERTa | - | - | |
| LINE-DistilBERT-Japanese | - | - | |
| Japanese-Alpaca-LoRA | - | - | |
| albert-japanese-tinysegmenter | - | - | |
| japanese-llama-experiment | - | - | |
| easylightchatassistant | - | - | 
ChatGPTやAPIを用いて日本語の対話やテキスト生成を行うためのリソース
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| VRChatGPT | - | - | |
| AITuberDegikkoMirii | - | - | |
| wanna | |||
| ChatdollKit | - | - | |
| ChuanhuChatGPTJapanese | - | - | |
| AISisterAIChan | - | - | |
| vrchatbot | - | - | |
| gptuber-by-langchain | - | - | |
| openai-chatfriend | - | - | |
| chrome-ext-translate-to-hiragana-with-chatgpt | - | - | |
| azure-search-openai-demo | - | - | |
| chatvrm | - | - | |
| sftly-replace | - | - | |
| summarize_arxv | - | - | |
| aiavatarkit | - | - | |
| pva-aoai-integration-solution | - | - | |
| jp-azureopenai-samples | - | - | |
| character_chat | - | - | |
| chatgpt-slackbot | - | - | |
| chatgpt-prompt-sample-japanese | - | - | |
| kanji-flashcard-app-gpt4 | - | - | |
| IgakuQA | - | - | |
| japagen | - | - | |
| generativeai-prompt-sample-japanese | - | - | 
日本語辞書や入力メソッドエディタに関するリソース
品詞や固有表現のラベルが付与された日本語コーパス
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| ner-wikipedia-dataset | - | - | |
| IOB2Corpus | - | - | |
| TwitterCorpus | - | - | |
| UD_Japanese-PUD | - | - | |
| UD_Japanese-GSD | - | - | |
| KWDLC | - | - | |
| AnnotatedFKCCorpus | - | - | |
| anthy | - | - | |
| UD_Japanese-GSDLUW | - | - | 
多言語の対応文を収録した翻訳用データセット
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| small_parallel_enja | - | - | |
| Web-Crawled-Corpus-for-Japanese-Chinese-NMT | - | - | |
| CourseraParallelCorpusMining | - | - | |
| JESC | - | - | |
| AMI-Meeting-Parallel-Corpus | - | - | |
| giant_ja-en_parallel_corpus | - | - | |
| jesc_small | - | - | |
| graded-enja-corpus | - | - | |
| cjk-compsci-terms | - | - | |
| Laboro-ParaCorpus | - | - | |
| google-vs-deepl-je | - | - | |
| matcha | - | - | |
| en-ja-el | - | - | 
会話データを収集して対話モデルの学習に利用するコーパス
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| JMRD | - | - | |
| open2ch-dialogue-corpus | - | - | |
| BSD | - | - | |
| asdc | - | - | |
| japanese-corpus | - | - | |
| BPersona-chat | - | - | |
| japanese-daily-dialogue | - | - | |
| llm-japanese-dataset | - | - | |
| kokorochat | - | - | 
質問応答や含意認識など特定タスク向けの日本語データセット
日本語自然言語処理のツールや技術を学ぶためのチュートリアル
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| spacy_tutorial | - | - | |
| fastTextJapaneseTutorial | - | - | |
| allennlp-NER-ja | - | - | |
| chariot-PyTorch-Japanese-text-classification | - | - | |
| ginza-examples | - | - | |
| DocumentClassificationUsingBERT-Japanese | - | - | |
| BERT_Japanese_Google_Colaboratory | - | - | |
| bert-book | - | - | |
| janome-tutorial | - | - | |
| handson-language-models | - | - | |
| JapaneseNLI | - | - | |
| deep-learning-with-pytorch-ja | - | - | |
| bert-classification-tutorial | - | - | |
| python-nlp-book | - | - | |
| llm-book | - | - | |
| nlp2024-tutorial-3 | - | - | |
| japanese-ir-tutorial | - | - | |
| nlpbook | - | - | |
| kantan-regex-book | - | - | |
| bert-classification-tutorial-2024 | - | - | |
| Gemma2_2b_Japanese_finetuning_colab.ipynb | - | - | |
| nlp100v2020 | - | - | |
| textmining-ja | - | - | |
| nlp2025-tutorial-2 | - | - | |
| nlp100v2025 | - | - | |
| public-annotations | - | - | |
| topic-models-ao | - | - | |
| slp2025 | - | - | |
| book_impress_it-basic-education-ai | - | - | |
| genai-agent-advanced-book | - | - | |
| course2024-nlp | - | - | |
| support-genai-book | - | - | 
日本語自然言語処理に関する研究成果や論文をまとめた資料
| Name | downloads/week | total downloads | stars | 
|---|---|---|---|
| awesome-bert-japanese | - | - | |
| GEC-Info-ja | - | - | |
| dataset-list | - | - | |
| tuning_playbook_ja | - | - | |
| japanese-pitch-accent-resources | - | - | |
| awesome-japanese-llm | - | - |