💎 PREMIUM: Gallery - HD Photos!
Introduction to Japanese Natural Language Processing
Masato Hagiwara and Paul O'Leary McCann
Completion: Winter 2021 (expected).
Available in both English and Japanese
About This Book
A thorough guide for programmers working with Japanese text, covering fundamental issues like tokenization and recent research topics like generating natural language texts.
Working examples are accompanied by extensive reference to allow problem solving even without a background in Japanese or Machine Learning.
Basics of Japanese Linguistics
All the background knowledge required for processing Japanese language texts on computers — characters, words, grammar, as well as encodings and emoji.
Open-source Tools
Use open-source tools to analyze Japanese texts, including: word tokenization with MeCab, PoS tagging and parsing with spaCy.
Dictionaries & Datasets
A thorough overview of dictionaries, corpora, and other datasets commonly used for Japanese language processing.
Word Embeddings
Use word and sentence embeddings to represent, visualize, and retrieve Japanese texts.
Language Generation and Conversion
Use neural networks to generate Japanese texts and and convert between Kana and Kanji.
Natural Language Understanding
Use transfer learning to understand Japanese texts through sentiment analysis and named entity recognition.
Who This Book Is For
This book is written for anyone who's interested in dealing with Japanese texts, including software developers, AI researchers and engineers, and language experts.
No Math Required
You don't need to know math to understand the book. We focus on how to use tools to get things done, rather than explaining the theory behind their implementation.
No Japanese Required
While highly desirable, you don't need to understand Japanese to read the book, and example texts will be thoroughly annotated.
Basic Python
The only prerequiste for this book is basic Python skills. Extensive code examples are used to show how to approach and solve problems.
Table of Contents
- Chapter 1: Basics of Japanese linguistics
- 1.1 Japanese language overview
- 1.2 Orthography: What kinds of letters are there?
- 1.3 Morphology: What kinds of words are there?
- 1.4 Syntax: How are sentences structured?
- 1.5 Technical Notes: How are texts represented?
- Chapter 2: Morphological analysis and open-source tools
- 2.1 Tokenizers and morphological analyzers: overview and basic use
- 2.2 Advanced tokenization
- 2.3 Dependency parsers
- Chapter 3: Datasets
- 3.1 Overview
- 3.2 Dictionaries
- 3.3 General Corpora
- 3.4 Specialized Corpora
- Chapter 4: Word and sentence embeddings
- 4.1 Word embeddings
- 4.2 Sentence embeddings
- 4.3 Multilingual embeddings
- Chapter 5: Natural language generation and conversion with Transformer
- 5.1 Introduction to Transformer
- 5.2 Text generation
- 5.3 Kana-Kanji conversion / transliteration
- Chapter 6: Natural language understanding via transfer learning
- 6.1 Introduction to transfer learning
- 6.2 Sentiment / document classification
- 6.3 Named entity recognition
Subscribe for updates
We'll let you know when the book is completed/updated!