Welcome to Jihanzi - a Chinese vocabulary extraction tool

Extract vocabulary from epub (must be non-DRM), pdf and plaintext (.txt or .csv) files and download as CSV. When supplied with a list of known vocabulary (one per line) only unknown words are extracted. Furthermore a minimum amout of occurences can be specified as to filter out words that do not appear often in the source.

Extra information such as frequency and location of first occurence within the source document are extracted for each word. For an epub the location is the chapter, for a pdf the page and for a plaintext document the relative position.

Get book recommendations based on your known vocabulary. You can match your vocabulary against a list of currently 138 books and download a CSV file with info about amount of unknown (unique) words and total (unique) words. The books are sorted by amount of unkown words from least to most.