How to find alternate naming in text (Python tutorial)

Find Misspellings and Alternate Naming in Large Text Datasets (Tutorial)

Below is the Python fastText script I use in the tutorial describing how to find alternate and misspellings in a large text data corpus.

#install pandas and fasttext if you haven't already
#pip install pandas
#pip install fasttext

#view info about our data
import pandas as pd
df = pd.read_csv(r"C:\folder\file.csv")
df.info()

#start fasttext magic ...
import fasttext

#train and save model - view parameters/options at https://fasttext.cc/
model = fasttext.train_unsupervised(r"C:\folder\file.csv", model='skipgram', epoch=2)

model.save_model(r"C:\folder\file.bin")

#load model and see an overview of words
model = fasttext.load_model(r"C:\folder\file.bin")

model.words

#view words related to 'paracetamol'
model.get_nearest_neighbors('paracetamol')

Tags:

Data Science, Python

How to find alternate naming in text (Python tutorial)

Share this: