language tool
language toolis an open source grammar tool also known as the OpenOffice Spell Checker. This library allows you to catch grammar and spelling mistakes from a Python script or from a command line interface. let's work with himlanguage_tool_pytonPython package to install withpip install language tool python
Domain. By default,language_tool_python
downloads a LanguageTool server.Bottle
and run in the background to catch grammatical errors locally. However, LanguageTool also offers onePublic HTTP Fix APIthis is also supported, but there is a limit on the number of calls.
LanguageTool ein Python
We will give you a practical example of how to identify and correct your grammatical errors. We work with the following text:
”LanguageTool offers spelling and grammar checking. Just paste your text here and click the "Check Text" button. Click on the colored phrases for details on possible errors.ÖUse this textseen too muchsomevonthe problems that LanguageTool canrecognized. How are youthinkby grammar checkers? PleaseNOthey are not perfect. Style issues have a blue marker: that's 5PN. Am Abend.The weather was fineThursday 27 June 2017“.
I didboldgrammar problems. Let's see how we can detect them using Python:
import language_tool_pythontool = language_tool_python.LanguageTool('en-US')text = """LanguageTool provides spelling and grammar checking. Just paste your text here and click the "Check Text" button. Click on the colored sentences, to see details of possible errors Also use this text to see some of the problems LanguageTool can detect What do you think of the grammar checkers Note that they are not perfect Style problems have a blue marker: It's 5pm .The weather was nice on Thu- Fair, 27 Jun 2017"""# get the matchesmatches = tool.check(text)matches
[match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence will not be capitalized', 'replacements': ['OR'], 'context': '...Phrases for details on possible errors. or Also use this text to match some of the...', 'offset': 168, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical'}), Match( {' ruleId ' : 'TOO_TO', 'message': 'Did you mean "ver"?', 'replacements': ['to see'], 'context': '...s about possible errors or usage Check this text too some of the language problems...', 'offset': 185, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'wrong spelling'}), Match( { 'ruleId': ' EN_A_VS_AN ' , 'message': 'Use "a" instead of \'an\' if the following word does not start with a vowel, for example \'a sentence\', \'a university\ '', 'substitutions' : ['a'], 'context': '... major errors. or use this text too, look at some of the issues LanguageToo...', 'offset': 193 , ' rLength error': 2, ' category' : 'MISC', 'ruleIssueType': 'Spelling error'}), Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: repeated word', 'replacements': ['from'] , 'context': '...error. or use this text to see some of the issues LanguageTool can detect...', 'offset': 200, 'errorLength': 5, 'category': 'MISC', 'ruleIssueType': 'duplication'} ), Match ( {'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible misspelling found.', 'replacements': ['detect'], 'context': '...some of the problems LanguageTool can detect what do you think grammar checker...', 'offset': 241, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'wrong spelling'}), Match({'ruleId': 'DO_VBZ' , 'message': 'After the auxiliary verb \'do\', use the base form of the main verb. Did you mean 'think'?', 'replacements': ['think'], 'context' : ' . . .in LanguageTool you can see. What do you think about the grammar checker? Please don't...', 'offset': 261, 'errorLength': 6, 'category': 'GRAMMAR', 'ruleIssueType' : 'Grammar'}) , Match({'ruleId': 'PLEASE_NO_THAT', 'message': 'Did you mean "note"?", 'replacements': ['note'], 'context': '... Guess n you to grammar checker? Please, it's not that these aren't perfect style issues...', 'offset': 296, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'wrong spelling'}), match ( { 'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. It is 5:00 p.m. The weather on Thursday 27 July was good...", 'offset': 366, ' errorLength' : 22 , 'category': 'REDUNDANCE', 'ruleIssueType': 'style' }), Match({'ruleId': 'DATE_WEEKDAY' , 'message': 'The date June 27, 2017 is not Thursday but Tuesday.', 'replacements': [], 'context': '...late . The weather was nice on Thursday 27 June 2017', 'compensation': 413, 'errorLength': 22, 'category' : 'SEMANTICS', 'ruleIssueType': 'inconsistency'})]
As we can see, we get a detailed dictionary showing theRule ID
, IsNews
etc. For a detailed explanation of each rule ID, see theLanguageTool-Community. Interesting to see the error you got about the date returning a message containing:The date June 27, 2017 is not Thursday, but Tuesday.
However, for this case you don't have a fix because you can't guess what the author meant by inserting this date 🙂
Now that we've spotted the errors, we can correct them.
my_errors = []my_corrections = []start_positions = []end_positions = []para regras sobre partidas: if len(rules.replacements)>0: start_positions.append(offset.rules) end_positions.append(rules.errorlength+offset.rules ) my_mistakes.append(text[rules.offset:rules.errorLength+rules.offset]) my_corrections.append(rules.replacements[0]) my_new_text = list(text)for m in range(len(start_positions)): for i in range(len(text)): my_new_text[start_positions[m]] = my_corrections[m] if (i>start_positions[m] e i<end_positions[m]): my_new_text[i]="" my_new_text = "". join(meu_novo_texto)meu_novo_texto
And we get:
„LanguageTool offers spelling and grammar checking. Just paste your text here and click the "Check Text" button. Click on the colored phrases for details on possible errors. Or use this text to see some of the problems LanguageTool can detect. What do you think of grammar checkers? Please note that they are not perfect. Style issues have a blue marker: It's 5 p.m. M. The weather was fine Thursday, June 27, 2017„
Spelling and grammatical errors
Let's take a look at the bugs we've discovered and the corresponding fixes.
list(zip(my_bugs,my_fixes))
[('o', 'O'), ('too see', 'see'), ('an', 'a'), ('de of', 'de'), ('detected', ' detect '), ('think', 'think'), ('no', 'note'), ('afternoon afternoon.', 'afternoon')]
detailed example
We will give a detailed example considering a simple one sentence example and see the result we get from thelanguage tool
. Our sentence:
„You are the best, but they are good too.!„
text = "You're the best, but they're good too!"matches = tool.check(text)len(matches)# 4
LanguageTool found 4 issues. We can focus on any problem. Let's take a look.
parties[0]
And we get:
Match({'ruleId': 'YOUR_YOU_RE', 'message': 'Did you mean "You are"?', 'replacements': ["You are"], 'context': 'You are the best, but they are good ones too!', 'offset': 0, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'wrong spelling'})
As we can see, mention thoseRule ID
, ANews
for the end user this isDid you mean "Are you?"", recommendedErsatz
, Iscontext
what is the entrancecompensate
What is the location of the beginning of the problem thaterror length
that is the number of characters in the subject, in our case 4 characters thatCategory
of the error that "WRITE ERROR"in our case and inreleIssueType
which "Spelling mistake".
We can show how we can name each element of thelanguage_tool_python.match.Match
Write with the name followed by a period. Let's say we want to callErsatz.
matches[0].replacements# ["Du bist"]
Let's take a look at the other issues that LanguageTool detects. The second problem identified was that “they are“ which correctedleaves
Parties[1]
And we get:
Match({'ruleId': 'THEIR_IS', 'message': 'Did you mean "there"?', 'replacements': ['there'], 'context': 'You're the best, but they're good too! ' , 'offset': 18, 'errorLength': 5, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'wrong spelling'})
The third problem identified was "Also“ which correctedAlso
Parties[2]
And we get:
Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found', 'replacements': ['also', 'okay'], 'context': 'You're the best, but so are you!ok !', 'offset': 28, 'errorLength': 5, 'category': 'TYPOS', 'ruleIssueType': 'spelling error'})
Finally, the last recognized problem was thisdouble spaceswhat is correctedonly space.
Parties[3]
And we get:
Match({'ruleId': 'WHITESPACE_RULE', 'message': 'Possible typo: repeated space', 'replacements': [' '], 'context': 'You're the best, but so are they!' , 'offset': 33, 'errorLength': 2, 'category': 'TYPOGRAPHY', 'ruleIssueType': 'whitespace'})
Automatically apply suggestions to text
We can automatically apply suggestions to text like this:
import language_tool_pythontool = language_tool_python.LanguageTool('en-US')text = 'A sentence with an error in The Hitchhiker's Guide to the Galaxy'tool.correct(text)
'A misspelled sentence in Hitchhiker's Guide to the Galaxy'
discussion
If we want a free Python tool that works grammatically and supports over 20 languages, thenlanguage toolIt's a good choice Sure, no tool is perfect and not only can we rely on grammar and spell checkers, but it sure is something we can use mostly in NLP projects and tasks.
More data science hacks?
He canFollow us in the middlefor more data science tricks