pythonregexpython-3.xtkintertext-coloring

Using re.finditer won't return all matches


I've been making a simple colorizer in python. It uses re.finditer to find the index of all words between quotation marks and color those words in a tkinter Text box. For some reason, when the box opens, not all of the words have been found. Here is my code:

import tkinter as tk
import re

def htmlbasiccolorer(self):
        def find2(self, color, warning):
                    string = (str(self.get("1.0",tk.END)))
                    lines=string.split("\n")
                    for i,line in enumerate(lines):
                                y=(i+1)
                                for e in re.finditer(r'"(.*?)"', line):
                                    startindex= e.start()
                                    endindex= e.end()
                                    startindex=(str(y)+'.'+(str(startindex)))
                                    endindex=(str(y)+'.'+(str(endindex)))
                                    startindex=float(startindex)
                                    endindex=float(endindex)
                                    startindex=(round(float(startindex), 2))
                                    endindex=(round(float(endindex), 2))
                                    self.tag_configure(warning, background="white", foreground=color)
                                    self.tag_add(warning, startindex, endindex)
        find2(self, "purple", "id-6")
s=tk.Tk()
s.geometry('1000x600')
t=tk.Text(s)
t.insert(tk.END, """
<!DOCTYPE html>
<html
  xmlns="http://www.w3.org/1999/xhtml"
  xml:lang="en-US"
  lang="en-US"
  dir="ltr"
  xmlns:fb="http://ogp.me/ns/fb#" xmlns:og="http://ogp.me/ns#"
  class=" user-logged-out">

<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# game: http://ogp.me/ns/game#">
        <meta charset="utf-8" />
    <meta name="ROBOTS" content="NOODP" />
    <meta name="ROBOTS" content="NOYDIR" />
        <meta name="verify-v1" content="TgxixMKtxcQ+9NUdD7grKbzw3tAl3iJWlTPSPKt9t0I=" />
    <meta name="p:domain_verify" content="314c7ba9469cc171a12a46b43e0e2aed" />
    <meta name="google-site-verification" content="n7BdKb0xn1E9tRJXvmMxE3Ynr-QajBOi1yA1srT4Nrc" />
    <meta name="apple-itunes-app" content="app-id=329218549">

              <meta name="description" content="Play chess on Chess.com - the #1 chess community with +20 million members around the world. Play online with friends, challenge the computer, join a club, solve puzzles, analyze your games, and learn from hundreds of video lessons. You can also watch top players and compete for prizes." />

        <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1">

    <link rel="preconnect" href="//betacssjs.chesscomfiles.com">
  <link rel="preconnect" href="//images.chesscomfiles.com">

    <link rel="dns-prefetch" href="//betacssjs.chesscomfiles.com">
  <link rel="dns-prefetch" href="//images.chesscomfiles.com">

    <link
    as="font"
    crossorigin="crossorigin"
    href="/bundles/web/fonts/chessglyph-regular.d4a95b80.woff2"
    rel="preload"
    type="font/woff2">

    <link rel="publisher" href="https://plus.google.com/+chess"/>

    <link rel="apple-touch-icon" href="https://betacssjs.chesscomfiles.com
    <div id="challenge-popover"></div>
    <div id="message-popover"></div>
    <div id="modal-video"></div>
    <div id="trophy-popover"></div>
    <div id="user-popover"></div>
    </body>
</html>

""")
t.pack(expand=1, fill=tk.BOTH)
htmlbasiccolorer(t)
s.mainloop()

Below is an example of what it looks like. The purple text has been found and the black text hasn't. Some of the text between two quotation marks is still black. enter image description here I'm using python 3.6 on Windows 10. Any help would be hugely appreciated.


Solution

  • You are using a float for the text widget indexes. The indexes are not floats, they are strings of the form line.column. You are then making the odd choice of rounding the index up or down.

    Let's take a look at "NOYDIR" as an example, since that's one you claim it isn't finding. With just a single print statement you'll see that it's finding NOYDIR, but the indexes it is computing are 14.32 as the start, and 14.4 as the end. Because the end index is before the start index (character 4 is before character 32), tkinter won't highlight that word.

    Why is the second index 14.4? It is because e.start() returns 40. You convert that to a float by appending a "." and the value to the row, yielding "1.40". You then convert it to a float, which converts "1.40" to "1.4". This is exactly why you should not treat text widget indexes as floats. The index is a string in the form of line.column. When you convert it to a float, the value "14.40" is indentical to "14.4", but to the text widget "14.40" and "14.4" are very different things.