Helo!
Using chess.pgn to convert a Chess database into a dataframe, to read the nth game from the database do I need to read all the previous ones first? I can't jump directly to the game n? If I want to distribute the processing in a database with 10^8 games, I can't start reading in the 9e7th game?
import pandas as pd
import chess.pgn
from datetime import datetime as dt
import os
import glob
nome_arquivo = "Analises_01.pgn"
inicio = 0
numero_jogos = 1.47e8
arquivo = open(nome_arquivo, encoding="utf8")
ratings = []
for j in range(numero_jogos):
first_game = chess.pgn.read_game(arquivo)
if j >= inicio:
try:
Brancas = int(first_game.headers["WhiteElo"])
Pretas = int(first_game.headers["BlackElo"])
ratings.append([Brancas, Pretas])
except:
pass
I hope this code can help you. I didn't use Pandas or a data frame, sorry. It will just make a list to indexing all the pgn games. So, game_index[n]
will return the string of the game number n+1.
PGN = open('your_pgn_path_here.pgn')
text_PGN = PGN.read()
game_index = []
actual_game = ''
for string in text_PGN :
if string == '\n' :
if actual_game[-2] == '\n' and actual_game[-1] == '\n' :
actual_game += string
game_index.append(actual_game)
actual_game = ''
else :
actual_game += string
else :
actual_game += string