pythonpycharmpytestseparator

Line Separator Python \r\n


We have an English-Latin dictionary in our hands, that is, a list of words in English and their translations into Latin (there may be several translations) in the form of a file with the following contents:

apple - malum, pomum, popula
fruit - baca, bacca, popum
punishment - malum, multa

It is necessary to write a script that reads a dictionary from files using the transmitted paths and creates a Latin-English dictionary from the dictionary. The result should be displayed on the screen.

So for the line described above, the screen should display:

baca - fruit
bacca - fruit
malum - apple, punishment
multa - punishment
pomum - apple
popula - apple
popum - fruit

If different files contain different translations for the same words, multiple translations must be combined, leaving only unique words.

And there are autotests.: There is a line there: student_output_lines = result.stdout.decode().strip() I did everything a long time ago, but the second day I can't get rid of \r

When comparing the two outputs (mine == necessary), I get an error: E AssertionError: assert 'baca - fruit\r\nbacca - fruit\r\nmalum - apple, punishment\r\nmulta - punishment\r\npomum - apple\r\npopula - apple\r\npopum - fruit' == 'baca - fruit\nbacca - fruit\nmalum - apple, punishment\nmulta - punishment\npomum - apple\npopula - apple\npopum - fruit'

My code:

import sys
for filename in sys.argv[1:]:
    with open(filename, 'r', encoding='utf-8') as f:
        res_dict = {}
        for s in f.readlines():
            cur_word = s.split()[0]
            translations = s.strip().replace(',', '').split()[2:]
            for i in translations:
                if i in res_dict:
                    res_dict[i].append(cur_word)
                else:
                    res_dict.setdefault(i, [cur_word])
    res = []
    for k, v in sorted(res_dict.items()):
        res.append(k + ' - ' + ', '.join(v))
    print('\n'.join(res).replace('\r\n', ''))

The input files are in plain text format.txt:

apple - malum, pomum, popula
fruit - baca, bacca, popum
punishment - malum, multa

That is, I always have this disgusting \r, no matter what I do and replace and do everything I can. In the pycharm settings, I also changed the linear separator to /n, changed it both in the project (bottom right) and in settings -> code style -> linear separator. The problem always arises as soon as I add \n anywhere in the code. Help please!

Code from autotest:

def test_from_file(test_input_file, expected_output_file):
    """
    The test verifies the correctness of the script output.
    Files from the 'test/resources/task3' folder are submitted for input:
    - test_input_1.txt
    - test_input_2.txt
    """
    result = subprocess.run(
        ["python", os.path.join(SOLUTION_FOLDER_PATH, "task3.py"), test_input_file],
        stdout=subprocess.PIPE,
    )
    student_output = result.stdout.decode().strip()

    with open(expected_output_file, "r") as expected_output_file:
        expected_output_content = expected_output_file.read().strip()

    assert student_output == expected_output_content

minimal reproducible example:

import sys
for filename in sys.argv[1:]:
    words = ['Hello, ', 'World!']
    print('\n'.join(words))

3 examples:

  1. '' in join: AssertionError: assert 'Hello, World!'
  2. '\r' in join: AssertionError: assert 'Hello, \rWorld!'
  3. '\n' in join: AssertionError: assert 'Hello, \r\nWorld!'

I add \n, and \r comes out again.


Solution

  • The test itself is not OS-portable. It should use text=True in subprocess.run and the output will be the original text and not encoded to sys.stdout. Then .decode() won't be required in the student output as well:

        result = subprocess.run(
            ["python", os.path.join(SOLUTION_FOLDER_PATH, "task3.py"), test_input_file],
            stdout=subprocess.PIPE, text=True  # add text=True
        )
        student_output = result.stdout.strip()  # remove .decode()
    

    If you can't change the test (or convince the test writer to fix it), you may (depending on your environment) change the default TextIOWrapper for sys.stdout to not translate newlines with this line added to your script:

    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='').
    

    Some IDEs will redirect sys.stdout and don't implement sys.stdout.buffer or wrap sys.stdout in a different manner than io.TextIOWrapper. You can test for this from a REPL:

    Command line Python example (redirect line above works):

    Python 3.13.3 (tags/v3.13.3:6280bb5, Apr  8 2025, 14:47:33) [MSC v.1943 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.>>> import sys
    >>> sys.stdout
    <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
    >>> sys.stdout.buffer
    <_io.BufferedWriter name='<stdout>'>
    

    PythonWin (IDE from pywin32 module, redirect line doesn't work):

    PythonWin 3.13.3 (tags/v3.13.3:6280bb5, Apr  8 2025, 14:47:33) [MSC v.1943 64 bit (AMD64)] on win32.
    Portions Copyright 1994-2018 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
    >>> import sys
    >>> sys.stdout
    <pywin.framework.interact.DockedInteractiveView object at 0x0000017658511550>
    >>> sys.stdout.buffer
    Traceback (most recent call last):
      File "<interactive input>", line 1, in <module>
      File "C:\dev\Python313\Lib\site-packages\pythonwin\pywin\mfc\object.py", line 24, in __getattr__
        return getattr(o, attr)
    AttributeError: 'PyCCtrlView' object has no attribute 'buffer'