pythonstringfunctionsplitcontrol-statements

Write a function that reads a file and returns all of the numbers in it as a list of floats


I have a large text file containing many thousand lines but a short example that covers the same idea is:

vapor dust -2C pb 
x14 71 hello! 42.42
100,000 lover baby: -2

there is a mixture of integers, alphanumerics, and floats.

ATTEMPT AT SOLN. Ive done this to create a single list composed of strings, but I am unable to isolate each cell based on if its numeric or alphanumeric

with open ('file.txt','r') as f:
data = f.read().split()
#dirty = [ x for x in data if x.isnumeric()]
print(data)

The line #dirty fails.

I have had luck constructing a list-of-lists containing almost all required values using the code as follows:

with open ('benzene_SDS.txt','r') as f:  
    for word in f:
        data= word.split()
        clean = [ x for x in data if x.isnumeric()]            
        res = list(set(data).difference(clean))
        print(clean)

But It doesnt return a single list, it a list of lists, most of which are blank [].

There was a hint given, that using the "try" control statement is useful in solving the problem but I dont see how to utilize it.

Any help would be greatly appreciated! Thanks.


Solution

  • If you're mainly asking how one would use try to check for validity, this is what you're after:

    values = []
    with open ('benzene_SDS.txt','r') as f:  
        for word in f.read().split():
            try:
                values.append(float(word))
            except ValueError:
                pass
    print(values)
    

    Output:

    [71.0, 42.42, -2.0]
    

    However, not that this does not parse '100,000' as either 100 or 100000.

    This code would do that:

    import locale
    
    locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
    
    values = []
    with open('benzene_SDS.txt', 'r') as f:
        for word in f.read().split():
            try:
                values.append(locale.atof(word))
            except ValueError:
                pass
    
    print(values)
    

    Result:

    [71.0, 42.42, 100000.0, -2.0]
    

    Note that running the same code with this:

    locale.setlocale(locale.LC_ALL, 'nl_NL.UTF-8')
    

    Yields a different result:

    [71.0, 4242.0, 100.0, -2.0]
    

    Since the Netherlands use , as a decimal separator and . as a thousands separator (which basically just gets ignored in 42.42)