I have some csv that contains null values for the first thousand of rows in some columns. The file has more than 200 columns. ML Studio is infering the schema as strings, but in reality all of those columns are decimals. How can I change all of those at once instead of going 1 by 1?
Below is the table I have, with the first 1000 rows being null and the remaining rows containing decimal data.
| Column_1 | Column_2 | Column_3 | Column_4 | Column_5 | Column_6 | Column_7 | Column_8 | Column_9 | Column_10 |
|----------|----------|----------|----------|----------|----------|----------|----------|----------|-----------|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
|--------------------|--------------------|-------------------|--------------------|-------------------|-------------------|--------------------|--------------------|-------------------|---------------------|
| 3.0734454878622133 | 7.273655602933219 | 6.422759017554442 | 1.2375730442349464 | 8.300633975042667 | 9.416728718037973 | 7.078674510546692 | 4.9788718688395415 | 5.423851215824898 | 0.28440799036452913 |
| 5.405178011983812 | 3.6230982150244015 | 9.440586438993723 | 9.394297741890705 | 9.855264171698582 | 7.132510908305499 | 3.4916472430982948 | 8.34951022213561 | 3.51424491309551 | 8.15558468010132 |
Inside the file the data looks like below.
While creating a data asset in ML Studio, use the following configurations:
The default settings determine how the data is parsed. It takes a subset of your data and determines the schema. Make sure your data is included in that subset by skipping some null rows.
All column types have changed.
Before skipping the rows, the subset contained only null values, so it automatically recognized them as string
. Try skipping a number of rows according to the data you have.