I encountered an issue when using the read_excel() function from the readxl package in R to read an .xlsx file that was originally created on a Windows PC. When I open this file on my Mac, the date format in the resulting data frame is incorrect.
I can manually convert these numeric values to dates using something like as.Date(, origin = "1899-12-30"), but this only works for the rows where the dates are incorrect. Unfortunately, for the rows where the dates are already correct (like the first two rows), this approach leads to incorrect conversion.
Here’s a simplified example of the code I’m using:
> date_data
`Company` Date
**1 CHINA PETROLEUM & CHEMICAL CORPORATION 1998
2 INDUSTRIAL AND COMMERCIAL BANK OF CHINA LIM… 2018 **
3 CHINA MERCHANTS BANK CO., LTD 39911
4 LEGEND HOLDINGS CORPORATION 42096
5 CHINA MERCHANTS BANK CO., LTD. 42354
6 CRRC CORPORATION LIMITED 39559
7 CHINA PACIFIC INSURANCE (GROUP) CO., LTD. 40154
8 SHANGHAI PHARMACEUTICALS HOLDING CO., LTD. 40666
9 CHINA CITIC BANK CORPORATION LIMITED 39176
10 CHINA NATIONAL BUILDING MATERIAL COMPANY LI… 38755
correct read the data even in a character format like the picture attached
It looks like you have two types of data there -- some rows like 1+2 where you have a plain number which refers to the year, and other rows which are dates.
Excel saves dates as days since X (1904-01-01 on pre-2011 mac excel, or 1899-12-30 on windows excel), but formats them to show the date format you want, like "08/04/2009." So under the hood, those rows hold numbers like 39911, which is 109 years of days, so implies a 2009 date on windows excel.
To convert all the rows to plain years, you might do something like:
df <- data.frame(Company = letters[1:4],
Date = c(1998, 2018, 39911, 42096))
df$Date2 = ifelse(df$Date < 2050, df$Date,
lubridate::year(as.Date(df$Date,
origin = "1899-12-30")))
Then we get:
> df
Company Date Date2
1 a 1998 1998
2 b 2018 2018
3 c 39911 2009
4 d 42096 2015