c++unicodemfctchar

Memory allocation error when simply assigning values


I have the following code in order to obtain a parent directory from a given path. Note: size_t is a typedef for unsigned int.

/****************************************************
This function takes a full path to a file, and returns
the directory path by returning the string up to the last backslash.

Author: Aashish Bharadwaj
*****************************************************/
_TCHAR* GetDirectoryFromPath(const _TCHAR* path)
{
   size_t size = _tcslen(path);
   size_t lastBackslash = 0;
   for (size_t i = 0; i < size; i++)
   {
      if (path[i] == '\\')
      {
         lastBackslash = i;
      }
   }

   _TCHAR* dirPath = new _TCHAR();
   size_t i;
   for (i = 0; i <= lastBackslash; i++)
   {
      dirPath[i] = path[i];
   }
   dirPath[i + 1] = '\0';  //THIS IS VERY NECESSARY! Otherwise, a bunch of garbage is appended to the character array sometimes.

   return dirPath;
}

The issue is that sometimes it appends a weird "@" looking symbol to the end of the string that it returns.enter image description here

I was wondering if anyone knew what this was and why it was doing so.


Solution

  • The problem is that you are allocating only 1 TCHAR and then you are writing past the end of the memory block that you have allocated. Your code has undefined behavior.

    You need to use new _TCHAR[...] instead of new _TCHAR().

    You are also not handling the case where no backslashes are found. In that case, lastBackslash is 0 even though the 1st character is not a backslash. You are not checking for that possibility. And because your loop is using <= instead of <, it will end up copying that 1st character when it shouldn't be.

    Try something more like this instead:

    const size_t c_invalid_index = (size_t) -1;
    
    _TCHAR* GetDirectoryFromPath(const _TCHAR* path)
    {
        size_t lastBackslash = c_invalid_index;
    
        size_t size = _tcslen(path);
        for (size_t i = 0; i < size; ++i)
        {
            if (path[i] == _T('\\'))
            {
                lastBackslash = i;
            }
        }
    
        if (lastBackslash == c_invalid_index)
            return NULL;
    
        _TCHAR* dirPath = new _TCHAR[lastBackslash + 2];
        for (size_t i = 0; i <= lastBackslash; ++i)
        {
            dirPath[i] = path[i];
        }
        dirPath[lastBackslash + 1] = _T('\0');
    
        return dirPath;
    }
    

    Alternatively:

    _TCHAR* GetDirectoryFromPath(const _TCHAR* path)
    {
        const _TCHAR *lastBackslash = NULL;
    
        size_t size = _tcslen(path);
        for (size_t i = 0; i < size; ++i)
        {
            if (path[i] == _T('\\'))
            {
                lastBackslash = &path[i];
            }
        }
    
        if (!lastBackslash)
            return NULL;
    
        size = (lastBackslash - path) + 1;
    
        _TCHAR* dirPath = new _TCHAR[size + 1];
        for (size_t i = 0; i < size; ++i)
        {
            dirPath[i] = path[i];
        }
        dirPath[size] = _T('\0');
    
        return dirPath;
    }
    

    That being said, you really should not be using raw string pointers like this. It would be much safer and cleaner to use std::basic_string<_TCHAR> instead (if not std::string or std::wstring, or std::u16string or std::u32string in C++11 and later), eg:

    #include <string>
    
    typedef std::basic_string<_TCHAR> tstring;
    
    ...
    
    tstring GetDirectoryFromPath(const tstring &path)
    {
        tstring::size_type pos = path.find_last_of(_T('\\'));
        if (pos == tstring::npos)
            return tstring();
        return path.substr(0, pos+1);
    }