I am using the snakebite client from
https://github.com/spotify/snakebite
and i notice a strange behavior when i try to make a directory or move files around in hdfs. Here is my code. All it does it move the contents of the source directory to the destination directory. Then finally, displays the content of the destination directory
def purge_pending(self,source_dir,dest_dir):
if(self.hdfs_serpent.test(path=self.root_dir+"/"+source_dir, exists=True, directory=True)):
print "Source exists ",self.root_dir+source_dir
for x in self.hdfs_serpent.ls([self.root_dir+source_dir]):
print x['path']
else:
print "Source does not exist ",self.root_dir+"/"+source_dir
return
if(self.hdfs_serpent.test(path=self.root_dir+"/"+dest_dir, exists=True, directory=True)):
print "Destination exists ",self.root_dir+dest_dir
else:
print "Destination does not exist ",self.root_dir+dest_dir
print "Will be created"
for y in self.hdfs_serpent.mkdir([self.root_dir+dest_dir],create_parent=True):
print y
for src in self.hdfs_serpent.ls([self.root_dir+source_dir]):
print src['path'].split("/")[-1]
for y in self.hdfs_serpent.rename([src['path']],self.root_dir+dest_dir+"/"+src['path'].split("/")[-1]):
print y
for x in self.hdfs_serpent.ls([self.root_dir+dest_dir]):
print x['path']
and here is a sample output from when the destination did not exist
Source exists /root/source
/root/source/208560.json
/root/source/208571.json
/root/source/208574.json
/root/source/208581.json
/root/source/208707.json
Destination does not exist /root/dest
Will be created
{'path':'/research/dest/'}
208560.json
{'path':'/research/dest/208560.json'}
208571.json
{'path':'/research/dest/208571.json'}
208574.json
{'path':'/research/dest/208574.json'}
208581.json
{'path':'/research/dest/208581.json'}
208707.json
{'path':'/research/dest/208707.json'}
and the weird part is that i have to put those print statements in, otherwise nothing works. So
self.hdfs_serpent.mkdir([self.root_dir+dest_dir],create_parent=True)
does not work, but
for y in self.hdfs_serpent.mkdir([self.root_dir+dest_dir],create_parent=True):
print y
does!!! same for
self.hdfs_serpent.rename([src['path']],self.root_dir+dest_dir+"/"+src['path'].split("/")[-1])
as the above does not work but the following does
for y in self.hdfs_serpent.rename([src['path']],self.root_dir+dest_dir+"/"+src['path'].split("/")[-1]):
print y
is this a bug? am i doing something wrong?
This looks to be by design, as the documentation states that most of the objects returned by the methods are generators. Therefore, the function won't usually do anything until the values have been consumed with next() which for
does implicitly.