In my app for iOS (DLNA media player), I'm seeing a hang I don't understand...I'm hoping someone can shed light on it.
My app is built in Objective C sitting on top of a C++ library, part of which is libupnp. The compilation flag SO_NOSIGPIPE is set, for the record, when looking at the code below.
Broadly speaking, the app works fairly well, at least on an iPod and my iPad, both running iOS 6. It does all the media player sorts of things.
EDIT: I was wrong about the OS on the iPhone 4, I thought it was 6.x, but it's 5.1.1,for what it's worth.
The problem happens when I step up and start testing my app on the iPhone 4 (iOS 5.1.1) and iPhone 5 (iOS 6)...which says to me that there's a timing issue in my code.
The user selects an item of media to play/display on the remote Digital Media Receiver (DMR).
My code calls into libupnp, creating the soap command to make this happen. Then it calls http_RequestAndResponse(), which creates the socket, connect()s to the host, and calls http_SendMessage which calls sock_read_write (I'll include this function later in the message) to send the request I've built (the POST command to Play media on the DMR). Then, using the same socket, calls http_RecvMessage (which calls sock_read_write() again to recv the bytes). At this point, it's called select() waiting on the DMR to give a response to the Play command.
On a different thread, the web server of libupnp gets a request for the bits of the media file we just said to play. So on a different thread, I'm calling http_SendMessage with the bytes to respond to the request, which calls sock_read_write() to write the bytes to the client.
This send() in sock_read_write hangs. Not only does it hang libupnp, but it means that there are no more communications on the socket on any thread.
These hung sockets don't seem to timeout, die, or otherwise terminate. Of course, it's a DLNA media player I'm building and much of the commands and reports about the state of the world goes via these sockets, so my app effectively turns into a zombie: it responds to mouse clicks and what not, but you cannot do anything meaningful.
I've tried making the send() non-blocking. I've tried calling fcntrl(sock,F_SETFL, O_NONBLOCK) to set it to non blocking, and returning if that fails for any reason, just before calling send().
I've tried flags to send() like MSG_NOWAIT (which had no effect on iOS) on the send().
It would seem that it's a timing issue. On the iPad and iPod, I can play music till the cows come home. On iPhone 4 and iPhone 5, I get hangs.
Any suggestions? ( Suggestions to RTFM, read man pages, read books, etc are cheerfully accepted if you tell me which ones specifically answer this...)
Oh, the code for sock_read_write() (from libupnp 1.6.18):
/*!
* \brief Receives or sends data. Also returns the time taken to receive or
* send data.
*
* \return
* \li \c numBytes - On Success, no of bytes received or sent or
* \li \c UPNP_E_TIMEDOUT - Timeout
* \li \c UPNP_E_SOCKET_ERROR - Error on socket calls
*/
static int sock_read_write(
/*! [in] Socket Information Object. */
SOCKINFO *info,
/*! [out] Buffer to get data to or send data from. */
char *buffer,
/*! [in] Size of the buffer. */
size_t bufsize,
/*! [in] timeout value. */
int *timeoutSecs,
/*! [in] Boolean value specifying read or write option. */
int bRead)
{
int retCode;
fd_set readSet;
fd_set writeSet;
struct timeval timeout;
long numBytes;
time_t start_time = time(NULL);
SOCKET sockfd = info->socket;
long bytes_sent = 0;
size_t byte_left = (size_t)0;
ssize_t num_written;
if (*timeoutSecs < 0)
return UPNP_E_TIMEDOUT;
FD_ZERO(&readSet);
FD_ZERO(&writeSet);
if (bRead)
FD_SET(sockfd, &readSet);
else
FD_SET(sockfd, &writeSet);
timeout.tv_sec = *timeoutSecs;
timeout.tv_usec = 0;
while (TRUE) {
if (*timeoutSecs == 0)
retCode = select(sockfd + 1, &readSet, &writeSet,
NULL, NULL);
else
retCode = select(sockfd + 1, &readSet, &writeSet,
NULL, &timeout);
if (retCode == 0)
return UPNP_E_TIMEDOUT;
if (retCode == -1) {
if (errno == EINTR)
continue;
return UPNP_E_SOCKET_ERROR;
} else
/* read or write. */
break;
}
#ifdef SO_NOSIGPIPE
{
int old;
int set = 1;
socklen_t olen = sizeof(old);
getsockopt(sockfd, SOL_SOCKET, SO_NOSIGPIPE, &old, &olen);
setsockopt(sockfd, SOL_SOCKET, SO_NOSIGPIPE, &set, sizeof(set));
#endif
if (bRead) {
/* read data. */
numBytes = (long)recv(sockfd, buffer, bufsize, MSG_NOSIGNAL);
} else {
byte_left = bufsize;
bytes_sent = 0;
while (byte_left != (size_t)0) {
/* write data. */
num_written = send(sockfd,
buffer + bytes_sent, byte_left,
MSG_DONTROUTE | MSG_NOSIGNAL);
if (num_written == -1) {
#ifdef SO_NOSIGPIPE
setsockopt(sockfd, SOL_SOCKET,
SO_NOSIGPIPE, &old, olen);
#endif
return (int)num_written;
}
byte_left -= (size_t)num_written;
bytes_sent += num_written;
}
numBytes = bytes_sent;
}
#ifdef SO_NOSIGPIPE
setsockopt(sockfd, SOL_SOCKET, SO_NOSIGPIPE, &old, olen);
}
#endif
if (numBytes < 0)
return UPNP_E_SOCKET_ERROR;
/* subtract time used for reading/writing. */
if (*timeoutSecs != 0)
*timeoutSecs -= (int)(time(NULL) - start_time);
return (int)numBytes;
}
Thanks!
-Ken
Well, now isn't this interesting...
There were two things wrong with my code:
1) Someone had changed the configuration file, and conveniently removed the -DSO_NOSIGPIPE from the compilation. Always worth checking the details.
2) It seems that there's a bug in sock_read_write() in libupnp.
If -DSO_NOSIGPIPE is defined, then each time a send or recv is attempted, a select is done, and only /then/ is the SO_NOSIGPIPE option applied to the socket. Once the operation is done, the original state of the socket is set again.
When I was first testing -DSO_NOSIGPIPE, I would, sometimes, still get SIGPIPEs. I eventually got around it by also doing something like this in my main.m:
void sighandler(int signum)
{
NSLog(@"Caught signal %d",signum);
}
int main(int argc, char *argv[])
{
signal(SIGPIPE,sighandler);
...
The sharper-witted-than-I are thinking "Moron, you're handling some SIGPIPEs one place, and some another place!"
Turns out the select() statements can also return SIGPIPE.
I removed the sighandler above, and then moved the "#ifdef SO_NOSIGPIPE" section where the SO_NOSIGPIPE property is applied above the select, and the problem completely goes away.
If the select() fails due to an EPIPE, the select() returns -1, It's caught in the next few lines and the function exits with UPNP_E_SOCKET_ERROR,o it can be handled properly, rather than simply being ignored.
It's thoroughly possible I've completely misunderstood what is going on here, in which case I definitely look forward to being schooled.
However, I'm certainly enjoying reliable network communications again.
Hope this helps someone.
-Ken