telethonprivacyanonymity

To which extent am I anonymous with Telethon?


Consider the following model situation. Don't be surprised with carelessness of Alice and Bob — maybe they're my alter egos.

Both Alice and Bob have Telegram accounts. Using my.telegram.org on her mobile phone, Alice creates new app, retrieves the pair (app api_id, app api_hash), writes them on a piece of paper and for some strange reason gives this paper to me. Then on my PC I try to run the app's code that has the string client = TelegramClient('bob', api_id, api_hash) in it, the prompt Please enter your phone (or bot token) appears in my stdout, I type in Bob's phone number, Bob receives verification code and for some strange reason shows it to me. Now I'm logged in to the Bob's Telegram account with Alice's api_id and api_hash.

Then, using my PC, I run the app's code: I call some Telethon functions, such as GetFullChannelRequest, GetFullUserRequest, fetch some messages etc. My code does not send any messages on Bob's behalf, it only does massive data scraping.

Questions:

  1. When I use Telethon for scraping, what happens on IP packet level? Does it look like series of GET requests from my PC directly to one of the Telegram servers, or are there some intermediary Telethon servers?

  2. Who can log what channels and users are scraped by my code? Is it correct that Telegram can log every my scraping request together with my IP address and Bob's Telegram ID and Alice's App ID, my ISP can only log TLS-encrypted traffic between me and a Telegram server, Bob can only note that the app session becomes active from time to time, and Alice gets no information regarding my behavior at all?


Solution

  • it only does massive data scraping

    Note that Telegram is free to ban any account for API abuse. You should be sure to read through their Terms of Service.

    When I use Telethon for scraping, what happens on IP packet level?

    Telethon creates a new socket and connects via TCP to one of Telegram's datacenters. The first datacenter IP is hardcoded or connection would not be possible. The library will switch datacenter if Telegram says so.

    are there some intermediary Telethon servers

    There is no such thing. In older versions, when RPC errors occurred, it made requests to another server so that we could provide better a experience and document all errors users found. This has long been removed, and the library now only talks to Telegram (there is a bug open as it may still make a request to other servers when downloading remote media, but not when downloading files uploaded to Telegram, however, note that Telegram may use CDNs).

    Who can log what channels and users are scraped by my code?

    Telegram obviously knows the API ID used, what account it belongs to, what account is logged-in, and your IP address simply by virtue of having a direct TCP connection to you. It also knows what requests you make, as it would not be able to process them otherwise. Whether they log or do anything with this information we cannot know, as we have no access to the server source code.

    What happens over the wire is another story, as there are likely intermediate peers between you and Telegram. The protocol used to exchange messages with Telegram is known as MTProto. Whether you trust it or this is safe I can't speak for, as I am not an expert in that field.

    Likewise, I've developed the library and I trust it enough to use it. Whether others trust it enough is up to them, and this is true for any other library too (or even the official applications).