pythonparsingpython-requests

Cant grab "filters" from a website with my parser


i am parsing a website with a choices in filters.

For instance, if it is a filter by a building, the link contains a number at the end:

https://etalongroup.ru/choose/?group=false&object%5B%5D=16

Where 16 is a certain building with flats, same goes for other buildings. Grabbing all the 900+ flats works fine at this point.

BUT, when i try to filter by other parameters (e.g "has a balcony") where the link looks like

https://etalongroup.ru/choose/?group=false&option%5B%5D=23

It always returns an error, which is case not only for a balcony, but literally to the link with any other filter, rather than choosing a building

here is the code that grabs apartments in a building (15) succesfuly:

import requests
import pandas as pd

base_link_url = "https://etalongroup.ru"  # Base URL for apartment links

def fetch_data(action: str, object_id: str, offset: int, limit: int) -> dict:
    url = f"https://etalongroup.ru/bitrix/services/main/ajax.php?action={action}"
    payload = {
        'filter[object]': object_id,  # Use the passed object identifier
        'haveItem': 'true',
        'limit': limit,
        'offset': offset
    }
    response = requests.post(url, data=payload)
    return response.json()

def get_data(action: str, object_id: str, max_offset: int, limit: int = 50) -> list:
    result = []
    offset = 0
    while offset < max_offset:
        response_json = fetch_data(action, object_id, offset, limit)
        if 'data' in response_json and response_json['data'][0]['itemList']:
            for item in response_json['data'][0]['itemList']:
                # Extract additional data from each item
                flat_info = {
                    'id': item.get('id'),
                    'img': base_link_url + item.get('img'),
                    'price': item.get('price'),
                    'priceTotal': item.get('priceTotal'),
                    'area': item.get('area'),
                    'floor': item.get('floor'),
                    'title': item.get('title'),
                    'link': base_link_url + item.get('link'),  # Full link
                    'deliveryName': item.get('deliveryName'),
                    'isBooked': item.get('isBooked'),
                    'object_id': object_id  # Add object_id field
                }
                result.append(flat_info)
            offset += limit
        else:
            break
    return result

# Collect data only for the object with ID 16
all_data = []
object_id = 16
print(f"Fetching data for object_id: {object_id}")
data = get_data(action='etalongroup:filter.FlatFilter.getFlatList', object_id=str(object_id), max_offset=700)  # Adjust max_offset as needed
if data:  # Check if data is available
    all_data.extend(data)

# Create a DataFrame from all collected data
df_all = pd.DataFrame(all_data)

# Count the number of rows in the DataFrame
num_rows = len(df_all)
print(f"Total number of objects: {num_rows}")

# Print the DataFrame
print(df_all.to_string(index=False))

# Export the DataFrame to a CSV file
#df_all.to_csv('exported_data.csv', index=False)

however, if id like to grab all the flats with a balcony (object_id = 23)

Fetching data for object_id: 23
Total number of objects: 0
Empty DataFrame
Columns: []
Index: []

Still i have to remind that all other numbers (that contain exactly filter by a building) work - 15, 16, 18 etc

The question goes: how can i implement filtering by other parameters?


Solution

  • Your code assumes object_id always corresponds to 'filter[object]', which works for buildings but doesnt work for options because the API doesn’t get that filter[object]=23 as a valid filter for features, which should be an array of options rather than straight filter (it's thinking 23 option is instead a filter like buildings).

    your payload for options should match a pattern more like filter[option][]=23

    options are nested in an object within a filter, so buildings, then looking at the options for buildings for the balcony id

    make sense?