Facebook page posts checker

apifySOCIAL_MEDIAby apify

Original ↗

Original Uptime

N/A

Original p95

N/A

Best p95

3.0ms

Subscribers

769

Research Kept

0/50

Benchmark History

Type	p50	p95	p99	Correctness	Errors	Result	Date
post_dev	3.0ms	3.0ms	3.0ms	0.0%	0.00%	pass	2026-04-09T13:59:07.037725+00:00

Research Iterations

#49## Hypothesis The `RedisCache.connect()` method creates the Redis client with default connection pool settings, meaning every burst of concurrent requests may queue waiting for a single connection. Increasing the connection pool size (via `max_connections`) will reduce p95 latency under load by allowing multiple Redis operations to execute truly in parallel rather than serializing behind a single pooled connection. ### Expected Impact Under concurrent load, multiple coroutines calling `cache.

reverted

#48## Hypothesis The `ApifyClient` creates a new `httpx.AsyncClient` (with full SSL context initialization, connection pool setup, and event loop registration) on every single API call. Reusing a single persistent `httpx.AsyncClient` instance across all requests eliminates this per-call overhead and enables TCP/TLS connection reuse, which should reduce p95 latency on the upstream calls by avoiding repeated handshake costs. ## Expected Impact Each upstream HTTP call currently pays: object allocat

reverted

#47## Hypothesis The `RedisCache` methods perform separate `json.dumps`/`json.loads` calls using the standard library's `json` module for every cache read and write. Replacing these with `orjson` (which is 2-10x faster for serialization/deserialization) will reduce the CPU overhead on every cache hit path, directly lowering p95 latency for the high-frequency `get_run_status` and `get_dataset_items` endpoints that serve most traffic from cache. ## Expected Impact Cache hits dominate steady-state

reverted

#46## Hypothesis The `get_dataset_items` route reconstructs `PostItem` Pydantic models one-by-one in a list comprehension (`[PostItem(**item) for item in items]`) even when items come from cache, adding unnecessary validation overhead for large datasets. Replacing this with direct dict passthrough and using `model_construct` (which skips validation) will reduce CPU time and p95 latency for dataset-heavy responses. **Expected impact:** For runs returning many posts, the per-item Pydantic construct

reverted

#45## Hypothesis **Pre-build and reuse a single `httpx.AsyncClient` with a connection pool at the application level (stored in `app.state`), rather than creating and destroying a new client on every request in `ApifyClient`.** The current code calls `async with httpx.AsyncClient(...) as client:` for every upstream call, which pays TCP connection establishment + TLS handshake costs on each request. By creating one `AsyncClient` with `limits=httpx.Limits(max_keepalive_connections=20, max_connection

reverted

#44## Hypothesis **Replace the per-request `httpx.AsyncClient` instantiation in `ApifyClient` with a single shared persistent client created at application startup and injected via `app.state`**, eliminating TCP connection setup, TLS handshake, and object allocation overhead on every upstream call. Every call to `start_run`, `get_run_status`, or `get_dataset_items` currently creates a new `httpx.AsyncClient`, opens a fresh TCP+TLS connection to `api.apify.com`, completes the request, then tears i

reverted

#43## Hypothesis **Pipeline the Redis `GET` (cache check) and lock-acquisition `SET NX` into a single round-trip using a Lua script or Redis pipeline in `get_dataset_items` and `start_run`**, eliminating one full Redis network round-trip on every cache-miss path. Currently, each cache-miss makes two sequential Redis calls: `GET` (cache check) → `SET NX` (acquire lock). Each call incurs a separate network RTT to Redis (~0.5–2 ms each). By combining them into a single pipelined or atomic Lua operat

reverted

#42## Hypothesis **Use `ujson` (or the stdlib `json` with `orjson`) for all Redis serialization/deserialization in `cache.py` to reduce the CPU overhead of `json.dumps`/`json.loads` on the hot cache path.** Every cache `get` and `set` call performs a full JSON serialize/deserialize round-trip using Python's stdlib `json` module. Replacing this with `orjson` — which is 2–10× faster for typical dict/list payloads — will reduce the latency added by the cache layer itself, particularly for the `get_d

reverted

#41## Hypothesis **Eliminate the per-request `httpx.AsyncClient` construction overhead by creating a single shared `AsyncClient` at application startup and reusing it across all requests via `app.state`.** Each call to `start_run`, `get_run_status`, or `get_dataset_items` currently creates and tears down a new `httpx.AsyncClient` (including SSL context initialization, connection pool setup, etc.). A persistent shared client with a connection pool allows TCP connections to be reused across request

reverted

#40## Hypothesis **Create the `httpx.AsyncClient` once per `ApifyClient` instance (stored at application startup) rather than instantiating a new client on every request.** Each `async with httpx.AsyncClient(...)` call currently pays TCP connection setup cost (and TLS handshake overhead) on every upstream call; a persistent client with a connection pool will reuse existing connections, eliminating that latency from the p95 path. ### Expected Impact The dominant latency source for cache-miss path

reverted

#39## Hypothesis **Pre-compile the Pydantic `RunInfo`, `PostItem`, and response models using `model_construct()` instead of full validation in the route handlers**, bypassing redundant field validation on data that has already been validated/shaped by the service layer. In the route handlers, data returned from `FacebookService` is already a plain dict from a trusted upstream (Apify) that has passed through the service layer — re-running Pydantic's full validator chain (coercion, constraint check

reverted

#38## Hypothesis **Eliminate the redundant `json.dumps`/`json.loads` round-trip in `RedisCache.set` by using the already-serialized string directly, and skip the intermediate Python object re-serialization in `get` by returning the raw string to callers that immediately re-serialize it — but more impactfully, replace the blocking `json.loads(raw)` in the hot-path `get` method with `orjson.loads` which is 2-3× faster for deserialization.** The Redis `get` call is on the critical path for every cac

reverted

#37## Hypothesis The `wait_for_coalesced_result` method in `cache.py` uses a fixed 0.5-second polling interval, meaning coalesced (non-lock-winning) requests for `start_run` and `get_dataset_items` wait at least 500ms before receiving a result. Reducing this poll interval to 50ms for the first few attempts (exponential back-off starting at 50ms) will dramatically cut the p95 latency for requests that hit the coalesce path, since the lock winner typically completes the upstream call within 1-2 seco

reverted

#36## Hypothesis **Persist a single `httpx.AsyncClient` instance at application startup (stored on `app.state`) and reuse it across all requests, rather than creating and destroying a new client per API call.** Every call to `start_run`, `get_run_status`, or `get_dataset_items` currently executes `async with httpx.AsyncClient(...) as client:`, which allocates a new client, establishes a new TCP connection (including TLS handshake to `api.apify.com`), and tears it down — adding 50–200 ms of connec

reverted

#35## Hypothesis **Use `response_class=ORJSONResponse` for all route handlers to replace Pydantic's default JSON serialization with the faster `orjson` encoder, eliminating the double-serialization overhead (Pydantic model → dict → `json.dumps`) on every successful response path.** The default `JSONResponse` uses Python's `json.dumps`, and Pydantic's `.model_dump()` + re-serialization adds measurable CPU overhead on every request. `ORJSONResponse` (backed by `orjson`) is 3-10x faster at serializa

reverted

#34## Hypothesis **Skip the Redis cache lookup for `start_run` when there is no pre-existing cached result to read — the current code always calls `acquire_coalesce_lock` (a Redis SETNX round-trip) even on the hot path where the job simply needs to be started, adding ~1ms of Redis RTT before every upstream call. Instead, first check if a cached result already exists for the payload hash, and only enter the coalescing lock path on a cache miss.** ### Expected Impact The `start_run` path currently

reverted

#33## Hypothesis **Parallelize the two sequential Redis operations in `get_dataset_items` — the dataset cache lookup and the run-status cache lookup — using `asyncio.gather`, and similarly parallelize the cache lookup + circuit-breaker check pattern.** In `get_dataset_items`, the code first does `await self._cache.get(cache_key)`, then later (after the upstream call) does a second `await self._cache.get(self._cache.run_status_key(run_id))` sequentially. More impactfully, we can parallelize the in

reverted

#32## Hypothesis **Reuse a single persistent `httpx.AsyncClient` instance (with a connection pool) at the application level instead of creating and tearing down a new client on every HTTP call in `ApifyClient`.** Each call to `start_run`, `get_run_status`, or `get_dataset_items` currently does `async with httpx.AsyncClient(...) as client:` which allocates a new client, establishes a fresh TCP/TLS connection to `api.apify.com`, and then tears it all down — adding 50–200 ms of TLS handshake overhea

reverted

#31## Hypothesis **Use `ujson` for all JSON serialization/deserialization in the Redis cache layer instead of the standard `json` module, and also apply it to the httpx response parsing in the Apify client.** The standard `json` module is significantly slower than `ujson` for both encoding and decoding. Since every cache hit/miss involves `json.loads`/`json.dumps`, and every upstream API response involves JSON parsing, switching to `ujson` (which is a C extension) should reduce CPU time on the ho

reverted

#30## Hypothesis **Cache the parsed `Authorization` header token extraction and eliminate the repeated string splitting overhead by inlining the token extraction logic, but more importantly: add an `asyncio.Lock`-based in-process coalescing layer in front of Redis to eliminate the Redis round-trips for the coalesce lock acquire/release on `start_run` and `get_dataset_items` — replacing the two Redis calls (`SETNX` + `DEL`) with a single in-memory check.** The `acquire_coalesce_lock` + `release_co

reverted

#29## Hypothesis **Replace per-request `httpx.AsyncClient` instantiation (which incurs TCP connection setup overhead every call) with a single module-level persistent client that uses a `connection_pool` configured with `limits`, initialized once at application startup and injected into `ApifyClient`.** Every call to `start_run`, `get_run_status`, or `get_dataset_items` currently creates a new `httpx.AsyncClient` via `async with httpx.AsyncClient(...) as client`, which tears down and re-establish

reverted

#28## Hypothesis **Pre-compute the `Authorization` header string once per request in `ApifyClient` rather than rebuilding it on every call, and eliminate the per-request `httpx.AsyncClient` construction overhead by using a module-level shared client with connection pooling configured at import time.** The current code constructs a new `httpx.AsyncClient` (which allocates a new connection pool, SSL context, etc.) on every single API call inside an `async with` block. This means every upstream requ

reverted

#27## Hypothesis **Use a module-level persistent `httpx.AsyncClient` instance with a `connection_pool` (via `limits`) that is created once at startup and reused across all requests, rather than creating and tearing down a new `AsyncClient` on every API call.** Each call to `start_run`, `get_run_status`, and `get_dataset_items` currently does `async with httpx.AsyncClient(...) as client:` which incurs TCP connection establishment overhead (and TLS handshake to `api.apify.com`) on every request. By

reverted

#26## Hypothesis **Replace `json.loads` / `json.dumps` in `RedisCache.get` and `RedisCache.set` with `orjson` for faster serialization, and simultaneously eliminate the per-call `httpx.AsyncClient` construction overhead in `ApifyClient` by instantiating a single shared `AsyncClient` at module level (created once, reused across all requests).** The dominant latency contributors on the hot path are: (1) Python's stdlib `json` is significantly slower than `orjson` for both serialization and deserial

reverted

#25## Hypothesis **Disable Pydantic's response model serialization overhead by removing `response_model` from route decorators and returning pre-built `JSONResponse` objects directly**, eliminating the double-serialization pass (Pydantic validation + JSON encoding) that occurs on every response. When FastAPI uses `response_model`, it validates and re-serializes the returned object through Pydantic even if you already have a dict/model. By returning `JSONResponse` directly and dropping `response_m

reverted

#24## Hypothesis **Replace the polling-based `wait_for_coalesced_result` (which uses `asyncio.sleep(0.5)` intervals adding up to 500ms+ of wasted latency) with a Redis pub/sub notification so waiting coroutines are woken immediately when the winner writes the result.** When multiple concurrent requests coalesce on the same cache key, losers currently poll every 500ms — meaning p95 latency for coalesced requests includes at least one full 500ms sleep cycle. By publishing a notification on the cach

reverted

#23## Hypothesis **Use a module-level persistent `httpx.AsyncClient` with a connection pool (initialized at app startup, shared across all requests) instead of creating and tearing down a new client on every API call.** Every request to `apify_client.py` currently does `async with httpx.AsyncClient(...) as client:` which pays TCP connection establishment + TLS handshake overhead on each call. A persistent client with `limits=httpx.Limits(max_keepalive_connections=10, max_connections=20)` will reu

reverted

#22## Hypothesis **Reduce the Redis round-trips in `get_run_status` and `get_dataset_items` by using `asyncio.gather` to fire the circuit-breaker check and cache lookup concurrently — but more impactfully, eliminate the second Redis `GET` call inside `get_dataset_items` (the run-status cache lookup) by folding it into a single pipelined request.** Specifically, the `get_dataset_items` method makes two sequential Redis `GET` calls (one for the dataset cache key, one for the run-status key to decid

reverted

#21## Hypothesis **Skip Redis cache lookup for the `start_run` path when no coalesce lock exists, and eliminate the double Redis round-trip (lock acquire + cache get) by checking the cache *before* attempting to acquire the coalesce lock.** Currently in `get_dataset_items` and `start_run`, the code acquires a coalesce lock and *then* waits, but for `get_run_status` the cache is checked first (correct pattern). The `start_run` path does zero cache read before the lock — meaning every request pays

reverted

#20## Hypothesis **Maintain a single, long-lived `httpx.AsyncClient` with a connection pool at the application level (stored on `app.state`), shared across all requests, instead of creating and tearing down a new client per API call.** Every request to `ApifyClient` currently executes `async with httpx.AsyncClient(...) as client:`, which creates a new client, establishes a fresh TCP connection (or waits for one from the OS), completes the TLS handshake, and then closes everything — paying this ov

reverted

#19## Hypothesis **Use `orjson` for all JSON serialization/deserialization in the Redis cache layer**, replacing the standard `json` module with `orjson` which is a Rust-based implementation that is typically 2-10x faster for both `dumps` and `loads` operations. The cache `get` and `set` operations are on the critical path for every request. Faster JSON encoding/decoding reduces the CPU time spent serializing cached payloads, directly cutting latency for cache-hit paths (which are the fast path)

reverted

#18## Hypothesis **Eliminate the redundant `await self._cache.release_coalesce_lock(cache_key)` call in the `finally` block of `start_run` and `get_dataset_items` by replacing it with a single atomic `SET ... EX ... GET` pipeline that combines the lock acquisition, result storage, and lock release into fewer round-trips — but more practically, skip the coalescing lock entirely for `get_dataset_items` when the cache already missed, since the lock+poll adds at least one extra Redis RTT on every cach

reverted

#17## Hypothesis **Pre-compute and reuse `httpx` connection pools by creating a single `httpx.AsyncClient` at application startup (stored in `app.state`) rather than instantiating a new client (and its underlying connection pool) on every single API call.** Each call to `start_run`, `get_run_status`, and `get_dataset_items` currently does `async with httpx.AsyncClient(...) as client:`, which creates a new client, establishes a fresh TCP connection (including TLS handshake) to `api.apify.com`, and

reverted

#16## Hypothesis **Disable Pydantic response model validation on the route handlers by setting `response_model=None` and returning pre-built dicts directly as `JSONResponse`, eliminating the per-request Pydantic serialization/validation overhead on the hot response path.** The current routes construct Pydantic model instances (`RunJobResponse`, `RunStatusResponse`, `DatasetItemsResponse`) on every request, which triggers field validation, type coercion, and JSON serialization through FastAPI's re

reverted

#15## Hypothesis **Add `response_model=None` and return raw `JSONResponse` objects directly in the route handlers to bypass FastAPI's Pydantic response serialization/validation overhead on the hot path.** FastAPI's default behavior validates and re-serializes the return value through the `response_model` on every response, which involves Pydantic model instantiation, field validation, and a second `json.dumps` pass. By constructing the `JSONResponse` directly from already-validated dicts (skippin

reverted

#14## Hypothesis **Pipeline the Redis `GET` (cache check) and `SET` operations using `asyncio.gather` where independent lookups occur together, and more importantly: in `get_dataset_items`, eliminate the sequential Redis round-trip to check run status by storing the terminal-status flag inline with the dataset cache key — replacing the extra `await self._cache.get(self._cache.run_status_key(run_id))` call that currently happens *after* the expensive upstream fetch.** The extra Redis lookup in `ge

reverted

#13## Hypothesis **Replace per-request `httpx.AsyncClient` instantiation with a module-level persistent client that is initialized once at application startup and reused across all requests, avoiding the TCP connection setup overhead on every API call.** Every method in `ApifyClient` currently creates a new `httpx.AsyncClient` via `async with httpx.AsyncClient(...) as client:`, which tears down and recreates the underlying connection pool on each call. Even with HTTP keep-alive, the client object

reverted

#12## Hypothesis **Use `ujson` (or `orjson`) instead of the standard `json` module for all serialization/deserialization in the cache layer, and avoid double-encoding by storing raw bytes directly.** The cache layer currently calls `json.dumps`/`json.loads` on every get/set operation using Python's stdlib `json`, which is implemented in pure Python and is significantly slower than C-extension alternatives like `orjson`. Since every request path touches Redis cache (at minimum a GET, often a SET),

reverted

#11## Hypothesis **Reuse a single persistent `httpx.AsyncClient` instance (with connection pooling) across all requests by creating it once at application startup and storing it on `app.state`, rather than instantiating a new client per request.** Every call to `start_run`, `get_run_status`, or `get_dataset_items` currently creates a new `httpx.AsyncClient` via `async with httpx.AsyncClient(...) as client:`, which means a fresh TCP connection (and TLS handshake) to `api.apify.com` is established

reverted

#10## Hypothesis **Skip the Redis cache lookup for `start_run` when no coalescing lock exists, and avoid the double Redis round-trips (lock acquire + cache get) by checking the cache first before attempting to acquire the coalesce lock.** Currently in `start_run`, the code immediately tries to acquire a coalesce lock without first checking if a cached result already exists for that payload hash. This means every cache-hit scenario still pays for a `SET NX` Redis round-trip before it can return. B

reverted

#9## Hypothesis **Parallelize the Redis cache lookup and circuit-breaker check with the actual upstream call preparation by overlapping the Redis `get` for `run_status` with a concurrent Redis `get` for the dataset cache key in `get_dataset_items`, and more importantly: pipeline the two Redis operations (`get dataset_key` + `get run_status_key`) in `get_dataset_items` into a single round-trip using `asyncio.gather`.** Specifically, in `get_dataset_items`, the code currently does a sequential `ca

reverted

#8## Hypothesis **Replace the blocking `json.dumps`/`json.loads` calls in the cache layer with `orjson`, which is a Rust-backed JSON library that is 2-10x faster for serialization/deserialization, reducing the CPU-bound overhead on every cache read and write path.** Every cache hit (the fast path) still pays the cost of `json.loads` on the raw Redis string, and every cache miss pays `json.dumps` before writing. With `orjson`, these operations become significantly cheaper, directly reducing laten

reverted

#7## Hypothesis **Avoid creating a new `httpx.AsyncClient` on every request by using a module-level persistent client with connection pooling in `apify_client.py`.** Every call to `start_run`, `get_run_status`, or `get_dataset_items` currently creates a new `httpx.AsyncClient` via `async with httpx.AsyncClient(...) as client:`, which forces a fresh TCP/TLS handshake to `api.apify.com` on each request. A persistent client reuses existing connections from its pool, eliminating the ~50-150ms TLS ne

reverted

#6## Hypothesis **Pre-serialize JSON once in `cache.set` and skip double-serialization by storing the raw string, while also avoiding redundant `json.loads`/`json.dumps` round-trips in the hot path by using `json.loads` only once on cache hit.** The current code calls `json.dumps(value)` in `cache.set` and `json.loads(raw)` in `cache.get` on every cache interaction — but in `facebook_service.py` the values being stored are already plain Python dicts/lists that came from `httpx`'s `resp.json()` (

reverted

#5## Hypothesis **Replace the 0.5-second fixed polling interval in `wait_for_coalesced_result` with exponential backoff starting at 50ms, and reduce the coalescing lock TTL mismatch overhead by using `asyncio.gather` to parallelize the Redis cache read and circuit-breaker-safe path in `get_run_status` and `get_dataset_items`.** More specifically: the `wait_for_coalesced_result` method currently sleeps 500ms between polls, meaning coalesced requests always wait at least 500ms even if the winner f

reverted

#4## Hypothesis **Reuse a persistent `httpx.AsyncClient` with a connection pool at the application level (initialized in lifespan) instead of creating and destroying a new client on every request.** Each call to `ApifyClient.start_run/get_run_status/get_dataset_items` currently does `async with httpx.AsyncClient(...) as client:` which creates a new client, establishes a new TCP+TLS connection, and tears it down — adding ~50-200ms of overhead per request (TLS handshake alone). A shared client wit

reverted

#3## Hypothesis **Pipeline Redis cache reads and writes using `asyncio.gather` where multiple independent Redis operations occur sequentially**, specifically in `get_dataset_items` where a cache check for dataset items is followed by a separate cache check for run status — replacing the sequential `await` calls with concurrent execution. ### Expected Impact In `get_dataset_items`, after fetching from upstream, the code does a sequential `await self._cache.get(self._cache.run_status_key(run_id))

reverted

#2## Hypothesis **Replace per-request `httpx.AsyncClient` instantiation with a module-level persistent client that uses connection pooling and keepalive**, fixing the root cause that caused iter 0/1 to be reverted — by initializing the client during app lifespan (so it's properly managed) and injecting it via `app.state` rather than creating a new TCP connection on every request. The previous attempts likely failed because the client was created as a module-level global without proper lifecycle

reverted

#1## Hypothesis **Use a persistent (singleton) `httpx.AsyncClient` with connection pooling instead of creating a new client per request.** Currently, every call to `start_run`, `get_run_status`, or `get_dataset_items` creates a new `httpx.AsyncClient` inside an `async with` block, which incurs TCP handshake + TLS negotiation overhead on every upstream call. By instantiating a single `AsyncClient` at app startup and reusing it across all requests, connections to `api.apify.com` will be pooled and

reverted

#0## Hypothesis **Reuse a persistent `httpx.AsyncClient` instance across requests instead of creating and tearing down a new client per API call.** Each call to `start_run`, `get_run_status`, and `get_dataset_items` currently opens a new TCP connection (including TLS handshake to `api.apify.com`), which adds 50–300ms of overhead to every uncached request and dominates p95 latency. ### Why This Matters In `services/apify_client.py`, every method does: ```python async with httpx.AsyncClient(timeo

reverted