Research Log
Autoresearcher iterations — Karpathy-style optimization loop
Total Iterations
200
Kept
0
0% acceptance rate
Valid Patches
0
0% patch success
Avg Improvement
0.0%
p95 latency reduction (kept)
Iteration Performance
Kept
Reverted
| API | # | Hypothesis | Patch | p95 Before | p95 After | Delta | Result | Date |
|---|---|---|---|---|---|---|---|---|
| houzz | 49 | ## Hypothesis The `UsageCapMiddleware` calls `_current_month()` on every request, which invokes `datetime.now(timezone.utc).strftime(...)` — but more importantly, the Redis key is constructed via string concatenation on every request rather than being cached. The real opportunity is that `_get_redis()` uses a global singleton but re-checks `if _redis_client is None` without any fast-path optimization, and the `incr` + conditional `expire` are two sequential round-trips when the key already exis | invalid | ms | ms | % | skip | 2026-06-11T11:21:20.990122+00:00 |
| houzz | 48 | ## Hypothesis The `ApifyClient._fetch_dataset()` method fetches all dataset items in a single blocking HTTP call with no pagination or streaming, but more critically, it creates a new `httpx.AsyncClient` on each request (via the `http` property check) because the property only guards against `None` or closed clients — yet the real bottleneck is that the Apify polling loop fires a status-check HTTP request on every iteration even after the run has clearly just started, burning unnecessary round- | invalid | ms | ms | % | skip | 2026-06-11T11:20:28.541112+00:00 |
| houzz | 47 | ## Hypothesis The `_coalesced_fetch` function stores results in `_in_flight_results` and then immediately deletes them in the `finally` block — before waiting callers have had a chance to read the result after `event.wait()` — creating a race where late waiters get `None` instead of the actual result. By keeping result storage separate from the cleanup (only cleaning up after the event is set and a brief window passes), or better yet, passing results through the event mechanism via a local copy | invalid | ms | ms | % | skip | 2026-06-11T11:19:43.860730+00:00 |
| houzz | 46 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` twice — once before the upstream call and once after — but the actual upstream call (which can take hundreds of milliseconds) runs **outside** the lock. However, the post-call lock acquisitions add two extra asyncio lock contention points per request. More importantly, `_should_attempt()` is a synchronous method that mutates `self._state` (OPEN→HALF_OPEN transition) **without holding the lock**, meaning the state transition | invalid | ms | ms | % | skip | 2026-06-11T11:19:07.248030+00:00 |
| houzz | 45 | ## Hypothesis The `ApifyClient.run_actor()` method uses a fixed `POLL_INTERVAL` (default 3.0s) sleep before every status check, meaning a run that completes in 3.1 seconds will wait a full 6 seconds before being detected. Using exponential backoff with a short initial interval (0.5s) that caps at `POLL_INTERVAL` will detect fast completions much sooner while not over-polling slow runs. ## Expected Impact For actor runs completing in under 10 seconds, p95 latency could drop by 2–8 seconds depe | invalid | ms | ms | % | skip | 2026-06-11T11:18:43.435701+00:00 |
| houzz | 44 | ## Hypothesis The `_coalesced_fetch` function cleans up `_in_flight` and `_in_flight_results` inside a `finally` block that re-acquires `_in_flight_lock` **before** calling `event.set()`, meaning all waiting coroutines are unblocked only after the lock is released — but more critically, the result is deleted from `_in_flight_results` immediately after the event fires, creating a race where waiting coroutines may read `None` instead of the actual result. More importantly, the cleanup sequence ac | invalid | ms | ms | % | skip | 2026-06-11T11:18:04.715568+00:00 |
| houzz | 43 | ## Hypothesis The `ApifyClient` recreates the `httpx.AsyncClient` on every request if it has been closed (checking `self._http.is_closed` on the `http` property), but more critically, it never reuses TCP connections across requests because the client is created fresh per-process-start and the connection pool is never warmed. The real latency win is to pre-connect the `httpx.AsyncClient` during `lifespan` startup and configure explicit connection limits/keepalive so that the TLS handshake and TC | invalid | ms | ms | % | skip | 2026-06-11T11:17:09.025800+00:00 |
| houzz | 42 | ## Hypothesis The `UsageCapMiddleware` creates a new Redis connection check on every request by calling `_get_redis()` which tests `if _redis_client is None` without any connection pooling validation, but more importantly the middleware performs an `incr` followed by a conditional `expire` as two separate round-trips — these can be collapsed into a single Lua script execution, eliminating one full Redis RTT (~1-2ms) on every non-cached request. **Expected impact:** Every request that passes th | invalid | ms | ms | % | skip | 2026-06-11T11:16:50.424250+00:00 |
| houzz | 41 | ## Hypothesis The `CircuitBreaker.call()` method holds `self._lock` only during the guard check and the post-result state update, but **not** during the actual upstream call — yet it still acquires the lock **twice per request** (once before, once after). For the common `CLOSED` state path, these lock acquisitions serialize all concurrent requests through an `asyncio.Lock` unnecessarily. Since the state machine only needs atomic reads/writes of a few integers (no I/O), we can replace the `async | invalid | ms | ms | % | skip | 2026-06-11T11:16:04.586527+00:00 |
| houzz | 40 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` three times per request (once to check/register, once to store results, once to clean up), and the cleanup path holds the lock while calling `event.set()`, serializing all waiting coroutines through the lock before they can proceed. By restructuring so the lock is released before `event.set()` and results/cleanup are batched into a single lock acquisition, we eliminate unnecessary lock contention that adds latency spikes u | invalid | ms | ms | % | skip | 2026-06-11T11:15:18.836050+00:00 |
| houzz | 39 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips per request (`INCR` then `EXPIRE`) when a new key is created, adding latency on the first request of each month. Using a Lua script (or Redis pipeline) to atomically combine the `INCR` and conditional `EXPIRE` into a single network round-trip will halve Redis overhead on the middleware hot path. **Expected impact:** For the common case where the key already exists (mid-month), only one round-trip occurs anyway. Bu | invalid | ms | ms | % | skip | 2026-06-11T11:15:01.996874+00:00 |
| houzz | 38 | ## Hypothesis The `ApifyClient` uses a polling loop that always waits a full `POLL_INTERVAL` (3 seconds) before checking status for the first time, and then re-authenticates/rebuilds params on every poll by calling `self._auth_params()` and constructing a new URL string each iteration. More critically, the actor run URL could be fetched with `waitForFinish` query parameter to let Apify's server hold the connection until completion (long-polling), eliminating the entire client-side polling loop | invalid | ms | ms | % | skip | 2026-06-11T11:14:10.692045+00:00 |
| houzz | 37 | ## Hypothesis The `CacheService` uses two separate Redis round-trips for every cache hit path — one `GET` in the service layer and potentially one `PING` in `health_check` — but more critically, the `UsageCapMiddleware` always does an `INCR` + conditional `EXPIRE` as two sequential Redis commands on every non-cached request, adding ~1-2ms of serial network latency. Replacing the `INCR`+`EXPIRE` pair with a single Lua script (atomic `INCR`+`EXPIRE` in one round-trip) eliminates one Redis RTT fro | invalid | ms | ms | % | skip | 2026-06-11T11:13:46.323361+00:00 |
| houzz | 36 | ## Hypothesis The `_coalesced_fetch` function stores exceptions in `_in_flight_results` but then cleans up **both** the event and the result dict entries inside the `finally` block *before* calling `event.set()` — meaning waiting coroutines that wake up immediately after `event.set()` will find no result in `_in_flight_results` and silently return `[]` instead of re-raising the exception or returning the actual data. More critically for latency: the global `_in_flight_lock` is acquired **three | invalid | ms | ms | % | skip | 2026-06-11T11:13:03.096666+00:00 |
| houzz | 35 | ## Hypothesis The `ApifyClient.run_actor()` method waits a full `POLL_INTERVAL` (3 seconds) before making its *first* status check, unconditionally sleeping before any polling attempt. Using an exponential backoff starting at a short interval (e.g., 0.5s) and capping at `POLL_INTERVAL` would reduce the idle wait for fast actor runs while preserving poll rate for long-running ones. ## Expected Impact For actor runs that complete quickly (e.g., small `maxItems` or cache-warm upstream), the firs | invalid | ms | ms | % | skip | 2026-06-11T11:12:37.848598+00:00 |
| houzz | 34 | ## Hypothesis The `ApifyClient` instantiates a new `httpx.AsyncClient` on every request (or after it's closed) rather than reusing a persistent connection pool. By pre-creating the client at module load time with a connection pool sized for concurrency, we eliminate TCP/TLS handshake overhead and connection establishment latency on every upstream call, which is the dominant source of p95 latency for non-cached requests. The current `http` property recreates the client whenever `self._http is N | invalid | ms | ms | % | skip | 2026-06-11T11:12:03.944466+00:00 |
| houzz | 33 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` (a global asyncio lock) **three separate times** per request — once to check/register, once to store the result, and once in the `finally` block to clean up — and the cleanup path does two separate `async with _in_flight_lock` acquisitions (store result + cleanup are separate). For the "first caller" path, reducing these to **two lock acquisitions** (one for register, one combined result-store+cleanup) eliminates a redunda | invalid | ms | ms | % | skip | 2026-06-11T11:11:37.623330+00:00 |
| houzz | 32 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` twice per successful request (once to check state, once to record success), but more critically it holds the lock during the *entire upstream call* in the check phase — actually no, looking carefully, it only holds it briefly. The real issue is that `_coalesced_fetch` acquires `_in_flight_lock` (a single global asyncio.Lock) **three separate times** per request, and the second acquisition (by waiting callers reading `_in_fli | invalid | ms | ms | % | skip | 2026-06-11T11:11:06.183079+00:00 |
| houzz | 31 | ## Hypothesis The `UsageCapMiddleware` creates a new Redis connection lazily on the first request via `_get_redis()`, and uses a module-level global `_redis_client` that is initialized outside the async event loop context, potentially causing connection pool contention or re-creation on each worker. By pre-initializing the Redis client at app startup using the same connection pool as `CacheService` (reusing `CacheService._client`), we eliminate the redundant second Redis connection and its asso | invalid | ms | ms | % | skip | 2026-06-11T11:10:44.817449+00:00 |
| houzz | 30 | ## Hypothesis The `ApifyClient.run_actor()` method starts a run and only begins polling after a full `POLL_INTERVAL` sleep (default 3.0s), even though Apify itself supports a `waitForFinish` query parameter that causes the run-start POST to block server-side for up to 60 seconds and return the finished run status directly — eliminating all polling round-trips for fast actor runs and reducing the first-poll overhead for slower ones. **Expected impact:** For actor runs that complete in under 60 | invalid | ms | ms | % | skip | 2026-06-11T11:10:01.343249+00:00 |
| houzz | 29 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` three separate times — once to check/register the event, once to store the result, and once in the `finally` block to clean up — causing repeated lock contention on every upstream fetch. Consolidating the result storage and cleanup into a single `finally` block acquisition eliminates one redundant lock round-trip on the hot path, reducing p95 latency for cache-miss requests under concurrent load. ### Expected Impact Each | invalid | ms | ms | % | skip | 2026-06-11T11:09:36.416532+00:00 |
| houzz | 28 | ## Hypothesis The `ApifyClient` polling loop uses a fixed `POLL_INTERVAL` (default 3.0s) between status checks, meaning every request incurs at least one full 3-second sleep before even the first status check. Switching to an **exponential backoff with a short initial delay** (e.g., starting at 0.5s, capping at the configured `POLL_INTERVAL`) will dramatically reduce p95 latency for fast actor runs that complete in under 10 seconds, without meaningfully increasing poll frequency for slow runs. | invalid | ms | ms | % | skip | 2026-06-11T11:09:10.955129+00:00 |
| houzz | 27 | ## Hypothesis The `CacheService._build_key()` method calls `json.dumps()` with `sort_keys=True` followed by SHA-256 hashing on every request, even for cache hits. By pre-computing a faster key using a simpler deterministic string construction with `hashlib.md5` (sufficient for non-cryptographic cache keys) and avoiding repeated object serialization for the common case, we can reduce the CPU overhead on the hot path where most requests are cache hits. ## Expected Impact SHA-256 is ~2-3x slower | invalid | ms | ms | % | skip | 2026-06-11T11:08:48.961304+00:00 |
| houzz | 26 | ## Hypothesis The `CircuitBreaker.call()` method holds `self._lock` only for the state-check preamble, but then re-acquires it twice more (once on success, once on failure) — each acquisition is a full `async with` on an `asyncio.Lock`, which suspends the coroutine and reschedules it through the event loop. Since every upstream call goes through this path, replacing the three separate lock acquisitions with a single lightweight approach using a non-async lock (`threading.Lock`) via `asyncio.Loc | invalid | ms | ms | % | skip | 2026-06-11T11:08:27.118828+00:00 |
| houzz | 25 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips (`INCR` then `EXPIRE`) on every non-cached request, but these can be replaced with a single atomic Lua script call, eliminating one full Redis RTT (~0.5–2ms) per request on the hot path. **Expected impact:** For cached responses where Redis is the only I/O, this cuts the middleware overhead nearly in half. For uncached requests the absolute saving is smaller relative to the Apify poll time, but at p95 where Redis | invalid | ms | ms | % | skip | 2026-06-11T11:08:09.152092+00:00 |
| houzz | 24 | ## Hypothesis The `_coalesced_fetch` function stores results in `_in_flight_results` and then immediately removes them in the `finally` block *before* waiting callers have a chance to read the result, creating a race condition where coalesced waiters get `None` instead of the actual data — but more critically for p95 latency, the `finally` block pops the result dict entry and then calls `event.set()`, meaning every coalesced waiter does a redundant `_in_flight_results.get(cache_key)` that retur | invalid | ms | ms | % | skip | 2026-06-11T11:07:28.762277+00:00 |
| houzz | 23 | ## Hypothesis The `ApifyClient` creates a new `httpx.AsyncClient` instance lazily (and recreates it if closed) without connection pooling persistence across requests, meaning each actor status-poll HTTP request may incur TCP connection establishment overhead. By pre-initializing a single persistent `httpx.AsyncClient` at startup with explicit connection pool limits and keepalive settings, we eliminate repeated TCP handshake latency during the polling loop, reducing p95 latency for uncached requ | invalid | ms | ms | % | skip | 2026-06-11T11:06:47.203003+00:00 |
| houzz | 22 | ## Hypothesis The `CacheService.get()` method deserializes JSON on every cache hit using `json.loads()`, but the result is then immediately re-serialized back to JSON by `JSONResponse` in the route handler — this round-trip parse+re-serialize adds unnecessary CPU overhead on the hot path. By storing and returning the raw JSON string from Redis and passing it directly to a `Response(content=..., media_type="application/json")`, we eliminate one full JSON parse cycle on every cached request. **E | invalid | ms | ms | % | skip | 2026-06-11T11:05:56.404493+00:00 |
| houzz | 21 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` (an `asyncio.Lock`) **twice** on the success path — once to store the result and once in the `finally` block to clean up — each acquisition requiring a coroutine suspension. These redundant lock acquisitions on the hot path add unnecessary overhead; consolidating the result storage and cleanup into a single `finally`-block lock acquisition eliminates one lock round-trip per upstream call, reducing p95 latency for the first | invalid | ms | ms | % | skip | 2026-06-11T11:05:03.460580+00:00 |
| houzz | 20 | ## Hypothesis The `ApifyClient` uses a polling loop with a fixed `POLL_INTERVAL` (default 3.0s) that always waits the full interval before the first status check, and uses `asyncio.sleep` with a constant delay even when the actor completes quickly. By implementing **exponential backoff with a short initial delay** (e.g., starting at 0.5s, doubling up to the configured `POLL_INTERVAL` ceiling), fast actor runs (which complete in <5s for cached/simple queries) will be detected much sooner, dramat | invalid | ms | ms | % | skip | 2026-06-11T11:04:29.482252+00:00 |
| houzz | 19 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` twice per successful request (once for the pre-check and once for the success-count update), and each acquisition involves an `async with` that yields to the event loop. Since the common path (CLOSED state, no failures) only needs to *read* the state for the pre-check and then do a trivial reset of `_failure_count = 0` on success, we can eliminate the second lock acquisition entirely for the fast path by checking state witho | invalid | ms | ms | % | skip | 2026-06-11T11:04:04.211251+00:00 |
| houzz | 18 | ## Hypothesis The `ApifyClient._fetch_dataset()` method fetches all dataset items in a single synchronous request, but the Apify API supports a `limit` parameter and returns paginated results — more importantly, the HTTP response body for large datasets must be fully buffered before `resp.json()` parses it, causing a serialization bottleneck. By passing `stream=False` with `httpx` but using `resp.json()` directly is fine; however the real issue is that `httpx.AsyncClient` is being recreated on | invalid | ms | ms | % | skip | 2026-06-11T11:03:43.425838+00:00 |
| houzz | 17 | ## Hypothesis The `UsageCapMiddleware` calls `_current_month()` on every request, which invokes `datetime.now(timezone.utc).strftime(...)` — but since the month string only changes once per month, this computation (and its associated timezone lookup) can be cached with a simple module-level variable that is invalidated only when the day changes, reducing redundant work on every request's hot path. However, more impactful: the middleware always performs `await r.incr(key)` followed by a conditi | invalid | ms | ms | % | skip | 2026-06-11T11:03:21.591714+00:00 |
| houzz | 16 | ## Hypothesis The `CacheService._build_key()` method calls `json.dumps()` with `sort_keys=True` followed by SHA-256 hashing on every request, even for cache hits. By pre-computing and caching the serialized key string for repeated parameter dictionaries using an LRU cache on a stable frozenset representation, we avoid redundant serialization and hashing on the hot path. However, looking more carefully at the code and iteration history, the most impactful untried optimization is: **the `Circuit | invalid | ms | ms | % | skip | 2026-06-11T11:03:02.381270+00:00 |
| houzz | 15 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` **three times** per upstream call (once to check/register, once to store the result, once to clean up), and the cleanup lock acquisition happens *after* `event.set()` is called in the `finally` block — meaning waiting coroutines can wake up and attempt to read `_in_flight_results` while the primary coroutine hasn't finished cleaning up yet, creating a race. More impactfully, the triple lock acquisitions on the hot path add | invalid | ms | ms | % | skip | 2026-06-11T11:02:39.439951+00:00 |
| houzz | 14 | ## Hypothesis The `UsageCapMiddleware.dispatch()` method calls `_current_month()` on every request, which invokes `datetime.now(timezone.utc).strftime(...)` — but more critically, it performs two sequential Redis round-trips (`INCR` then `EXPIRE`) when a key is new, adding latency on the first request of each month per user. These can be replaced with a single Lua script (atomic `INCR`+`EXPIRE` in one round-trip), and the month string can be cached with a short-lived module-level variable to av | invalid | ms | ms | % | skip | 2026-06-11T11:01:52.116745+00:00 |
| houzz | 13 | ## Hypothesis The `CircuitBreaker.call()` method holds `self._lock` only during the pre-flight check and post-flight state update, but the lock is an `asyncio.Lock` acquired via `async with` — which is correct. However, the **actual bottleneck** is that `_coalesced_fetch` acquires `_in_flight_lock` (a single global lock) **three separate times** for every request: once to check/register, once to store the result, and once to clean up. For the "first caller" path, these three sequential lock acq | invalid | ms | ms | % | skip | 2026-06-11T11:01:25.812270+00:00 |
| houzz | 12 | ## Hypothesis The `ApifyClient` creates a new `httpx.AsyncClient` instance lazily on each request (checking `is_closed` every time), but never reuses connection pooling across the actor start, status poll, and dataset fetch calls within a single `run_actor` invocation. Using a single persistent `httpx.AsyncClient` initialized at startup with explicit connection pool limits will eliminate per-request TLS handshake and TCP connection overhead, reducing p95 latency on the 3–5 sequential HTTP calls | invalid | ms | ms | % | skip | 2026-06-11T11:01:00.539756+00:00 |
| houzz | 11 | ## Hypothesis The `_coalesced_fetch` function stores results in `_in_flight_results` and then immediately removes them in the `finally` block *before* waiting callers have had a chance to read them — but more critically, the lock is acquired **three separate times** (check, store result, cleanup), and the `finally` block removes the result from `_in_flight_results` unconditionally even though waiting callers retrieve it *after* `event.set()`. However, the deeper latency issue is that `_in_fligh | invalid | ms | ms | % | skip | 2026-06-11T11:00:13.750564+00:00 |
| houzz | 10 | ## Hypothesis The `CacheService.get()` method deserializes JSON on every cache hit using `json.loads()`, but the result is then immediately re-serialized by `JSONResponse` in the route handler. By storing and returning the raw JSON string from Redis and passing it directly to a `Response(media_type="application/json")`, we eliminate one redundant serialize/deserialize round-trip on the hot (cached) path, which dominates p95 latency for repeated queries. ## Expected Impact Cache hits currently | invalid | ms | ms | % | skip | 2026-06-11T10:59:10.591433+00:00 |
| houzz | 9 | ## Hypothesis The `ApifyClient.run_actor` method uses a fixed `POLL_INTERVAL` (default 3.0s) throughout the entire wait loop, but actor runs typically take tens of seconds — meaning the **first status check waits a full 3 seconds even though the actor may have already finished** (e.g., for cached/fast runs). Replacing the constant sleep with an **exponential backoff starting at 0.5s** (capped at `POLL_INTERVAL`) will reduce wasted wait time on fast runs and lower p95 latency without changing co | invalid | ms | ms | % | skip | 2026-06-11T10:58:44.974014+00:00 |
| houzz | 8 | ## Hypothesis The `apify_client.run_actor` method polls for actor completion using fixed `POLL_INTERVAL` (default 3.0s) sleeps, but the **first sleep always waits the full interval before even checking status**, wasting up to 3 seconds on fast runs. Adding an initial short-delay poll (0.5s) before falling into the regular polling cadence will reduce p95 latency for runs that complete quickly without changing behavior for slow runs. ## Expected Impact For actor runs that finish in under 3 seco | invalid | ms | ms | % | skip | 2026-06-11T10:58:19.638499+00:00 |
| houzz | 7 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips on every non-cached request — `INCR` followed by a conditional `EXPIRE` — where the `EXPIRE` is only needed on the very first call of the month. Replacing these two calls with a single atomic Lua script (or a Redis pipeline) eliminates one network round-trip per request on the hot path, directly reducing p95 latency for all requests that pass through the usage guard. ### Expected Impact Each Redis round-trip over | invalid | ms | ms | % | skip | 2026-06-11T10:58:00.518809+00:00 |
| houzz | 6 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` **twice per successful request** (once to check `_should_attempt`, then again to reset the failure count), serializing all concurrent requests through an asyncio lock even though the actual upstream call happens outside it. By using a non-locking fast-path check with a single lock acquisition only when state needs to change, we eliminate unnecessary lock contention under concurrent load. **Expected impact:** Under concurren | invalid | ms | ms | % | skip | 2026-06-11T10:57:35.382222+00:00 |
| houzz | 5 | ## Hypothesis The `_fetch_dataset` call in `apify_client.py` fetches all dataset items in a single synchronous HTTP request without any pagination or streaming, but more critically, the `httpx.AsyncClient` is recreated on every request (the `is_closed` check and re-instantiation path in the `http` property) because the singleton `apify_client` instance loses its client between requests due to connection pool exhaustion or closure — causing a new TCP+TLS handshake penalty on every upstream call. | invalid | ms | ms | % | skip | 2026-06-11T10:57:10.136637+00:00 |
| houzz | 4 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` a **third** time in the `finally` block to clean up `_in_flight` and `_in_flight_results` **before** calling `event.set()`, which means every waiting coroutine wakes up, re-enters `_in_flight_lock` to read `_in_flight_results.get(cache_key)`, and finds the result already deleted — returning an empty list instead of the real data. This forces all coalesced waiters to independently fire their own upstream calls, destroying t | invalid | ms | ms | % | skip | 2026-06-11T10:56:45.676765+00:00 |
| houzz | 3 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` a second time after the upstream call completes to store results, then releases the event **outside** that lock in a `finally` block — but crucially, it also **pops both the event and the result** from the dictionaries in the `finally` block *before* waiting callers can read the result, creating a race where late-joining waiters get `None` instead of the real data. More impactful for p95: every coalesced waiter re-acquires | invalid | ms | ms | % | skip | 2026-06-11T10:56:00.768107+00:00 |
| houzz | 2 | ## Hypothesis The `APIFY_POLL_INTERVAL` defaults to 3.0 seconds, meaning the actor completion is detected up to 3 seconds late on average (1.5s average, up to 3s worst-case added to p95). Switching from fixed-interval polling to **exponential backoff with a short initial interval** (e.g., 0.5s → 1s → 2s → 3s → capped at 3s) will detect fast actor completions much sooner while not hammering the API for long-running jobs. ### Expected Impact For actor runs that complete in under 10 seconds, the | invalid | ms | ms | % | skip | 2026-06-11T10:55:38.176990+00:00 |
| houzz | 1 | ## Hypothesis The `apify_client` uses a lazily-recreated `httpx.AsyncClient` that is checked and potentially reconstructed on every call (`if self._http is None or self._http.is_closed`), and more critically it creates a **new client per request** when the old one is closed — but the real latency killer is that the HTTP/1.1 connection pool is not being reused across the polling loop iterations because each `await asyncio.sleep` yields control and the client may be shared unsafely. More importan | invalid | ms | ms | % | skip | 2026-06-11T10:54:38.971782+00:00 |
| houzz | 0 | ## Hypothesis The `_coalesced_fetch` function acquires `_in_flight_lock` twice per coalesced waiter (once to check/register, once to read the result after the event fires), and the "first caller" path holds no lock during the actual upstream call — but the **result cleanup happens inside the lock before `event.set()`**, meaning waiters that wake up immediately try to read from `_in_flight_results` *after* the key has already been deleted, silently returning `[]` instead of the real data. This c | invalid | ms | ms | % | skip | 2026-06-11T10:53:47.832620+00:00 |
| Twitter Timeline Scraper | 49 | ## Hypothesis The `RedisCache.get_or_set()` method performs a `cache.get()` call, and then the route handlers in `routes/twitter.py` perform an **additional** redundant `cache.get()` call (to set `was_cached`) before calling `fetch_user_timeline()`/`fetch_user_profile()` which internally calls `cache.get_or_set()` — resulting in **2 Redis round-trips on every cache hit** instead of 1. Eliminating the redundant pre-check by having `get_or_set()` return a `(result, was_cached)` tuple will halve R | invalid | ms | ms | % | skip | 2026-06-11T10:52:48.805126+00:00 |
| Twitter Timeline Scraper | 48 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get(cache_key)` call before `fetch_user_timeline`/`fetch_user_profile`, which internally also calls `cache.get` inside `get_or_set`. This doubles the Redis round-trips on every cache hit (the common hot path), adding ~1-2ms per request. Removing the pre-check and instead inspecting the result's origin by comparing before/after (or using a wrapper that signals cache provenance) eliminates one network round-trip on every ca | invalid | ms | ms | % | skip | 2026-06-11T10:52:14.061873+00:00 |
| Twitter Timeline Scraper | 47 | ## Hypothesis The `CircuitBreaker.call()` method holds `self._lock` while awaiting the upstream `fn(*args, **kwargs)` call (the lock is acquired, state is checked, and only released *after* the try block completes via the second `async with self._lock` for `_on_success`/`_on_failure`). This means every concurrent request serializes through the lock waiting for the upstream Apify call to complete, turning what should be parallel in-flight requests into a queue — directly inflating p95 latency un | invalid | ms | ms | % | skip | 2026-06-11T10:51:46.777468+00:00 |
| Twitter Timeline Scraper | 46 | ## Hypothesis The `RedisCache` uses `json.dumps` with no options for serialization and `json.loads` for deserialization on every cache read/write, but the larger overhead is that `hashlib.sha256(...).hexdigest()[:16]` is called to build cache keys — this is fine — however the real issue is that the `httpx.AsyncClient` in `ApifyClient._get_client()` is re-created lazily per-call check (`if self._http is None or self._http.is_closed`) without connection pooling being warmed up at startup, meaning | invalid | ms | ms | % | skip | 2026-06-11T10:51:00.850881+00:00 |
| Twitter Timeline Scraper | 45 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get(cache_key)` call *before* calling `fetch_user_timeline`/`fetch_user_profile`, which internally calls `cache.get_or_set()` that also does a `cache.get()` as its first step. This means every cache-hit request pays for **two** sequential Redis round-trips instead of one; eliminating the pre-check and deriving `was_cached` by comparing whether the result was already in Redis (via a sentinel in the returned data or by chec | invalid | ms | ms | % | skip | 2026-06-11T10:50:17.097212+00:00 |
| Twitter Timeline Scraper | 44 | ## Hypothesis The `RedisCache` client is created with `socket_connect_timeout=5` and `socket_timeout=5` in `cache.py`, but the `UsageCapMiddleware` creates a **separate** Redis client in `usage_guard.py` with `socket_connect_timeout=2` and `socket_timeout=2`. More critically, `usage_guard._get_redis()` uses a module-level lazy singleton that is never connection-pooled with the main cache client — every middleware invocation may be reusing a single unoptimized connection. By sharing the same `re | invalid | ms | ms | % | skip | 2026-06-11T10:49:41.872340+00:00 |
| Twitter Timeline Scraper | 43 | ## Hypothesis The `routes/twitter.py` handlers call `cache.get(cache_key)` explicitly to detect `was_cached`, and then `fetch_user_timeline` (via `cache.get_or_set`) calls `cache.get(cache_key)` **again** — doubling the Redis round-trips on every cache-hit path. Eliminating the redundant pre-check by reading `was_cached` from whether the result was already in `_in_flight_results` or by passing a flag through `get_or_set` would halve Redis calls on the hot path, directly reducing p95 latency for | invalid | ms | ms | % | skip | 2026-06-11T10:48:52.161075+00:00 |
| Twitter Timeline Scraper | 42 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` a second time after the upstream call completes (for `_on_success` / `_on_failure`), which means every concurrent request serializes on lock acquisition twice per call. Replacing the post-call lock acquisitions with a single non-blocking atomic update using a simple `asyncio.Lock` held only for the brief state-mutation step — and crucially, **not** re-acquiring the lock during the upstream I/O — will eliminate lock contentio | invalid | ms | ms | % | skip | 2026-06-11T10:48:22.663578+00:00 |
| Twitter Timeline Scraper | 41 | ## Hypothesis The `routes/twitter.py` handlers perform an explicit `cache.get(cache_key)` call purely to determine the `was_cached` flag, and then `fetch_user_timeline` calls `cache.get_or_set()` which does **another** `cache.get()` internally — resulting in two Redis round-trips on every cache hit. By eliminating the pre-check and instead threading the cache-hit signal through the return value of `get_or_set`, we can halve the Redis calls on cached requests, directly reducing p95 latency for t | invalid | ms | ms | % | skip | 2026-06-11T10:47:25.621947+00:00 |
| Twitter Timeline Scraper | 40 | ## Hypothesis The `CircuitBreaker` uses a single `asyncio.Lock` that serializes all concurrent callers even in the normal `CLOSED` state — each request must acquire and release the lock just to check state and then again after the call completes. Replacing this with a `threading`-style read/write pattern using an atomic state check (without locking) for the hot `CLOSED` path, and only using the lock for state *transitions*, will eliminate lock contention under concurrent load and reduce p95 lat | invalid | ms | ms | % | skip | 2026-06-11T10:46:21.813733+00:00 |
| Twitter Timeline Scraper | 39 | ## Hypothesis The `CircuitBreaker._lock` is held across the entire upstream call (`await fn(*args, **kwargs)`) in the HALF_OPEN path, serializing all concurrent requests behind a single long-running I/O operation. Replacing the coarse lock with fine-grained atomic state transitions (only locking for state reads/writes, not during the actual upstream call) will eliminate lock contention for the common CLOSED-state path and prevent one slow upstream call from blocking all others. ## Expected Imp | invalid | ms | ms | % | skip | 2026-06-11T10:45:46.097922+00:00 |
| Twitter Timeline Scraper | 38 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get(cache_key)` call before `fetch_user_timeline`/`fetch_user_profile`, which itself calls `cache.get_or_set()` (another Redis GET). This doubles the Redis round-trips on the hot cache-hit path. Eliminating the pre-check and instead passing a flag through the result (or checking after the fact using a wrapper) can halve Redis latency on cache hits. The fix: wrap `cache.get_or_set` to return a `(result, was_cached)` tuple | invalid | ms | ms | % | skip | 2026-06-11T10:44:58.735201+00:00 |
| Twitter Timeline Scraper | 37 | ## Hypothesis The `cache.get_or_set()` method calls `await self.get(key)` (a Redis round-trip) at the start, and then if a cache miss occurs and the request becomes the "leader," it calls `await self.set(key, result, ttl)` after fetching — but the route handlers **also** call `await cache.get(cache_key)` explicitly before calling `fetch_user_timeline`, which itself calls `cache.get_or_set()` that does another `await self.get(key)`. This means every cache-miss request path hits Redis **twice** f | invalid | ms | ms | % | skip | 2026-06-11T10:44:12.048729+00:00 |
| Twitter Timeline Scraper | 36 | ## Hypothesis The `CircuitBreaker.call()` method holds `self._lock` during the entire pre-call state check, then re-acquires it after the upstream call completes. This double-locking pattern serializes concurrent requests through the breaker unnecessarily — but the real overhead is that `_transition_to()` always does an async Redis write (`cache.set_circuit_state()`), even on every success in the common CLOSED→CLOSED path where no state change occurs. **Expected impact:** Eliminating the uncon | invalid | ms | ms | % | skip | 2026-06-11T10:43:53.979867+00:00 |
| Twitter Timeline Scraper | 35 | ## Hypothesis The `routes/twitter.py` handlers call `cache.get(cache_key)` explicitly to detect cache hits for the `was_cached` flag, and then `fetch_user_timeline` calls `cache.get_or_set()` which performs a **second** Redis GET for the same key on a cache miss. On cache hits, this double-GET still serializes two round-trips to Redis before returning. Eliminating the pre-check by instead passing the cache key into `fetch_user_timeline` and returning a `(result, was_cached)` tuple would halve R | invalid | ms | ms | % | skip | 2026-06-11T10:42:58.923658+00:00 |
| Twitter Timeline Scraper | 34 | ## Hypothesis The `ApifyClient._get_client()` method checks `self._http.is_closed` on every call, but more importantly the `httpx.AsyncClient` is created without an explicit `http2=False` flag and without connection pooling being pre-warmed — however the real bottleneck is that `run_actor` creates a new `httpx.AsyncClient` instance check on every retry attempt due to the `is_closed` guard potentially triggering re-creation mid-flight. More significantly: the `APIFY_MAX_RETRIES=2` default means | invalid | ms | ms | % | skip | 2026-06-11T10:42:03.961854+00:00 |
| Twitter Timeline Scraper | 33 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get()` call before calling `fetch_user_timeline()`/`fetch_user_profile()` (which internally calls `cache.get_or_set()`, which also calls `cache.get()`). Eliminating this duplicate Redis round-trip by tracking whether the result was served from cache inside `cache.get_or_set()` — returning a sentinel tuple `(result, was_cached)` — will save one Redis RTT (~0.5–2ms) on every request, reducing p95 latency for both cache-hit | invalid | ms | ms | % | skip | 2026-06-11T10:41:06.759226+00:00 |
| Twitter Timeline Scraper | 32 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` twice per successful call — once before the upstream call and once after — creating unnecessary serialization overhead. By using a lightweight check without locking for the common "CLOSED, no contention" path and only locking for state transitions, we can eliminate the second lock acquisition on every successful request. **Expected Impact:** In the common case (circuit CLOSED, request succeeds), the current code does: `lock | invalid | ms | ms | % | skip | 2026-06-11T10:40:32.465386+00:00 |
| Twitter Timeline Scraper | 31 | ## Hypothesis The `cache.get_or_set()` method performs a redundant Redis GET call after coalesced waiters wake up (`await self.get(key)` fallback), when the result is already available in `self._in_flight_results`. Additionally, the primary waiter path unnecessarily falls through to a Redis GET even when `_in_flight_results[key]` is populated. Eliminating this redundant Redis round-trip for coalesced requests will reduce p95 latency for burst traffic patterns where multiple concurrent requests | invalid | ms | ms | % | skip | 2026-06-11T10:39:40.229202+00:00 |
| Twitter Timeline Scraper | 30 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get(cache_key)` call *before* calling `fetch_user_timeline`/`fetch_user_profile`, which itself calls `cache.get_or_set()` that also does a `cache.get()` internally — resulting in **two sequential Redis round-trips on every cache hit** (the most common hot path). Eliminating the pre-check in the route handlers and instead inferring `was_cached` by comparing the result against a sentinel (or by passing a flag through the se | invalid | ms | ms | % | skip | 2026-06-11T10:38:49.417127+00:00 |
| Twitter Timeline Scraper | 29 | ## Hypothesis The `CircuitBreaker` uses an `asyncio.Lock` that serializes all concurrent requests at the check/update points, and the lock is acquired **twice** per successful call (once before the upstream call and once after). Replacing the lock-based state checks with lock-free reads (using Python's GIL-safe attribute reads) and only locking during actual state transitions will eliminate unnecessary serialization and reduce p95 latency under concurrent load. ## Expected Impact Under concur | invalid | ms | ms | % | skip | 2026-06-11T10:38:07.250614+00:00 |
| Twitter Timeline Scraper | 28 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get(cache_key)` call **before** calling `fetch_user_timeline`/`fetch_user_profile`, which internally calls `cache.get_or_set()` that also calls `cache.get()` first. This means every request pays for two Redis round-trips on the hot path (cache hit or miss) — eliminating the redundant pre-check and instead inferring `was_cached` from whether the result was already in Redis before the `get_or_set` completes will halve the R | invalid | ms | ms | % | skip | 2026-06-11T10:37:27.483021+00:00 |
| Twitter Timeline Scraper | 27 | ## Hypothesis The `UsageCapMiddleware` creates a new Redis connection lazily on every process start and then reuses it, but uses a module-level `_redis_client` global that is initialized inside an `async` function without any lock, meaning under concurrent startup the client could be initialized multiple times. More impactfully, the middleware performs two sequential Redis round-trips (`INCR` then `EXPIRE`) for new keys, where these could be replaced with a single atomic Lua script call, elimin | invalid | ms | ms | % | skip | 2026-06-11T10:36:55.201593+00:00 |
| Twitter Timeline Scraper | 26 | ## Hypothesis The `cache.get_or_set()` method performs a redundant `await self.get(key)` call at the start, even though the route handlers in `routes/twitter.py` **already** called `await cache.get(cache_key)` just before invoking `fetch_user_timeline`/`fetch_user_profile`. This means every cache miss incurs **two** sequential Redis round-trips before any upstream work begins — eliminating the duplicate `get` inside `get_or_set` when the caller already knows the cache is cold would save one ful | invalid | ms | ms | % | skip | 2026-06-11T10:36:03.822921+00:00 |
| Twitter Timeline Scraper | 25 | ## Hypothesis The `CircuitBreaker.call()` method holds `self._lock` across two separate critical sections per call (once for state checking, once for success/failure recording), and since Python's `asyncio.Lock` is non-reentrant, this serializes all concurrent upstream calls through the lock — creating a queue of coroutines waiting to acquire it after each upstream response completes. Replacing the lock-based state check with a lockless read for the common CLOSED-state fast path, using the lock | invalid | ms | ms | % | skip | 2026-06-11T10:34:56.773165+00:00 |
| Twitter Timeline Scraper | 24 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant `cache.get(cache_key)` call before `fetch_user_timeline()`/`fetch_user_profile()`, which itself calls `cache.get_or_set()` that also does a `cache.get()`. This doubles the Redis round-trips on every cache-hit path (the common case). Eliminating the pre-check and instead inferring `was_cached` from whether the result round-tripped through `get_or_set()` without calling the upstream `fetch_fn` will halve Redis reads on cache hits, | invalid | ms | ms | % | skip | 2026-06-11T10:34:08.545930+00:00 |
| Twitter Timeline Scraper | 23 | ## Hypothesis The `ApifyClient.run_actor()` method creates a new `httpx.AsyncClient` inside `_get_client()` on every call if the previous client was closed, but more critically, it logs a full `f-string` info message including the entire `input_data` dict on every upstream call. More importantly, the retry loop calls `_get_client()` on each attempt, which involves an `is_closed` attribute check — but the real issue is that the `httpx.AsyncClient` is constructed without HTTP/2 or connection reus | invalid | ms | ms | % | skip | 2026-06-11T10:33:46.010408+00:00 |
| Twitter Timeline Scraper | 22 | ## Hypothesis The `cache.get_or_set()` method performs two separate Redis round-trips on a cache miss (one `GET` in step 1, then a `SET` after fetching), but more critically, **every call to `get_or_set()` is preceded by a redundant `cache.get()` call in the route handler** (to detect `was_cached`). This means every request makes **2 Redis round-trips before even checking in-flight coalescing**, and the route handler's explicit `cache.get()` result is then thrown away since `get_or_set()` calls | invalid | ms | ms | % | skip | 2026-06-11T10:33:05.931817+00:00 |
| Twitter Timeline Scraper | 21 | ## Hypothesis The `_current_month()` function in `usage_guard.py` calls `datetime.now(timezone.utc).strftime(...)` on every single request, which involves timezone localization and string formatting overhead. Caching the month string and only recomputing it when the month actually changes will reduce per-request CPU overhead in the hot middleware path. ## Expected Impact `UsageCapMiddleware.dispatch()` runs on every request before the route handler. While `datetime.now()` is individually chea | invalid | ms | ms | % | skip | 2026-06-11T10:32:48.576834+00:00 |
| Twitter Timeline Scraper | 20 | ## Hypothesis The `routes/twitter.py` handlers call `cache.get(cache_key)` explicitly to check `was_cached`, then `fetch_user_timeline`/`fetch_user_profile` calls `cache.get_or_set()` which immediately calls `cache.get(key)` again — resulting in two sequential Redis round-trips on every cache hit. Eliminating the redundant pre-check by passing a flag through the result or checking the in-flight dict state would save one Redis RTT (~0.5–2ms) on every cached request, directly reducing p95 latency | invalid | ms | ms | % | skip | 2026-06-11T10:31:59.618907+00:00 |
| Twitter Timeline Scraper | 19 | ## Hypothesis The `CircuitBreaker._lock` is held for the entire duration of the upstream `fn(*args, **kwargs)` call in `HALF_OPEN` state but released before the call in `CLOSED` state — yet the lock is re-acquired after the call completes for `_on_success`/`_on_failure`. This means in `CLOSED` state every request acquires the lock *twice* sequentially (pre-call check + post-call update), adding lock contention overhead. More importantly, the `cache.set_circuit_state()` Redis write inside `_on_s | invalid | ms | ms | % | skip | 2026-06-11T10:31:18.905260+00:00 |
| Twitter Timeline Scraper | 18 | ## Hypothesis The `CircuitBreaker.call()` method acquires `self._lock` twice per request (once before the upstream call to check/transition state, and once after to call `_on_success`), and `_on_success`/`_on_failure` are themselves `async` methods that `await` a Redis write (`set_circuit_state`) while holding the lock. For cached-hit requests that still go through `cb.call()`, this Redis write blocks the lock and serializes all concurrent requests through it. The fix is to make `_on_success` s | invalid | ms | ms | % | skip | 2026-06-11T10:31:01.559182+00:00 |
| Twitter Timeline Scraper | 17 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant Redis `GET` call (`cached_result = await cache.get(cache_key)`) solely to determine the `was_cached` flag, then `fetch_user_timeline` calls `cache.get_or_set()` which performs *another* Redis `GET` internally. This doubles the Redis round-trips on every cache-hit path, adding ~1-2ms per request. By eliminating the pre-fetch `GET` and instead inferring `was_cached` from whether `get_or_set` returned without calling the upstream f | invalid | ms | ms | % | skip | 2026-06-11T10:30:13.377102+00:00 |
| Twitter Timeline Scraper | 16 | ## Hypothesis The `cache.get_or_set()` method calls `await self.get(key)` which performs a full Redis round-trip, and then in the route handlers there is **another** `await cache.get(cache_key)` call before `fetch_user_timeline`/`fetch_user_profile` is invoked — meaning every cached request pays **two** Redis round-trips instead of one. Eliminating the redundant pre-check in the route handlers and returning the `was_cached` flag based on whether `get_or_set` hit the cache (by having the service | invalid | ms | ms | % | skip | 2026-06-11T10:29:24.225268+00:00 |
| Twitter Timeline Scraper | 15 | ## Hypothesis The `ApifyClient._get_client()` method checks `self._http.is_closed` on every single request, which requires accessing the httpx client's internal state. More importantly, the `httpx.AsyncClient` is being recreated as a new instance with fresh connection pools whenever it's found closed, discarding any warm TCP connections — but the real latency hit is that `_get_client()` is called inside `run_actor()` on every attempt, and the client is instantiated lazily rather than at module | invalid | ms | ms | % | skip | 2026-06-11T10:28:54.528807+00:00 |
| Twitter Timeline Scraper | 14 | ## Hypothesis The `CircuitBreaker._lock` is an `asyncio.Lock` that is acquired **twice per call** — once before the upstream call and once after — and the post-call acquisition (`_on_success` / `_on_failure`) blocks all other concurrent requests even though it only updates simple integer counters and calls `cache.set_circuit_state` (a Redis write). Replacing the lock with a simple non-blocking `asyncio.Lock` that is held only for the brief in-memory state mutation (skipping the Redis write insi | invalid | ms | ms | % | skip | 2026-06-11T10:28:14.510526+00:00 |
| Twitter Timeline Scraper | 13 | ## Hypothesis The `routes/twitter.py` handlers perform a redundant Redis `GET` call (to detect cache hits for the `was_cached` flag) immediately before `fetch_user_timeline`/`fetch_user_profile`, which internally calls `cache.get_or_set()` that **also starts with a Redis `GET`**. Eliminating the pre-flight `GET` in the route handlers and instead propagating the cache-hit signal through the return value of `get_or_set` (by checking whether the result was already in Redis before the fetch functio | invalid | ms | ms | % | skip | 2026-06-11T10:27:31.439897+00:00 |
| Twitter Timeline Scraper | 12 | ## Hypothesis The `httpx.AsyncClient` in `ApifyClient._get_client()` checks `self._http.is_closed` on every call, but more importantly the client is **not initialized at startup** — it's created lazily on the first request, adding client-creation overhead to that request's latency. More critically, when the client *is* reused, `httpx` still performs **DNS resolution on every request** because no `transport` with connection pooling persistence is configured with `keepalive` at the TCP level. How | invalid | ms | ms | % | skip | 2026-06-11T10:26:36.825932+00:00 |
| Twitter Timeline Scraper | 11 | ## Hypothesis The `UsageCapMiddleware.dispatch()` creates a new `datetime.now(timezone.utc)` object and formats it on every single request to compute the month key, and also re-reads `PER_USER_MONTHLY_CAP` (a module-level int, fine) — but more importantly, it calls `_get_redis()` which checks a global variable every time. The real latency issue is that **`incr` and `expire` are two sequential Redis round-trips on every new month's first call**, but on all subsequent calls within the month only | invalid | ms | ms | % | skip | 2026-06-11T10:26:18.125166+00:00 |
| Twitter Timeline Scraper | 10 | ## Hypothesis The `cache.get_or_set()` method always performs a Redis `GET` call on cache miss even when the result will immediately be fetched upstream, and then the route handler performs an **additional redundant Redis `GET`** call just before calling `fetch_user_timeline()` to check `was_cached`. This means every cache-miss request pays for **two Redis round-trips** (one in the route handler + one inside `get_or_set`) before the upstream call even begins — eliminating the route-level pre-ch | invalid | ms | ms | % | skip | 2026-06-11T10:25:34.163698+00:00 |
| Twitter Timeline Scraper | 9 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis operations (`INCR` then `EXPIRE`) on every request, but these can be replaced with a single atomic `EXPIRE`-on-first-set pattern using a Lua script or by using `SET ... EX ... NX` combined with `INCR`, eliminating one round-trip to Redis on the hot path. However, since all prior attempts touching lock/cache/client patterns were reverted, I'll focus on a different angle: the `_current_month()` function calls `datetime.now(timez | invalid | ms | ms | % | skip | 2026-06-11T10:24:55.891007+00:00 |
| Twitter Timeline Scraper | 8 | ## Hypothesis The `circuit_breaker.call()` method holds the `asyncio.Lock` during the entire upstream HTTP call (via `await fn(*args, **kwargs)` executing while lock is held in the half-open branch), and then re-acquires it a second time for `_on_success`/`_on_failure`. The lock contention on the single shared `asyncio.Lock` serializes concurrent requests unnecessarily — replacing it with a lightweight `asyncio.Event`-based approach for state transitions, or more practically, using `asyncio.Loc | invalid | ms | ms | % | skip | 2026-06-11T10:24:33.953302+00:00 |
| Twitter Timeline Scraper | 7 | ## Hypothesis The `cache.get_or_set()` method performs a **second Redis `GET` call** after a coalesced waiter's event fires (as a fallback), but the result is already stored in `_in_flight_results` — the redundant Redis round-trip adds latency on every coalesced cache miss. Additionally, the route handlers duplicate the Redis `GET` before calling `fetch_user_timeline`/`fetch_user_profile`, meaning **every cache hit costs two Redis round-trips** instead of one. **Expected impact:** Eliminating | invalid | ms | ms | % | skip | 2026-06-11T10:23:43.429712+00:00 |
| Twitter Timeline Scraper | 6 | ## Hypothesis The route handlers perform a **redundant Redis `GET` call** before calling `fetch_user_timeline`/`fetch_user_profile`, which internally calls `cache.get_or_set()` that also does a Redis `GET`. This doubles the Redis round-trips on every cache-hit path (the hot path), adding ~1-2ms of unnecessary latency per request at p95. The fix is to remove the pre-check `cached_result = await cache.get(cache_key)` from the route handlers and instead have `get_or_set` signal whether the result | invalid | ms | ms | % | skip | 2026-06-11T10:23:08.747917+00:00 |
| Twitter Timeline Scraper | 5 | ## Hypothesis The `CircuitBreaker.call()` method holds the `asyncio.Lock` for the **entire duration of the upstream Apify HTTP call** (because `_lock` is acquired before `fn(*args, **kwargs)` is awaited and only released after). This serializes all concurrent requests through a single lock, turning what should be parallel upstream calls into a queue — dramatically inflating p95 latency under any concurrency. The fix: release the lock after the state-gate check, then re-acquire it only briefly | invalid | ms | ms | % | skip | 2026-06-11T10:22:41.730665+00:00 |
| Twitter Timeline Scraper | 4 | ## Hypothesis The `httpx.AsyncClient` in `ApifyClient` is created lazily on first use and checked on every call with `is_closed`, but more critically it is **not shared as a persistent connection pool at module load time** — instead it's recreated whenever `_http` is `None` or closed, losing keep-alive connections and paying TCP+TLS handshake cost on each cold path. By initializing the `httpx.AsyncClient` eagerly at module level (singleton, created once) and removing the per-call `is_closed` re | invalid | ms | ms | % | skip | 2026-06-11T10:22:12.676985+00:00 |
| Twitter Timeline Scraper | 3 | ## Hypothesis The `CircuitBreaker.call()` method acquires `asyncio.Lock` **twice per successful request** (once before the upstream call to check/transition state, and once after to call `_on_success`), serializing all concurrent requests through a coroutine-level mutex unnecessarily. Replacing the lock-based state checks with lockless reads for the hot path (CLOSED state with no failures) and only locking on actual state transitions eliminates this serialization bottleneck, reducing p95 latenc | invalid | ms | ms | % | skip | 2026-06-11T10:21:30.209462+00:00 |
| Twitter Timeline Scraper | 2 | ## Hypothesis The circuit breaker's `asyncio.Lock` is acquired **twice per successful upstream call** (once to check state, once in `_on_success`), and both acquisitions are coarse — holding the lock across the entire pre-call state check means concurrent requests for *different* cache keys are serialized through a single lock. Replacing the double-lock pattern with a single lock acquisition that covers the full call lifecycle, and using a per-service lock (already the case) but releasing it be | invalid | ms | ms | % | skip | 2026-06-11T10:21:03.766123+00:00 |
| Twitter Timeline Scraper | 1 | ## Hypothesis The route handlers perform a **redundant Redis `GET` call** before calling `fetch_user_timeline`/`fetch_user_profile`, which themselves call `cache.get_or_set()` (which also does a Redis `GET` first). This means every cache-hit request pays for **two** Redis round-trips instead of one, adding ~1-3ms of unnecessary latency at p95. **Fix:** Remove the explicit `await cache.get(cache_key)` / `was_cached` pre-check from the route handlers. Instead, have `get_or_set()` return a tuple | invalid | ms | ms | % | skip | 2026-06-11T10:20:22.239207+00:00 |
| Twitter Timeline Scraper | 0 | ## Hypothesis The route handlers perform a redundant Redis `GET` call (to set `was_cached`) immediately before calling `fetch_user_timeline` / `fetch_user_profile`, which internally calls `cache.get_or_set` — resulting in **two sequential Redis round-trips on every cache hit** instead of one. Eliminating the pre-check and instead inferring `was_cached` from whether the result was already in Redis (by checking after the fact, or by having `get_or_set` signal cache provenance) will cut p95 latenc | invalid | ms | ms | % | skip | 2026-06-11T10:19:36.753280+00:00 |
| Instagram Posts Scraper - Pay per Results | 49 | ## Hypothesis **Pre-build and reuse a single persistent `httpx.AsyncClient` instance at module level in `apify_client.py` instead of creating and tearing down a new client on every actor call.** Every request to `/api/v1/scrape` that misses cache currently pays the overhead of `httpx.AsyncClient.__aenter__`/`__aexit__` — which allocates connection pool state, SSL context setup, and ultimately closes the connection rather than keeping it alive for reuse. A persistent module-level client maintai | invalid | ms | ms | % | skip | 2026-06-11T10:19:02.557873+00:00 |
| Instagram Posts Scraper - Pay per Results | 48 | ## Hypothesis **Avoid re-creating the `aioredis` client object on every `UsageCapMiddleware` dispatch by initializing it eagerly at module import time (or at first use with a module-level lock) and reusing the connection pool across requests.** Currently `_get_redis()` in `usage_guard.py` uses a bare `global` check with no lock, meaning under concurrent requests at startup multiple coroutines can race past the `if _redis_client is None` guard simultaneously and each attempt to create a new cli | invalid | ms | ms | % | skip | 2026-06-11T10:18:22.536537+00:00 |
| Instagram Posts Scraper - Pay per Results | 47 | ## Hypothesis **Use `redis.asyncio` pipeline to combine the `INCR` and `EXPIRE` commands in `UsageCapMiddleware` into a single round-trip**, eliminating one Redis network call per request on the hot path. Currently, `usage_guard.py` performs two sequential async Redis calls (`r.incr(key)` then `r.expire(key, ...)`) when `count == 1`, and even for subsequent requests still pays the `incr` RTT before the route handler runs. By using a Lua script (or `pipeline`) to atomically execute both command | invalid | ms | ms | % | skip | 2026-06-11T10:17:37.801732+00:00 |
| Instagram Posts Scraper - Pay per Results | 46 | ## Hypothesis **Pre-compile the `_parse_post` and `_parse_author` field lookups by replacing the repeated `raw.get("field") or raw.get("altField")` chains with a single-pass dictionary lookup using a pre-built field-alias map**, eliminating the redundant attribute lookups and short-circuit evaluation overhead on every post object in the response. The current `_parse_post` function performs up to ~20 sequential `.get()` calls with `or` chaining per post. For responses with many posts, this per- | invalid | ms | ms | % | skip | 2026-06-11T10:17:02.167807+00:00 |
| Instagram Posts Scraper - Pay per Results | 45 | ## Hypothesis **Cache the `json.dumps` + `hashlib.sha256` cache key computation by sorting and serializing the payload fields in a fixed order using direct string concatenation instead of `json.dumps(sort_keys=True)`, and pre-compute a lookup-friendly key without the full JSON roundtrip.** The `make_cache_key` function calls `json.dumps` with `sort_keys=True` on every request before any cache check — this is on the hot path for every single request including cache hits. By replacing the generi | invalid | ms | ms | % | skip | 2026-06-11T10:16:39.302009+00:00 |
| Instagram Posts Scraper - Pay per Results | 44 | ## Hypothesis **Avoid rebuilding a new `httpx.AsyncClient` on every Apify call by maintaining a single module-level persistent client instance, eliminating TCP connection setup and TLS handshake overhead on each upstream request.** The current `apify_client.py` uses `async with httpx.AsyncClient(...) as client:` inside `run_actor`, which creates and tears down a new client (and therefore a new TCP/TLS connection) on every call. A persistent module-level client with connection pooling will reus | invalid | ms | ms | % | skip | 2026-06-11T10:16:12.888665+00:00 |
| Instagram Posts Scraper - Pay per Results | 43 | ## Hypothesis **Pre-compute the `_current_month()` string once per minute rather than calling `datetime.now(timezone.utc).strftime(...)` on every request in `UsageCapMiddleware.dispatch`**, and also cache the Redis key string by combining the pre-computed month with a per-user LRU to avoid repeated string concatenation on the hot path. The more impactful change here is eliminating the repeated `datetime.now` + `strftime` syscall on every request by caching the result with a simple time-based i | invalid | ms | ms | % | skip | 2026-06-11T10:15:31.308240+00:00 |
| Instagram Posts Scraper - Pay per Results | 42 | ## Hypothesis **Skip Pydantic model instantiation for the 400-validation fast-path and replace the `_parse_post`/`_parse_author` per-field dict lookups with a single `model_validate` call using a pre-built alias-normalised dict, avoiding the repeated `.get()` chain overhead.** Specifically: pre-build a lightweight normalisation step that renames camelCase keys to snake_case once (a single dict comprehension over a small static mapping), then call `InstagramPost.model_validate(normalized)` — th | invalid | ms | ms | % | skip | 2026-06-11T10:15:07.893255+00:00 |
| Instagram Posts Scraper - Pay per Results | 41 | ## Hypothesis **Reuse a single persistent `httpx.AsyncClient` instance (module-level singleton) in `apify_client.py` instead of creating and tearing down a new client per request.** The current code creates a new `httpx.AsyncClient` on every call to `run_actor`, which incurs overhead for TCP connection establishment, TLS handshake, and client object allocation/cleanup on every upstream call. A persistent client with connection pooling will reuse existing TCP/TLS connections, eliminating that o | invalid | ms | ms | % | skip | 2026-06-11T10:14:44.983999+00:00 |
| Instagram Posts Scraper - Pay per Results | 40 | ## Hypothesis **Replace `request.model_dump(exclude_none=True)` with a manually constructed dict using direct attribute access in the route handler**, avoiding the overhead of Pydantic's introspective serialization path which walks the model's field metadata on every request. The `model_dump()` call traverses Pydantic's internal field registry, applies exclusion logic, and copies values — all for a small, known-shape model. Directly building `{"usernames": ..., "post_urls": ..., ...}` skips th | invalid | ms | ms | % | skip | 2026-06-11T10:14:33.762467+00:00 |
| Instagram Posts Scraper - Pay per Results | 39 | ## Hypothesis **Pre-instantiate a single persistent `httpx.AsyncClient` at module load time in `apify_client.py` with connection pooling limits tuned for high concurrency, rather than creating and destroying a new client (and its underlying TCP connection pool) on every request.** A new `httpx.AsyncClient` per request incurs TCP connection setup overhead (and potentially TLS handshake overhead) on every upstream call, since the connection pool is discarded when the context manager exits. By re | invalid | ms | ms | % | skip | 2026-06-11T10:14:04.948871+00:00 |
| Instagram Posts Scraper - Pay per Results | 38 | ## Hypothesis **Replace the synchronous `hashlib.sha256` + `json.dumps` cache key computation in `make_cache_key` with a faster approach using `xxhash` (if available) or by avoiding the double-serialization that occurs because `json.dumps` is called once in `make_cache_key` and again in `cache_set`.** Specifically: pre-sort and serialize the payload once in `coalesced_fetch`, reuse that serialized bytes for both the cache key hash and the `cache_set` call, eliminating a redundant `json.dumps` | invalid | ms | ms | % | skip | 2026-06-11T10:13:29.529941+00:00 |
| Instagram Posts Scraper - Pay per Results | 37 | ## Hypothesis **Replace the two-round-trip Redis pattern in `coalesced_fetch` (separate `GET` then `SET`) with a single `GET` that also pipelines the existence check, and more importantly eliminate the redundant `await get_redis_client()` call on every cache operation by caching the client reference at module import time after the lifespan startup.** The current `cache_get` and `cache_set` each call `await get_redis_client()` which checks `if _redis_client is None` on every invocation — this i | invalid | ms | ms | % | skip | 2026-06-11T10:12:48.577284+00:00 |
| Instagram Posts Scraper - Pay per Results | 36 | ## Hypothesis **Replace the per-request `asyncio.get_event_loop()` call in `coalesced_fetch` with `asyncio.get_running_loop()`, and eliminate the redundant second `cache_get` round-trip on coalesced-miss paths by checking the cache only once and storing the future in `_inflight` before awaiting.** The current `coalesced_fetch` calls `cache_get` (one Redis RTT), then potentially joins an in-flight future, but a waiter that falls through the `shield` exception path re-enters a new fetch without | invalid | ms | ms | % | skip | 2026-06-11T10:12:20.982285+00:00 |
| Instagram Posts Scraper - Pay per Results | 35 | ## Hypothesis **Pre-compute a reusable `httpx.AsyncClient` with `http2=True` and connection limits at module level in `apify_client.py`, avoiding the per-request TCP handshake and TLS negotiation overhead that dominates p95 latency on cache misses.** Every call to `run_actor` currently creates and destroys an `httpx.AsyncClient`, paying the full TCP+TLS setup cost (~50-200ms) on every upstream request. By reusing a persistent client with a connection pool (and optionally HTTP/2 multiplexing), | invalid | ms | ms | % | skip | 2026-06-11T10:11:53.522220+00:00 |
| Instagram Posts Scraper - Pay per Results | 34 | ## Hypothesis **Use `ujson` for JSON serialization/deserialization in `cache.py` instead of the standard `json` module**, replacing `json.dumps`/`json.loads` in `make_cache_key`, `cache_get`, and `cache_set`. `ujson` is 2–5× faster than the standard library `json` for typical payloads. Since every request that hits the cache path (both GET and SET) incurs at least two JSON operations (`json.loads` on cache hit, `json.dumps` in `make_cache_key` + `cache_set` on miss), switching to `ujson` reduc | invalid | ms | ms | % | skip | 2026-06-11T10:11:29.427205+00:00 |
| Instagram Posts Scraper - Pay per Results | 33 | ## Hypothesis **Inline the `_parse_author` and `_parse_post` field lookups into a single flat dict comprehension per post, eliminating repeated `dict.get()` call overhead and intermediate `AuthorModel`/`InstagramPost` construction overhead by pre-computing field aliases at module level.** The current `_parse_post`/`_parse_author` functions each perform 8–9 individual `.get()` calls with `or` chaining per field, plus two Pydantic model constructions with full validation per post. For responses | invalid | ms | ms | % | skip | 2026-06-11T10:10:54.332669+00:00 |
| Instagram Posts Scraper - Pay per Results | 32 | ## Hypothesis **Pre-allocate a single persistent `httpx.AsyncClient` at module level in `apify_client.py` with connection pooling (keepalive), so each request reuses an existing TCP/TLS connection to `api.apify.com` instead of establishing a new one per call.** Every call to `run_actor` currently creates a new `httpx.AsyncClient` inside an `async with` block, which incurs TCP handshake + TLS negotiation overhead (~50–200 ms) on every upstream call. A persistent client with `limits` tuned for c | invalid | ms | ms | % | skip | 2026-06-11T10:10:18.183022+00:00 |
| Instagram Posts Scraper - Pay per Results | 31 | ## Hypothesis **Replace `BaseHTTPMiddleware` with a lightweight ASGI middleware to eliminate the double-buffering and thread-synchronization overhead that Starlette's `BaseHTTPMiddleware` imposes on every request.** Starlette's `BaseHTTPMiddleware` wraps every request/response in an `anyio` task group and buffers the response body through an intermediary stream, adding measurable overhead (~1-3ms per request) even when the middleware is a near-pass-through (as `UsageCapMiddleware` is when `PER | invalid | ms | ms | % | skip | 2026-06-11T10:09:28.492678+00:00 |
| Instagram Posts Scraper - Pay per Results | 30 | ## Hypothesis **Pre-compute the Apify request URL string and params dict as module-level constants in `apify_client.py`, and eliminate the per-request `httpx.AsyncClient` construction overhead by using a persistent connection-pooled client initialized at module load time.** The current code reconstructs both the URL f-string, params dict, AND a fresh `httpx.AsyncClient` on every single request — `AsyncClient.__init__` allocates connection pool structures and SSL context references each time, e | invalid | ms | ms | % | skip | 2026-06-11T10:09:00.155989+00:00 |
| Instagram Posts Scraper - Pay per Results | 29 | ## Hypothesis **Pre-build and cache the Apify `params` dict and URL as module-level constants in `apify_client.py`, and avoid reconstructing them on every request.** Currently, `run_actor` rebuilds the `url` f-string and `params` dict on every single call. While individually cheap, these allocations add up under load and contribute to p95 latency jitter. By hoisting them to module-level constants (computed once at import time), we eliminate repeated string formatting and dict allocation from t | invalid | ms | ms | % | skip | 2026-06-11T10:08:31.029471+00:00 |
| Instagram Posts Scraper - Pay per Results | 28 | ## Hypothesis **Eliminate the redundant `model_dump(exclude_none=True)` → `make_cache_key` → `_build_actor_input` double-traversal by computing the cache key directly from the already-serialized actor input dict**, avoiding a second `json.dumps` round-trip over the same data. Currently `scrape_posts` calls `make_cache_key("posts", request_payload)` (serializing `request_payload` to JSON for hashing) and then `_build_actor_input(request_payload)` (iterating it again to build `actor_input`). By | invalid | ms | ms | % | skip | 2026-06-11T10:08:14.368501+00:00 |
| Instagram Posts Scraper - Pay per Results | 27 | ## Hypothesis **Persist a single `httpx.AsyncClient` instance at module level in `apify_client.py` with connection pooling enabled, rather than creating and destroying a new client on every request.** Each call to `run_actor` currently instantiates a fresh `httpx.AsyncClient`, performs a full TCP+TLS handshake to `api.apify.com`, then tears down the connection. By keeping one long-lived client, subsequent requests reuse the existing TCP/TLS connection from the pool, eliminating the ~50–150 ms | invalid | ms | ms | % | skip | 2026-06-11T10:07:45.317186+00:00 |
| Instagram Posts Scraper - Pay per Results | 26 | ## Hypothesis **Eliminate redundant double JSON serialization in `cache_set` by storing the raw response bytes directly from Apify and returning them without re-serializing, using `client.set(key, value)` where value is already a JSON string from `response.text`.** The current path serializes data twice on cache writes (Apify returns JSON → `response.json()` parses it → `json.dumps()` re-serializes it in `cache_set`) and deserializes twice on cache reads (`json.loads` in `cache_get` → Pydantic | invalid | ms | ms | % | skip | 2026-06-11T10:07:18.453421+00:00 |
| Instagram Posts Scraper - Pay per Results | 25 | ## Hypothesis **Replace `asyncio.Lock` in `CircuitBreaker` with lock-free state checks using atomic operations, and eliminate the per-call `asyncio.Lock` acquisition overhead in the hot path by checking circuit state without locking in `is_open()`.** The `record_success()` and `record_failure()` methods each acquire an `asyncio.Lock`, which adds coroutine scheduling overhead on every successful upstream call. Since `is_open()` already reads state without a lock (acceptable for this use case), | invalid | ms | ms | % | skip | 2026-06-11T10:06:58.929990+00:00 |
| Instagram Posts Scraper - Pay per Results | 24 | ## Hypothesis **Pre-compute and cache the `make_cache_key` result by avoiding redundant `json.dumps` + `hashlib.sha256` work through storing the serialized payload string once, and eliminate the redundant `model_dump` → `_build_actor_input` double-traversal by merging them into a single pass.** The current hot path calls `request.model_dump(exclude_none=True)` in the route, then `json.dumps(payload, sort_keys=True)` + SHA-256 in `make_cache_key`, then iterates the dict again in `_build_actor_i | invalid | ms | ms | % | skip | 2026-06-11T10:06:22.797407+00:00 |
| Instagram Posts Scraper - Pay per Results | 23 | ## Hypothesis **Eliminate redundant `await get_redis_client()` calls in `coalesced_fetch` by storing the resolved client reference at module level after first connection, so cache-hit paths skip the async function call overhead and global-variable indirection on every request.** Currently, both `cache_get` and `cache_set` each call `await get_redis_client()`, which—even though the function returns immediately once `_redis_client` is set—still incurs two async function call frames, two `global` | invalid | ms | ms | % | skip | 2026-06-11T10:05:55.785330+00:00 |
| Instagram Posts Scraper - Pay per Results | 22 | ## Hypothesis **Pre-allocate and reuse a single `httpx.AsyncClient` instance at module level in `apify_client.py` (with connection pooling) instead of creating and tearing down a new client on every request.** Each call to `run_actor` currently executes `async with httpx.AsyncClient(...) as client:`, which creates a new client, establishes a new TCP/TLS connection to `api.apify.com`, and closes everything when done. By keeping a persistent module-level client, subsequent requests reuse the alr | invalid | ms | ms | % | skip | 2026-06-11T10:05:29.806503+00:00 |
| Instagram Posts Scraper - Pay per Results | 21 | ## Hypothesis **Avoid re-serializing already-cached JSON by storing and returning raw strings from Redis, skipping the redundant `json.dumps` on `cache_set` and `json.loads` on `cache_get` round-trip for the hot cached path.** Currently, `cache_get` deserializes the Redis string into a Python object (`json.loads`), then `coalesced_fetch` returns that object to the route, which then re-serializes it for the HTTP response. By storing the raw JSON string and returning it directly to FastAPI as a | invalid | ms | ms | % | skip | 2026-06-11T10:04:54.469514+00:00 |
| Instagram Posts Scraper - Pay per Results | 20 | ## Hypothesis **Replace the per-request `asyncio.get_event_loop()` call in `coalesced_fetch` with `asyncio.get_running_loop()`, and eliminate the redundant double cache lookup by merging the in-flight check into a single atomic flow using `asyncio.Event` instead of `asyncio.Future`.** The current code calls the deprecated `get_event_loop()` (which has overhead from deprecation-path checks in Python 3.10+) and performs a Redis cache lookup, then separately checks `_inflight`, meaning a cache mi | invalid | ms | ms | % | skip | 2026-06-11T10:04:28.132589+00:00 |
| Instagram Posts Scraper - Pay per Results | 19 | ## Hypothesis **Pre-parse and intern the fixed `params` dict in `apify_client.py` at module load time, and avoid recreating the `httpx.AsyncClient` context manager on every call by using a module-level persistent client with connection pooling.** The current code creates a new `httpx.AsyncClient` (and tears it down) on every single Apify call, discarding all TCP/TLS connection reuse. A persistent module-level client retains the connection pool across calls, eliminating the TCP handshake + TLS | invalid | ms | ms | % | skip | 2026-06-11T10:03:57.800149+00:00 |
| Instagram Posts Scraper - Pay per Results | 18 | ## Hypothesis **Pre-build and reuse a single `httpx.AsyncClient` with `http2=True` and connection pooling at module level in `apify_client.py`, rather than creating and tearing down a new client on every request.** The dominant latency cost for cache-miss paths is the Apify HTTP call. Creating a new `httpx.AsyncClient` per request incurs TCP connection establishment overhead (and TLS handshake) on every call because the client's connection pool is discarded immediately after the `async with` b | invalid | ms | ms | % | skip | 2026-06-11T10:03:30.047950+00:00 |
| Instagram Posts Scraper - Pay per Results | 17 | ## Hypothesis **Replace `BaseHTTPMiddleware` with a lightweight pure-ASGI middleware for `UsageCapMiddleware` to eliminate the double-buffering and extra task overhead that Starlette's `BaseHTTPMiddleware` imposes on every request.** `BaseHTTPMiddleware` wraps each request in an additional `asyncio.Task` and buffers the response body, adding measurable overhead (~1-3 ms) to every request's p95 latency. Converting to a raw ASGI callable removes that wrapper entirely while preserving identical b | invalid | ms | ms | % | skip | 2026-06-11T10:02:48.441705+00:00 |
| Instagram Posts Scraper - Pay per Results | 16 | ## Hypothesis **Use `asyncio.gather` to parallelize the Redis cache lookup and the circuit-breaker state check, since both are independent I/O-bound operations that are currently executed sequentially.** Currently, `scrape_posts` first checks the circuit breaker (synchronous but involves a lock), then calls `coalesced_fetch` which does the Redis GET. While the circuit breaker check itself is cheap, the real win is restructuring `coalesced_fetch` to issue the Redis `GET` concurrently with any o | invalid | ms | ms | % | skip | 2026-06-11T10:02:03.856392+00:00 |
| Instagram Posts Scraper - Pay per Results | 15 | ## Hypothesis **Reuse a single `redis.asyncio` client instance across both `cache.py` and `usage_guard.py` by having `usage_guard.py` import and reuse `get_redis_client()` from `cache.py` instead of maintaining its own separate lazy-initialized client.** Currently every request that hits `UsageCapMiddleware` may create and maintain a *second* Redis connection pool (in `usage_guard._redis_client`) in addition to the one in `cache._redis_client`. This wastes connection slots and adds connection- | invalid | ms | ms | % | skip | 2026-06-11T10:01:36.999511+00:00 |
| Instagram Posts Scraper - Pay per Results | 14 | ## Hypothesis **Persist a single `httpx.AsyncClient` instance at module level in `apify_client.py` instead of creating and tearing down a new client on every request.** Creating an `httpx.AsyncClient` involves allocating connection pool objects, SSL context setup, and teardown overhead on every call. A persistent client reuses TCP connections (keep-alive) and amortizes SSL handshake cost, directly reducing the latency of the dominant slow path (cache miss → Apify call) which dominates p95. `` | invalid | ms | ms | % | skip | 2026-06-11T10:01:10.253087+00:00 |
| Instagram Posts Scraper - Pay per Results | 13 | ## Hypothesis **Pre-compute and cache the `make_cache_key` result by using a faster hashing path: skip `json.dumps` + `hashlib.sha256` for the common single-username/single-hashtag case, and replace `sha256` with `xxhash` (or fall back to a direct string key) to reduce CPU overhead on every cache lookup.** Currently, every request — including cache hits — pays the cost of `json.dumps(payload, sort_keys=True)` followed by `sha256` digest computation before even touching Redis. For the dominant | invalid | ms | ms | % | skip | 2026-06-11T10:00:42.807389+00:00 |
| Instagram Posts Scraper - Pay per Results | 12 | ## Hypothesis **Compress the JSON payload stored in Redis using `zlib` to reduce serialization/deserialization time for large Instagram post datasets, and simultaneously pre-parse the stored bytes directly without a redundant `json.dumps` round-trip on cache hit.** Wait — looking at the iteration history more carefully, most attempts have been reverted. Let me focus on something simpler and more targeted. ## Hypothesis **The `coalesced_fetch` function calls `cache_get` (which calls `get_redi | invalid | ms | ms | % | skip | 2026-06-11T10:00:18.834815+00:00 |
| Instagram Posts Scraper - Pay per Results | 11 | ## Hypothesis **Skip Pydantic model instantiation for cached responses by returning the raw cached dict directly as a `JSONResponse`, bypassing the `_parse_post`/`_parse_author` object construction and Pydantic's `model_dump` serialization on the hot (cache-hit) path.** On cache hits — which should be the majority of p95 traffic after warm-up — the route currently deserializes JSON from Redis, constructs `InstagramPost` + `AuthorModel` Pydantic objects for every post, then re-serializes them. | invalid | ms | ms | % | skip | 2026-06-11T09:59:41.960217+00:00 |
| Instagram Posts Scraper - Pay per Results | 10 | ## Hypothesis **Eliminate redundant JSON serialization/deserialization overhead in `cache_get`/`cache_set` by checking the in-process `_inflight` dict *before* making the Redis network round-trip**, so cache misses with an in-flight request for the same key skip the Redis `GET` entirely. ### Explanation In `coalesced_fetch`, every request currently does: 1. `await cache_get(cache_key)` — a Redis network round-trip (~0.5–2ms) 2. *Then* checks `_inflight` For the common burst pattern (multiple | invalid | ms | ms | % | skip | 2026-06-11T09:59:17.962915+00:00 |
| Instagram Posts Scraper - Pay per Results | 9 | ## Hypothesis **Parallelize the cache lookup and in-flight check by eliminating the redundant second Redis round-trip on cache miss: perform the `cache_get` call concurrently with a lightweight in-process check of `_inflight`, so the hot-path (cache hit) only pays one Redis RTT, and avoid re-fetching from Redis inside `coalesced_fetch` when the caller in `scrape_posts` could pass an already-resolved key.** More concretely: the real win available without touching the upstream call is to add a s | invalid | ms | ms | % | skip | 2026-06-11T09:58:41.432016+00:00 |
| Instagram Posts Scraper - Pay per Results | 8 | ## Hypothesis **Avoid creating a new `httpx.AsyncClient` on every Apify call by maintaining a single module-level persistent client, eliminating the TCP connection setup and TLS handshake overhead on each request.** Every call to `run_actor()` currently executes `async with httpx.AsyncClient(...) as client:`, which creates a fresh client, establishes a new TCP connection, performs a full TLS handshake with `api.apify.com`, and tears everything down afterward. Since Apify calls dominate non-cac | invalid | ms | ms | % | skip | 2026-06-11T09:58:16.565571+00:00 |
| Instagram Posts Scraper - Pay per Results | 7 | ## Hypothesis **Add a small in-process TTL cache (using `cachetools.TTLCache`) as an L1 cache layer in front of Redis, so repeated cache lookups for the same key within a short window skip the Redis round-trip entirely.** When multiple requests for the same popular username/hashtag arrive within the 5-minute cache window, every request currently pays a Redis network round-trip (~1-5ms) even on a cache hit. By keeping a small in-memory TTL cache (e.g., 256 entries, 60-second TTL) in front of `c | invalid | ms | ms | % | skip | 2026-06-11T09:57:46.101320+00:00 |
| Instagram Posts Scraper - Pay per Results | 6 | ## Hypothesis **Replace the two sequential Redis calls (`INCR` then `EXPIRE`) in `usage_guard.py` with a single Lua script (or a Redis pipeline) that atomically increments and conditionally sets the TTL in one round-trip**, eliminating one full Redis network round-trip per request for every RapidAPI-proxied call. ### Expected Impact Currently, every metered request makes two sequential async Redis calls: `INCR` followed by `EXPIRE`. Each call is a separate network round-trip (~0.5–1 ms on loc | invalid | ms | ms | % | skip | 2026-06-11T09:57:27.427831+00:00 |
| Instagram Posts Scraper - Pay per Results | 5 | ## Hypothesis **Pipeline the Redis `INCR` + `EXPIRE` commands in `usage_guard.py` into a single atomic `EVAL` (Lua script) or replace the two-round-trip sequence with a single `SET ... EX ... NX` + `INCR` pipeline, and simultaneously eliminate the repeated `await _get_redis()` call overhead in `cache.py` by storing the client reference at module level after first successful connection.** The current `UsageCapMiddleware` performs two sequential Redis round trips per request (`INCR` then conditi | invalid | ms | ms | % | skip | 2026-06-11T09:56:59.632686+00:00 |
| Instagram Posts Scraper - Pay per Results | 4 | ## Hypothesis **Persist a single `httpx.AsyncClient` instance across requests instead of creating and tearing down a new client per Apify call.** Every call to `run_actor` currently executes `async with httpx.AsyncClient(...) as client:`, which creates a new connection pool, performs TCP handshake + TLS negotiation, and then immediately destroys the pool on exit. By initializing one `AsyncClient` at application startup (stored on `app.state`) and reusing it for all Apify calls, subsequent requ | invalid | ms | ms | % | skip | 2026-06-11T09:56:22.404646+00:00 |
| Instagram Posts Scraper - Pay per Results | 3 | ## Hypothesis **Add a local in-process LRU cache (using `functools.lru_cache` / a simple `dict` with TTL) as an L1 cache layer in front of Redis**, so that repeated identical requests within the same process skip the Redis round-trip entirely and return results from memory. The Redis GET adds a network round-trip (~0.5–2ms even locally) on every request, including cache hits. By keeping a small in-process dict keyed by `cache_key` with a short TTL (e.g., 30 seconds — safely within the 5-minute | invalid | ms | ms | % | skip | 2026-06-11T09:55:44.216795+00:00 |
| Instagram Posts Scraper - Pay per Results | 2 | ## Hypothesis **Pre-warm the Redis connection at startup and reuse a single persistent `httpx.AsyncClient` for Apify calls** — specifically, the `usage_guard.py` module lazily creates its own Redis client on the first request (adding ~2ms RTT on cold paths), while `apify_client.py` creates and tears down a new `httpx.AsyncClient` on every call (adding TLS handshake overhead of 50–200ms per request). By initializing the `usage_guard` Redis client during app lifespan startup (sharing the same cl | invalid | ms | ms | % | skip | 2026-06-11T09:55:04.717238+00:00 |
| Instagram Posts Scraper - Pay per Results | 1 | ## Hypothesis **Reuse a single persistent `httpx.AsyncClient` for all Apify calls instead of creating and tearing down a new client per request.** Each call to `run_actor` currently executes `async with httpx.AsyncClient(...) as client:`, which pays TCP connection setup + TLS handshake overhead on every request (~50–150 ms for a fresh TLS connection to `api.apify.com`). A module-level persistent client with connection pooling will reuse existing connections, eliminating that overhead for cache | invalid | ms | ms | % | skip | 2026-06-11T09:54:35.442854+00:00 |
| Instagram Posts Scraper - Pay per Results | 0 | ## Hypothesis **Reuse a single persistent `httpx.AsyncClient` for Apify calls instead of creating and tearing down a new client per request.** Creating an `httpx.AsyncClient` on every call establishes a new TCP connection (including TLS handshake) to Apify each time, adding 50–300 ms of overhead; a persistent client with connection pooling reuses the underlying TCP connection, eliminating that cost from the critical path and directly reducing p95 latency. ### Expected Impact - Eliminates TCP | invalid | ms | ms | % | skip | 2026-06-11T09:53:57.383148+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 49 | ## Hypothesis The `_poll_run` method uses a fixed `POLL_INTERVAL` of 3.0 seconds between every poll, but many actor runs complete within the first few seconds. By using **exponential backoff starting at 0.5s** (capped at `POLL_INTERVAL`), we reduce the expected wait time for fast-completing runs without increasing polling pressure on slow runs. **Expected impact:** For actor runs that complete in 2–4 seconds, the current implementation wastes 3.0s sleeping before the first status check after t | invalid | ms | ms | % | skip | 2026-06-11T09:53:32.159511+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 48 | ## Hypothesis The `httpx.AsyncClient` is recreated on every call to `_run_actor`, incurring TCP connection setup overhead (DNS resolution, TCP handshake, TLS negotiation) for each upstream request to Apify. Reusing a single module-level `httpx.AsyncClient` with connection pooling will eliminate this per-request overhead, reducing p95 latency by avoiding repeated connection establishment costs. ```python # services/instagram.py import asyncio import os import time import logging from typing im | invalid | ms | ms | % | skip | 2026-06-11T09:52:45.325855+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 47 | ## Hypothesis The `_poll_run` method calls `asyncio.sleep(POLL_INTERVAL)` (3.0 seconds) **after** each status check, meaning even if the actor completes immediately after the first poll, the code waits another 3 seconds before the next check. By checking status **before** sleeping — and using an exponential backoff starting at 0.5s capped at 3.0s — we reduce the average and p95 latency for runs that complete quickly. **Expected impact:** For fast actor runs (e.g., completing in 5–15 seconds), | invalid | ms | ms | % | skip | 2026-06-11T09:52:26.682803+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 46 | ## Hypothesis The `UsageCapMiddleware` uses a lazy-initialized global Redis client (`_redis_client`) that is separate from the already-initialized `app.state.redis` client created at startup. This means the middleware may incur a cold-connection penalty on its first request, and more importantly, every middleware dispatch acquires a `_get_redis()` call that checks a global variable and potentially creates a new connection pool — adding unnecessary overhead per request compared to reusing the al | invalid | ms | ms | % | skip | 2026-06-11T09:52:05.574138+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 45 | ## Hypothesis The `_poll_run` method uses a fixed `POLL_INTERVAL` of 3.0 seconds between every status check, but Apify actor runs for small requests often complete in under 3 seconds — meaning the first poll happens *after* the run has already finished, adding up to 3 seconds of unnecessary waiting to every p95 request. **Fix:** Implement an exponential backoff starting at 0.5s (capped at `POLL_INTERVAL`) so short-running actors are detected quickly, while long-running ones still poll at the c | invalid | ms | ms | % | skip | 2026-06-11T09:51:48.065913+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 44 | ## Hypothesis The `httpx.AsyncClient` is recreated on every request in `_run_actor`, incurring TCP connection setup overhead for each of the 2–3 Apify API calls (start run, poll, fetch dataset). Replacing it with a module-level persistent `httpx.AsyncClient` with connection pooling will reuse existing TCP connections, eliminating the per-request handshake latency on cached-miss paths. ## Expected Impact Each `_run_actor` invocation makes at minimum 3 HTTP requests to `api.apify.com`. With a f | invalid | ms | ms | % | skip | 2026-06-11T09:50:58.455020+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 43 | ## Hypothesis The `_poll_run` method always waits the full `POLL_INTERVAL` (3.0s) **after** checking status, meaning even if the actor completes immediately after the first poll, the code sleeps 3 seconds before discovering it. Restructuring the loop to check status **after** the sleep (except for the first check, which should happen immediately with no sleep) will eliminate up to one full `POLL_INTERVAL` of unnecessary latency on the happy path. ## Expected Impact For fast actor runs (which | invalid | ms | ms | % | skip | 2026-06-11T09:50:39.061269+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 42 | ## Hypothesis The `_cleanup` coroutine inside `get_following` is defined as a nested `async def` and scheduled via `asyncio.create_task()` on every single request — even cache hits and coalesced requests that never populate `_in_flight_results`. This creates unnecessary task scheduling overhead on the hot path. More importantly, the `_in_flight_results` dictionary is never cleaned up for the **coalesced waiters** path (they read the result but never schedule a cleanup), meaning stale entries ac | invalid | ms | ms | % | skip | 2026-06-11T09:50:04.375779+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 41 | ## Hypothesis The `get_following` route accesses `request.app.state.redis` twice (once for cache read, once for cache write) while the `InstagramService` also holds a reference to the same Redis client — but the `Depends(get_instagram_service)` factory re-fetches it from `request.app.state` on every request. The real latency win is that the cache lookup calls `get_cache(request.app.state.redis, cache_key)` which performs a synchronous attribute lookup chain on every hot path; the bigger issue i | invalid | ms | ms | % | skip | 2026-06-11T09:49:33.798289+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 40 | ## Hypothesis The `_run_actor` method creates a new `httpx.AsyncClient` on every invocation with a very large timeout (`ACTOR_TIMEOUT_SECS + 30`), and within `_poll_run` it always sleeps `POLL_INTERVAL` (3.0s) **after** each status check — including right before the final successful check — adding up to 3 seconds of unnecessary wait even when the actor completes quickly. By using an adaptive backoff that starts with a short initial poll (0.5s) and grows toward `POLL_INTERVAL`, we reduce the lat | invalid | ms | ms | % | skip | 2026-06-11T09:49:10.977501+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 39 | ## Hypothesis The `_poll_run` method always sleeps `POLL_INTERVAL` (3.0s) **before** checking status again, even when the actor finishes quickly. By polling immediately at the start of the loop (no initial sleep) and using exponential backoff with a cap, fast-completing actors return sooner, cutting p95 latency by up to one full poll interval (3s). ## Expected Impact For actors that complete in under 10 seconds, eliminating the trailing sleep-before-first-check saves the initial 3s wait. Expo | invalid | ms | ms | % | skip | 2026-06-11T09:48:52.854715+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 38 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips (`INCR` then `EXPIRE`) on every non-cached request, but these can be collapsed into a single atomic Lua script call, eliminating one Redis RTT (~0.5–2ms per request) on the hot path. **Expected impact:** For requests that hit the usage guard (any RapidAPI-proxied call), this saves one full Redis round-trip on every single request — before the route handler even runs. At p95, where Redis latency can spike, this is | invalid | ms | ms | % | skip | 2026-06-11T09:48:33.699672+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 37 | ## Hypothesis The `get_cache` call in `routes/instagram.py` accesses `request.app.state.redis` directly (bypassing the `service` object's `self.redis`), but more importantly, the cache key construction via `make_cache_key` with `max_results=None` produces a different key than a request with `max_results` omitted vs explicitly set — this is a minor correctness issue. The real latency win is that **when a cache miss occurs and coalesced waiters read `_in_flight_results`, the result is served with | invalid | ms | ms | % | skip | 2026-06-11T09:47:58.201509+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 36 | ## Hypothesis The `_poll_run` method sleeps for the full `POLL_INTERVAL` (default 3.0s) **before** checking the status after the first poll, meaning even a fast actor run that completes in <3s will always wait at least one full sleep cycle. By issuing the first status check immediately after starting the run (with no initial sleep) and only sleeping between subsequent polls, we can reduce p95 latency for fast runs by up to one full `POLL_INTERVAL` (3 seconds). **Expected impact:** For actor ru | invalid | ms | ms | % | skip | 2026-06-11T09:47:37.148761+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 35 | ## Hypothesis The `_fetch_dataset` call in `_run_actor` uses `format=json&clean=true` but doesn't pass a `limit` parameter even when `max_results` is specified, causing Apify to return the full dataset and making the client deserialize/transfer more data than needed. Adding `limit=max_results` to the dataset fetch params will reduce network transfer and JSON parsing time proportionally to how much smaller the requested slice is versus the full dataset. ## Expected Impact For requests with `ma | invalid | ms | ms | % | skip | 2026-06-11T09:47:21.045258+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 34 | ## Hypothesis The `_poll_run` method calls `asyncio.sleep(POLL_INTERVAL)` **before** checking if the run already completed on the first iteration, meaning every request waits at least one full `POLL_INTERVAL` (3 seconds) even if the actor finishes almost instantly. Adding an initial short-delay probe (e.g., 0.5s) before the first status check, then using exponential backoff capped at `POLL_INTERVAL`, will reduce p95 latency for fast actor completions by 2–5+ seconds. **Expected impact:** Actor | invalid | ms | ms | % | skip | 2026-06-11T09:47:04.524318+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 33 | ## Hypothesis The `_run_actor` method opens a new `httpx.AsyncClient` on every call, incurring TCP connection establishment and TLS handshake overhead for each request to Apify's API. By using a module-level persistent `httpx.AsyncClient` with connection pooling (keep-alive), subsequent requests to the same Apify host reuse existing connections, eliminating the per-call handshake latency. ## Expected Impact Each `_run_actor` call makes 3+ sequential HTTP requests to `api.apify.com` (start run | invalid | ms | ms | % | skip | 2026-06-11T09:46:21.478346+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 32 | ## Hypothesis The `_poll_run` method sleeps for the full `POLL_INTERVAL` (default 3.0s) before checking status, meaning every request waits at least 3 seconds even if the actor completes almost instantly. Using an **exponential backoff with a short initial delay** (e.g., starting at 0.5s, doubling up to the configured `POLL_INTERVAL` cap) will reduce median and p95 latency for fast actor runs while preserving the same ceiling for slow ones. ### Expected Impact For actor runs that complete in | invalid | ms | ms | % | skip | 2026-06-11T09:46:05.821189+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 31 | ## Hypothesis The `UsageCapMiddleware` performs `incr` and `expire` as two separate sequential Redis commands, adding an extra round-trip on every non-cached request. Using a Lua script (or Redis pipeline) to execute both atomically in a single round-trip will eliminate that extra RTT, reducing p95 latency by ~1-2ms per request when Redis is the bottleneck. ## Expected Impact Each request through `UsageCapMiddleware` (when `PER_USER_MONTHLY_CAP > 0`) currently does two sequential async Redis | invalid | ms | ms | % | skip | 2026-06-11T09:45:49.622887+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 30 | ## Hypothesis The `routes/instagram.py` handler accesses `request.app.state.redis` directly for the cache check (bypassing the `service` object that already holds a reference to the same redis client), creating an unnecessary attribute lookup chain on every request. More importantly, the `get_redis_client()` function in `cache.py` creates a **new** Redis connection pool on every call during the `lifespan` startup, but the `UsageCapMiddleware` in `usage_guard.py` lazily creates its **own separat | invalid | ms | ms | % | skip | 2026-06-11T09:45:12.091785+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 29 | ## Hypothesis The `_poll_run` method uses a fixed `POLL_INTERVAL` (default 3.0s) throughout the entire polling loop, but Apify actor runs typically have a startup phase (container cold start ~5-15s) followed by a relatively fast execution phase. By using exponential backoff that starts with a shorter initial interval (0.5s) and caps at `POLL_INTERVAL`, we reduce the delay between when the actor actually completes and when we detect it — cutting p95 latency by up to one full `POLL_INTERVAL` (3s) | invalid | ms | ms | % | skip | 2026-06-11T09:44:55.171037+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 28 | ## Hypothesis The `httpx.AsyncClient` is created fresh for every `_run_actor` call (including its internal TLS handshake and connection setup), adding latency on every upstream request. Reusing a single module-level `httpx.AsyncClient` with a connection pool eliminates repeated TCP/TLS setup overhead, reducing p95 latency especially for the actor-start and dataset-fetch calls. ### Expected Impact Each `_run_actor` invocation currently pays TCP connection establishment + TLS handshake costs at | invalid | ms | ms | % | skip | 2026-06-11T09:44:15.008786+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 27 | ## Hypothesis The `_run_actor` method in `services/instagram.py` uses Apify's asynchronous "fire and poll" pattern, but the initial `POLL_INTERVAL` of 3.0 seconds means the first status check is delayed by a full 3 seconds even if the actor completes quickly. By using an exponential backoff starting at 0.5s and capping at `POLL_INTERVAL`, we can detect fast completions much sooner while still being gentle on longer runs. **Expected impact:** For actors that complete in under 3 seconds (cached | invalid | ms | ms | % | skip | 2026-06-11T09:43:59.409089+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 26 | ## Hypothesis The `get_redis_client()` in `cache.py` creates a new Redis connection pool on every call that goes through `get_cache`/`set_cache` indirectly — but more critically, the `UsageCapMiddleware` in `usage_guard.py` creates its own separate Redis client via `_get_redis()` using a lazy singleton with a **2-second socket timeout**, while `cache.py` uses a **3-second timeout**. The real bottleneck is that `app.state.redis` is accessed via `getattr(request.app.state, "redis", None)` on ever | invalid | ms | ms | % | skip | 2026-06-11T09:43:40.952906+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 25 | ## Hypothesis The `_in_flight_results` cleanup uses `asyncio.create_task(_cleanup())` with a 2-second sleep after every request, even for cache hits and fast paths. More importantly, coalesced waiters that arrive after the event is set but before cleanup read a stale `None` result (race condition), causing unnecessary 503 retries that inflate p95. Storing the result directly on the `asyncio.Event` object (as an attribute) eliminates the separate dict, the race window, and the 2-second cleanup t | invalid | ms | ms | % | skip | 2026-06-11T09:43:03.667798+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 24 | ## Hypothesis The `routes/instagram.py` handler calls `get_cache` with `request.app.state.redis` directly, which involves an attribute lookup chain on every request, but more importantly the `getattr(request.app.state, "redis", None)` in `get_instagram_service` and the direct `request.app.state.redis` access in the route are duplicated and unoptimized — however, the real latency issue is that the `_cleanup` coroutine spawns an `asyncio.create_task` with a 2-second sleep on **every single reques | invalid | ms | ms | % | skip | 2026-06-11T09:42:35.576927+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 23 | ## Hypothesis The `_run_actor` method uses Apify's asynchronous "fire and poll" pattern, but Apify also supports a **synchronous run endpoint** (`/runs?waitForFinish=<seconds>`) that blocks server-side and returns the completed run in a single HTTP round-trip — eliminating the polling loop overhead (multiple network RTTs + `POLL_INTERVAL` sleep delays that add up to several seconds at p95). **Expected Impact:** Each poll cycle costs one network RTT (~50-200ms) plus `POLL_INTERVAL` sleep (3.0s | invalid | ms | ms | % | skip | 2026-06-11T09:42:03.644360+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 22 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips per request (`INCR` then `EXPIRE`) when the key is new (count == 1). These can be collapsed into a single atomic Lua script or a pipeline, but more impactfully: the middleware currently uses its **own separate Redis client** (`_redis_client` in `usage_guard.py`) created lazily and independently from the app's shared Redis connection pool in `cache.py`. This means every request pays for two separate connection pools | invalid | ms | ms | % | skip | 2026-06-11T09:41:45.717093+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 21 | ## Hypothesis The `_poll_run` method uses a fixed `POLL_INTERVAL` (default 3.0s) sleep between status checks, which means every request pays at least one full 3-second wait even when the actor completes quickly. Implementing exponential backoff starting at 0.5s (capped at `POLL_INTERVAL`) will reduce p95 latency for fast actor runs by capturing early completions sooner while not hammering the API for slow runs. **Expected impact:** Actor runs that finish in under 3 seconds currently always wai | invalid | ms | ms | % | skip | 2026-06-11T09:41:26.455960+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 20 | ## Hypothesis The `cache.py` `get_cache` function calls `json.loads` on every cache hit, and `set_cache` calls `json.dumps` — both run synchronously on the event loop, blocking it for large following lists. Switching to `redis.asyncio`'s built-in `GET`/`SETEX` with pre-serialized bytes is equivalent, but more impactful: the real gain is replacing the per-field `json.dumps(value, default=str)` and `json.loads(raw)` with `orjson` (which is 2–10× faster for large nested structures) so that cache h | invalid | ms | ms | % | skip | 2026-06-11T09:41:00.724363+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 19 | ## Hypothesis The `_run_actor` method creates a new `httpx.AsyncClient` for every request with no connection pooling persistence, and closes it immediately after use. By maintaining a module-level persistent `httpx.AsyncClient` with a connection pool, TCP connection establishment and TLS handshake overhead to `api.apify.com` is eliminated on subsequent requests, reducing p95 latency by avoiding repeated connection setup costs. ## Expected Impact Each call to `_run_actor` currently pays TCP+TL | invalid | ms | ms | % | skip | 2026-06-11T09:40:13.973557+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 18 | ## Hypothesis The `_poll_run` method fetches the run status via a separate polling endpoint, but Apify supports a **synchronous run mode** via `waitForFinish` query parameter on the actor start endpoint, which blocks server-side until the run completes and returns the final status in one HTTP round-trip — eliminating all polling overhead (multiple `POLL_INTERVAL` sleeps + HTTP requests). By passing `waitForFinish=<ACTOR_TIMEOUT_SECS>` to the actor start URL, the single POST call blocks on Apif | invalid | ms | ms | % | skip | 2026-06-11T09:39:45.517287+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 17 | ## Hypothesis The `httpx.AsyncClient` is created with `timeout=httpx.Timeout(ACTOR_TIMEOUT_SECS + 30)` as a single scalar value, which applies the same timeout to connect, read, write, and pool acquisition. By splitting this into a short connect timeout (10s) and a long read timeout matching the actor's polling window, combined with reusing a single module-level `httpx.AsyncClient` instance across requests (avoiding per-request TLS handshake and connection setup overhead), we can eliminate the | invalid | ms | ms | % | skip | 2026-06-11T09:38:55.447295+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 16 | ## Hypothesis The `_in_flight` coalescing dictionary is a plain `dict` accessed from multiple concurrent coroutines without any lock, and waiters call `event.wait()` then immediately read `_in_flight_results` — but the `finally` block calls `event.set()` and then **pops the key from `_in_flight`** before popping the result, creating a race where a late-arriving request finds no in-flight event and launches a redundant upstream call instead of coalescing. More critically, the `_in_flight` dict i | invalid | ms | ms | % | skip | 2026-06-11T09:38:15.566116+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 15 | ## Hypothesis The `_poll_run` method uses a fixed `POLL_INTERVAL` sleep (default 3.0s) before the **first** status check, meaning every request pays at least one full 3-second delay even when the actor completes in under a second. Adding an immediate first poll with zero sleep — then falling back to the normal interval — will eliminate this unnecessary latency for fast actor runs and directly reduce p95. ### Expected Impact For requests where the Apify actor completes quickly (sub-second to ~ | invalid | ms | ms | % | skip | 2026-06-11T09:37:57.941342+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 14 | ## Hypothesis The `get_instagram_service` dependency calls `getattr(request.app.state, "redis", None)` on every single request, but more importantly the `InstagramService` object is instantiated fresh per-request, causing unnecessary object allocation overhead. However, the more impactful issue is that `get_cache` and `set_cache` in the route handler also call `request.app.state.redis` separately, meaning the Redis client reference is resolved multiple times per request path when it could be re | invalid | ms | ms | % | skip | 2026-06-11T09:37:25.078143+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 13 | ## Hypothesis The `get_redis_client()` function in `cache.py` creates a new Redis connection pool on every call (once per request via `request.app.state.redis`), but the `UsageCapMiddleware` maintains its own separate singleton `_redis_client` — meaning two separate connection pools exist. More critically, the `get_cache` call in `routes/instagram.py` passes `request.app.state.redis` directly (which was initialized once at startup and is fine), but `set_cache` also uses the same reference. The | invalid | ms | ms | % | skip | 2026-06-11T09:37:07.369279+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 12 | ## Hypothesis The `_in_flight` coalescing logic stores waiter results in a plain dict and has coalesced waiters return `result["cached"] = False` even though they could have been served the cached value — but the deeper inefficiency is that `get_following` accesses `request.app.state.redis` twice: once directly for the cache check and once through `InstagramService`, creating redundant attribute lookups and an inconsistent pattern. More critically, the `get_cache` call uses `request.app.state.r | invalid | ms | ms | % | skip | 2026-06-11T09:36:34.226826+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 11 | ## Hypothesis The `_poll_run` loop calls `client.get(poll_url, params=params)` on every iteration, building a new `params` dict lookup and URL each time, but more importantly the poll URL string is reconstructed with an f-string inside the method rather than being passed pre-built — minor overhead aside, the **real issue** is that `asyncio.sleep(POLL_INTERVAL)` sleeps *before* checking whether the run already finished on the very first poll, adding a mandatory `POLL_INTERVAL` (3 seconds) of unn | invalid | ms | ms | % | skip | 2026-06-11T09:36:18.082407+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 10 | ## Hypothesis The `httpx.AsyncClient` is created fresh for every `_run_actor` call, paying TCP+TLS handshake overhead on every upstream request. Using a module-level persistent `httpx.AsyncClient` with connection pooling will eliminate repeated handshake costs and reduce p95 latency for cache-miss paths. **Expected impact:** Each `_run_actor` invocation currently opens new TCP connections for the start, poll(s), and dataset fetch calls. A persistent client reuses pooled connections, saving 50– | invalid | ms | ms | % | skip | 2026-06-11T09:35:33.314482+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 9 | ## Hypothesis The `_poll_run` method always sleeps for the full `POLL_INTERVAL` (3.0 seconds) before checking status, even if the actor completes quickly. Using an **exponential backoff with a short initial interval** (e.g., starting at 0.5s, doubling up to `POLL_INTERVAL`) will reduce p95 latency for fast-completing actor runs by checking sooner after submission. ## Expected Impact For actor runs that complete in under 3 seconds (or between 3–6 seconds), the current implementation unnecessar | invalid | ms | ms | % | skip | 2026-06-11T09:35:11.072619+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 8 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips per request (`INCR` then `EXPIRE`) when the key is new (count == 1). These can be collapsed into a single atomic Lua script or a pipeline, but more impactfully: the `expire` call on every new key adds a conditional branch and extra RTT. We can replace the `incr` + conditional `expire` with a single pipelined `incr`+`expire` (always sent together), eliminating the conditional and the second network round-trip on new | invalid | ms | ms | % | skip | 2026-06-11T09:34:38.171973+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 7 | ## Hypothesis The `UsageCapMiddleware` performs two sequential Redis round-trips per request (one `INCR` + one `EXPIRE`) when the key is new, and always at least one (`INCR`) on every subsequent request. These synchronous-style Redis calls on the hot path add latency even when the guard is disabled only at the logic level — the middleware still awaits `_get_redis()` and `r.incr(key)` before forwarding to the actual handler. Using a single atomic `INCR`+`EXPIRE` pipeline (or Redis `SET ... EX .. | invalid | ms | ms | % | skip | 2026-06-11T09:34:18.500926+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 6 | ## Hypothesis The `_in_flight_results` dictionary is read by coalesced waiters *after* `event.wait()` returns, but the cleanup task deletes the result after only 2 seconds. The real bottleneck is that coalesced waiters that finish `event.wait()` must still go through `get_cache` on the next request (cache miss window), and more critically, the `redis` attribute is accessed via `request.app.state.redis` on *every* request — a redundant attribute lookup chain — while `get_cache` is called twice ( | invalid | ms | ms | % | skip | 2026-06-11T09:33:41.075135+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 5 | ## Hypothesis The `httpx.AsyncClient` is created fresh for every call to `_run_actor`, incurring TCP/TLS handshake overhead on every upstream request. Replacing it with a module-level persistent `httpx.AsyncClient` (with connection pooling) will eliminate repeated handshake latency for the actor-start, polling, and dataset-fetch calls within a single run, and across concurrent/sequential runs. ### Expected Impact Each `_run_actor` call currently opens a new TCP+TLS connection to `api.apify.co | invalid | ms | ms | % | skip | 2026-06-11T09:32:54.203811+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 4 | ## Hypothesis The `_poll_run` loop uses a fixed `asyncio.sleep(POLL_INTERVAL)` of 3.0 seconds between status checks, meaning a job that completes in, say, 3.1 seconds will wait a full additional 3 seconds before detection — directly inflating p95 latency by up to one full poll interval. Switching to an **exponential backoff starting at 0.5s** (capped at `POLL_INTERVAL`) will detect fast completions much sooner while not hammering the Apify API for slow runs. ### Expected Impact Most Apify act | invalid | ms | ms | % | skip | 2026-06-11T09:32:34.329621+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 3 | ## Hypothesis The `get_cache` call in `routes/instagram.py` uses `request.app.state.redis` directly, but the `InstagramService` dependency also accepts a `redis` parameter — creating two separate Redis access paths. More critically, the coalesced waiters receive `result["cached"] = False` instead of `True`, but the real latency issue is that **`redis` attribute access via `request.app.state` is an unguarded attribute lookup that could return `None` and fall through to an unnecessary service cal | invalid | ms | ms | % | skip | 2026-06-11T09:32:04.583371+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 2 | ## Hypothesis The `get_redis_client()` in `cache.py` creates a new connection pool on every call (since the `from_url` call itself doesn't reuse anything at module level), and the route handler calls `get_cache` with `request.app.state.redis` while `get_redis_client()` is only called once at startup — but the `UsageCapMiddleware` in `usage_guard.py` creates its **own separate Redis client** via `_get_redis()` on first middleware invocation, doubling connection overhead and adding cold-start lat | invalid | ms | ms | % | skip | 2026-06-11T09:31:35.697032+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 1 | ## Hypothesis The `get_redis_client()` call in `cache.py` creates a new Redis connection (including a `ping()` round-trip) on every request that misses the module-level singleton check, and the `get_instagram_service` dependency also re-fetches `request.app.state.redis` on every request. The dominant p95 latency issue in the **cache-hit path** is that `get_cache` and `set_cache` each create implicit per-call connection overhead because the Redis client stored in `app.state.redis` is initialized | invalid | ms | ms | % | skip | 2026-06-11T09:30:54.743544+00:00 |
| Instagram Following Scraper - Instant, No-Login Required | 0 | ## Hypothesis The `_in_flight_results` cleanup uses `asyncio.create_task(asyncio.sleep(2))` which holds result memory unnecessarily, but more critically, the **coalesced waiters read the result *after* the event is set but the `_in_flight` dict entry is already popped** — meaning a third concurrent request arriving mid-cleanup will miss coalescing and hit upstream again. However, the primary p95 latency win is: **the `httpx.AsyncClient` is recreated on every request**, incurring TCP connection | invalid | ms | ms | % | skip | 2026-06-11T09:30:08.952872+00:00 |