Did you find that some data model patterns were easier to detect for some LLM ? I am curious on how training might have made some agents better at graph navigation for instance?
aaronsteers•May 5, 2026
AJ here, from Airbyte.
Yes, we've definitely found that some API data models are easier for models to navigate than others.
The largest factors of Agent inefficiency we've identified so far are:
1. Many APIs lack robust-enough search, forcing agents to page through hundreds or thousands of paginated responses until they find the record they are looking for (our Context Store addresses this).
2. Many APIs have HUGE response sets. Our MCP helps handle this by letting the agent decide exactly what fields they can return.
3. With our SDK, you can literally build your own MCP on top of any source we support (50+ right now and will grow). This is super powerful, and allows you to build more ergonomic MCP servers and tools - even if the models themselves are not intuitive or easy for the LLM to leverage directly.
Combining all three of these together, we see the vast majority of challenges can be addressed via a strong system prompt for guidance. Fine tuning could get you further but anyway, you'd still want your fine tuned model to build on this same foundation, since the efficiences will transfer across use cases and models.
@ecares - Does this answer your question? What do you think?
woeirua•May 5, 2026
Your point about search being a bottleneck is spot on. IMO, search APIs should return guidance to agents to help them winnow down the results faster. For example, if your query returns 1000 results, then it should tell the agent, "too many results, we recommend you filter on column X because of Y to improve your search. Here are the possible values in column X: ..."
carefulfungi•May 5, 2026
There are a lot of APIs like this that I really wish would expose downloading a parquet file instead of trying to implement server-side filtering and reporting query features.
aaronsteers•May 6, 2026
+1
Working with APIs is often frustrating and the worst ones are terribly ineficient and frustrating. Our Agent SDK and Agent Context Store insulates you and your agent from this headache, allowing you to query from those synced datasets directly.
The feedback about wanting to download a parquet file is super interesting...
aaronsteers•May 6, 2026
Glad to hear this resonates with you also. We're aiming to give agents more control over their context, and easier access paths regardless of the source system.
mtricot•May 5, 2026
Just want to call out a couple of nuances in our methodology. In general, we tried our best to do apples-to-apples comparisons where we could, and gave ourselves a discount where we couldn’t. Unsurprisingly, it’s a challenge to find MCPs for various vendors (which is another reason we are trying to solve this). Here’s a video walkthrough of the benchmark harness:https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54acc
Where the comparison wasn't valid or not apples-to-apples:
Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.
While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.
The general test set:
2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!
Where the vendor MCP wins or ties:
Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.
We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).
Where the vendor MCP is costly to context:
Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!
Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.
swyx•May 5, 2026
(former employee here) congrats Michel! so glad to see you guys adapting to the AI age so well (and using the crap out of Devin!)
i think my most sad/interesting observation about ai engineers is that many ai apps are super data hungry, but many dont have the necessary data engineering background to even know they need an airbyte or what tradeoffs to make in an etl pipeline. would love a "data engineering for ai engineers" type braindump session from someone from airbyte at AIE (https://ai.engineer/cfp )
jeanlaf•May 6, 2026
Thanks swyx! We'd love to do that session "data engineering for ai engineers", will make you an intro to the right person in the team.
aaronsteers•May 6, 2026
Hey, swyx! Great seeing you here.
> airbyte agents could serve as a form of MCP gateway
Exactly! And a single set of tools for agents to access both realtime (direct reads/writes) as well as cached (Context Store), bringing hopefully the best access path for each different use case.
> would love a "data engineering for ai engineers" type braindump ... at AIE
Great idea - we have a booth at AIE, and we'll submit there for a talk. Mario will reach out to you about this. :)
jscheel•May 5, 2026
I feel like we've been working in parallel here :) We are using PyAirbyte (hi
aaronsteers) for our users to connect their data sources to our agents. We originally wanted to use the airbyte white-label platform, but the team said that it was being deprecated. I think this really drives home just how crucial it is to have a clear model for accessing your data, and Airbyte has been great at that for quite a while.
aaronsteers•May 6, 2026
Hello, Jared! Small world! Yes, we did deprecate our old PbA (Powered by Airbyte) offering, but in many ways our new Agents and Embedded offering is a more robust and agent-friendly successor to that older offering.
I am happy to hear you are still getting value out of PyAirbyte! If you do try out Airbyte Agents, please let us know how it goes! We are always listening to feedback and would love to hear from you as you explore the new tools and capabilities.
ritonlajoie•May 5, 2026
Hi Michel, congrats and I have nice memories of working with you in lafayette street !!
Keep up the good work on airbyte ! :)
mtricot•May 6, 2026
Great to see you here!
pjm331•May 5, 2026
sounds very familiar to what I ended up doing on my internal system - especially anything to do with search - much better to just sync everything to a DB and give the agent access to the DB
aaronsteers•May 6, 2026
That's great to hear - great minds think alike!
> give the agent access to the DB
This is where Airbyte really can shine, I think, and the total can be more the sum of the parts. Because Airbyte excels at data replication already, we can populate your the Agent Context Store without users or agents ever needing to think about the words "ELT" or "ETL".
We're listening carefully to feedback so we hope you will give it a try and let us know how it goes! Thanks!
nerdright•May 5, 2026
This is such a great direction airbyte is taking and congrats to the lunch! I think you're very well-positioned for this opportunity than most people realize, given your reputable brand and your uncanny expertise in etl. It's honestly a natural progression of airbyte as far as the current AI landscape goes. Kudos to you and the team!
(We use airbyte at my company, although we self-host it.)
aaronsteers•May 6, 2026
Thanks! Really appreciate the kind words. Looking forward to seeing what our amazing community builds with these new tools.
jessewmc•May 5, 2026
Looks interesting!
If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
Have you done any testing on guided indexing, or metadata layers on top of the data? My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time. I.e. _what_ you index, and how you signpost for agents, becomes really important (unless your data is super clean I guess). This does look like a good foundation for that kind of tooling though!
aaronsteers•May 6, 2026
Hi, @jessewmc. Thanks for your reply. Regarding your points:
> If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
While we haven't yet published details on the backend implementation, I can say that our implementation performs very well without needing to prioritize specific fields for indexing. We aim for large text fields to perform decently and retrieval based on small/compressible fields like ints to be fast. (More to come on this in the coming months.)
> Have you done any testing on guided indexing, or metadata layers on top of the data?
We've been testing with different data scales and shapes. Nothing detailed to share yet, but performance has (so far) never itself become the bottleneck in our agent testing. (The LLM thinking itself is often the bottleneck.)
> My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time.
Airbyte has rich metadata on our upstream connector's data models, which I think helps us a lot to deliver helpful context to the agent. Another option, when optimizing for specific use cases, is to build your own agent tools on top of our Agent SDK. This allows you to make the calls organic and build the tools in a way that makes natural sense to the agent, regardless of source shape or which system(s) that data is coming from.
> This does look like a good foundation for that kind of tooling though!
We agree! Thanks again for sharing your thoughts here.
slurpyb•May 5, 2026
Your billing support email forwards to a google group which rejects the email entirely. So i embedded my question inside the websites sales enquiry form and received multiple rounds of emails that couldn’t be further from human.
It’s not why we started using posthog but it definitely sealed the deal when you see how simple and reliable that experience is
mtricot•May 6, 2026
Let me see what's up and fix that!
tomrod•May 6, 2026
What actions does agents enable that weren't already available from Airbyte?
aaronsteers•May 6, 2026
The new Airbyte Agents offering brings a ton of new capabilities actually.
1. Programmatic Interfaces: Including a new REST API, SDK, and MCP Server.
2. New action verbs: Not just replication anymore. We have get/set/list/update/upload, and more!
3. New credentials passthrough: For all the above, you OAuth to Airbyte and we OAuth on your behalf to the systems your agent needs. No need to provide your agents dozens of different secrets in order to access the systems it needs.
4. Context Store. Like your agents' own data warehouse, but completely automatic and hands-free. For those use cases that just aren't possible when calling the REST API directly.
Shameless plug: I have written a paper about using the MCP server architecture to enable agents to overcome the knowledge cutoff, to work with software released after the training stop.
OpenClaw, Hermes and other agents have already made skill adoption mainstream?
Are you guys still seeing a future where people are dumping entire MCP tool defs into context?
aaronsteers•May 6, 2026
Great question, @Tsarp - Skill and tools work great together. What we've found is that agents generally need both to achieve great results. We're actually not trying to replace skills, but to give them new super powers.
Are there any examples you've run into where skills were missing tools (or data) that they needed for a specific task?
ck_one•May 6, 2026
More and more SaaS companies like ServiceNow or Hubspot are creating new tollgates for agent api calls. How do you think will this impact Airbyte Agents? I guess that replicating data locally will be harder since the platforms will try to protect it or charge for it.
andai•May 6, 2026
The prompts you mentioned here sound like SQL. Is there any way to run actual SQL on these systems? Is "agents need to poke around endlessly" a symptom of the fact that there isn't a way to run an actual query?
(I'd guess there is actually SQL at the bottom layer, but there's no way to talk to it?)
sho•May 6, 2026
That's actually the approach we took with https://gentility.ai/ - we either provide almost-raw SQL query access to the DBs themselves or we synthesize from API into DuckDB via parquet and make that available to the agent to just directly query. It works well - my philosophy is to give agents the sharpest tools you can, and SQL is the best tool there is.
I understand the instinct to try to make a proprietary moat around it all but I think the pattern is useful and obvious enough that all big orgs will be doing something very similar within 5 years or so.
thecopy•May 6, 2026
Super interesting idea! Congrats on the launch. Context is definitely something that is lacking in my experience. Im always frustrated when an agent cannot answer business-related questions, and i compare them to coding agents which seem to be able to answer everything. The difference is that coding agent has the context right there at the fingertips, while for business its gated behind a bunch of services and custom data models. Context is king :)
How do you handle encryption and confidentiality? Im building in this space too (MCP gateway https://www.gatana.ai/) which already have semantic search for tool outputs, and ensuring encryption and confidentiality is not trivial.
15 Comments
Yes, we've definitely found that some API data models are easier for models to navigate than others.
The largest factors of Agent inefficiency we've identified so far are: 1. Many APIs lack robust-enough search, forcing agents to page through hundreds or thousands of paginated responses until they find the record they are looking for (our Context Store addresses this). 2. Many APIs have HUGE response sets. Our MCP helps handle this by letting the agent decide exactly what fields they can return. 3. With our SDK, you can literally build your own MCP on top of any source we support (50+ right now and will grow). This is super powerful, and allows you to build more ergonomic MCP servers and tools - even if the models themselves are not intuitive or easy for the LLM to leverage directly.
Combining all three of these together, we see the vast majority of challenges can be addressed via a strong system prompt for guidance. Fine tuning could get you further but anyway, you'd still want your fine tuned model to build on this same foundation, since the efficiences will transfer across use cases and models.
@ecares - Does this answer your question? What do you think?
Working with APIs is often frustrating and the worst ones are terribly ineficient and frustrating. Our Agent SDK and Agent Context Store insulates you and your agent from this headache, allowing you to query from those synced datasets directly.
The feedback about wanting to download a parquet file is super interesting...
Where the comparison wasn't valid or not apples-to-apples:
Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.
While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.
The general test set:
2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!
Where the vendor MCP wins or ties:
Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.
We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).
Where the vendor MCP is costly to context:
Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!
Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.
hmm so airbyte agents could serve as a form of MCP gateway, or a key building block of an MCP gateway, which btw is how anthropic uses mcp themselves for all their internal apps https://www.youtube.com/watch?v=CD6R4Wf3jnY&t=1s&pp=0gcJCd4K...
i think my most sad/interesting observation about ai engineers is that many ai apps are super data hungry, but many dont have the necessary data engineering background to even know they need an airbyte or what tradeoffs to make in an etl pipeline. would love a "data engineering for ai engineers" type braindump session from someone from airbyte at AIE (https://ai.engineer/cfp )
> airbyte agents could serve as a form of MCP gateway
Exactly! And a single set of tools for agents to access both realtime (direct reads/writes) as well as cached (Context Store), bringing hopefully the best access path for each different use case.
> would love a "data engineering for ai engineers" type braindump ... at AIE
Great idea - we have a booth at AIE, and we'll submit there for a talk. Mario will reach out to you about this. :)
I am happy to hear you are still getting value out of PyAirbyte! If you do try out Airbyte Agents, please let us know how it goes! We are always listening to feedback and would love to hear from you as you explore the new tools and capabilities.
> give the agent access to the DB
This is where Airbyte really can shine, I think, and the total can be more the sum of the parts. Because Airbyte excels at data replication already, we can populate your the Agent Context Store without users or agents ever needing to think about the words "ELT" or "ETL".
We're listening carefully to feedback so we hope you will give it a try and let us know how it goes! Thanks!
(We use airbyte at my company, although we self-host it.)
If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
Have you done any testing on guided indexing, or metadata layers on top of the data? My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time. I.e. _what_ you index, and how you signpost for agents, becomes really important (unless your data is super clean I guess). This does look like a good foundation for that kind of tooling though!
> If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?
While we haven't yet published details on the backend implementation, I can say that our implementation performs very well without needing to prioritize specific fields for indexing. We aim for large text fields to perform decently and retrieval based on small/compressible fields like ints to be fast. (More to come on this in the coming months.)
> Have you done any testing on guided indexing, or metadata layers on top of the data?
We've been testing with different data scales and shapes. Nothing detailed to share yet, but performance has (so far) never itself become the bottleneck in our agent testing. (The LLM thinking itself is often the bottleneck.)
> My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time.
Airbyte has rich metadata on our upstream connector's data models, which I think helps us a lot to deliver helpful context to the agent. Another option, when optimizing for specific use cases, is to build your own agent tools on top of our Agent SDK. This allows you to make the calls organic and build the tools in a way that makes natural sense to the agent, regardless of source shape or which system(s) that data is coming from.
> This does look like a good foundation for that kind of tooling though!
We agree! Thanks again for sharing your thoughts here.
It’s not why we started using posthog but it definitely sealed the deal when you see how simple and reliable that experience is
1. Programmatic Interfaces: Including a new REST API, SDK, and MCP Server. 2. New action verbs: Not just replication anymore. We have get/set/list/update/upload, and more! 3. New credentials passthrough: For all the above, you OAuth to Airbyte and we OAuth on your behalf to the systems your agent needs. No need to provide your agents dozens of different secrets in order to access the systems it needs. 4. Context Store. Like your agents' own data warehouse, but completely automatic and hands-free. For those use cases that just aren't possible when calling the REST API directly.
Again - thanks for your comment and sorry for the longwinded response. More info here: https://docs.airbyte.com/ai-agents/
[https://zenodo.org/records/19925469]
OpenClaw, Hermes and other agents have already made skill adoption mainstream?
Are you guys still seeing a future where people are dumping entire MCP tool defs into context?
Are there any examples you've run into where skills were missing tools (or data) that they needed for a specific task?
(I'd guess there is actually SQL at the bottom layer, but there's no way to talk to it?)
I understand the instinct to try to make a proprietary moat around it all but I think the pattern is useful and obvious enough that all big orgs will be doing something very similar within 5 years or so.
How do you handle encryption and confidentiality? Im building in this space too (MCP gateway https://www.gatana.ai/) which already have semantic search for tool outputs, and ensuring encryption and confidentiality is not trivial.