Introducing NotebookLM as a solution for interactive code exploration

9 min readNov 1, 2024

Tackling a large codebase is like wandering through a city with no map and street signs in a foreign language. Each module is a random alley, every dependency a tricky roundabout, and piecing together the big picture feels like assembling a jigsaw puzzle… in a blackout. For developers, this often means hours spent decoding cryptic functions, deciphering tangled dependencies, and trying to connect the dots — all while somehow staying focused on the end goal.

In this article I will show you how I used NotebookLM to have an interactive exploration and document-based insights about the crewAI repo (crewAI is a popular agentic framework). While CrewAI is already well-documented [link], this approach is better suited when tackling more “mysterious” repos that lack proper documentation.

Making the code Chat-Ready

The first step is transforming the code into a format that the NotebookLM can efficiently process. NotebookLM has specific limitations: it allows a maximum of 50 files per notebook, each capped at 500,000 words. While this capacity is ample for many text-based projects, a large repository with multiple files and thousands of lines of code can quickly surpass these limits.

To work within these constraints, we convert the entire crewAI repo into NotebookLM-friendly text files. The script, available [here], consolidates the contents of the repo by merging files into larger text files.

python repo_to_text.py --repo_path repos/crewAI/ --output_dir repoTXTs/crewai

When the content of a file is appened t othe text file, it follows the following format: [DELIMITER] path_to_file [DELIMITER] file_content [DELIMITER].

Below is a truncated example as found in a gneerated text file.

****************************************
tests/cli/test_plus_api.py
****************************************
import os
import unittest
from unittest.mock import MagicMock, patch
from crewai.cli.plus_api import PlusAPI


class TestPlusAPI(unittest.TestCase):
    def setUp(self):
        self.api_key = "test_api_key"
        self.api = PlusAPI(self.api_key)

    def test_init(self):
        self.assertEqual(self.api.api_key, self.api_key)
        self.assertEqual(self.api.headers["Authorization"], f"Bearer {self.api_key}")
        self.assertEqual(self.api.headers["Content-Type"], "application/json")
        self.assertTrue("CrewAI-CLI/" in self.api.headers["User-Agent"])
        self.assertTrue(self.api.headers["X-Crewai-Version"])You can take a look at a text file example here:

For the entire crewAI repo, the code generated 7 text files with a maximum total of 400,000 words in each file.
Next, we create a new notebook and upload our text files.

check out this tutorial to get started with NotebookLM: https://www.youtube.com/watch?v=1A9o-MalN0k.

8 Ways to “Chat” with the Code

Now, let’s dive in and start the conversation with the repo! Here are 8 ways NotebookLM can help you navigate, optimize, and build upon complex codebases.

1. Summarize All the Things

We can start by requesting high-level summaries of entire modules or drill down to specific functions. These summaries will provide a clear map of what the repository offers without needing to manually review each part. It’s like getting a high-level overview combined with specific insights, helping you quickly assess the code’s capabilities and structure.

prompt: Generate a directory tree of the repo.

An extract of Gemini response [link to full response]

2. Onboarding and Learning Path Suggestions

We can use NotebookLM to guide us through the codebase with an onboarding path tailored to essential concepts and functions. It will identify critical modules, functions, and dependencies in sequence, and consequently helping us navigate the repository in manageable steps. By providing explanations of key classes and methods — often with relevant code snippets and usage examples — NotebookLM will enable us to understand each component’s role and how they connect within the codebase.

prompt: Can you suggest an onboarding path for beginners or new team members to understand and work with this repository? Focus on the essential modules, functions, and dependencies, and provide clear learning steps with goals for each stage. Include specific code examples, key methods to explore, and practical checkpoints where they can test their understanding. If possible, suggest a small project or exercise for hands-on practice using the core functionalities.

3. Generating Familiarization Code Snippets

NotebookLM can also help you get familiar with the repo’s functionalities by generating code examples for small, practical implementations. By referencing relevant parts of the codebase, NotebookLM can provide initial snippets that let you see functions in action and understand how they interact with other components — making it easier to start exploring and building on the repository’s capabilities.

Prompt: Implement a random example of a crew of agents.

4. Dependency Detective

Large codebases often involve intricate inter-dependencies, which can be challenging to untangle. We can use NotebookLM to map out these relationships, providing a clear view of how different modules interact. By understanding data flow and function dependencies, you gain a deeper insight into how changes in one area may impact others, allowing for more informed development decisions.

prompt: Can you provide a detailed breakdown of the dependencies between the agent and task modules in the CrewAI framework? Include specific function calls, methods, or data structures where agent relies on task for execution. Also, list any parameter exchanges or direct references in the code, and if possible, provide code snippets that illustrate these dependencies.

5. Identifying Vulnerabilities

One valuable use case for NotebookLM is performing a security review on a codebase to identify potential vulnerabilities before deployment. By guiding NotebookLM to examine areas prone to risks — such as code injection points, access control mechanisms, data handling practices, and dependencies — it can help flag sections of the code that may need tightening. For example, NotebookLM might pinpoint functions with unsanitized inputs, highlight dependency versions with known vulnerabilities, or recommend stronger access controls on sensitive modules. With these insights, developers can proactively address weaknesses, making the repository safer and more robust before it goes live.

prompt: Can you analyze this codebase for any security vulnerabilities that I should address before deploying? Specifically, look for risks related to code injection, inadequate access control, improper data handling, dependency vulnerabilities, and any other common security issues in repositories. Provide examples of code sections or functions that might need security improvements, along with recommendations for mitigating these risks.

6. Feature Exploration Guide

NotebookLM can act as a feature exploration guide, outlining optional features, hidden modules, or “nice-to-have” functionalities within the codebase that might be less apparent. This is helpful for exploring the repo’s full potential and discovering lesser-used but valuable components.

prompt: Can you provide a feature exploration guide for this codebase? Outline any optional features, hidden modules, or ‘nice-to-have’ functionalities that may not be immediately obvious but could be valuable. Include a brief description of each feature, its purpose, and examples of how it might enhance or extend the core functionalities of the codebase.

7. State and Data Flow

NotebookLM can be a powerful tool for visualizing the state and data flow within a complex codebase, offering insights into how data moves and transforms across functions and modules. This is especially valuable in data-intensive applications where understanding the journey of data — from input to processing to output — is crucial for effective usage.

prompt:describe how data flows between functions or modules, providing an overview of the transformations data undergoes throughout the codebase. Understanding data flow can be crucial for learning how to use a repository effectively, especially in data-intensive application

8. New Functionalities: Accelerating Codebase Expansion

Incorporating new functionalities into an existing codebase often requires careful planning to ensure compatibility and maintain performance. With NotebookLM, you can streamline this process: the tool analyzes the current architecture, dependencies, and coding style to propose starter code for new features that seamlessly integrate into the existing framework. Since NotebookLM is limited to the documents you upload, there’s less risk of it “hallucinating” framework functionalities — like mistakenly referencing a non-existent function.

prompt: I want each agent within my multi-agent setup to log their individual token usage during interactions. How can I implement this functionality to monitor and print each agent’s token usage after every interaction? Additionally, where in the existing code should this feature be integrated for optimal tracking and minimal performance impact? Please provide guidance on which methods or modules would be most appropriate to modify or extend to capture token usage data effectively.

I’m sure there are even more ways to interact with the repo, but these eight examples highlight what’s possible with NotebookLM when using a repository’s content as context.

Advantages and Limitations of NotebookLM for code explorations

Now, let’s discuss the advantages and the limitation of using NotebookLM.

Advantages

Ready-to-use solution: NotebookLM offers the unique advantage of being ready to use right out of the box, unlike custom Retrieval-Augmented Generation (RAG) pipelines that require setup and fine-tuning.
High Context Capacity: NotebookLM is powered by Google’s Gemini 1.5 Pro model, and capable of handling up to 2 million tokens. This allows it to analyze entire repositories and large documents, supporting deep, in-depth interactions.
Contextually Grounded Responses: NotebookLM focuses exclusively on uploaded documents, keeping responses project-specific and minimizing the risk of hallucinations

Limitations

File and Word Limits: NotebookLM restricts users to 50 files per notebook, each with a maximum of 500,000 words/tokens [info]. For complex or extensive codebases, this limitation requires a thoughtful approach to selecting and uploading only the most critical parts of the repository.
Contextual Grounding Constraints: NotebookLM works strictly within the context of the uploaded documents, meaning it does not access or pull information from external sources. While this keeps responses focused on the codebase, it can be a limitation if you need additional context, references, or industry-standard practices outside the repository itself.
Processing Speed: Processing large documents can slow down response times, and handling numerous files adds to the delay. For instance, in the CrewAI example, which uses seven text files with around 200,000 words each, it takes about 2 to 3 seconds for NotebookLM to begin generating an answer.
Lack of Direct Code Execution and Testing: A key drawback of NotebookLM, specifically for the use case of code exploration, is that it cannot directly execute or test code. This limits its utility as a hands-on development tool, as users cannot validate code snippets, run unit tests, or experiment with code execution directly within the chat window.

Conclusion

If you’re staring down a monster of a repository, give NotebookLM a shot! It’s a fantastic way to turn that daunting codebase into a friendly guide. Not only will it help you get acquainted with the repo or library, but it might even make the whole experience feel less like solving a giant jigsaw puzzle and more like following a well-marked treasure map.

Have you tried using NotebookLM for anything beyond code conversations? I’d love to hear about your use cases!

Thank you for reading this article, consider clapping 👏 .

[repo] https://github.com/jmlb/repoClerk

[2024-11-13 UPDATE]🔥 https://www.youtube.com/playlist?list=PLqOeNtecSqR9_tNJVICLgON-m3UYV2QT0
It’s the link to the podcast generated from crewAI’s repository using NotebookLM. The content is surprisingly engaging, making it a great companion for drives, hikes, or walks.