Github Trending - Weekly
Github weekly trending
Shardeum is an EVM based autoscaling blockchain
An EVM-based autoscaling blockchain platform
Table of Contents
- Introduction
- Features
- Getting Started
- Running the Network Locally
- Testing with MetaMask
- Stopping and Cleaning Up
- Health Checks
- Contributing
- Community
- License
Introduction
Shardeum is an innovative EVM-compliant blockchain platform that leverages dynamic state sharding to achieve unprecedented scalability. By implementing a sharding model, Shardeum ensures faster processing times and lower transaction costs without compromising security or decentralization.
Features
- Scalability: Horizontal scalability through sharding
- High Performance: Low latency and high throughput
- Security: Advanced cryptographic techniques and robust consensus protocols
- Decentralization: Truly decentralized network with no single point of failure
- Interoperability: EVM compatibility for existing DApps and smart contracts
Getting Started
Prerequisites
- Node.js (v18.19.1)
- npm (v10.2.4)
- Rust (v1.74.1)
- Docker (optional, for containerized deployment)
Setting Up Your Environment
Shardeum requires specific versions of Nodejs, Rust and other build tools to run.
We have detailed setup instructions in this page
[!IMPORTANT] This is a crucial step, ensure your local environment is correctly set up before proceeding with the next steps
Installation
- Clone the repository:
git clone https://github.com/shardeum/shardeum.git
cd shardeum
- Install dependencies:
npm ci
- Network Configuration:
git apply debug-10-nodes.patch
Learn more about the different config options here
- Compile project
npm run prepare
- Install the Shardus CLI:
npm install -g shardus
npm update @shardus/archiver
Running the Network Locally
To start a local Shardeum network with 10 nodes, run:
shardus start 10
Running the JSON-RPC Server
- Clone the JSON-RPC server repository:
git clone https://github.com/shardeum/json-rpc-server.git
cd json-rpc-server
npm install
- Start the server:
npm run start
The default RPC URL is http://localhost:8080
.
Testing with MetaMask
To test your local Shardeum network using MetaMask:
- Install the MetaMask extension.
- Add the Shardeum network to MetaMask:
- Network Name: Shardeum
- RPC URL: http://localhost:8080
- Chain ID: 8082
- Currency Symbol: SHM
- Block Explorer URL: http://localhost:6001/
- Obtaining Test Tokens and Configuring the Genesis File: To receive SHM tokens for testing on your local Shardeum network, you need to add your wallet address to the src/config/genesis.json file. Open this file in a text editor and add an entry for your wallet address with the desired SHM balance like this:
"YOUR-WALLET-ADDRESS": {
"wei": "200000000000000000000000000"
},
Stopping and Cleaning Up
To stop the network and clean up resources:
shardus stop && shardus clean && rm -rf instances
Health Checks
Diagnostic endpoints to check the health of the node
- GET
/is-alive
this endpoint returns 200 if the server is running. - GET
/is-healthy
currently the same as/is-alive
but will be expanded.
Contributing
We welcome contributions! Please see our Contribution Guidelines for more information. All contributors are expected to adhere to our Code of Conduct.
Community
License
This project is licensed under the MIT License. See the LICENSE file for details.
> htmx - high power tools for HTML
high power tools for HTML
introduction
htmx allows you to access AJAX, CSS Transitions, WebSockets and Server Sent Events directly in HTML, using attributes, so you can build modern user interfaces with the simplicity and power of hypertext
htmx is small (~14k min.gz'd), dependency-free & extendable
motivation
- Why should only
<a>
and<form>
be able to make HTTP requests? - Why should only
click
&submit
events trigger them? - Why should only GET & POST be available?
- Why should you only be able to replace the entire screen?
By removing these arbitrary constraints htmx completes HTML as a hypertext
quick start
<script src="https://unpkg.com/[email protected]"></script>
<!-- have a button POST a click via AJAX -->
<button hx-post="/clicked" hx-swap="outerHTML">
Click Me
</button>
The hx-post
and hx-swap
attributes tell htmx:
"When a user clicks on this button, issue an AJAX request to /clicked, and replace the entire button with the response"
htmx is the successor to intercooler.js
installing as a node package
To install using npm:
npm install htmx.org --save
Note there is an old broken package called htmx
. This is htmx.org
.
website & docs
contributing
Want to contribute? Check out our contribution guidelines
No time? Then become a sponsor
hacking guide
To develop htmx locally, you will need to install the development dependencies.
Run:
npm install
Then, run a web server in the root.
This is easiest with:
npx serve
You can then run the test suite by navigating to:
At this point you can modify /src/htmx.js
to add features, and then add tests in the appropriate area under /test
.
/test/index.html
- the root test page from which all other tests are included/test/attributes
- attribute specific tests/test/core
- core functionality tests/test/core/regressions.js
- regression tests/test/ext
- extension tests/test/manual
- manual tests that cannot be automated
htmx uses the mocha testing framework, the chai assertion framework and sinon to mock out AJAX requests. They are all OK.
You can also run live tests and demo of the WebSockets and Server-Side Events extensions with npm run ws-tests
haiku
javascript fatigue:
longing for a hypertext
already in hand
Examples and guides for using the Gemini API
Welcome to the Gemini API Cookbook
This is a collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts and using different features of the API, and examples of things you can build.
${\Large \textbf{\color[rgb]{0.12941,0.48235,0.99608}N\color[rgb]{0.57647,0.60392,1}e\color[rgb]{0.91765,0.47843,0.72157}w\color[rgb]{0.93333,0.30196,0.36471}:}}$ Check out the latest Gemini 2.0 capabilities in the docs, Google AI Studio and here in the cookbook.
Get started with the Gemini API
The Gemini API gives you access to Gemini models created by Google DeepMind. Gemini models are built from the ground up to be multimodal, so you can reason seamlessly across text, images, code, and audio. You can use these to develop a range of applications.
Start developing
- Go to Google AI Studio.
- Log in with your Google account.
- Create an API key.
- Use a quickstart for Python, or call the REST API using curl.
What's New?
We're excited to show you the latest additions to the Gemini API, and new notebooks.
- Gemini 2.0: Explore the capabilities of the new Gemini 2.0 model, including multimodal Live API, audio streaming applications with tool use and Spatial understanding.
Table of contents
Learn about the capabilities of the Gemini API by checking out these quickstart tutorials.
- Authentication: Start here to learn how you can set up your API key so you can get access to the Gemini API.
- Counting Tokens Tokens are the basic inputs to the Gemini models. Through this notebook, you will gain a better understanding of tokens through an interactive experience.
- Files: Use the Gemini API to upload files (text, code, images, audio, video) and write prompts using them.
- Audio: Learn how to use the Gemini API with audio files.
- JSON mode: Discover how to use JSON mode.
- Function Calling: The Gemini API works great with code. Use this quickstart to learn how to write prompts to understand and call functions. Then check out the function calling config tutorial to learn more.
- System Instructions: Give models additional context on how to respond by setting system instructions.
- Embeddings: Create high-quality and task-specific embeddings.
- Tuning: Learn how to improve model performance on a specific task through tuning.
- Code execution: Solve complex tasks by Generating and running Python code based on plain-text instructions.
You can find lots more in the quickstarts folder, and check out the examples folder for fun examples. We're also maintaining an Awesome Gemini list of all the cool projects the community is building using Gemini.
Official SDKs
The Gemini API is a REST API. You can call the API using a command line tool like curl
(and you can find REST examples here), or by using one of our official SDKs:
- Python: Google GenAI SDKs will eventually replace the older Developer SDK.
- Node.js
- Dart (Flutter)
- Android
- Swift
- Go
Get help
Ask a question on the Google AI Developer Forum.
The Gemini API on Google Cloud Vertex AI
If you're an enterprise developer looking to build on a fully managed platform, you can also use the Gemini API on Google Cloud. Check out this repo for lots of cool examples.
Contributing
Contributions are welcome. See contributing to learn more.
Thank you for developing with the Gemini API! Weโre excited to see what you create.
Performant, batteries-included completion plugin for Neovim
[!WARNING] This plugin is beta quality. Expect breaking changes and many bugs
Blink Completion (blink.cmp)
blink.cmp is a completion plugin with support for LSPs and external sources that updates on every keystroke with minimal overhead (0.5-4ms async). It use a custom SIMD fuzzy searcher to easily handle >20k items. It provides extensibility via hooks into the trigger, sources and rendering pipeline. Plenty of work has been put into making each stage of the pipeline as intelligent as possible, such as frecency and proximity bonus on fuzzy matching, and this work is on-going.
https://github.com/user-attachments/assets/9849e57a-3c2c-49a8-959c-dbb7fef78c80
Features
- Works out of the box with no additional configuration
- Updates on every keystroke (0.5-4ms async, single core)
- Typo resistant fuzzy with frecency and proximity bonus
- Extensive LSP support (tracker)
- Native
vim.snippet
support (includingfriendly-snippets
) - External sources support (compatibility layer for
nvim-cmp
sources) - Auto-bracket support based on semantic tokens
- Signature help (experimental, opt-in)
- Command line completion
- Comparison with nvim-cmp
Getting Started
Head over to the documentation website for installation instructions and configuration options.
Special Thanks
- @hrsh7th nvim-cmp used as inspiration and cmp-path/cmp-cmdline implementations modified for path/cmdline sources
- @garymjr nvim-snippets implementation modified for snippets source
- @redxtech Help with design and testing
- @aaditya-sahay Help with rust, design and testing
Contributors
- @stefanboca Author of blink.compat
- @lopi-py Contributes to the windowing code
- @scottmckendry Contributes to the CI and prebuilt binaries
- @balssh Manages nixpkg and nixvim
โ๏ธ๐ฆ Build portable, modular & lightweight Fullstack Agents
โจ If you would like to help spread the word about Rig, please consider starring the repo!
[!WARNING] Here be dragons! As we plan to ship a torrent of features in the following months, future updates will contain breaking changes. With Rig evolving, we'll annotate changes and highlight migration paths as we encounter them.
What is Rig?
Rig is a Rust library for building scalable, modular, and ergonomic LLM-powered applications.
More information about this crate can be found in the official & crate (API Reference) documentations.
Help us improve Rig by contributing to our Feedback form.
Table of contents
High-level features
- Full support for LLM completion and embedding workflows
- Simple but powerful common abstractions over LLM providers (e.g. OpenAI, Cohere) and vector stores (e.g. MongoDB, in-memory)
- Integrate LLMs in your app with minimal boilerplate
Get Started
cargo add rig-core
Simple example:
use rig::{completion::Prompt, providers::openai};
#[tokio::main]
async fn main() {
// Create OpenAI client and model
// This requires the `OPENAI_API_KEY` environment variable to be set.
let openai_client = openai::Client::from_env();
let gpt4 = openai_client.agent("gpt-4").build();
// Prompt the model and print its response
let response = gpt4
.prompt("Who are you?")
.await
.expect("Failed to prompt GPT-4");
println!("GPT-4: {response}");
}
Note using #[tokio::main]
requires you enable tokio's macros
and rt-multi-thread
features or just full
to enable all features (cargo add tokio --features macros,rt-multi-thread
).
You can find more examples each crate's examples
(ie. rig-core/examples
) directory. More detailed use cases walkthroughs are regularly published on our Dev.to Blog and added to Rig's official documentation (docs.rig.rs).
Supported Integrations
Model Providers | Vector Stores |
---|---|
Vector stores are available as separate companion-crates:
- MongoDB vector store:
rig-mongodb
- LanceDB vector store:
rig-lancedb
- Neo4j vector store:
rig-neo4j
- Qdrant vector store:
rig-qdrant
Build multi-modal Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.
phidata
Build multi-modal Agents with memory, knowledge, tools and reasoning.
What is phidata?
Phidata is a framework for building multi-modal agents, use phidata to:
- Build multi-modal agents with memory, knowledge, tools and reasoning.
- Build teams of agents that can work together to solve problems.
- Chat with your agents using a beautiful Agent UI.
Install
pip install -U phidata
Key Features
- Simple & Elegant
- Powerful & Flexible
- Multi-Modal by default
- Multi-Agent orchestration
- A beautiful Agent UI to chat with your agents
- Agentic RAG built-in
- Structured Outputs
- Reasoning Agents
- Monitoring & Debugging built-in
- Demo Agents
Simple & Elegant
Phidata Agents are simple and elegant, resulting in minimal, beautiful code.
For example, you can create a web search agent in 10 lines of code, create a file web_search.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
web_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
show_tool_calls=True,
markdown=True,
)
web_agent.print_response("Tell me about OpenAI Sora?", stream=True)
Install libraries, export your OPENAI_API_KEY
and run the Agent:
pip install phidata openai duckduckgo-search
export OPENAI_API_KEY=sk-xxxx
python web_search.py
Powerful & Flexible
Phidata agents can use multiple tools and follow instructions to achieve complex tasks.
For example, you can create a finance agent with tools to query financial data, create a file finance_agent.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.yfinance import YFinanceTools
finance_agent = Agent(
name="Finance Agent",
model=OpenAIChat(id="gpt-4o"),
tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
instructions=["Use tables to display data"],
show_tool_calls=True,
markdown=True,
)
finance_agent.print_response("Summarize analyst recommendations for NVDA", stream=True)
Install libraries and run the Agent:
pip install yfinance
python finance_agent.py
Multi-Modal by default
Phidata agents support text, images, audio and video.
For example, you can create an image agent that can understand images and make tool calls as needed, create a file image_agent.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
markdown=True,
)
agent.print_response(
"Tell me about this image and give me the latest news about it.",
images=["https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg"],
stream=True,
)
Run the Agent:
python image_agent.py
Multi-Agent orchestration
Phidata agents can work together as a team to achieve complex tasks, create a file agent_team.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools
web_agent = Agent(
name="Web Agent",
role="Search the web for information",
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
show_tool_calls=True,
markdown=True,
)
finance_agent = Agent(
name="Finance Agent",
role="Get financial data",
model=OpenAIChat(id="gpt-4o"),
tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True)],
instructions=["Use tables to display data"],
show_tool_calls=True,
markdown=True,
)
agent_team = Agent(
team=[web_agent, finance_agent],
model=OpenAIChat(id="gpt-4o"),
instructions=["Always include sources", "Use tables to display data"],
show_tool_calls=True,
markdown=True,
)
agent_team.print_response("Summarize analyst recommendations and share the latest news for NVDA", stream=True)
Run the Agent team:
python agent_team.py
A beautiful Agent UI to chat with your agents
Phidata provides a beautiful UI for interacting with your agents. Let's take it for a spin, create a file playground.py
[!NOTE] Phidata does not store any data, all agent data is stored locally in a sqlite database.
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.storage.agent.sqlite import SqlAgentStorage
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools
from phi.playground import Playground, serve_playground_app
web_agent = Agent(
name="Web Agent",
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
storage=SqlAgentStorage(table_name="web_agent", db_file="agents.db"),
add_history_to_messages=True,
markdown=True,
)
finance_agent = Agent(
name="Finance Agent",
model=OpenAIChat(id="gpt-4o"),
tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
instructions=["Use tables to display data"],
storage=SqlAgentStorage(table_name="finance_agent", db_file="agents.db"),
add_history_to_messages=True,
markdown=True,
)
app = Playground(agents=[finance_agent, web_agent]).get_app()
if __name__ == "__main__":
serve_playground_app("playground:app", reload=True)
Authenticate with phidata by running the following command:
phi auth
or by exporting the PHI_API_KEY
for your workspace from phidata.app
export PHI_API_KEY=phi-***
Install dependencies and run the Agent Playground:
pip install 'fastapi[standard]' sqlalchemy
python playground.py
- Open the link provided or navigate to
http://phidata.app/playground
- Select the
localhost:7777
endpoint and start chatting with your agents!
Agentic RAG
We were the first to pioneer Agentic RAG using our Auto-RAG paradigm. With Agentic RAG (or auto-rag), the Agent can search its knowledge base (vector db) for the specific information it needs to achieve its task, instead of always inserting the "context" into the prompt.
This saves tokens and improves response quality. Create a file rag_agent.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.embedder.openai import OpenAIEmbedder
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.lancedb import LanceDb, SearchType
# Create a knowledge base from a PDF
knowledge_base = PDFUrlKnowledgeBase(
urls=["https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
# Use LanceDB as the vector database
vector_db=LanceDb(
table_name="recipes",
uri="tmp/lancedb",
search_type=SearchType.vector,
embedder=OpenAIEmbedder(model="text-embedding-3-small"),
),
)
# Comment out after first run as the knowledge base is loaded
knowledge_base.load()
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
# Add the knowledge base to the agent
knowledge=knowledge_base,
show_tool_calls=True,
markdown=True,
)
agent.print_response("How do I make chicken and galangal in coconut milk soup", stream=True)
Install libraries and run the Agent:
pip install lancedb tantivy pypdf sqlalchemy
python rag_agent.py
Structured Outputs
Agents can return their output in a structured format as a Pydantic model.
Create a file structured_output.py
from typing import List
from pydantic import BaseModel, Field
from phi.agent import Agent
from phi.model.openai import OpenAIChat
# Define a Pydantic model to enforce the structure of the output
class MovieScript(BaseModel):
setting: str = Field(..., description="Provide a nice setting for a blockbuster movie.")
ending: str = Field(..., description="Ending of the movie. If not available, provide a happy ending.")
genre: str = Field(..., description="Genre of the movie. If not available, select action, thriller or romantic comedy.")
name: str = Field(..., description="Give a name to this movie")
characters: List[str] = Field(..., description="Name of characters for this movie.")
storyline: str = Field(..., description="3 sentence storyline for the movie. Make it exciting!")
# Agent that uses JSON mode
json_mode_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
description="You write movie scripts.",
response_model=MovieScript,
)
# Agent that uses structured outputs
structured_output_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
description="You write movie scripts.",
response_model=MovieScript,
structured_outputs=True,
)
json_mode_agent.print_response("New York")
structured_output_agent.print_response("New York")
- Run the
structured_output.py
file
python structured_output.py
- The output is an object of the
MovieScript
class, here's how it looks:
MovieScript(
โ setting='A bustling and vibrant New York City',
โ ending='The protagonist saves the city and reconciles with their estranged family.',
โ genre='action',
โ name='City Pulse',
โ characters=['Alex Mercer', 'Nina Castillo', 'Detective Mike Johnson'],
โ storyline='In the heart of New York City, a former cop turned vigilante, Alex Mercer, teams up with a street-smart activist, Nina Castillo, to take down a corrupt political figure who threatens to destroy the city. As they navigate through the intricate web of power and deception, they uncover shocking truths that push them to the brink of their abilities. With time running out, they must race against the clock to save New York and confront their own demons.'
)
Reasoning Agents (experimental)
Reasoning helps agents work through a problem step-by-step, backtracking and correcting as needed. Create a file reasoning_agent.py
.
from phi.agent import Agent
from phi.model.openai import OpenAIChat
task = (
"Three missionaries and three cannibals need to cross a river. "
"They have a boat that can carry up to two people at a time. "
"If, at any time, the cannibals outnumber the missionaries on either side of the river, the cannibals will eat the missionaries. "
"How can all six people get across the river safely? Provide a step-by-step solution and show the solutions as an ascii diagram"
)
reasoning_agent = Agent(model=OpenAIChat(id="gpt-4o"), reasoning=True, markdown=True, structured_outputs=True)
reasoning_agent.print_response(task, stream=True, show_full_reasoning=True)
Run the Reasoning Agent:
python reasoning_agent.py
[!WARNING] Reasoning is an experimental feature and will break ~20% of the time. It is not a replacement for o1.
It is an experiment fueled by curiosity, combining COT and tool use. Set your expectations very low for this initial release. For example: It will not be able to count โrโs in โstrawberryโ.
Demo Agents
The Agent Playground includes a few demo agents that you can test with. If you have recommendations for other demo agents, please let us know in our community forum.
Monitoring & Debugging
Monitoring
Phidata comes with built-in monitoring. You can set monitoring=True
on any agent to track sessions or set PHI_MONITORING=true
in your environment.
[!NOTE] Run
phi auth
to authenticate your local account or export thePHI_API_KEY
from phi.agent import Agent
agent = Agent(markdown=True, monitoring=True)
agent.print_response("Share a 2 sentence horror story")
Run the agent and monitor the results on phidata.app/sessions
# You can also set the environment variable
# export PHI_MONITORING=true
python monitoring.py
View the agent session on phidata.app/sessions
Debugging
Phidata also includes a built-in debugger that will show debug logs in the terminal. You can set debug_mode=True
on any agent to track sessions or set PHI_DEBUG=true
in your environment.
from phi.agent import Agent
agent = Agent(markdown=True, debug_mode=True)
agent.print_response("Share a 2 sentence horror story")
Getting help
- Read the docs at docs.phidata.com
- Post your questions on the community forum
- Chat with us on discord
More examples
Agent that can write and run python code
Show code
The PythonAgent
can achieve tasks by writing and running python code.
- Create a file
python_agent.py
from phi.agent.python import PythonAgent
from phi.model.openai import OpenAIChat
from phi.file.local.csv import CsvFile
python_agent = PythonAgent(
model=OpenAIChat(id="gpt-4o"),
files=[
CsvFile(
path="https://phidata-public.s3.amazonaws.com/demo_data/IMDB-Movie-Data.csv",
description="Contains information about movies from IMDB.",
)
],
markdown=True,
pip_install=True,
show_tool_calls=True,
)
python_agent.print_response("What is the average rating of movies?")
- Run the
python_agent.py
python python_agent.py
Agent that can analyze data using SQL
Show code
The DuckDbAgent
can perform data analysis using SQL.
- Create a file
data_analyst.py
import json
from phi.model.openai import OpenAIChat
from phi.agent.duckdb import DuckDbAgent
data_analyst = DuckDbAgent(
model=OpenAIChat(model="gpt-4o"),
markdown=True,
semantic_model=json.dumps(
{
"tables": [
{
"name": "movies",
"description": "Contains information about movies from IMDB.",
"path": "https://phidata-public.s3.amazonaws.com/demo_data/IMDB-Movie-Data.csv",
}
]
},
indent=2,
),
)
data_analyst.print_response(
"Show me a histogram of ratings. "
"Choose an appropriate bucket size but share how you chose it. "
"Show me the result as a pretty ascii diagram",
stream=True,
)
- Install duckdb and run the
data_analyst.py
file
pip install duckdb
python data_analyst.py
Check out the cookbook for more examples.
Contributions
We're an open-source project and welcome contributions, please read the contributing guide for more information.
Request a feature
- If you have a feature request, please open an issue or make a pull request.
- If you have ideas on how we can improve, please create a discussion.
Telemetry
Phidata logs which model an agent used so we can prioritize features for the most popular models.
You can disable this by setting PHI_TELEMETRY=false
in your environment.
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
ใ ๐จ๐ปโ๐ป YouTube | ๐ฎ Newsletter ใ
System Design 101
Explain complex systems using visuals and simple terms.
Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.
Table of Contents
- Communication protocols
- REST API vs. GraphQL
- How does gRPC work?
- What is a webhook?
- How to improve API performance?
- HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC)
- SOAP vs REST vs GraphQL vs RPC
- Code First vs. API First
- HTTP status codes
- What does API gateway do?
- How do we design effective and safe APIs?
- TCP/IP encapsulation
- Why is Nginx called a โreverseโ proxy?
- What are the common load-balancing algorithms?
- URL, URI, URN - Do you know the differences?
- CI/CD
- Architecture patterns
- Database
- Cache
- Microservice architecture
- Payment systems
- How to learn payment systems?
- Why is the credit card called โthe most profitable product in banksโ? How does VISA/Mastercard make money?
- How does VISA work when we swipe a credit card at a merchantโs shop?
- Payment Systems Around The World Series (Part 1): Unified Payments Interface (UPI) in India
- DevOps
- GIT
- Cloud Services
- Developer productivity tools
- Linux
- Security
- How does HTTPS work?
- Oauth 2.0 Explained With Simple Terms.
- Top 4 Forms of Authentication Mechanisms
- Session, cookie, JWT, token, SSO, and OAuth 2.0 - what are they?
- How to store passwords safely in the database and how to validate a password?
- Explaining JSON Web Token (JWT) to a 10 year old Kid
- How does Google Authenticator (or other types of 2-factor authenticators) work?
- Real World Case Studies
- Netflix's Tech Stack
- Twitter Architecture 2022
- Evolution of Airbnbโs microservice architecture over the past 15 years
- Monorepo vs. Microrepo.
- How will you design the Stack Overflow website?
- Why did Amazon Prime Video monitoring move from serverless to monolithic? How can it save 90% cost?
- How does Disney Hotstar capture 5 Billion Emojis during a tournament?
- How Discord Stores Trillions Of Messages
- How do video live streamings work on YouTube, TikTok live, or Twitch?
Communication protocols
Architecture styles define how different components of an application programming interface (API) interact with one another. As a result, they ensure efficiency, reliability, and ease of integration with other systems by providing a standard approach to designing and building APIs. Here are the most used styles:
-
SOAP:
Mature, comprehensive, XML-based
Best for enterprise applications
-
RESTful:
Popular, easy-to-implement, HTTP methods
Ideal for web services
-
GraphQL:
Query language, request specific data
Reduces network overhead, faster responses
-
gRPC:
Modern, high-performance, Protocol Buffers
Suitable for microservices architectures
-
WebSocket:
Real-time, bidirectional, persistent connections
Perfect for low-latency data exchange
-
Webhook:
Event-driven, HTTP callbacks, asynchronous
Notifies systems when events occur
REST API vs. GraphQL
When it comes to API design, REST and GraphQL each have their own strengths and weaknesses.
The diagram below shows a quick comparison between REST and GraphQL.
REST
- Uses standard HTTP methods like GET, POST, PUT, DELETE for CRUD operations.
- Works well when you need simple, uniform interfaces between separate services/applications.
- Caching strategies are straightforward to implement.
- The downside is it may require multiple roundtrips to assemble related data from separate endpoints.
GraphQL
- Provides a single endpoint for clients to query for precisely the data they need.
- Clients specify the exact fields required in nested queries, and the server returns optimized payloads containing just those fields.
- Supports Mutations for modifying data and Subscriptions for real-time notifications.
- Great for aggregating data from multiple sources and works well with rapidly evolving frontend requirements.
- However, it shifts complexity to the client side and can allow abusive queries if not properly safeguarded
- Caching strategies can be more complicated than REST.
The best choice between REST and GraphQL depends on the specific requirements of the application and development team. GraphQL is a good fit for complex or frequently changing frontend needs, while REST suits applications where simple and consistent contracts are preferred.
Neither API approach is a silver bullet. Carefully evaluating requirements and tradeoffs is important to pick the right style. Both REST and GraphQL are valid options for exposing data and powering modern applications.
How does gRPC work?
RPC (Remote Procedure Call) is called โremoteโ because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the userโs point of view, it acts like a local function call.
The diagram below illustrates the overall data flow for gRPC.
Step 1: A REST call is made from the client. The request body is usually in JSON format.
Steps 2 - 4: The order service (gRPC client) receives the REST call, transforms it, and makes an RPC call to the payment service. gRPC encodes the client stub into a binary format and sends it to the low-level transport layer.
Step 5: gRPC sends the packets over the network via HTTP2. Because of binary encoding and network optimizations, gRPC is said to be 5X faster than JSON.
Steps 6 - 8: The payment service (gRPC server) receives the packets from the network, decodes them, and invokes the server application.
Steps 9 - 11: The result is returned from the server application, and gets encoded and sent to the transport layer.
Steps 12 - 14: The order service receives the packets, decodes them, and sends the result to the client application.
What is a webhook?
The diagram below shows a comparison between polling and Webhook.
Assume we run an eCommerce website. The clients send orders to the order service via the API gateway, which goes to the payment service for payment transactions. The payment service then talks to an external payment service provider (PSP) to complete the transactions.
There are two ways to handle communications with the external PSP.
1. Short polling
After sending the payment request to the PSP, the payment service keeps asking the PSP about the payment status. After several rounds, the PSP finally returns with the status.
Short polling has two drawbacks:
- Constant polling of the status requires resources from the payment service.
- The External service communicates directly with the payment service, creating security vulnerabilities.
2. Webhook
We can register a webhook with the external service. It means: call me back at a certain URL when you have updates on the request. When the PSP has completed the processing, it will invoke the HTTP request to update the payment status.
In this way, the programming paradigm is changed, and the payment service doesnโt need to waste resources to poll the payment status anymore.
What if the PSP never calls back? We can set up a housekeeping job to check payment status every hour.
Webhooks are often referred to as reverse APIs or push APIs because the server sends HTTP requests to the client. We need to pay attention to 3 things when using a webhook:
- We need to design a proper API for the external service to call.
- We need to set up proper rules in the API gateway for security reasons.
- We need to register the correct URL at the external service.
How to improve API performance?
The diagram below shows 5 common tricks to improve API performance.
Pagination
This is a common optimization when the size of the result is large. The results are streaming back to the client to improve the service responsiveness.
Asynchronous Logging
Synchronous logging deals with the disk for every call and can slow down the system. Asynchronous logging sends logs to a lock-free buffer first and immediately returns. The logs will be flushed to the disk periodically. This significantly reduces the I/O overhead.
Caching
We can store frequently accessed data into a cache. The client can query the cache first instead of visiting the database directly. If there is a cache miss, the client can query from the database. Caches like Redis store data in memory, so the data access is much faster than the database.
Payload Compression
The requests and responses can be compressed using gzip etc so that the transmitted data size is much smaller. This speeds up the upload and download.
Connection Pool
When accessing resources, we often need to load data from the database. Opening the closing db connections adds significant overhead. So we should connect to the db via a pool of open connections. The connection pool is responsible for managing the connection lifecycle.
HTTP 1.0 -> HTTP 1.1 -> HTTP 2.0 -> HTTP 3.0 (QUIC)
What problem does each generation of HTTP solve?
The diagram below illustrates the key features.
-
HTTP 1.0 was finalized and fully documented in 1996. Every request to the same server requires a separate TCP connection.
-
HTTP 1.1 was published in 1997. A TCP connection can be left open for reuse (persistent connection), but it doesnโt solve the HOL (head-of-line) blocking issue.
HOL blocking - when the number of allowed parallel requests in the browser is used up, subsequent requests need to wait for the former ones to complete.
-
HTTP 2.0 was published in 2015. It addresses HOL issue through request multiplexing, which eliminates HOL blocking at the application layer, but HOL still exists at the transport (TCP) layer.
As you can see in the diagram, HTTP 2.0 introduced the concept of HTTP โstreamsโ: an abstraction that allows multiplexing different HTTP exchanges onto the same TCP connection. Each stream doesnโt need to be sent in order.
-
HTTP 3.0 first draft was published in 2020. It is the proposed successor to HTTP 2.0. It uses QUIC instead of TCP for the underlying transport protocol, thus removing HOL blocking in the transport layer.
QUIC is based on UDP. It introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn't affect others.
SOAP vs REST vs GraphQL vs RPC
The diagram below illustrates the API timeline and API styles comparison.
Over time, different API architectural styles are released. Each of them has its own patterns of standardizing data exchange.
You can check out the use cases of each style in the diagram.
Code First vs. API First
The diagram below shows the differences between code-first development and API-first development. Why do we want to consider API first design?
- Microservices increase system complexity and we have separate services to serve different functions of the system. While this kind of architecture facilitates decoupling and segregation of duty, we need to handle the various communications among services.
It is better to think through the system's complexity before writing the code and carefully defining the boundaries of the services.
- Separate functional teams need to speak the same language and the dedicated functional teams are only responsible for their own components and services. It is recommended that the organization speak the same language via API design.
We can mock requests and responses to validate the API design before writing code.
- Improve software quality and developer productivity Since we have ironed out most of the uncertainties when the project starts, the overall development process is smoother, and the software quality is greatly improved.
Developers are happy about the process as well because they can focus on functional development instead of negotiating sudden changes.
The possibility of having surprises toward the end of the project lifecycle is reduced.
Because we have designed the API first, the tests can be designed while the code is being developed. In a way, we also have TDD (Test Driven Design) when using API first development.
HTTP status codes
The response codes for HTTP are divided into five categories:
Informational (100-199) Success (200-299) Redirection (300-399) Client Error (400-499) Server Error (500-599)
What does API gateway do?
The diagram below shows the details.
Step 1 - The client sends an HTTP request to the API gateway.
Step 2 - The API gateway parses and validates the attributes in the HTTP request.
Step 3 - The API gateway performs allow-list/deny-list checks.
Step 4 - The API gateway talks to an identity provider for authentication and authorization.
Step 5 - The rate limiting rules are applied to the request. If it is over the limit, the request is rejected.
Steps 6 and 7 - Now that the request has passed basic checks, the API gateway finds the relevant service to route to by path matching.
Step 8 - The API gateway transforms the request into the appropriate protocol and sends it to backend microservices.
Steps 9-12: The API gateway can handle errors properly, and deals with faults if the error takes a longer time to recover (circuit break). It can also leverage ELK (Elastic-Logstash-Kibana) stack for logging and monitoring. We sometimes cache data in the API gateway.
How do we design effective and safe APIs?
The diagram below shows typical API designs with a shopping cart example.
Note that API design is not just URL path design. Most of the time, we need to choose the proper resource names, identifiers, and path patterns. It is equally important to design proper HTTP header fields or to design effective rate-limiting rules within the API gateway.
TCP/IP encapsulation
How is data sent over the network? Why do we need so many layers in the OSI model?
The diagram below shows how data is encapsulated and de-encapsulated when transmitting over the network.
Step 1: When Device A sends data to Device B over the network via the HTTP protocol, it is first added an HTTP header at the application layer.
Step 2: Then a TCP or a UDP header is added to the data. It is encapsulated into TCP segments at the transport layer. The header contains the source port, destination port, and sequence number.
Step 3: The segments are then encapsulated with an IP header at the network layer. The IP header contains the source/destination IP addresses.
Step 4: The IP datagram is added a MAC header at the data link layer, with source/destination MAC addresses.
Step 5: The encapsulated frames are sent to the physical layer and sent over the network in binary bits.
Steps 6-10: When Device B receives the bits from the network, it performs the de-encapsulation process, which is a reverse processing of the encapsulation process. The headers are removed layer by layer, and eventually, Device B can read the data.
We need layers in the network model because each layer focuses on its own responsibilities. Each layer can rely on the headers for processing instructions and does not need to know the meaning of the data from the last layer.
Why is Nginx called a โreverseโ proxy?
The diagram below shows the differences between a ๐๐จ๐ซ๐ฐ๐๐ซ๐ ๐ฉ๐ซ๐จ๐ฑ๐ฒ and a ๐ซ๐๐ฏ๐๐ซ๐ฌ๐ ๐ฉ๐ซ๐จ๐ฑ๐ฒ.
A forward proxy is a server that sits between user devices and the internet.
A forward proxy is commonly used for:
- Protecting clients
- Circumventing browsing restrictions
- Blocking access to certain content
A reverse proxy is a server that accepts a request from the client, forwards the request to web servers, and returns the results to the client as if the proxy server had processed the request.
A reverse proxy is good for:
- Protecting servers
- Load balancing
- Caching static contents
- Encrypting and decrypting SSL communications
What are the common load-balancing algorithms?
The diagram below shows 6 common algorithms.
- Static Algorithms
-
Round robin
The client requests are sent to different service instances in sequential order. The services are usually required to be stateless.
-
Sticky round-robin
This is an improvement of the round-robin algorithm. If Aliceโs first request goes to service A, the following requests go to service A as well.
-
Weighted round-robin
The admin can specify the weight for each service. The ones with a higher weight handle more requests than others.
-
Hash
This algorithm applies a hash function on the incoming requestsโ IP or URL. The requests are routed to relevant instances based on the hash function result.
- Dynamic Algorithms
-
Least connections
A new request is sent to the service instance with the least concurrent connections.
-
Least response time
A new request is sent to the service instance with the fastest response time.
URL, URI, URN - Do you know the differences?
The diagram below shows a comparison of URL, URI, and URN.
- URI
URI stands for Uniform Resource Identifier. It identifies a logical or physical resource on the web. URL and URN are subtypes of URI. URL locates a resource, while URN names a resource.
A URI is composed of the following parts: scheme:[//authority]path[?query][#fragment]
- URL
URL stands for Uniform Resource Locator, the key concept of HTTP. It is the address of a unique resource on the web. It can be used with other protocols like FTP and JDBC.
- URN
URN stands for Uniform Resource Name. It uses the urn scheme. URNs cannot be used to locate a resource. A simple example given in the diagram is composed of a namespace and a namespace-specific string.
If you would like to learn more detail on the subject, I would recommend W3Cโs clarification.
CI/CD
CI/CD Pipeline Explained in Simple Terms
Section 1 - SDLC with CI/CD
The software development life cycle (SDLC) consists of several key stages: development, testing, deployment, and maintenance. CI/CD automates and integrates these stages to enable faster and more reliable releases.
When code is pushed to a git repository, it triggers an automated build and test process. End-to-end (e2e) test cases are run to validate the code. If tests pass, the code can be automatically deployed to staging/production. If issues are found, the code is sent back to development for bug fixing. This automation provides fast feedback to developers and reduces the risk of bugs in production.
Section 2 - Difference between CI and CD
Continuous Integration (CI) automates the build, test, and merge process. It runs tests whenever code is committed to detect integration issues early. This encourages frequent code commits and rapid feedback.
Continuous Delivery (CD) automates release processes like infrastructure changes and deployment. It ensures software can be released reliably at any time through automated workflows. CD may also automate the manual testing and approval steps required before production deployment.
Section 3 - CI/CD Pipeline
A typical CI/CD pipeline has several connected stages:
- The developer commits code changes to the source control
- CI server detects changes and triggers the build
- Code is compiled, and tested (unit, integration tests)
- Test results reported to the developer
- On success, artifacts are deployed to staging environments
- Further testing may be done on staging before release
- CD system deploys approved changes to production
Netflix Tech Stack (CI/CD Pipeline)
Planning: Netflix Engineering uses JIRA for planning and Confluence for documentation.
Coding: Java is the primary programming language for the backend service, while other languages are used for different use cases.
Build: Gradle is mainly used for building, and Gradle plugins are built to support various use cases.
Packaging: Package and dependencies are packed into an Amazon Machine Image (AMI) for release.
Testing: Testing emphasizes the production culture's focus on building chaos tools.
Deployment: Netflix uses its self-built Spinnaker for canary rollout deployment.
Monitoring: The monitoring metrics are centralized in Atlas, and Kayenta is used to detect anomalies.
Incident report: Incidents are dispatched according to priority, and PagerDuty is used for incident handling.
Architecture patterns
MVC, MVP, MVVM, MVVM-C, and VIPER
These architecture patterns are among the most commonly used in app development, whether on iOS or Android platforms. Developers have introduced them to overcome the limitations of earlier patterns. So, how do they differ?
- MVC, the oldest pattern, dates back almost 50 years
- Every pattern has a "view" (V) responsible for displaying content and receiving user input
- Most patterns include a "model" (M) to manage business data
- "Controller," "presenter," and "view-model" are translators that mediate between the view and the model ("entity" in the VIPER pattern)
18 Key Design Patterns Every Developer Should Know
Patterns are reusable solutions to common design problems, resulting in a smoother, more efficient development process. They serve as blueprints for building better software structures. These are some of the most popular patterns:
- Abstract Factory: Family Creator - Makes groups of related items.
- Builder: Lego Master - Builds objects step by step, keeping creation and appearance separate.
- Prototype: Clone Maker - Creates copies of fully prepared examples.
- Singleton: One and Only - A special class with just one instance.
- Adapter: Universal Plug - Connects things with different interfaces.
- Bridge: Function Connector - Links how an object works to what it does.
- Composite: Tree Builder - Forms tree-like structures of simple and complex parts.
- Decorator: Customizer - Adds features to objects without changing their core.
- Facade: One-Stop-Shop - Represents a whole system with a single, simplified interface.
- Flyweight: Space Saver - Shares small, reusable items efficiently.
- Proxy: Stand-In Actor - Represents another object, controlling access or actions.
- Chain of Responsibility: Request Relay - Passes a request through a chain of objects until handled.
- Command: Task Wrapper - Turns a request into an object, ready for action.
- Iterator: Collection Explorer - Accesses elements in a collection one by one.
- Mediator: Communication Hub - Simplifies interactions between different classes.
- Memento: Time Capsule - Captures and restores an object's state.
- Observer: News Broadcaster - Notifies classes about changes in other objects.
- Visitor: Skillful Guest - Adds new operations to a class without altering it.
Database
A nice cheat sheet of different databases in cloud services
Choosing the right database for your project is a complex task. Many database options, each suited to distinct use cases, can quickly lead to decision fatigue.
We hope this cheat sheet provides high-level direction to pinpoint the right service that aligns with your project's needs and avoid potential pitfalls.
Note: Google has limited documentation for their database use cases. Even though we did our best to look at what was available and arrived at the best option, some of the entries may need to be more accurate.
8 Data Structures That Power Your Databases
The answer will vary depending on your use case. Data can be indexed in memory or on disk. Similarly, data formats vary, such as numbers, strings, geographic coordinates, etc. The system might be write-heavy or read-heavy. All of these factors affect your choice of database index format.
The following are some of the most popular data structures used for indexing data:
- Skiplist: a common in-memory index type. Used in Redis
- Hash index: a very common implementation of the โMapโ data structure (or โCollectionโ)
- SSTable: immutable on-disk โMapโ implementation
- LSM tree: Skiplist + SSTable. High write throughput
- B-tree: disk-based solution. Consistent read/write performance
- Inverted index: used for document indexing. Used in Lucene
- Suffix tree: for string pattern search
- R-tree: multi-dimension search, such as finding the nearest neighbor
How is an SQL statement executed in the database?
The diagram below shows the process. Note that the architectures for different databases are different, the diagram demonstrates some common designs.
Step 1 - A SQL statement is sent to the database via a transport layer protocol (e.g.TCP).
Step 2 - The SQL statement is sent to the command parser, where it goes through syntactic and semantic analysis, and a query tree is generated afterward.
Step 3 - The query tree is sent to the optimizer. The optimizer creates an execution plan.
Step 4 - The execution plan is sent to the executor. The executor retrieves data from the execution.
Step 5 - Access methods provide the data fetching logic required for execution, retrieving data from the storage engine.
Step 6 - Access methods decide whether the SQL statement is read-only. If the query is read-only (SELECT statement), it is passed to the buffer manager for further processing. The buffer manager looks for the data in the cache or data files.
Step 7 - If the statement is an UPDATE or INSERT, it is passed to the transaction manager for further processing.
Step 8 - During a transaction, the data is in lock mode. This is guaranteed by the lock manager. It also ensures the transactionโs ACID properties.
CAP theorem
The CAP theorem is one of the most famous terms in computer science, but I bet different developers have different understandings. Letโs examine what it is and why it can be confusing.
CAP theorem states that a distributed system can't provide more than two of these three guarantees simultaneously.
Consistency: consistency means all clients see the same data at the same time no matter which node they connect to.
Availability: availability means any client that requests data gets a response even if some of the nodes are down.
Partition Tolerance: a partition indicates a communication break between two nodes. Partition tolerance means the system continues to operate despite network partitions.
The โ2 of 3โ formulation can be useful, but this simplification could be misleading.
-
Picking a database is not easy. Justifying our choice purely based on the CAP theorem is not enough. For example, companies don't choose Cassandra for chat applications simply because it is an AP system. There is a list of good characteristics that make Cassandra a desirable option for storing chat messages. We need to dig deeper.
-
โCAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rareโ. Quoted from the paper: CAP Twelve Years Later: How the โRulesโ Have Changed.
-
The theorem is about 100% availability and consistency. A more realistic discussion would be the trade-offs between latency and consistency when there is no network partition. See PACELC theorem for more details.
Is the CAP theorem actually useful?
I think it is still useful as it opens our minds to a set of tradeoff discussions, but it is only part of the story. We need to dig deeper when picking the right database.
Types of Memory and Storage
Visualizing a SQL query
SQL statements are executed by the database system in several steps, including:
- Parsing the SQL statement and checking its validity
- Transforming the SQL into an internal representation, such as relational algebra
- Optimizing the internal representation and creating an execution plan that utilizes index information
- Executing the plan and returning the results
The execution of SQL is highly complex and involves many considerations, such as:
- The use of indexes and caches
- The order of table joins
- Concurrency control
- Transaction management
SQL language
In 1986, SQL (Structured Query Language) became a standard. Over the next 40 years, it became the dominant language for relational database management systems. Reading the latest standard (ANSI SQL 2016) can be time-consuming. How can I learn it?
There are 5 components of the SQL language:
- DDL: data definition language, such as CREATE, ALTER, DROP
- DQL: data query language, such as SELECT
- DML: data manipulation language, such as INSERT, UPDATE, DELETE
- DCL: data control language, such as GRANT, REVOKE
- TCL: transaction control language, such as COMMIT, ROLLBACK
For a backend engineer, you may need to know most of it. As a data analyst, you may need to have a good understanding of DQL. Select the topics that are most relevant to you.
Cache
Data is cached everywhere
This diagram illustrates where we cache data in a typical architecture.
There are multiple layers along the flow.
- Client apps: HTTP responses can be cached by the browser. We request data over HTTP for the first time, and it is returned with an expiry policy in the HTTP header; we request data again, and the client app tries to retrieve the data from the browser cache first.
- CDN: CDN caches static web resources. The clients can retrieve data from a CDN node nearby.
- Load Balancer: The load Balancer can cache resources as well.
- Messaging infra: Message brokers store messages on disk first, and then consumers retrieve them at their own pace. Depending on the retention policy, the data is cached in Kafka clusters for a period of time.
- Services: There are multiple layers of cache in a service. If the data is not cached in the CPU cache, the service will try to retrieve the data from memory. Sometimes the service has a second-level cache to store data on disk.
- Distributed Cache: Distributed cache like Redis holds key-value pairs for multiple services in memory. It provides much better read/write performance than the database.
- Full-text Search: we sometimes need to use full-text searches like Elastic Search for document search or log search. A copy of data is indexed in the search engine as well.
- Database: Even in the database, we have different levels of caches:
- WAL(Write-ahead Log): data is written to WAL first before building the B tree index
- Bufferpool: A memory area allocated to cache query results
- Materialized View: Pre-compute query results and store them in the database tables for better query performance
- Transaction log: record all the transactions and database updates
- Replication Log: used to record the replication state in a database cluster
Why is Redis so fast?
There are 3 main reasons as shown in the diagram below.
- Redis is a RAM-based data store. RAM access is at least 1000 times faster than random disk access.
- Redis leverages IO multiplexing and single-threaded execution loop for execution efficiency.
- Redis leverages several efficient lower-level data structures.
Question: Another popular in-memory store is Memcached. Do you know the differences between Redis and Memcached?
You might have noticed the style of this diagram is different from my previous posts. Please let me know which one you prefer.
How can Redis be used?
There is more to Redis than just caching.
Redis can be used in a variety of scenarios as shown in the diagram.
-
Session
We can use Redis to share user session data among different services.
-
Cache
We can use Redis to cache objects or pages, especially for hotspot data.
-
Distributed lock
We can use a Redis string to acquire locks among distributed services.
-
Counter
We can count how many likes or how many reads for articles.
-
Rate limiter
We can apply a rate limiter for certain user IPs.
-
Global ID generator
We can use Redis Int for global ID.
-
Shopping cart
We can use Redis Hash to represent key-value pairs in a shopping cart.
-
Calculate user retention
We can use Bitmap to represent the user login daily and calculate user retention.
-
Message queue
We can use List for a message queue.
-
Ranking
We can use ZSet to sort the articles.
Top caching strategies
Designing large-scale systems usually requires careful consideration of caching. Below are five caching strategies that are frequently utilized.
Microservice architecture
What does a typical microservice architecture look like?
The diagram below shows a typical microservice architecture.
- Load Balancer: This distributes incoming traffic across multiple backend services.
- CDN (Content Delivery Network): CDN is a group of geographically distributed servers that hold static content for faster delivery. The clients look for content in CDN first, then progress to backend services.
- API Gateway: This handles incoming requests and routes them to the relevant services. It talks to the identity provider and service discovery.
- Identity Provider: This handles authentication and authorization for users.
- Service Registry & Discovery: Microservice registration and discovery happen in this component, and the API gateway looks for relevant services in this component to talk to.
- Management: This component is responsible for monitoring the services.
- Microservices: Microservices are designed and deployed in different domains. Each domain has its own database. The API gateway talks to the microservices via REST API or other protocols, and the microservices within the same domain talk to each other using RPC (Remote Procedure Call).
Benefits of microservices:
- They can be quickly designed, deployed, and horizontally scaled.
- Each domain can be independently maintained by a dedicated team.
- Business requirements can be customized in each domain and better supported, as a result.
Microservice Best Practices
A picture is worth a thousand words: 9 best practices for developing microservices.
When we develop microservices, we need to follow the following best practices:
- Use separate data storage for each microservice
- Keep code at a similar level of maturity
- Separate build for each microservice
- Assign each microservice with a single responsibility
- Deploy into containers
- Design stateless services
- Adopt domain-driven design
- Design micro frontend
- Orchestrating microservices
What tech stack is commonly used for microservices?
Below you will find a diagram showing the microservice tech stack, both for the development phase and for production.
โถ๏ธ ๐๐ซ๐-๐๐ซ๐จ๐๐ฎ๐๐ญ๐ข๐จ๐ง
- Define API - This establishes a contract between frontend and backend. We can use Postman or OpenAPI for this.
- Development - Node.js or react is popular for frontend development, and java/python/go for backend development. Also, we need to change the configurations in the API gateway according to API definitions.
- Continuous Integration - JUnit and Jenkins for automated testing. The code is packaged into a Docker image and deployed as microservices.
โถ๏ธ ๐๐ซ๐จ๐๐ฎ๐๐ญ๐ข๐จ๐ง
- NGinx is a common choice for load balancers. Cloudflare provides CDN (Content Delivery Network).
- API Gateway - We can use spring boot for the gateway, and use Eureka/Zookeeper for service discovery.
- The microservices are deployed on clouds. We have options among AWS, Microsoft Azure, or Google GCP. Cache and Full-text Search - Redis is a common choice for caching key-value pairs. Elasticsearch is used for full-text search.
- Communications - For services to talk to each other, we can use messaging infra Kafka or RPC.
- Persistence - We can use MySQL or PostgreSQL for a relational database, and Amazon S3 for object store. We can also use Cassandra for the wide-column store if necessary.
- Management & Monitoring - To manage so many microservices, the common Ops tools include Prometheus, Elastic Stack, and Kubernetes.
Why is Kafka fast
There are many design decisions that contributed to Kafkaโs performance. In this post, weโll focus on two. We think these two carried the most weight.
- The first one is Kafkaโs reliance on Sequential I/O.
- The second design choice that gives Kafka its performance advantage is its focus on efficiency: zero copy principle.
The diagram illustrates how the data is transmitted between producer and consumer, and what zero-copy means.
- Step 1.1 - 1.3: Producer writes data to the disk
- Step 2: Consumer reads data without zero-copy
2.1 The data is loaded from disk to OS cache
2.2 The data is copied from OS cache to Kafka application
2.3 Kafka application copies the data into the socket buffer
2.4 The data is copied from socket buffer to network card
2.5 The network card sends data out to the consumer
- Step 3: Consumer reads data with zero-copy
3.1: The data is loaded from disk to OS cache 3.2 OS cache directly copies the data to the network card via sendfile() command 3.3 The network card sends data out to the consumer
Zero copy is a shortcut to save the multiple data copies between application context and kernel context.
Payment systems
How to learn payment systems?
Why is the credit card called โthe most profitable product in banksโ? How does VISA/Mastercard make money?
The diagram below shows the economics of the credit card payment flow.
1. The cardholder pays a merchant $100 to buy a product.
2. The merchant benefits from the use of the credit card with higher sales volume and needs to compensate the issuer and the card network for providing the payment service. The acquiring bank sets a fee with the merchant, called the โmerchant discount fee.โ
3 - 4. The acquiring bank keeps $0.25 as the acquiring markup, and $1.75 is paid to the issuing bank as the interchange fee. The merchant discount fee should cover the interchange fee.
The interchange fee is set by the card network because it is less efficient for each issuing bank to negotiate fees with each merchant.
5. The card network sets up the network assessments and fees with each bank, which pays the card network for its services every month. For example, VISA charges a 0.11% assessment, plus a $0.0195 usage fee, for every swipe.
6. The cardholder pays the issuing bank for its services.
Why should the issuing bank be compensated?
- The issuer pays the merchant even if the cardholder fails to pay the issuer.
- The issuer pays the merchant before the cardholder pays the issuer.
- The issuer has other operating costs, including managing customer accounts, providing statements, fraud detection, risk management, clearing & settlement, etc.
How does VISA work when we swipe a credit card at a merchantโs shop?
VISA, Mastercard, and American Express act as card networks for the clearing and settling of funds. The card acquiring bank and the card issuing bank can be โ and often are โ different. If banks were to settle transactions one by one without an intermediary, each bank would have to settle the transactions with all the other banks. This is quite inefficient.
The diagram below shows VISAโs role in the credit card payment process. There are two flows involved. Authorization flow happens when the customer swipes the credit card. Capture and settlement flow happens when the merchant wants to get the money at the end of the day.
- Authorization Flow
Step 0: The card issuing bank issues credit cards to its customers.
Step 1: The cardholder wants to buy a product and swipes the credit card at the Point of Sale (POS) terminal in the merchantโs shop.
Step 2: The POS terminal sends the transaction to the acquiring bank, which has provided the POS terminal.
Steps 3 and 4: The acquiring bank sends the transaction to the card network, also called the card scheme. The card network sends the transaction to the issuing bank for approval.
Steps 4.1, 4.2 and 4.3: The issuing bank freezes the money if the transaction is approved. The approval or rejection is sent back to the acquirer, as well as the POS terminal.
- Capture and Settlement Flow
Steps 1 and 2: The merchant wants to collect the money at the end of the day, so they hit โcaptureโ on the POS terminal. The transactions are sent to the acquirer in batch. The acquirer sends the batch file with transactions to the card network.
Step 3: The card network performs clearing for the transactions collected from different acquirers, and sends the clearing files to different issuing banks.
Step 4: The issuing banks confirm the correctness of the clearing files, and transfer money to the relevant acquiring banks.
Step 5: The acquiring bank then transfers money to the merchantโs bank.
Step 4: The card network clears up the transactions from different acquiring banks. Clearing is a process in which mutual offset transactions are netted, so the number of total transactions is reduced.
In the process, the card network takes on the burden of talking to each bank and receives service fees in return.
Payment Systems Around The World Series (Part 1): Unified Payments Interface (UPI) in India
Whatโs UPI? UPI is an instant real-time payment system developed by the National Payments Corporation of India.
It accounts for 60% of digital retail transactions in India today.
UPI = payment markup language + standard for interoperable payments
DevOps
DevOps vs. SRE vs. Platform Engineering. What is the difference?
The concepts of DevOps, SRE, and Platform Engineering have emerged at different times and have been developed by various individuals and organizations.
DevOps as a concept was introduced in 2009 by Patrick Debois and Andrew Shafer at the Agile conference. They sought to bridge the gap between software development and operations by promoting a collaborative culture and shared responsibility for the entire software development lifecycle.
SRE, or Site Reliability Engineering, was pioneered by Google in the early 2000s to address operational challenges in managing large-scale, complex systems. Google developed SRE practices and tools, such as the Borg cluster management system and the Monarch monitoring system, to improve the reliability and efficiency of their services.
Platform Engineering is a more recent concept, building on the foundation of SRE engineering. The precise origins of Platform Engineering are less clear, but it is generally understood to be an extension of the DevOps and SRE practices, with a focus on delivering a comprehensive platform for product development that supports the entire business perspective.
It's worth noting that while these concepts emerged at different times. They are all related to the broader trend of improving collaboration, automation, and efficiency in software development and operations.
What is k8s (Kubernetes)?
K8s is a container orchestration system. It is used for container deployment and management. Its design is greatly impacted by Googleโs internal system Borg.
A k8s cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node.
The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. In production environments, the control plane usually runs across multiple computers, and a cluster usually runs multiple nodes, providing fault tolerance and high availability.
- Control Plane Components
-
API Server
The API server talks to all the components in the k8s cluster. All the operations on pods are executed by talking to the API server.
-
Scheduler
The scheduler watches pod workloads and assigns loads on newly created pods.
-
Controller Manager
The controller manager runs the controllers, including Node Controller, Job Controller, EndpointSlice Controller, and ServiceAccount Controller.
-
Etcd
etcd is a key-value store used as Kubernetes' backing store for all cluster data.
- Nodes
-
Pods
A pod is a group of containers and is the smallest unit that k8s administers. Pods have a single IP address applied to every container within the pod.
-
Kubelet
An agent that runs on each node in the cluster. It ensures containers are running in a Pod.
-
Kube Proxy
Kube-proxy is a network proxy that runs on each node in your cluster. It routes traffic coming into a node from the service. It forwards requests for work to the correct containers.
Docker vs. Kubernetes. Which one should we use?
What is Docker ?
Docker is an open-source platform that allows you to package, distribute, and run applications in isolated containers. It focuses on containerization, providing lightweight environments that encapsulate applications and their dependencies.
What is Kubernetes ?
Kubernetes, often referred to as K8s, is an open-source container orchestration platform. It provides a framework for automating the deployment, scaling, and management of containerized applications across a cluster of nodes.
How are both different from each other ?
Docker: Docker operates at the individual container level on a single operating system host.
You must manually manage each host and setting up networks, security policies, and storage for multiple related containers can be complex.
Kubernetes: Kubernetes operates at the cluster level. It manages multiple containerized applications across multiple hosts, providing automation for tasks like load balancing, scaling, and ensuring the desired state of applications.
In short, Docker focuses on containerization and running containers on individual hosts, while Kubernetes specializes in managing and orchestrating containers at scale across a cluster of hosts.
How does Docker work?
The diagram below shows the architecture of Docker and how it works when we run โdocker buildโ, โdocker pullโ and โdocker runโ.
There are 3 components in Docker architecture:
-
Docker client
The docker client talks to the Docker daemon.
-
Docker host
The Docker daemon listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes.
-
Docker registry
A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use.
Letโs take the โdocker runโ command as an example.
- Docker pulls the image from the registry.
- Docker creates a new container.
- Docker allocates a read-write filesystem to the container.
- Docker creates a network interface to connect the container to the default network.
- Docker starts the container.
GIT
How Git Commands work
To begin with, it's essential to identify where our code is stored. The common assumption is that there are only two locations - one on a remote server like Github and the other on our local machine. However, this isn't entirely accurate. Git maintains three local storages on our machine, which means that our code can be found in four places:
- Working directory: where we edit files
- Staging area: a temporary location where files are kept for the next commit
- Local repository: contains the code that has been committed
- Remote repository: the remote server that stores the code
Most Git commands primarily move files between these four locations.
How does Git Work?
The diagram below shows the Git workflow.
Git is a distributed version control system.
Every developer maintains a local copy of the main repository and edits and commits to the local copy.
The commit is very fast because the operation doesnโt interact with the remote repository.
If the remote repository crashes, the files can be recovered from the local repositories.
Git merge vs. Git rebase
What are the differences?
When we merge changes from one Git branch to another, we can use โgit mergeโ or โgit rebaseโ. The diagram below shows how the two commands work.
Git merge
This creates a new commit Gโ in the main branch. Gโ ties the histories of both main and feature branches.
Git merge is non-destructive. Neither the main nor the feature branch is changed.
Git rebase
Git rebase moves the feature branch histories to the head of the main branch. It creates new commits Eโ, Fโ, and Gโ for each commit in the feature branch.
The benefit of rebase is that it has a linear commit history.
Rebase can be dangerous if โthe golden rule of git rebaseโ is not followed.
The Golden Rule of Git Rebase
Never use it on public branches!
Cloud Services
A nice cheat sheet of different cloud services (2023 edition)
What is cloud native?
Below is a diagram showing the evolution of architecture and processes since the 1980s.
Organizations can build and run scalable applications on public, private, and hybrid clouds using cloud native technologies.
This means the applications are designed to leverage cloud features, so they are resilient to load and easy to scale.
Cloud native includes 4 aspects:
-
Development process
This has progressed from waterfall to agile to DevOps.
-
Application Architecture
The architecture has gone from monolithic to microservices. Each service is designed to be small, adaptive to the limited resources in cloud containers.
-
Deployment & packaging
The applications used to be deployed on physical servers. Then around 2000, the applications that were not sensitive to latency were usually deployed on virtual servers. With cloud native applications, they are packaged into docker images and deployed in containers.
-
Application infrastructure
The applications are massively deployed on cloud infrastructure instead of self-hosted servers.
Developer productivity tools
Visualize JSON files
Nested JSON files are hard to read.
JsonCrack generates graph diagrams from JSON files and makes them easy to read.
Additionally, the generated diagrams can be downloaded as images.
Automatically turn code into architecture diagrams
What does it do?
- Draw the cloud system architecture in Python code.
- Diagrams can also be rendered directly inside the Jupyter Notebooks.
- No design tools are needed.
- Supports the following providers: AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud, etc.
Linux
Linux file system explained
The Linux file system used to resemble an unorganized town where individuals constructed their houses wherever they pleased. However, in 1994, the Filesystem Hierarchy Standard (FHS) was introduced to bring order to the Linux file system.
By implementing a standard like the FHS, software can ensure a consistent layout across various Linux distributions. Nonetheless, not all Linux distributions strictly adhere to this standard. They often incorporate their own unique elements or cater to specific requirements. To become proficient in this standard, you can begin by exploring. Utilize commands such as "cd" for navigation and "ls" for listing directory contents. Imagine the file system as a tree, starting from the root (/). With time, it will become second nature to you, transforming you into a skilled Linux administrator.
18 Most-used Linux Commands You Should Know
Linux commands are instructions for interacting with the operating system. They help manage files, directories, system processes, and many other aspects of the system. You need to become familiar with these commands in order to navigate and maintain Linux-based systems efficiently and effectively.
This diagram below shows popular Linux commands:
- ls - List files and directories
- cd - Change the current directory
- mkdir - Create a new directory
- rm - Remove files or directories
- cp - Copy files or directories
- mv - Move or rename files or directories
- chmod - Change file or directory permissions
- grep - Search for a pattern in files
- find - Search for files and directories
- tar - manipulate tarball archive files
- vi - Edit files using text editors
- cat - display the content of files
- top - Display processes and resource usage
- ps - Display processes information
- kill - Terminate a process by sending a signal
- du - Estimate file space usage
- ifconfig - Configure network interfaces
- ping - Test network connectivity between hosts
Security
How does HTTPS work?
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP.) HTTPS transmits encrypted data using Transport Layer Security (TLS.) If the data is hijacked online, all the hijacker gets is binary code.
How is the data encrypted and decrypted?
Step 1 - The client (browser) and the server establish a TCP connection.
Step 2 - The client sends a โclient helloโ to the server. The message contains a set of necessary encryption algorithms (cipher suites) and the latest TLS version it can support. The server responds with a โserver helloโ so the browser knows whether it can support the algorithms and TLS version.
The server then sends the SSL certificate to the client. The certificate contains the public key, host name, expiry dates, etc. The client validates the certificate.
Step 3 - After validating the SSL certificate, the client generates a session key and encrypts it using the public key. The server receives the encrypted session key and decrypts it with the private key.
Step 4 - Now that both the client and the server hold the same session key (symmetric encryption), the encrypted data is transmitted in a secure bi-directional channel.
Why does HTTPS switch to symmetric encryption during data transmission? There are two main reasons:
-
Security: The asymmetric encryption goes only one way. This means that if the server tries to send the encrypted data back to the client, anyone can decrypt the data using the public key.
-
Server resources: The asymmetric encryption adds quite a lot of mathematical overhead. It is not suitable for data transmissions in long sessions.
Oauth 2.0 Explained With Simple Terms.
OAuth 2.0 is a powerful and secure framework that allows different applications to securely interact with each other on behalf of users without sharing sensitive credentials.
The entities involved in OAuth are the User, the Server, and the Identity Provider (IDP).
What Can an OAuth Token Do?
When you use OAuth, you get an OAuth token that represents your identity and permissions. This token can do a few important things:
Single Sign-On (SSO): With an OAuth token, you can log into multiple services or apps using just one login, making life easier and safer.
Authorization Across Systems: The OAuth token allows you to share your authorization or access rights across various systems, so you don't have to log in separately everywhere.
Accessing User Profile: Apps with an OAuth token can access certain parts of your user profile that you allow, but they won't see everything.
Remember, OAuth 2.0 is all about keeping you and your data safe while making your online experiences seamless and hassle-free across different applications and services.
Top 4 Forms of Authentication Mechanisms
-
SSH Keys:
Cryptographic keys are used to access remote systems and servers securely
-
OAuth Tokens:
Tokens that provide limited access to user data on third-party applications
-
SSL Certificates:
Digital certificates ensure secure and encrypted communication between servers and clients
-
Credentials:
User authentication information is used to verify and grant access to various systems and services
Session, cookie, JWT, token, SSO, and OAuth 2.0 - what are they?
These terms are all related to user identity management. When you log into a website, you declare who you are (identification). Your identity is verified (authentication), and you are granted the necessary permissions (authorization). Many solutions have been proposed in the past, and the list keeps growing.
From simple to complex, here is my understanding of user identity management:
-
WWW-Authenticate is the most basic method. You are asked for the username and password by the browser. As a result of the inability to control the login life cycle, it is seldom used today.
-
A finer control over the login life cycle is session-cookie. The server maintains session storage, and the browser keeps the ID of the session. A cookie usually only works with browsers and is not mobile app friendly.
-
To address the compatibility issue, the token can be used. The client sends the token to the server, and the server validates the token. The downside is that the token needs to be encrypted and decrypted, which may be time-consuming.
-
JWT is a standard way of representing tokens. This information can be verified and trusted because it is digitally signed. Since JWT contains the signature, there is no need to save session information on the server side.
-
By using SSO (single sign-on), you can sign on only once and log in to multiple websites. It uses CAS (central authentication service) to maintain cross-site information.
-
By using OAuth 2.0, you can authorize one website to access your information on another website.
How to store passwords safely in the database and how to validate a password?
Things NOT to do
-
Storing passwords in plain text is not a good idea because anyone with internal access can see them.
-
Storing password hashes directly is not sufficient because it is pruned to precomputation attacks, such as rainbow tables.
-
To mitigate precomputation attacks, we salt the passwords.
What is salt?
According to OWASP guidelines, โa salt is a unique, randomly generated string that is added to each password as part of the hashing processโ.
How to store a password and salt?
- the hash result is unique to each password.
- The password can be stored in the database using the following format: hash(password + salt).
How to validate a password?
To validate a password, it can go through the following process:
- A client enters the password.
- The system fetches the corresponding salt from the database.
- The system appends the salt to the password and hashes it. Letโs call the hashed value H1.
- The system compares H1 and H2, where H2 is the hash stored in the database. If they are the same, the password is valid.
Explaining JSON Web Token (JWT) to a 10 year old Kid
Imagine you have a special box called a JWT. Inside this box, there are three parts: a header, a payload, and a signature.
The header is like the label on the outside of the box. It tells us what type of box it is and how it's secured. It's usually written in a format called JSON, which is just a way to organize information using curly braces { } and colons : .
The payload is like the actual message or information you want to send. It could be your name, age, or any other data you want to share. It's also written in JSON format, so it's easy to understand and work with. Now, the signature is what makes the JWT secure. It's like a special seal that only the sender knows how to create. The signature is created using a secret code, kind of like a password. This signature ensures that nobody can tamper with the contents of the JWT without the sender knowing about it.
When you want to send the JWT to a server, you put the header, payload, and signature inside the box. Then you send it over to the server. The server can easily read the header and payload to understand who you are and what you want to do.
How does Google Authenticator (or other types of 2-factor authenticators) work?
Google Authenticator is commonly used for logging into our accounts when 2-factor authentication is enabled. How does it guarantee security?
Google Authenticator is a software-based authenticator that implements a two-step verification service. The diagram below provides detail.
There are two stages involved:
- Stage 1 - The user enables Google two-step verification.
- Stage 2 - The user uses the authenticator for logging in, etc.
Letโs look at these stages.
Stage 1
Steps 1 and 2: Bob opens the web page to enable two-step verification. The front end requests a secret key. The authentication service generates the secret key for Bob and stores it in the database.
Step 3: The authentication service returns a URI to the front end. The URI is composed of a key issuer, username, and secret key. The URI is displayed in the form of a QR code on the web page.
Step 4: Bob then uses Google Authenticator to scan the generated QR code. The secret key is stored in the authenticator.
Stage 2 Steps 1 and 2: Bob wants to log into a website with Google two-step verification. For this, he needs the password. Every 30 seconds, Google Authenticator generates a 6-digit password using TOTP (Time-based One Time Password) algorithm. Bob uses the password to enter the website.
Steps 3 and 4: The frontend sends the password Bob enters to the backend for authentication. The authentication service reads the secret key from the database and generates a 6-digit password using the same TOTP algorithm as the client.
Step 5: The authentication service compares the two passwords generated by the client and the server, and returns the comparison result to the frontend. Bob can proceed with the login process only if the two passwords match.
Is this authentication mechanism safe?
-
Can the secret key be obtained by others?
We need to make sure the secret key is transmitted using HTTPS. The authenticator client and the database store the secret key, and we need to make sure the secret keys are encrypted.
-
Can the 6-digit password be guessed by hackers?
No. The password has 6 digits, so the generated password has 1 million potential combinations. Plus, the password changes every 30 seconds. If hackers want to guess the password in 30 seconds, they need to enter 30,000 combinations per second.
Real World Case Studies
Netflix's Tech Stack
This post is based on research from many Netflix engineering blogs and open-source projects. If you come across any inaccuracies, please feel free to inform us.
Mobile and web: Netflix has adopted Swift and Kotlin to build native mobile apps. For its web application, it uses React.
Frontend/server communication: Netflix uses GraphQL.
Backend services: Netflix relies on ZUUL, Eureka, the Spring Boot framework, and other technologies.
Databases: Netflix utilizes EV cache, Cassandra, CockroachDB, and other databases.
Messaging/streaming: Netflix employs Apache Kafka and Fink for messaging and streaming purposes.
Video storage: Netflix uses S3 and Open Connect for video storage.
Data processing: Netflix utilizes Flink and Spark for data processing, which is then visualized using Tableau. Redshift is used for processing structured data warehouse information.
CI/CD: Netflix employs various tools such as JIRA, Confluence, PagerDuty, Jenkins, Gradle, Chaos Monkey, Spinnaker, Atlas, and more for CI/CD processes.
Twitter Architecture 2022
Yes, this is the real Twitter architecture. It is posted by Elon Musk and redrawn by us for better readability.
Evolution of Airbnbโs microservice architecture over the past 15 years
Airbnbโs microservice architecture went through 3 main stages.
Monolith (2008 - 2017)
Airbnb began as a simple marketplace for hosts and guests. This is built in a Ruby on Rails application - the monolith.
Whatโs the challenge?
- Confusing team ownership + unowned code
- Slow deployment
Microservices (2017 - 2020)
Microservice aims to solve those challenges. In the microservice architecture, key services include:
- Data fetching service
- Business logic data service
- Write workflow service
- UI aggregation service
- Each service had one owning team
Whatโs the challenge?
Hundreds of services and dependencies were difficult for humans to manage.
Micro + macroservices (2020 - present)
This is what Airbnb is working on now. The micro and macroservice hybrid model focuses on the unification of APIs.
Monorepo vs. Microrepo.
Which is the best? Why do different companies choose different options?
Monorepo isn't new; Linux and Windows were both created using Monorepo. To improve scalability and build speed, Google developed its internal dedicated toolchain to scale it faster and strict coding quality standards to keep it consistent.
Amazon and Netflix are major ambassadors of the Microservice philosophy. This approach naturally separates the service code into separate repositories. It scales faster but can lead to governance pain points later on.
Within Monorepo, each service is a folder, and every folder has a BUILD config and OWNERS permission control. Every service member is responsible for their own folder.
On the other hand, in Microrepo, each service is responsible for its repository, with the build config and permissions typically set for the entire repository.
In Monorepo, dependencies are shared across the entire codebase regardless of your business, so when there's a version upgrade, every codebase upgrades their version.
In Microrepo, dependencies are controlled within each repository. Businesses choose when to upgrade their versions based on their own schedules.
Monorepo has a standard for check-ins. Google's code review process is famously known for setting a high bar, ensuring a coherent quality standard for Monorepo, regardless of the business.
Microrepo can either set its own standard or adopt a shared standard by incorporating the best practices. It can scale faster for business, but the code quality might be a bit different. Google engineers built Bazel, and Meta built Buck. There are other open-source tools available, including Nx, Lerna, and others.
Over the years, Microrepo has had more supported tools, including Maven and Gradle for Java, NPM for NodeJS, and CMake for C/C++, among others.
How will you design the Stack Overflow website?
If your answer is on-premise servers and monolith (on the bottom of the following image), you would likely fail the interview, but that's how it is built in reality!
What people think it should look like
The interviewer is probably expecting something like the top portion of the picture.
- Microservice is used to decompose the system into small components.
- Each service has its own database. Use cache heavily.
- The service is sharded.
- The services talk to each other asynchronously through message queues.
- The service is implemented using Event Sourcing with CQRS.
- Showing off knowledge in distributed systems such as eventual consistency, CAP theorem, etc.
What it actually is
Stack Overflow serves all the traffic with only 9 on-premise web servers, and itโs on monolith! It has its own servers and does not run on the cloud.
This is contrary to all our popular beliefs these days.
Why did Amazon Prime Video monitoring move from serverless to monolithic? How can it save 90% cost?
The diagram below shows the architecture comparison before and after the migration.
What is Amazon Prime Video Monitoring Service?
Prime Video service needs to monitor the quality of thousands of live streams. The monitoring tool automatically analyzes the streams in real time and identifies quality issues like block corruption, video freeze, and sync problems. This is an important process for customer satisfaction.
There are 3 steps: media converter, defect detector, and real-time notification.
-
What is the problem with the old architecture?
The old architecture was based on Amazon Lambda, which was good for building services quickly. However, it was not cost-effective when running the architecture at a high scale. The two most expensive operations are:
-
The orchestration workflow - AWS step functions charge users by state transitions and the orchestration performs multiple state transitions every second.
-
Data passing between distributed components - the intermediate data is stored in Amazon S3 so that the next stage can download. The download can be costly when the volume is high.
-
Monolithic architecture saves 90% cost
A monolithic architecture is designed to address the cost issues. There are still 3 components, but the media converter and defect detector are deployed in the same process, saving the cost of passing data over the network. Surprisingly, this approach to deployment architecture change led to 90% cost savings!
This is an interesting and unique case study because microservices have become a go-to and fashionable choice in the tech industry. It's good to see that we are having more discussions about evolving the architecture and having more honest discussions about its pros and cons. Decomposing components into distributed microservices comes with a cost.
-
What did Amazon leaders say about this?
Amazon CTO Werner Vogels: โBuilding evolvable software systems is a strategy, not a religion. And revisiting your architecture with an open mind is a must.โ
Ex Amazon VP Sustainability Adrian Cockcroft: โThe Prime Video team had followed a path I call Serverless FirstโฆI donโt advocate Serverless Onlyโ.
How does Disney Hotstar capture 5 Billion Emojis during a tournament?
-
Clients send emojis through standard HTTP requests. You can think of Golang Service as a typical Web Server. Golang is chosen because it supports concurrency well. Threads in Golang are lightweight.
-
Since the write volume is very high, Kafka (message queue) is used as a buffer.
-
Emoji data are aggregated by a streaming processing service called Spark. It aggregates data every 2 seconds, which is configurable. There is a trade-off to be made based on the interval. A shorter interval means emojis are delivered to other clients faster but it also means more computing resources are needed.
-
Aggregated data is written to another Kafka.
-
The PubSub consumers pull aggregated emoji data from Kafka.
-
Emojis are delivered to other clients in real-time through the PubSub infrastructure. The PubSub infrastructure is interesting. Hotstar considered the following protocols: Socketio, NATS, MQTT, and gRPC, and settled with MQTT.
A similar design is adopted by LinkedIn which streams a million likes/sec.
How Discord Stores Trillions Of Messages
The diagram below shows the evolution of message storage at Discord:
MongoDB โก๏ธ Cassandra โก๏ธ ScyllaDB
In 2015, the first version of Discord was built on top of a single MongoDB replica. Around Nov 2015, MongoDB stored 100 million messages and the RAM couldnโt hold the data and index any longer. The latency became unpredictable. Message storage needs to be moved to another database. Cassandra was chosen.
In 2017, Discord had 12 Cassandra nodes and stored billions of messages.
At the beginning of 2022, it had 177 nodes with trillions of messages. At this point, latency was unpredictable, and maintenance operations became too expensive to run.
There are several reasons for the issue:
- Cassandra uses the LSM tree for the internal data structure. The reads are more expensive than the writes. There can be many concurrent reads on a server with hundreds of users, resulting in hotspots.
- Maintaining clusters, such as compacting SSTables, impacts performance.
- Garbage collection pauses would cause significant latency spikes
ScyllaDB is Cassandra compatible database written in C++. Discord redesigned its architecture to have a monolithic API, a data service written in Rust, and ScyllaDB-based storage.
The p99 read latency in ScyllaDB is 15ms compared to 40-125ms in Cassandra. The p99 write latency is 5ms compared to 5-70ms in Cassandra.
How do video live streamings work on YouTube, TikTok live, or Twitch?
Live streaming differs from regular streaming because the video content is sent via the internet in real-time, usually with a latency of just a few seconds.
The diagram below explains what happens behind the scenes to make this possible.
Step 1: The raw video data is captured by a microphone and camera. The data is sent to the server side.
Step 2: The video data is compressed and encoded. For example, the compressing algorithm separates the background and other video elements. After compression, the video is encoded to standards such as H.264. The size of the video data is much smaller after this step.
Step 3: The encoded data is divided into smaller segments, usually seconds in length, so it takes much less time to download or stream.
Step 4: The segmented data is sent to the streaming server. The streaming server needs to support different devices and network conditions. This is called โAdaptive Bitrate Streaming.โ This means we need to produce multiple files at different bitrates in steps 2 and 3.
Step 5: The live streaming data is pushed to edge servers supported by CDN (Content Delivery Network.) Millions of viewers can watch the video from an edge server nearby. CDN significantly lowers data transmission latency.
Step 6: The viewersโ devices decode and decompress the video data and play the video in a video player.
Steps 7 and 8: If the video needs to be stored for replay, the encoded data is sent to a storage server, and viewers can request a replay from it later.
Standard protocols for live streaming include:
- RTMP (Real-Time Messaging Protocol): This was originally developed by Macromedia to transmit data between a Flash player and a server. Now it is used for streaming video data over the internet. Note that video conferencing applications like Skype use RTC (Real-Time Communication) protocol for lower latency.
- HLS (HTTP Live Streaming): It requires the H.264 or H.265 encoding. Apple devices accept only HLS format.
- DASH (Dynamic Adaptive Streaming over HTTP): DASH does not support Apple devices.
- Both HLS and DASH support adaptive bitrate streaming.
License
This work is licensed under CC BY-NC-ND 4.0
Windows system utilities to maximize productivity
Microsoft PowerToys
How to use PowerToys | Downloads & Release notes | Contributing to PowerToys | What's Happening | Roadmap
About
Microsoft PowerToys is a set of utilities for power users to tune and streamline their Windows experience for greater productivity. For more info on PowerToys overviews and how to use the utilities, or any other tools and resources for Windows development environments, head over to learn.microsoft.com!
๐โญ PowerToys Advent calendar โญ๐
We will be highlighting a cool utility each day for 24 days in December! To follow along, check out these threads:
- https://bsky.app/profile/kaylacinnamon.bsky.social/post/3lcb7iljxck2o
- https://x.com/cinnamon_msft/status/1863284610773246257
Installing and running Microsoft PowerToys
Requirements
- Windows 11 or Windows 10 version 2004 (code name 20H1 / build number 19041) or newer.
- x64 or ARM64 processor
- Our installer will install the following items:
- Microsoft Edge WebView2 Runtime bootstrapper. This will install the latest version.
Via GitHub with EXE [Recommended]
Go to the Microsoft PowerToys GitHub releases page and click on Assets
at the bottom to show the files available in the release. Please use the appropriate PowerToys installer that matches your machine's architecture and install scope. For most, it is x64
and per-user.
Description | Filename | sha256 hash |
---|---|---|
Per user - x64 | PowerToysUserSetup-0.87.0-x64.exe | A6549B8D78985CC995F091624D1A2B70907CAC8954334C1CAF61D26EBCF8A449 |
Per user - ARM64 | PowerToysUserSetup-0.87.0-arm64.exe | 3557D4F35AA52571334712A48F51D116F389FA8C43C6B27FE321A7525067E7AE |
Machine wide - x64 | PowerToysSetup-0.87.0-x64.exe | 600CDC7F9AC296AA8B554CA34A0C7EA2D9B1E7E8E41BD096840851B416E63A3C |
Machine wide - ARM64 | PowerToysSetup-0.87.0-arm64.exe | 387B5BF1BD923BDA215D7DF1D82A197AE12CD91A71A73267768E26757F7A5FE6 |
This is our preferred method.
Via Microsoft Store
Install from the Microsoft Store's PowerToys page. You must be using the new Microsoft Store which is available for both Windows 11 and Windows 10.
Via WinGet
Download PowerToys from WinGet. Updating PowerToys via winget will respect current PowerToys installation scope. To install PowerToys, run the following command from the command line / PowerShell:
User scope installer [default]
winget install Microsoft.PowerToys -s winget
Machine-wide scope installer
winget install --scope machine Microsoft.PowerToys -s winget
Other install methods
There are community driven install methods such as Chocolatey and Scoop. If these are your preferred install solutions, you can find the install instructions there.
Third-Party Run Plugins
There is a collection of third-party plugins created by the community that aren't distributed with PowerToys.
Contributing
This project welcomes contributions of all types. Besides coding features / bug fixes, other ways to assist include spec writing, design, documentation, and finding bugs. We are excited to work with the power user community to build a set of tools for helping you get the most out of Windows.
We ask that before you start work on a feature that you would like to contribute, please read our Contributor's Guide. We would be happy to work with you to figure out the best approach, provide guidance and mentorship throughout feature development, and help avoid any wasted or duplicate effort.
Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you grant us the rights to use your contribution and that you have permission to do so.
For guidance on developing for PowerToys, please read the developer docs for a detailed breakdown. This includes how to setup your computer to compile.
What's Happening
PowerToys Roadmap
Our prioritized roadmap of features and utilities that the core team is focusing on.
0.87 - December 2024 Update
In this release, we focused on new features, stability, and improvements.
Highlights
- Advanced Paste has a new feature called "Advanced AI" that uses Semantic Kernel to allow setting up the orchestration of sequential clipboard transformations.
- Workspaces supports Progressive Web Applications.
- Workspaces has a new feature to move existing windows instead of creating new ones.
- Mouse Jump added new settings to allow customization of screens pop-up. Thanks @mikeclayton!
- New+ now works on Windows 10. Thanks @cgaarden!
- Quick Accent allows selecting the character sets that should appear on the UI. Thanks @Sirozha1337!
Advanced Paste
- Added a new optional feature allowing using AI to set up the orchestration of sequential clipboard transformations.
Awake
- Initialization, logging and tray icon setup improvements. Thanks @dend!
File Explorer add-ons
- Preview Pane extensions now use the PerMonitorV2 DPI mode to fix errors on different scales. Thanks @davidegiacometti!
Keyboard Manager.
- Added labels to the IME On, IME Off keys. Thanks @kit494way!
- Fixed an issue that caused the Shift key to remain stuck if a numpad key was mapped to the Shift key.
Monaco Preview
- Added support for .ahk files to be shown as a plaintext file in Peek and File Explorer add-ons. Thanks @daverayment!
- Added support for .ion files to be shown as a plaintext file in Peek and File Explorer add-ons. Thanks @octastylos-pseudodipteros!
- Added support for syntax highlighting for .srt files in Peek and File Explorer add-ons. Thanks @PesBandi!
Mouse Jump
- Allow customizing the appearance of the UI of the Mouse Jump pop-up. Thanks @mikeclayton!
New+
- Added support for Windows 10. Thanks @cgaarden!
- Fixed an issue causing the renaming of new files to not trigger some times. Thanks @cgaarden!
- Updated the New+ icons. Thanks @niels9001!
Peek
- Peek now checks local capabilities to decide what image formats Image Previewer is able to support. Thanks @daverayment!
- Fixed an issue causing the Code Files Previewer to not load correctly under certain conditions. Thanks @daverayment!
- Refactored, improved and fixed logging when loading the user settings file. Thanks @daverayment!
PowerToys Run
- Added a scoring function for proper ordering of the WindowWalker plugin results. Thanks @andbartol!
- Added UUIDv7 support to the ValueGenerator plugin. Thanks @frederik-hoeft!
- The calculator plugin now allows scientific notation numbers with a lowercase 'e'. Thanks @PesBandi!
- Ported the UI from WPF-UI to .NET 9 WPF, to fix "Desktop composition is disabled" crashes.
Quick Accent
- Added a setting to allow selecting which character sets to show. Thanks @Sirozha1337!
Screen Ruler
- Added a Setting to also allow showing measurements in inches, centimeters or millimeters. Thanks @Sophanatprime!
Settings
- Fixed an issue causing all the links to milestones in the "What's new?" OOBE page to point to the same milestone.
- Removed extra space from the Welcome page. Thanks @agarwalishita!
- Updated left navigation bar icons. Thanks @niels9001!
- Fixed accessibility issues in the dashboard page. Thanks @davidegiacometti!
Workspaces
- Added support for Progressive Web Applications to Workspaces.
- Implemented a feature to move existing windows instead of creating new ones.
- Fixed a crash when opening the workspaces editor that was caused by passing incorrect encoder parameters when saving Bitmap files.
- Workspaces editor position is now saved so that we can start it at the same position when we open it again.
- Fixed an issue causing many instances of the same application to be put in the same position instead of the intended position due to timer issues.
- Fixed detection of exact application version when many versions of the same application are installed.
Documentation
- Improved language in CONTRIBUTE.md. Thanks @sanskaarz!
- Added Bilibili plugin mention to thirdPartyRunPlugins.md. Thanks @Whuihuan!
- Added CanIUse and TailwindCSS plugins mention to thirdPartyRunPlugins.md. Thanks @skttl!
- Added HttpStatusCodes plugin mention to thirdPartyRunPlugins.md. Thanks @grzhan!
- Updated COMMUNITY.md with more contributors.
Development
- Upgraded to .NET 9. Thanks @snickler!
- Fixed building on Visual Studio 17.12.
- Upgraded the System.IO.Abstractions dependency to 21.0.29. Thanks @davidegiacometti!
- Upgraded the WindowsAppSDK dependency to 1.6.241114003. Thanks @shuaiyuanxx!
- Upgraded the MSTest dependency to 3.6.3. Thanks @Youssef1313!
- Upgraded the check-spelling CI dependency to 0.0.24 and fixed related spell checking issues. Thanks @jsoref!
- Removed duplicate names from the spellcheck allowed names file. Thanks @htcfreek!
- Improved logging of asynchronous methods call stacks when logging an error.
- Created a MSBuild props file to be imported by other projects to enable AOT support.
- Made the Peek utility source code AOT compatible.
- Updated .editorconfig rules to relax squiggly IDE errors in Visual Studio 17.12. Thanks @snickler!
- Moved Xaml.Styler from the root to the src folder.
What is being planned for version 0.88
For v0.88, we'll work on the items below:
- Stability / bug fixes
- New module: File Actions Menu
- Integrate Sysinternals ZoomIt
PowerToys Community
The PowerToys team is extremely grateful to have the support of an amazing active community. The work you do is incredibly important. PowerToys wouldnโt be nearly what it is today without your help filing bugs, updating documentation, guiding the design, or writing features. We want to say thank you and take time to recognize your work. Month by month, you directly help make PowerToys a better piece of software.
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct.
Privacy Statement
The application logs basic diagnostic data (telemetry). For more information on privacy and what we collect, see our PowerToys Data and Privacy documentation.
Fullstack app framework for web, desktop, mobile, and more.
โจ Dioxus 0.6 is released - check it out here! โจ
Build for web, desktop, and mobile, and more with a single codebase. Zero-config setup, integrated hot-reloading, and signals-based state management. Add backend functionality with Server Functions and bundle with our CLI.
fn app() -> Element {
let mut count = use_signal(|| 0);
rsx! {
h1 { "High-Five counter: {count}" }
button { onclick: move |_| count += 1, "Up high!" }
button { onclick: move |_| count -= 1, "Down low!" }
}
}
โญ๏ธ Unique features:
- Cross-platform apps in three lines of code (web, desktop, mobile, server, and more)
- Ergonomic state management combines the best of React, Solid, and Svelte
- Type-safe Routing and server functions to leverage Rust's powerful compile-time guarantees
- Integrated bundler for deploying to the web, macOS, Linux, and Windows
- And more! Take a tour of Dioxus.
Instant hot-reloading
With one command, dx serve
and your app is running. Edit your markup and styles and see the results in real time.
First-class Android and iOS support
Dioxus is the fastest way to build native mobile apps with Rust. Simply run dx serve --platform android
and your app is running in an emulator or on device in seconds. Call directly into JNI and Native APIs.
Bundle for web, desktop, and mobile
Simply run dx bundle
and your app will be built and bundled with maximization optimizations. On the web, take advantage of .avif
generation, .wasm
compression, minification, and more. Build WebApps weighing less than 50kb and desktop/mobile apps less than 5mb.
Fantastic documentation
We've put a ton of effort into building clean, readable, and comprehensive documentation. All html elements and listeners are documented with MDN docs, and our Docs runs continuous integration with Dioxus itself to ensure that the docs are always up to date. Check out the Dioxus website for guides, references, recipes, and more. Fun fact: we use the Dioxus website as a testbed for new Dioxus features - check it out!
Community
Dioxus is a community-driven project, with a very active Discord and GitHub community. We're always looking for help, and we're happy to answer questions and help you get started. Our SDK is community-run and we even have a GitHub organization for the best Dioxus crates that receive free upgrades and support.
Full-time core team
Dioxus has grown from a side project to a small team of fulltime engineers. Thanks to the generous support of FutureWei, Satellite.im, the GitHub Accelerator program, we're able to work on Dioxus full-time. Our long term goal is for Dioxus to become self-sustaining by providing paid high-quality enterprise tools. If your company is interested in adopting Dioxus and would like to work with us, please reach out!
Supported Platforms
Web |
|
Desktop |
|
Mobile |
|
Server-side Rendering |
|
Running the examples
The examples in the main branch of this repository target the git version of dioxus and the CLI. If you are looking for examples that work with the latest stable release of dioxus, check out the 0.6 branch.
The examples in the top level of this repository can be run with:
cargo run --example <example>
However, we encourage you to download the dioxus-cli. If you are running the git version of dioxus, you can install the matching version of the CLI with:
cargo install --git https://github.com/DioxusLabs/dioxus dioxus-cli --locked
With the CLI, you can also run examples with the web platform. You just need to disable the default desktop feature and enable the web feature with this command:
dx serve --example <example> --platform web -- --no-default-features
Dioxus vs other frameworks
We love all frameworks and enjoy watching innovation in the Rust ecosystem. In fact, many of our projects are shared with other frameworks. For example, our flex-box library Taffy is used by Bevy, Zed, Lapce, Iced, and many more.
Dioxus places an emphasis on a few key points that make it different from other frameworks:
- React-like: we rely on concepts like components, props, and hooks to build UIs, with our state management being closer to Svelte than to SolidJS.
- HTML and CSS: we lean completely into HTML and CSS, quirks and all.
- Renderer-agnostic: you can swap out the renderer for any platform you want thanks to our fast VirtualDOM.
- Collaborative: whenever possible, we spin out crates like Taffy, manganis, include_mdbook, and blitz so the ecosystem can grow together.
Dioxus vs Tauri
Tauri is a framework for building desktop mobile apps where your frontend is written in a web-based framework like React, Vue, Svelte, etc. Whenever you need to do native work, you can write Rust functions and call them from your frontend.
-
Natively Rust: Tauri's architecture limits your UI to either JavaScript or WebAssembly. With Dioxus, your Rust code is running natively on the user's machine, letting you do things like spawning threads, accessing the filesystem, without any IPC bridge. This drastically simplifies your app's architecture and makes it easier to build. You can build a Tauri app with Dioxus-Web as a frontend if you'd like.
-
Different scopes: Tauri needs to support JavaScript and its complex build tooling, limiting the scope of what you can do with it. Since Dioxus is exclusively focused on Rust, we're able to provide extra utilities like Server Functions, advanced bundling, and a native renderer.
-
Shared DNA: While Tauri and Dioxus are separate projects, they do share libraries like Tao and Wry: windowing and webview libraries maintained by the Tauri team.
Dioxus vs Leptos
Leptos is a library for building fullstack web-apps, similar to SolidJS and SolidStart. The two libraries share similar goals on the web, but have several key differences:
-
Reactivity model: Leptos uses signals to drive both reactivity and rendering, while Dioxus uses signals just for reactivity. For managing re-renders, Dioxus uses a highly optimized VirtualDOM to support desktop and mobile architectures. Both Dioxus and Leptos are extremely fast.
-
Different scopes: Dioxus provides renderers for web, desktop, mobile, LiveView, and more. We also maintain community libraries and a cross-platform SDK. Leptos has a tighter focus on the fullstack web with features that Dioxus doesn't have like islands,
<Form />
components, and other web-specific utilities. -
Different DSLs: Dioxus uses its own custom Rust-like DSL for building UIs while Leptos uses an HTML-like syntax. We chose this to retain compatibility with IDE features like code-folding and syntax highlighting. Generally, Dioxus leans into more "magic" with its DSL including automatic formatting of strings and hot-reloading of simple Rust expressions.
// dioxus
rsx! {
div {
class: "my-class",
enabled: true,
"Hello, {name}"
}
}
// leptos
view! {
<div class="my-class" enabled={true}>
"Hello "
{name}
</div>
}
Dioxus vs Yew
Yew is a framework for building single-page web apps and initially served as an inspiration for Dioxus. Unfortunately, the architecture of Yew didn't support the various features we wanted, and thus Dioxus was born.
-
Single-page apps: Yew is designed exclusively for single-page web apps and is intrinsically tied to the web platform. Dioxus is fullstack and cross-platform, making it suitable for building web, desktop, mobile, and server apps.
-
Developer Tooling: Dioxus provides a number of utilities like autoformatting, hot-reloading, and a bundler.
-
Ongoing support: Dioxus is very actively maintained with new features and bug fixes being added on a daily basis.
Dioxus vs egui
egui is a cross-platform GUI library for Rust powering tools like Rerun.io.
-
Immediate vs Retained: egui is designed to be re-rendered on every frame. This is suitable for games and other interactive applications, but it does not retain style and layout state between frames. Dioxus is a retained UI framework, meaning that the UI is built once and then modified between frames. This enables Dioxus to use native web technologies like HTML and CSS with better battery life and performance.
-
Customizable: egui brings its own styling and layout solution while Dioxus expects you to use the built-in HTML and CSS. This enables dioxus apps to use any CSS library like Tailwind or Material UI.
-
State management: egui's state management is based on a single global state object. Dioxus encourages encapsulation of state by using components and props, making components more reusable.
Dioxus vs Iced
Iced is a cross-platform GUI library inspired by Elm. Iced renders natively with WGPU and supports the web using DOM nodes.
-
Elm state management: Iced uses Elm's state management model, which is based on message passing and reducers. This is simply a different state management model than Dioxus and can be rather verbose at times.
-
Native Feel: Since Dioxus uses a webview as its renderer, it automatically gets native text input, paste handling, and other native features like accessibility. Iced's renderer currently doesn't implement these features, making it feel less native.
-
WGPU: Dioxus' WGPU renderer is currently quite immature and not yet ready for production use. Iced's WGPU renderer is much more mature and is being used in production. This enables certain types of apps that need GPU access to be built with Iced that can't currently be built with Dioxus.
Dioxus vs Electron
Dioxus and Electron are two entirely different projects with similar goals. Electron makes it possible for developers to build cross-platform desktop apps using web technologies like HTML, CSS, and JavaScript.
-
Lightweight: Dioxus uses the system's native WebView - or optionally, a WGPU renderer - to render the UI. This makes a typical Dioxus app about 15mb on macOS in comparison to Electron's 100mb. Electron also ships an embedded chromium instance which cannot share system resources with the host OS in the same way as Dioxus.
-
Maturity: Electron is a mature project with a large community and a lot of tooling. Dioxus is still quite young in comparison to Electron. Expect to run into features like deep-linking that require extra work to implement.
Contributing
- Check out the website section on contributing.
- Report issues on our issue tracker.
- Join the discord and ask questions!
License
This project is licensed under either the MIT license or the Apache-2 License.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Dioxus by you, shall be licensed as MIT or Apache-2, without any additional terms or conditions.
Neovim config for the lazy
Install ยท Configure ยท Docs
LazyVim is a Neovim setup powered by ๐ค lazy.nvim to make it easy to customize and extend your config. Rather than having to choose between starting from scratch or using a pre-made distro, LazyVim offers the best of both worlds - the flexibility to tweak your config as needed, along with the convenience of a pre-configured setup.
โจ Features
- ๐ฅ Transform your Neovim into a full-fledged IDE
- ๐ค Easily customize and extend your config with lazy.nvim
- ๐ Blazingly fast
- ๐งน Sane default settings for options, autocmds, and keymaps
- ๐ฆ Comes with a wealth of plugins pre-configured and ready to use
โก๏ธ Requirements
- Neovim >= 0.9.0 (needs to be built with LuaJIT)
- Git >= 2.19.0 (for partial clones support)
- a Nerd Font (optional)
- a C compiler for
nvim-treesitter
. See here
๐ Getting Started
You can find a starter template for LazyVim here
Try it with Docker
docker run -w /root -it --rm alpine:edge sh -uelic '
apk add git lazygit fzf curl neovim ripgrep alpine-sdk --update
git clone https://github.com/LazyVim/starter ~/.config/nvim
cd ~/.config/nvim
nvim
'
Install the LazyVim Starter
-
Make a backup of your current Neovim files:
mv ~/.config/nvim ~/.config/nvim.bak mv ~/.local/share/nvim ~/.local/share/nvim.bak
-
Clone the starter
git clone https://github.com/LazyVim/starter ~/.config/nvim
-
Remove the
.git
folder, so you can add it to your own repo laterrm -rf ~/.config/nvim/.git
-
Start Neovim!
nvim
Refer to the comments in the files on how to customize LazyVim.
There's a great video created by @elijahmanor with a walkthrough to get started.
@dusty-phillips wrote a comprehensive book called LazyVim for Ambitious Developers available for free online.
๐ File Structure
The files under config will be automatically loaded at the appropriate time, so you don't need to require those files manually. LazyVim comes with a set of default config files that will be loaded before your own. See here
You can add your custom plugin specs under lua/plugins/
. All files there will be automatically loaded by lazy.nvim
~/.config/nvim โโโ lua โ โโโ config โ โ โโโ autocmds.lua โ โ โโโ keymaps.lua โ โ โโโ lazy.lua โ โ โโโ options.lua โ โโโ plugins โ โโโ spec1.lua โ โโโ ** โ โโโ spec2.lua โโโ init.lua
โ๏ธ Configuration
Refer to the docs
Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.
๐ Awesome LLM Apps
A curated collection of awesome LLM apps built with RAG and AI agents. This repository features LLM apps that use models from OpenAI, Anthropic, Google, and even open-source models like LLaMA that you can run locally on your computer.
๐ค Why Awesome LLM Apps?
- ๐ก Discover practical and creative ways LLMs can be applied across different domains, from code repositories to email inboxes and more.
- ๐ฅ Explore apps that combine LLMs from OpenAI, Anthropic, Gemini, and open-source alternatives with RAG and AI Agents.
- ๐ Learn from well-documented projects and contribute to the growing open-source ecosystem of LLM-powered applications.
๐ Featured AI Projects
AI Agents
- ๐ผ AI Customer Support Agent
- ๐ AI Investment Agent
- ๐จโโ๏ธ AI Legal Agent Team
- ๐จโ๐ผ AI Services Agency
- ๐๏ธโโ๏ธ AI Health & Fitness Planner Agent
- ๐ AI Startup Trend Analysis Agent
- ๐๏ธ AI Journalist Agent
- ๐ฒ AI Finance Agent Team
- ๐ฐ AI Personal Finance Agent
- ๐ซ AI Travel Agent
- ๐ฌ AI Movie Production Agent
- ๐ฐ Multi-Agent AI Researcher
- ๐ AI Meeting Agent
- ๐ Local News Agent OpenAI Swarm
- ๐ AI Finance Agent with xAI Grok
- ๐ง AI Reasoning Agent
- ๐งฌ Multimodal AI Agent
RAG (Retrieval Augmented Generation)
- ๐ Autonomous RAG
- ๐ Agentic RAG
- ๐ Llama3.1 Local RAG
- ๐งฉ RAG-as-a-Service
- ๐ฆ Local RAG Agent
- ๐ RAG App with Hybrid Search
- ๐ฅ๏ธ Local RAG App with Hybrid Search
LLM Apps with Memory
- ๐พ AI Arxiv Agent with Memory
- ๐ LLM App with Personalized Memory
- ๐ฉ๏ธ AI Travel Agent with Memory
- ๐๏ธ Local ChatGPT with Memory
Chat with X
- ๐ฌ Chat with GitHub Repo
- ๐จ Chat with Gmail
- ๐ Chat with PDF
- ๐ Chat with Research Papers
- ๐ Chat with Substack Newsletter
- ๐ฝ๏ธ Chat with YouTube Videos
LLM Finetuning
Advanced Tools and Frameworks
- ๐งช Gemini Multimodal Chatbot
- ๐ Mixture of Agents
- ๐ MultiLLM Chat Playground
- ๐ LLM Router App
- ๐ฌ Local ChatGPT Clone
- ๐ Web Scraping AI Agent
- ๐ Web Search AI Assistant
- ๐งช Cursor AI Experiments
๐ Getting Started
-
Clone the repository
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
-
Navigate to the desired project directory
cd awesome-llm-apps/chat_with_gmail
-
Install the required dependencies
pip install -r requirements.txt
-
Follow the project-specific instructions in each project's
README.md
file to set up and run the app.
๐ค Contributing to Open Source
Contributions are welcome! If you have any ideas, improvements, or new apps to add, please create a new GitHub Issue or submit a pull request. Make sure to follow the existing project structure and include a detailed README.md
for each new app.
Thank You, Community, for the Support! ๐
๐ Donโt miss out on future updates! Star the repo now and be the first to know about new and exciting LLM apps with RAG and AI Agents.
openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 275+ supported cars.
openpilot
openpilot is an operating system for robotics.
Currently, it upgrades the driver assistance system in 275+ supported cars.
Docs ยท Roadmap ยท Contribute ยท Community ยท Try it on a comma 3X
Quick start: bash <(curl -fsSL openpilot.comma.ai)
Using openpilot in a car
To use openpilot in a car, you need four things:
- Supported Device: a comma 3/3X, available at comma.ai/shop.
- Software: The setup procedure for the comma 3/3X allows users to enter a URL for custom software. Use the URL
openpilot.comma.ai
to install the release version. - Supported Car: Ensure that you have one of the 275+ supported cars.
- Car Harness: You will also need a car harness to connect your comma 3/3X to your car.
We have detailed instructions for how to install the harness and device in a car. Note that it's possible to run openpilot on other hardware, although it's not plug-and-play.
Branches
branch | URL | description |
---|---|---|
release3 |
openpilot.comma.ai | This is openpilot's release branch. |
release3-staging |
openpilot-test.comma.ai | This is the staging branch for releases. Use it to get new releases slightly early. |
nightly |
openpilot-nightly.comma.ai | This is the bleeding edge development branch. Do not expect this to be stable. |
nightly-dev |
installer.comma.ai/commaai/nightly-dev | Same as nightly, but includes experimental development features for some cars. |
To start developing openpilot
openpilot is developed by comma and by users like you. We welcome both pull requests and issues on GitHub.
- Join the community Discord
- Check out the contributing docs
- Check out the openpilot tools
- Read about the development workflow
- Code documentation lives at https://docs.comma.ai
- Information about running openpilot lives on the community wiki
Want to get paid to work on openpilot? comma is hiring and offers lots of bounties for external contributors.
Safety and Testing
- openpilot observes ISO26262 guidelines, see SAFETY.md for more details.
- openpilot has software-in-the-loop tests that run on every commit.
- The code enforcing the safety model lives in panda and is written in C, see code rigor for more details.
- panda has software-in-the-loop safety tests.
- Internally, we have a hardware-in-the-loop Jenkins test suite that builds and unit tests the various processes.
- panda has additional hardware-in-the-loop tests.
- We run the latest openpilot in a testing closet containing 10 comma devices continuously replaying routes.
Licensing
openpilot is released under the MIT license. Some parts of the software are released under other licenses as specified.
Any user of this software shall indemnify and hold harmless Comma.ai, Inc. and its directors, officers, employees, agents, stockholders, affiliates, subcontractors and customers from and against all allegations, claims, actions, suits, demands, damages, liabilities, obligations, losses, settlements, judgments, costs and expenses (including without limitation attorneysโ fees and costs) which arise out of, relate to or result from any use of this software by user.
THIS IS ALPHA QUALITY SOFTWARE FOR RESEARCH PURPOSES ONLY. THIS IS NOT A PRODUCT. YOU ARE RESPONSIBLE FOR COMPLYING WITH LOCAL LAWS AND REGULATIONS. NO WARRANTY EXPRESSED OR IMPLIED.
User Data and comma Account
By default, openpilot uploads the driving data to our servers. You can also access your data through comma connect. We use your data to train better models and improve openpilot for everyone.
openpilot is open source software: the user is free to disable data collection if they wish to do so.
openpilot logs the road-facing cameras, CAN, GPS, IMU, magnetometer, thermal sensors, crashes, and operating system logs. The driver-facing camera is only logged if you explicitly opt-in in settings. The microphone is not recorded.
By using openpilot, you agree to our Privacy Policy. You understand that use of this software or its related services will generate certain types of user data, which may be logged and stored at the sole discretion of comma. By accepting this agreement, you grant an irrevocable, perpetual, worldwide right to comma for the use of this data.
Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.
Limbo
Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.
Features
- In-process OLTP database engine library
- Asynchronous I/O support on Linux with
io_uring
- SQLite compatibility (status)
- SQL dialect support
- File format support
- SQLite C API
- JavaScript/WebAssembly bindings (wip)
- Support for Linux, macOS, and Windows
Getting Started
CLI
Install limbo
with:
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/penberg/limbo/releases/latest/download/limbo-installer.sh | sh
Then use the SQL shell to create and query a database:
$ limbo database.db
Limbo v0.0.6
Enter ".help" for usage hints.
limbo> CREATE TABLE users (id INT PRIMARY KEY, username TEXT);
limbo> INSERT INTO users VALUES (1, 'alice');
limbo> INSERT INTO users VALUES (2, 'bob');
limbo> SELECT * FROM users;
1|alice
2|bob
JavaScript (wip)
Installation:
npm i limbo-wasm
Example usage:
import { Database } from 'limbo-wasm';
const db = new Database('sqlite.db');
const stmt = db.prepare('SELECT * FROM users');
const users = stmt.all();
console.log(users);
Python (wip)
pip install pylimbo
Example usage:
import limbo
con = limbo.connect("sqlite.db")
cur = con.cursor()
res = cur.execute("SELECT * FROM users")
print(res.fetchone())
Developing
Build and run limbo
cli:
cargo run --package limbo --bin limbo database.db
Run tests:
cargo test
Test coverage report:
cargo tarpaulin -o html
Run benchmarks:
cargo bench
Run benchmarks and generate flamegraphs:
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
cargo bench --bench benchmark -- --profile-time=5
FAQ
How is Limbo different from libSQL?
Limbo is a research project to build a SQLite compatible in-process database in Rust with native async support. The libSQL project, on the other hand, is an open source, open contribution fork of SQLite, with focus on production features such as replication, backups, encryption, and so on. There is no hard dependency between the two projects. Of course, if Limbo becomes widely successful, we might consider merging with libSQL, but that is something that will be decided in the future.
Publications
- Pekka Enberg, Sasu Tarkoma, Jon Crowcroft Ashwin Rao (2024). Serverless Runtime / Database Co-Design With Asynchronous I/O. In EdgeSys โ24. [PDF]
- Pekka Enberg, Sasu Tarkoma, and Ashwin Rao (2023). Towards Database and Serverless Runtime Co-Design. In CoNEXT-SW โ23. [PDF] [Slides]
Contributing
We'd love to have you contribute to Limbo! Check out the contribution guide to get started.
License
This project is licensed under the MIT license.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Limbo by you, shall be licensed as MIT, without any additional terms or conditions.
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
๐ Gemini Multimodal Live API Extension with RTC
Try Google Gemini Multimodal Live API with realtime vision and realtime screenshare detection capabilities, it is a ready-to-use extension, along with powerful tools like Weather Check and Web Search integrated perfectly into TEN Agent.
TEN Agent Usecases
Ready-to-use Extensions
TEN Agent Playground in Local Environment
Prerequisites
Category | Requirements |
---|---|
Keys | โข Agora App ID and App Certificate (free minutes every month) โข OpenAI API key โข Deepgram ASR (free credits available with signup) โข FishAudio TTS (free credits available with signup) |
Installation | โข Docker / Docker Compose โข Node.js(LTS) v18 |
Minimum System Requirements | โข CPU >= 2 Core โข RAM >= 4 GB |
macOS: Docker setting on Apple Silicon
For Apple Silicon Macs, uncheck "Use Rosetta for x86/amd64 emulation" in Docker settings. Note: This may result in slower build times on ARM, but performance will be normal when deployed to x64 servers.
Next step
1. Create .env
file
cp ./.env.example ./.env
2. Setup Agora App ID and App Certificate in .env
AGORA_APP_ID=
AGORA_APP_CERTIFICATE=
3. Start agent development containers
docker compose up -d
4. Enter container
docker exec -it ten_agent_dev bash
5. Build agent
task use
6. Start the web server
task run
7. Edit playground settings
Open the playground at localhost:3000 to configure your agent.
- Select a graph type (e.g. Voice Agent, Realtime Agent)
- Choose a corresponding module
- Select an extension and configure its API key settings
Running Gemini Realtime Extension
Open the playground at localhost:3000.
- Select voice_assistant_realtime graph
- Choose Gemini Realtime module
- Select v2v extension and enter Gemini API key
TEN Agent Components
Stay Tuned
Before we get started, be sure to star our repository and get instant notifications for all new releases!
Join Community
- Discord: Ideal for sharing your applications and engaging with the community.
- GitHub Discussion: Perfect for providing feedback and asking questions.
- GitHub Issues: Best for reporting bugs and proposing new features. Refer to our contribution guidelines for more details.
- X: Great for sharing your agents and interacting with the community.
Star History
Code Contributors
Contribution Guidelines
Contributions are welcome! Please read the contribution guidelines first.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
An open-source, cross-platform terminal for seamless workflows
Wave Terminal
Wave is an open-source terminal that combines traditional terminal features with graphical capabilities like file previews, web browsing, and AI assistance. It runs on MacOS, Linux, and Windows.
Modern development involves constantly switching between terminals and browsers - checking documentation, previewing files, monitoring systems, and using AI tools. Wave brings these graphical tools directly into the terminal, letting you control them from the command line. This means you can stay in your terminal workflow while still having access to the visual interfaces you need.
Key Features
- Flexible drag & drop interface to organize terminal blocks, editors, web browsers, and AI assistants
- Built-in editor for seamlessly editing remote files with syntax highlighting and modern editor features
- Rich file preview system for remote files (markdown, images, video, PDFs, CSVs, directories)
- Integrated AI chat with support for multiple models (OpenAI, Claude, Azure, Perplexity, Ollama)
- Command Blocks for isolating and monitoring individual commands with auto-close options
- One-click remote connections with full terminal and file system access
- Rich customization including tab themes, terminal styles, and background images
- Powerful
wsh
command system for managing your workspace from the CLI and sharing data between terminal sessions
Installation
Wave Terminal works on macOS, Linux, and Windows.
Platform-specific installation instructions can be found here.
You can also install Wave Terminal directly from: www.waveterm.dev/download.
Minimum requirements
Wave Terminal runs on the following platforms:
- macOS 11 or later (arm64, x64)
- Windows 10 1809 or later (x64)
- Linux based on glibc-2.28 or later (Debian 10, RHEL 8, Ubuntu 20.04, etc.) (arm64, x64)
The WSH helper runs on the following platforms:
- macOS 11 or later (arm64, x64)
- Windows 10 or later (arm64, x64)
- Linux Kernel 2.6.32 or later (x64), Linux Kernel 3.1 or later (arm64)
Links
- Homepage โ https://www.waveterm.dev
- Download Page โ https://www.waveterm.dev/download
- Documentation โ https://docs.waveterm.dev
- Legacy Documentation โ https://legacydocs.waveterm.dev
- Blog โ https://blog.waveterm.dev
- X โ https://x.com/wavetermdev
- Discord Community โ https://discord.gg/XfvZ334gwU
Building from Source
Contributing
Wave uses GitHub Issues for issue tracking.
Find more information in our Contributions Guide, which includes:
Activity
License
Wave Terminal is licensed under the Apache-2.0 License. For more information on our dependencies, see here.
๐ง Open source Spotify client that doesn't require Premium nor uses Electron! Available for both desktop & mobile!
An open source, cross-platform Spotify client compatible across multiple platforms
utilizing Spotify's data API and YouTube, Piped.video or JioSaavn as an audio source,
eliminating the need for Spotify Premium
Btw it's not just another Electron app ๐
๐ Features
- ๐ซ No ads, thanks to the use of public & free Spotify and YT Music APIsยน
- โฌ๏ธ Freely downloadable tracks
- ๐ฅ๏ธ ๐ฑ Cross-platform support
- ๐ชถ Small size & less data usage
- ๐ต๏ธ Anonymous/guest login
- ๐ Time synced lyrics
- โ No telemetry, diagnostics or user data collection
- ๐ Native performance
- ๐ Open source/libre software
- ๐ Playback control is done locally, not on the server
ยน It is still recommended to support creators by engaging with their YouTube channels/Spotify tracks (or preferably by buying their merch/concert tickets/physical media).
โ Unsupported features
- ๐ฃ๏ธ Spotify Shows & Podcasts: Shows and Podcasts will never be supported because the audio tracks are only available on Spotify and accessing them would require Spotify Premium.
- ๐ง Spotify Listen Along: Coming soon!
๐ โฌ๏ธ Installation guide
New versions usually release every 3-4 months.
This handy table lists all the methods you can use to install Spotube:
Platform | Package/Installation Method |
---|---|
Windows | |
MacOS | |
Android | |
iOS | *iPA file only. Requires sideloading with AltStore or similar tools. |
Flatpak |
|
AppImage | AppImage's lacking stability led to it's temporal removal. More information at https://github.com/KRTirtho/spotube/issues/1082 |
Debian/Ubuntu | Then run: |
Arch/Manjaro | With pamac: With yay: |
Fedora/OpenSuse | For Fedora: For OpenSuse: |
Linux (tarball) | |
Macos - Homebrew |
brew tap krtirtho/apps
brew install --cask spotube
|
Windows - Chocolatey |
|
Windows - Scoop |
|
Windows - WinGet |
|
๐ Nightly Builds
Grab the latest nightly builds of Spotube from the GitHub Releases.
๐ณ๏ธ Building from source
You can compile Spotube's source code by following these instructions.
๐ฅ The Spotube team
- Kingkor Roy Tirtho - The Founder, Maintainer and Lead Developer
- RaptaG - The GitHub Moderator and Community Manager
- Owen Connor - The Cool Discord Moderator
- Meenbeese - The Android Developer
- Piotr Rogowski - The MacOS Developer
- Rusty Apple - The Mysterious Unknown Guy
๐ผ License
Spotube is open source and licensed under the BSD-4-Clause License.
If you are concerned, you can read the reason of choosing this license.
[Click to show]
๐ Services/Package/Plugin Credits
[Click to show]
๐ Services/Package/Plugin CreditsServices
- Flutter - Flutter transforms the app development process. Build, test, and deploy beautiful mobile, web, desktop, and embedded apps from a single codebase
- Spotify API - The Spotify Web API is a RESTful API that provides access to Spotify data
- Piped - Piped is a privacy friendly alternative YouTube frontend, which is efficient and scalable by design.
- YouTube - YouTube is an American online video-sharing platform headquartered in San Bruno, California. Three former PayPal employeesโChad Hurley, Steve Chen, and Jawed Karimโcreated the service in February 2005
- JioSaavn - JioSaavn is an Indian online music streaming service and a digital distributor of Bollywood, English and other regional Indian music across the world. Since it was founded in 2007 as Saavn, the company has acquired rights to over 5 crore (50 million) music tracks in 15 languages
- SongLink - SongLink is a free smart link service that helps you share music with your audience. It's a one-stop-shop for creating smart links for music, podcasts, and other audio content
- LRCLib - A public synced lyric API
- Linux - Linux is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged in a Linux distribution
- AUR - AUR stands for Arch User Repository. It is a community-driven repository for Arch-based Linux distributions users
- Flatpak - Flatpak is a utility for software deployment and package management for Linux
- SponsorBlock - SponsorBlock is an open-source crowdsourced browser extension and open API for skipping sponsor segments in YouTube videos.
- Inno Setup - Inno Setup is a free installer for Windows programs by Jordan Russell and Martijn Laan
- F-Droid - F-Droid is an installable catalogue of FOSS (Free and Open Source Software) applications for the Android platform. The client makes it easy to browse, install, and keep track of updates on your device
- LastFM - Last.fm is a music streaming and discovery platform that helps users discover and share new music. It tracks users' music listening habits across many devices and platforms.
Dependencies
- app_links - Android App Links, Deep Links, iOs Universal Links and Custom URL schemes handler for Flutter (desktop included).
- args - Library for defining parsers for parsing raw command-line arguments into a set of options and values using GNU and POSIX style options.
- async - Utility functions and classes related to the 'dart:async' library.
- audio_service - Flutter plugin to play audio in the background while the screen is off.
- audio_service_mpris - audio_service platform interface supporting Media Player Remote Interfacing Specification.
- audio_session - Sets the iOS audio session category and Android audio attributes for your app, and manages your app's audio focus, mixing and ducking behaviour.
- auto_size_text - Flutter widget that automatically resizes text to fit perfectly within its bounds.
- bonsoir - A Zeroconf library that allows you to discover network services and to broadcast your own. Based on Apple Bonjour and Android NSD.
- buttons_tabbar - A Flutter package that implements a TabBar where each label is a toggle button.
- cached_network_image - Flutter library to load and cache network images. Can also be used with placeholder and error widgets.
- collection - Collections and utilities functions and classes related to collections.
- curved_navigation_bar - Stunning Animating Curved Shape Navigation Bar. Adjustable color, background color, animation curve, animation duration.
- device_info_plus - Flutter plugin providing detailed information about the device (make, model, etc.), and Android or iOS version the app is running on.
- dio - A powerful HTTP networking package,supports Interceptors,Aborting and canceling a request,Custom adapters, Transformers, etc.
- disable_battery_optimization - Flutter plugin to check and disable battery optimizations. Also shows custom steps to disable the optimizations in devices like mi, xiaomi, samsung, oppo, huawei, oneplus etc
- drift - Drift is a reactive library to store relational data in Dart and Flutter applications.
- duration - Utilities to make working with 'Duration's easier. Formats duration in human readable form and also parses duration in human readable form to Dart's Duration.
- encrypt - A set of high-level APIs over PointyCastle for two-way cryptography.
- envied - Explicitly reads environment variables into a dart file from a .env file for more security and faster start up times.
- file_picker - A package that allows you to use a native file explorer to pick single or multiple absolute file paths, with extension filtering support.
- file_selector - Flutter plugin for opening and saving files, or selecting directories, using native file selection UI.
- fluentui_system_icons - Fluent UI System Icons are a collection of familiar, friendly and modern icons from Microsoft.
- flutter_broadcasts - A plugin for sending and receiving broadcasts with Android intents and iOS notifications.
- flutter_cache_manager - Generic cache manager for flutter. Saves web files on the storages of the device and saves the cache info using sqflite.
- flutter_discord_rpc - Discord RPC support for Flutter desktop platforms
- flutter_displaymode - A Flutter plugin to set display mode (resolution, refresh rate) on Android platform. Allows to enable high refresh rate on supported devices.
- flutter_feather_icons - Feather is a collection of simply beautiful open source icons. Each icon is designed on a 24x24 grid with an emphasis on simplicity, consistency and usability.
- flutter_hooks - A flutter implementation of React hooks. It adds a new kind of widget with enhanced code reuse.
- flutter_inappwebview - A Flutter plugin that allows you to add an inline webview, to use an headless webview, and to open an in-app browser window.
- flutter_native_splash - Customize Flutter's default white native splash screen with background color and splash image. Supports dark mode, full screen, and more.
- flutter_riverpod - A reactive caching and data-binding framework. Riverpod makes working with asynchronous code a breeze.
- flutter_secure_storage - Flutter Secure Storage provides API to store data in secure storage. Keychain is used in iOS, KeyStore based solution is used in Android.
- flutter_sharing_intent - A flutter plugin that allow flutter apps to receive photos, videos, text, urls or any other file types from another app.
- flutter_svg - An SVG rendering and widget library for Flutter, which allows painting and displaying Scalable Vector Graphics 1.1 files.
- form_validator - Simplest form validation library for flutter's form field widgets
- freezed_annotation - Annotations for the freezed code-generator. This package does nothing without freezed too.
- fuzzywuzzy - An implementation of the popular fuzzywuzzy package in Dart, to suit all your fuzzy string matching/searching needs!
- gap - Flutter widgets for easily adding gaps inside Flex widgets such as Columns and Rows or scrolling views.
- go_router - A declarative router for Flutter based on Navigation 2 supporting deep linking, data-driven routes and more
- google_fonts - A Flutter package to use fonts from fonts.google.com. Supports HTTP fetching, caching, and asset bundling.
- hive - Lightweight and blazing fast key-value database written in pure Dart. Strongly encrypted using AES-256.
- hive_flutter - Extension for Hive. Makes it easier to use Hive in Flutter apps.
- hooks_riverpod - A reactive caching and data-binding framework. Riverpod makes working with asynchronous code a breeze.
- html - APIs for parsing and manipulating HTML content outside the browser.
- html_unescape - A small library for un-escaping HTML. Supports all Named Character References, Decimal Character References and Hexadecimal Character References.
- http - A composable, multi-platform, Future-based API for HTTP requests.
- image_picker - Flutter plugin for selecting images from the Android and iOS image library, and taking new pictures with the camera.
- intl - Contains code to deal with internationalized/localized messages, date and number formatting and parsing, bi-directional text, and other internationalization issues.
- invidious - Invidious API client for Dart and Flutter.
- jiosaavn - Unofficial API client for jiosaavn.com
- json_annotation - Classes and helper functions that support JSON code generation via the
json_serializable
package. - local_notifier - This plugin allows Flutter desktop apps to displaying local notifications.
- logger - Small, easy to use and extensible logger which prints beautiful logs.
- lrc - A Dart-only package that creates, parses, and handles LRC, which is a format that stores song lyrics.
- media_kit - A cross-platform video player & audio player for Flutter & Dart. Performant, stable, feature-proof & modular.
- media_kit_libs_audio - package:media_kit audio (only) playback native libraries for all platforms.
- metadata_god - Plugin for retrieving and writing audio tags/metadata from audio files
- mime - Utilities for handling media (MIME) types, including determining a type from a file extension and file contents.
- open_file - A plug-in that can call native APP to open files with string result in flutter, support iOS(UTI) / android(intent) / PC(ffi) / web(dart:html)
- package_info_plus - Flutter plugin for querying information about the application package, such as CFBundleVersion on iOS or versionCode on Android.
- palette_generator - Flutter package for generating palette colors from a source image.
- path - A string-based path manipulation library. All of the path operations you know and love, with solid support for Windows, POSIX (Linux and Mac OS X), and the web.
- path_provider - Flutter plugin for getting commonly used locations on host platform file systems, such as the temp and app data directories.
- permission_handler - Permission plugin for Flutter. This plugin provides a cross-platform (iOS, Android) API to request and check permissions.
- piped_client - API Client for piped.video
- popover - A popover is a transient view that appears above other content onscreen when you tap a control or in an area.
- riverpod - A reactive caching and data-binding framework. Riverpod makes working with asynchronous code a breeze.
- scroll_to_index - Scroll to a specific child of any scrollable widget in Flutter
- shared_preferences - Flutter plugin for reading and writing simple key-value pairs. Wraps NSUserDefaults on iOS and SharedPreferences on Android.
- shelf - A model for web server middleware that encourages composition and easy reuse.
- shelf_router - A convenient request router for the shelf web-framework, with support for URL-parameters, nested routers and routers generated from source annotations.
- shelf_web_socket - A shelf handler that wires up a listener for every connection.
- sidebarx - flutter multiplatform navigation sidebar / side navigationbar / drawer widget
- simple_icons - The Simple Icon pack available as Flutter Icons. Provides over 1500 Free SVG icons for popular brands.
- skeletonizer - Converts already built widgets into skeleton loaders with no extra effort.
- sliver_tools - A set of useful sliver tools that are missing from the flutter framework
- smtc_windows - Windows
SystemMediaTransportControls
implementation for Flutter giving access to Windows OS Media Control applet. - spotify - An incomplete dart library for interfacing with the Spotify Web API.
- sqlite3 - Provides lightweight yet convenient bindings to SQLite by using dart:ffi
- sqlite3_flutter_libs - Flutter plugin to include native sqlite3 libraries with your app
- stroke_text - A Simple Flutter plugin for applying stroke (border) style to a text widget
- system_theme - A plugin to get the current system theme info. Supports Android, Web, Windows, Linux and macOS
- test - A full featured library for writing and running Dart tests across platforms.
- timezone - Time zone database and time zone aware DateTime.
- titlebar_buttons - A package which provides most of the titlebar buttons from windows, linux and macos.
- tray_manager - This plugin allows Flutter desktop apps to defines system tray.
- url_launcher - Flutter plugin for launching a URL. Supports web, phone, SMS, and email schemes.
- uuid - RFC4122 (v1, v4, v5, v6, v7, v8) UUID Generator and Parser for Dart
- version - Provides a simple class for parsing and comparing semantic versions as defined by http://semver.org/
- very_good_infinite_list - A library for easily displaying paginated data, created by Very Good Ventures. Great for activity feeds, news feeds, and more.
- visibility_detector - A widget that detects the visibility of its child and notifies a callback.
- web_socket_channel - StreamChannel wrappers for WebSockets. Provides a cross-platform WebSocketChannel API, a cross-platform implementation of that API that communicates over an underlying StreamChannel.
- wikipedia_api - Wikipedia API for dart and flutter
- win32_registry - A package that provides a friendly Dart API for accessing the Windows Registry.
- window_manager - This plugin allows Flutter desktop apps to resizing and repositioning the window.
- youtube_explode_dart - A port in dart of the youtube explode library. Supports several API functions without the need of Youtube API Key.
- build_runner - A build system for Dart code generation and modular compilation.
- crypto - Implementations of SHA, MD5, and HMAC cryptographic functions.
- envied_generator - Generator for the Envied package. See https://pub.dev/packages/envied.
- flutter_gen_runner - The Flutter code generator for your assets, fonts, colors, โฆ โ Get rid of all String-based APIs.
- flutter_launcher_icons - A package which simplifies the task of updating your Flutter app's launcher icon.
- flutter_lints - Recommended lints for Flutter apps, packages, and plugins to encourage good coding practices.
- hive_generator - Extension for Hive. Automatically generates TypeAdapters to store any class.
- json_serializable - Automatically generate code for converting to and from JSON by annotating Dart classes.
- freezed - Code generation for immutable classes that has a simple syntax/API without compromising on the features.
- custom_lint - Lint rules are a powerful way to improve the maintainability of a project. Custom Lint allows package authors and developers to easily write custom lint rules.
- riverpod_lint - Riverpod_lint is a developer tool for users of Riverpod, designed to help stop common issues and simplify repetitive tasks.
- process_run - Process run helpers for Linux/Win/Mac and which like feature for finding executables.
- pubspec_parse - Simple package for parsing pubspec.yaml files with a type-safe API and rich error reporting.
- pub_api_client - An API Client for Pub to interact with public package information.
- xml - A lightweight library for parsing, traversing, querying, transforming and building XML documents.
- io - Utilities for the Dart VM Runtime including support for ANSI colors, file copying, and standard exit code values.
- drift_dev - Dev-dependency for users of drift. Contains the generator and development tools.
- desktop_webview_window - Show a webview window on your flutter desktop application.
- draggable_scrollbar - A scrollbar that can be dragged for quickly navigation through a vertical list. Additional option is showing label next to scrollthumb with information about current item.
- scrobblenaut - A deadly simple LastFM API Wrapper for Dart. So deadly simple that it's gonna hit the mark.
ยฉ Copyright Spotube 2024
PDF scientific paper translation with preserved formats - ๅบไบ AI ๅฎๆดไฟ็ๆ็็ PDF ๆๆกฃๅ จๆๅ่ฏญ็ฟป่ฏ๏ผๆฏๆ Google/DeepL/Ollama/OpenAI ็ญๆๅก๏ผๆไพ CLI/GUI/Docker
PDF scientific paper translation and bilingual comparison.
- ๐ Preserve formulas, charts, table of contents, and annotations (preview).
- ๐ Support multiple languages, and diverse translation services.
- ๐ค Provides commandline tool, interactive user interface, and Docker
Feel free to provide feedback in GitHub Issues, Telegram Group or QQ Group.
Updates
- [Dec. 19 2024] Non-PDF/A documents are now supported using
-cp
(by @reycn) - [Dec. 13 2024] Additional support for backend by (by @YadominJinta)
- [Dec. 10 2024] The translator now supports OpenAI models on Azure (by @yidasanqian)
Preview
Online Service ๐
You can try our application out using either of the following demos:
- Public free service online without installation (recommended).
- Demo hosted on HuggingFace
- Demo hosted on ModelScope without installation.
Note that the computing resources of the demo are limited, so please avoid abusing them.
Installation and Usage
Methods
For different use cases, we provide four distinct methods to use our program:
1. Commandline
-
Python installed (3.8 <= version <= 3.12)
-
Install our package:
pip install pdf2zh
-
Execute translation, files generated in current working directory:
pdf2zh document.pdf
2. Portable (w/o Python installed)
-
Download setup.bat
-
Double-click to run.
3. Graphic user interface
1. Python installed (3.8 <= version <= 3.12) 2. Install our package:pip install pdf2zh
-
Start using in browser:
pdf2zh -i
-
If your browswer has not been started automatically, goto
http://localhost:7860/
See documentation for GUI for more details.
4. Docker
-
Pull and run:
docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh
-
Open in browser:
http://localhost:7860/
For docker deployment on cloud service:
Unable to install?
The present program needs an AI model(wybxc/DocLayout-YOLO-DocStructBench-onnx
) before working and some users are not able to download due to network issues. If you have a problem with downloading this model, we provide a workaround using the following environment variable:
set HF_ENDPOINT=https://hf-mirror.com
If the solution does not work to you / you encountered other issues, please refer to frequently asked questions.
Advanced Options
Execute the translation command in the command line to generate the translated document example-mono.pdf
and the bilingual document example-dual.pdf
in the current working directory. Use Google as the default translation service.
In the following table, we list all advanced options for reference:
Option | Function | Example |
---|---|---|
files | Local files | pdf2zh ~/local.pdf |
links | Online files | pdf2zh http://arxiv.org/paper.pdf |
-i |
Enter GUI | pdf2zh -i |
-p |
Partial document translation | pdf2zh example.pdf -p 1 |
-li |
Source language | pdf2zh example.pdf -li en |
-lo |
Target language | pdf2zh example.pdf -lo zh |
-s |
Translation service | pdf2zh example.pdf -s deepl |
-t |
Multi-threads | pdf2zh example.pdf -t 1 |
-o |
Output dir | pdf2zh example.pdf -o output |
-f , -c |
Exceptions | pdf2zh example.pdf -f "(MS.*)" |
-cp |
Compatibility Mode | pdf2zh example.pdf --compatible |
--share |
Public link | pdf2zh -i --share |
--authorized |
Authorization | pdf2zh -i --authorized users.txt [auth.html] |
--prompt |
Custom Prompt | pdf2zh --prompt [prompt.txt] |
For detailed explanations, please refer to our document about Advanced Usage for a full list of each option.
Secondary Development (APIs)
For downstream applications, please refer to our document about API Details for futher information about:
- Python API, how to use the program in other Python programs
- HTTP API, how to communicate with a server with the program installed
TODOs
-
Parse layout with DocLayNet based models, PaddleX, PaperMage, SAM2
-
Fix page rotation, table of contents, format of lists
-
Fix pixel formula in old papers
-
Async retry except KeyboardInterrupt
-
KnuthโPlass algorithm for western languages
-
Support non-PDF/A files
Acknowledgements
-
Document merging: PyMuPDF
-
Document parsing: Pdfminer.six
-
Document extraction: MinerU
-
Document Preview: Gradio PDF
-
Multi-threaded translation: MathTranslate
-
Layout parsing: DocLayout-YOLO
-
Document standard: PDF Explained, PDF Cheat Sheets
-
Multilingual Font: Go Noto Universal
Contributors
Star History
From RAG chatbots to code assistants to complex agentic pipelines and beyond, build LLM systems that run better, faster, and cheaper with tracing, evaluations, and dashboards.
Open source LLM evaluation framework
From RAG chatbots to code assistants to complex agentic pipelines and beyond, build LLM systems that run better, faster, and cheaper with tracing, evaluations, and dashboards.
Website โข Slack community โข Twitter โข Documentation
๐ What is Opik?
Opik is an open-source platform for evaluating, testing and monitoring LLM applications. Built by Comet.
You can use Opik for:
-
Development:
-
Tracing: Track all LLM calls and traces during development and production (Quickstart, Integrations
-
Annotations: Annotate your LLM calls by logging feedback scores using the Python SDK or the UI.
-
Playground:: Try out different prompts and models in the prompt playground
-
-
Evaluation: Automate the evaluation process of your LLM application:
-
Datasets and Experiments: Store test cases and run experiments (Datasets, Evaluate your LLM Application)
-
LLM as a judge metrics: Use Opik's LLM as a judge metric for complex issues like hallucination detection, moderation and RAG evaluation (Answer Relevance, Context Precision
-
CI/CD integration: Run evaluations as part of your CI/CD pipeline using our PyTest integration
-
-
Production Monitoring:
-
Log all your production traces: Opik has been designed to support high volumes of traces, making it easy to monitor your production applications.
-
Monitoring dashboards: Review your feedback scores, trace count and tokens over time in the Opik Dashboard.
-
[!TIP]
If you are looking for features that Opik doesn't have today, please raise a new Feature request ๐
๐ ๏ธ Installation
Opik is available as a fully open source local installation or using Comet.com as a hosted solution. The easiest way to get started with Opik is by creating a free Comet account at comet.com.
If you'd like to self-host Opik, you can do so by cloning the repository and starting the platform using Docker Compose:
# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git
# Navigate to the opik/deployment/docker-compose directory
cd opik/deployment/docker-compose
# Start the Opik platform
docker compose up --detach
# You can now visit http://localhost:5173 on your browser!
For more information about the different deployment options, please see our deployment guides:
Installation methods | Docs link |
---|---|
Local instance | |
Kubernetes |
๐ Get Started
To get started, you will need to first install the Python SDK:
pip install opik
Once the SDK is installed, you can configure it by running the opik configure
command:
opik configure
This will allow you to configure Opik locally by setting the correct local server address or if you're using the Cloud platform by setting the API Key
[!TIP]
You can also call theopik.configure(use_local=True)
method from your Python code to configure the SDK to run on the local installation.
You are now ready to start logging traces using the Python SDK.
๐ Logging Traces
The easiest way to get started is to use one of our integrations. Opik supports:
Integration | Description | Documentation | Try in Colab |
---|---|---|---|
OpenAI | Log traces for all OpenAI LLM calls | Documentation | |
LiteLLM | Call any LLM model using the OpenAI format | Documentation | |
LangChain | Log traces for all LangChain LLM calls | Documentation | |
Haystack | Log traces for all Haystack calls | Documentation | |
Bedrock | Log traces for all Bedrock LLM calls | Documentation | |
Anthropic | Log traces for all Anthropic LLM calls | Documentation | |
Gemini | Log traces for all Gemini LLM calls | Documentation | |
Groq | Log traces for all Groq LLM calls | Documentation | |
LangGraph | Log traces for all LangGraph executions | Documentation | |
LlamaIndex | Log traces for all LlamaIndex LLM calls | Documentation | |
Ollama | Log traces for all Ollama LLM calls | Documentation | |
Predibase | Fine-tune and serve open-source Large Language Models | Documentation | |
Ragas | Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines | Documentation | |
watsonx | Log traces for all watsonx LLM calls | Documentation |
[!TIP]
If the framework you are using is not listed above, feel free to open an issue or submit a PR with the integration.
If you are not using any of the frameworks above, you can also use the track
function decorator to log traces:
import opik
opik.configure(use_local=True) # Run locally
@opik.track
def my_llm_function(user_question: str) -> str:
# Your LLM code here
return "Hello"
[!TIP]
The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls.
๐งโโ๏ธ LLM as a Judge metrics
The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the metrics documentation.
To use them, simply import the relevant metric and use the score
function:
from opik.evaluation.metrics import Hallucination
metric = Hallucination()
score = metric.score(
input="What is the capital of France?",
output="Paris",
context=["France is a country in Europe."]
)
print(score)
Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the metrics documentation.
๐ Evaluating your LLM Application
Opik allows you to evaluate your LLM application during development through Datasets and Experiments.
You can also run evaluations as part of your CI/CD pipeline using our PyTest integration.
๐ค Contributing
There are many ways to contribute to Opik:
- Submit bug reports and feature requests
- Review the documentation and submit Pull Requests to improve it
- Speaking or writing about Opik and letting us know
- Upvoting popular feature requests to show your support
To learn more about how to contribute to Opik, please see our contributing guidelines.