VIEW SPEECH SUMMARY
- Acknowledges fears around AI replacing jobs and the lack of AI literacy.
- Highlights generative AI's tendency to cater answers to user beliefs.
- Notes the lack of substantial Polish data (<1%) used in training global models like GPT, motivating the creation of a Polish-focused dataset.
Speaklish and Bielik Project Overview
- Speaklish started before ChatGPT, aiming to prepare for the AI revolution.
- Developed a Polish data package to improve training quality and transparency.
- Aim to build a collaborative open-source movement involving academia and business.
- Supported by Krakow’s Supercomputer Center via grant funding.
- Timeline includes data package release in 2023, Azurro model (APT3) in Jan 2024, Bielik Video v2.0 between April-Sept 2024, and upcoming v2.5 on June 6.
Key Project Features and Principles
- Open source and open science driven.
- Focus on small, energy-efficient language models optimized for Polish data.
- Compliance with regulations such as the AI Act.
- Emphasis on community contributions (approx. 3,000 Discord members, 100 active contributors).
- Models support multilingual capabilities beyond just Polish.
Community Engagement and Ecosystem Building
- Verified implementations across business sectors and public administration.
- Collaborations with educational institutions for piloting model use.
- Promotion of AI literacy by sharing knowledge, papers, and organizing workshops.
- Challenges identified include communication channels (Discord) and insufficient info exchange between communities, academia, and government.
Expansion and International Endeavors
- Efforts to build international relationships and learn from projects like the Scandinavian Viking model.
- Desire to maintain Polish independence in AI development and offer an alternative AI model.
New Initiatives: Citizen Data Sets
- Initiative to categorize and describe large sets of public media data, such as photos, creating a crowdsourced “Pinterest” system.
- Aims to preserve cultural heritage and improve accessibility.
- Supports the roadmap towards BLG multimodal multilingual models.
Roadmap and Future Plans
- Continuing development of BLG V3 model.
- Ongoing cooperation with Warsaw University for R&D.
- Business usability evaluations underway.
- Calls for community participation, testing, implementation, and support.
Actionable Items / Tasks
- Invitation for specialists, developers, and enthusiasts to join the Speaklish community and contribute.
- Test and implement the Bielik models if infrastructure and pipelines exist.
- Participate in data annotation (Citizen Data Sets initiative).
- Share knowledge, feedback, and ethical considerations to shape the ecosystem.
- Promote AI literacy and assist with adoption across sectors in Poland.
From Partners to Passengers? Keeping Humans Central in the AI Revolution
12:40 - 13:10, 28th of May (Wednesday) 2025 / DEV ARCHITECTURE STAGE
As AI systems grow more autonomous, community-driven projects like SpeakLeash show we can shape technology on our terms. These open-source initiatives—developed using public supercomputers and volunteer networks—prove we don't need tech giants to advance AI that respects our language and culture.
Three key insights:
- Open communities create alternatives to replacement narratives. Projects like LLMBielik (whole family of models, and other projects) demonstrate how volunteers can build systems that augment rather than replace human capabilities, preserving cultural context while advancing technology.
- AI literacy transforms fear into collaboration. When teams understand AI's capabilities and limitations, they identify opportunities rather than threats. This literacy—teaching critical evaluation and ethical awareness—lets humans remain the decision-makers while leveraging AI's efficiency.
- Public sector success benefits everyone. Government AI implementations show how automation can enhance services while maintaining human oversight, creating models where technology serves citizens rather than corporations.