Welcome to House

Technology around us is constantly evolving and compelling us to think about how we live and will live, how society will change and to what extent it will be affected. For the better or the worse? It is difficult to give a clear answer. However, even art forms such as cinema can give us food for thought on society and ourselves, as well as some psychological reasoning. All this to try to better understand ourselves, the world around us, and where we are headed.

The House blog tries to do all of that.

Latest posts

Figure’s Helix AI brain helps robots observe and learn tasks in their environment

March 4, 2025“System 1, System 2” architecture could bring versatile robot assistants from warehouses to homes sooner than expected Humanoid Robot Race Accelerates as Figure Unveils Revolutionary Helix AI System Figure is setting a blistering pace in the humanoid robotics industry, outstripping competitors with breakthrough technological advancements that could fundamentally transform robotics applications. According to this article, CEO Brett Adcock attributes the company’s accelerated timeline to swift progress in its Helix AI system, a first-of-its-kind generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control. Unlike previous robotics systems, Helix operates through a novel “System 1, System 2” architecture: System 2: An onboard internet-pretrained VLM operating at 7-9 Hz for scene understanding and language comprehension System 1: A fast reactive visuomotor policy that translates semantic representations into precise continuous robot actions at 200 Hz This groundbreaking approach enables Figure robots to control their entire upper body—including wrists, torso, head, and individual fingers—while understanding natural language commands and adapting to new objects they’ve never encountered before. In a significant update, Figure revealed that Helix has been enhanced to deliver faster and more precise movements, allowing its robots to handle packages and perform sorting tasks—capabilities essential for warehouse operations. This rapid commercialization is possible because Helix runs entirely on embedded low-power GPUs, making it immediately ready for deployment. Most impressively, Figure has demonstrated that its robots can now pick up virtually any small household object and even collaborate with each other to complete complex tasks like grocery storage—all using a single set of neural network weights without any task-specific fine-tuning. This represents a fundamental shift in how robots learn and adapt to new environments. The humanoid robot market is becoming increasingly competitive. OpenAI-backed 1X recently demonstrated its Neo Gamma robot performing household duties including cleaning, food service, and grocery carrying. Though 1X claims its deployment timeline is “close,” the company has yet to announce specific dates for commercial availability. Meanwhile, Apptronik partnered with Google DeepMind last December to integrate advanced AI into its robots and announced yesterday that its Apollo robot will be employed in a unique application: manufacturing copies of itself on assembly lines. While Apptronik also envisions home applications for its technology, the company focuses on mastering industrial applications before targeting the consumer market. As these companies race to bring humanoid robots to market, Figure’s Helix AI represents a potential breakthrough in scaling robotic applications from industrial settings to the unpredictable environments of homes—potentially shrinking the timeline for having a robot assistant from years to just months. The most profound potential of Figure’s Helix AI lies not in industrial efficiency, but in its capacity to provide companionship and assistance to those most vulnerable in our society. For elderly individuals living alone, people with disabilities, and those with limited mobility, these advanced humanoid robots could represent more than just a technological marvel—they could be a lifeline. Imagine a robot capable of understanding nuanced commands, performing delicate tasks like retrieving medication, preparing meals, or helping with personal care. For someone with limited mobility, such a robot could restore a sense of independence and dignity. The ability to comprehend natural language and adapt to unique household environments means these robots could provide personalized assistance far beyond current caregiving technologies. Moreover, for individuals with conditions like dementia or severe physical disabilities, a robot that can understand emotional context and respond sensitively could offer crucial support. The multi-robot collaboration demonstrated by Helix suggests future possibilities of coordinated care, where multiple robots might work together to assist a single individual. However, this transformative potential is accompanied by profound ethical risks. The same technological capabilities that make these robots excellent caregivers could make them dangerous if misused. In the wrong hands—whether by military organizations developing autonomous weapons or criminal networks exploring new forms of technological exploitation—these robots’ advanced perception and action capabilities become deeply concerning. The ability to understand complex language, manipulate objects with precision, and operate autonomously are precisely the characteristics that could make such robots potential tools for surveillance, unauthorized data collection, or even physical harm. International regulations and robust ethical frameworks will be crucial in ensuring that this technology serves humanity’s best interests. As we stand on the cusp of this robotic revolution, the ultimate impact of humanoid robots like those developed by Figure will depend not just on technological advancement, but on our collective commitment to using these remarkable machines as instruments of care, compassion, and human empowerment. [...]

Microsoft’s Majorana-1 processor: the quantum revolution

February 25, 2025How the world’s first topoconductor could transform computing within a decade Microsoft’s quantum chip launch announcement has sent shockwaves across the tech world, changing our perception of quantum computing and the state of matters as we know them. The Majorana-1 processor, dubbed the “world’s first topoconductor,” promises to bring quantum computing capabilities within reach in “years, not decades.” While the full-scale impact remains, this breakthrough could revolutionize computing power, enabling complex calculations that are currently impossible with traditional systems. Understanding quantum computing Semiconductors power our modern tech gadgets, from the Snapdragon processors in smartphones to Apple’s custom chips. These components provide the computing power for entire systems to operate. Microsoft’s quantum chip takes this concept further, promising to condense computing potential that would currently take decades into systems that can perform calculations much faster. Traditional computers use bits (1s and 0s) to process information, but quantum computers use qubits. The key difference is that qubits can exist in superposition, with 1s and 0s simultaneously present, which dramatically increases parallel processing capabilities and allows quantum systems to scale exponentially. To grasp the significance of Microsoft’s breakthrough, imagine the most powerful supercomputer you’ve seen and multiply its processing potential several times over. Despite efforts by IBM, Google, and other organizations to advance quantum computing, the physical technology needed has been elusive due to the inherent challenges of working with qubits. While extremely powerful, qubits are notoriously fragile and unstable, making them difficult to consolidate and operate without errors due to environmental noise and interference. The Majorana particle breakthrough According to this article, this is where Majorana particles enter the picture. These hypothetical “quasiparticles” that act as their own antiparticles were first described by theoretical physicist Ettore Majorana in 1937. Conceptually, they offer more stability for quantum computing and are resilient to noise, but they don’t naturally exist in a usable form. The difficulty of creating these particles explains why other quantum computing approaches have focused on different qubit implementations. Microsoft’s innovation focuses on “topological qubits” that observe and control Majorana particles to produce more reliable and scalable qubits. In physics, the topological state of matter refers to a phase where deformities don’t affect the properties of the matter. Qubits encoded in the topological properties of Majorana particles could fundamentally transform how we store and process information. What distinguishes Microsoft’s approach from competitors is their claim to have both created these particles and measured them with incredible precision—”the difference between one billion and one billion and one electrons in a superconducting wire.” This precision is crucial as it tells the computer what state the qubit is in, forming the foundation for quantum computation. The company has established a system of digital control over quantum computing, potentially enabling more reliable system building. Technical implementation Microsoft’s quantum chip is the result of over 17 years of research and development. While not yet market-ready, it represents a significant step toward practical quantum computing. The company aims to fit one million qubits on a single palm-sized chip—a density that would revolutionize computing technology. The technical implementation involves new material stacks utilizing eight topological qubits that combine semiconductors and superconductors, including indium arsenide and aluminum. These materials were constructed atom by atom to create Majorana particles. The topological qubit architecture uses aluminum wires attached in an “H” form, with each H containing four controllable Majoranas to make one qubit. These structures can be connected to create a tile of links. These processors require controlled setups and cooling systems to maintain proper operating conditions, which presents challenges for scalability and commercial availability. The quantum computing work starts from first principles, ensuring the very structure and composition of the chips can handle their intended function. Microsoft’s announcement was accompanied by a peer-reviewed research paper in Nature detailing the research and material generation process. Future timeline and applications While Microsoft isn’t the first to explore topological superconductors, their achievement in transforming theoretical knowledge into a solid, replicable form represents a significant advance. The technology won’t be immediately available through the Azure public cloud, as the focus remains on research and improvements toward million-qubit machines. Microsoft’s Azure data centers will likely be among the first beneficiaries of this technology, but significant implementation isn’t expected until 2027-2029 at the earliest. Meanwhile, Microsoft continues to participate in DARPA’s Underexplored Systems for Utility-Scale Quantum Computing (US2QC) program to further develop its systems. The potential applications are vast. Quantum computing will play a key role in advancing emerging technologies, including AI, and will impact research potential across industries. Specific applications include: Developing life-saving drugs through complex molecular simulations Accelerating healthcare research by modeling biological systems Creating new materials to address environmental challenges Solving previously intractable chemistry problems Optimizing supply chains and logistics at unprecedented scales Breaking current encryption while developing quantum-resistant security The scale of calculations possible today is limited by available technology. As quantum systems expand, so will our analytical capabilities. Quantum computers may even help design their successors by offering insights into future iterations—a prospect both thrilling and somewhat unsettling. Implications and limitations Will Microsoft’s quantum breakthrough spell the end of traditional computing? Not for decades to come. Even if the Majorana-1 processor is successfully scaled up, it will remain an expensive and impractical investment for small businesses when the technology becomes commercially available. Average consumers won’t need to worry about their PCs and laptops becoming obsolete anytime soon. These traditional systems will continue to handle everyday computing needs effectively. The impact of quantum computing will primarily be felt through the results of high-profile research and industrial applications conducted by major companies and research institutions. Many experts remain cautious or skeptical about Microsoft’s claims. While the technology and research are impressive, there’s insufficient proof that it will scale to the degree Microsoft suggests. Technological advances often come with some hype, and there’s always the risk of overstating actual achievements. Challenges and considerations Before we see groundbreaking results from quantum systems, they will likely serve commercial purposes that further business interests rather than broader social causes. As with AI, quantum computing presents the potential for erroneous or misleading data that we’re not fully prepared to understand or address. Developing fault-tolerant quantum computing must remain the primary goal. Similar to AI development, appropriate safeguards and infrastructure are needed to prepare for the changes quantum computing will bring. Long-term viability and the economic and environmental impacts of this technology require careful consideration to ensure responsible implementation. The impressive aspect of Microsoft’s quantum chip is the sheer scale of research and scientific innovation behind it. However, we shouldn’t expect dramatic changes overnight. As with all technological progress, meaningful implementation will take time. The potential advantages of mature quantum computing technology are staggering. Beyond the computational speed improvements, quantum systems could unlock solutions to problems we currently consider unsolvable. Climate modeling could become precise enough to guide targeted interventions against global warming. Materials science could advance rapidly, leading to superconductors that work at room temperature or batteries with exponentially greater capacity. In healthcare, quantum computing could revolutionize drug discovery by accurately simulating molecular interactions, potentially cutting development time from years to days. Complex diseases might be understood at a fundamental level, leading to treatments for conditions that have long eluded medical science. Financial systems could benefit from superior optimization algorithms, potentially creating more efficient markets and economic stability. Meanwhile, artificial intelligence could experience a quantum leap of its own, with reasoning capabilities far beyond current limitations. However, these same capabilities bring significant risks. Quantum computers will eventually be able to break most current encryption methods, potentially compromising the security infrastructure that underpins our digital society. While quantum-resistant cryptographic methods are being developed, the transition period presents a vulnerable window. The economic disruption could be profound, with entire industries based on computational limitations suddenly rendered obsolete. The concentration of quantum computing power in the hands of a few corporations or nations could create unprecedented power imbalances, raising serious geopolitical concerns. There are also environmental considerations. Quantum systems currently require extreme cooling and controlled environments, which demand significant energy resources. Without sustainable approaches, widespread quantum adoption could have substantial environmental impacts. [...]

AI ‘brain decoder’ to read thoughts

February 18, 2025It can convert thoughts to text without extensive training Scientists have created enhanced versions of a “brain decoder” that employs artificial intelligence to transform thoughts into text. According to this article, their new converter algorithm can rapidly train an existing decoder on another person’s brain, as reported in their new study. These findings could eventually assist people with aphasia, a brain disorder that impacts a person’s ability to communicate, according to the scientists. A brain decoder uses machine learning to convert a person’s thoughts into text, based on their brain’s responses to stories they’ve listened to. The problem with past iterations of the decoder was that they required participants to listen to stories inside an MRI machine for many hours, and these decoders worked only for the individuals they were trained on. “People with aphasia oftentimes have some trouble understanding language as well as producing language,” said study co-author Alexander Huth, a computational neuroscientist at the University of Texas at Austin (UT Austin). “So if that’s the case, then we might not be able to build models for their brain at all by watching how their brain responds to stories they listen to.” In the new research, published in the journal Current Biology, Huth and co-author Jerry Tang, a graduate student at UT Austin, investigated how they might overcome this limitation. “In this study, we were asking, can we do things differently?” Huth said. “Can we essentially transfer a decoder that we built for one person’s brain to another person’s brain?” The researchers initially trained the brain decoder on a few reference participants using the long method — by collecting functional MRI data while the participants listened to 10 hours of radio stories. Then, they trained two converter algorithms on the reference participants and a different set of “goal” participants: one using data collected while the participants spent 70 minutes listening to radio stories, and the other while they spent 70 minutes watching silent Pixar short films unrelated to the radio stories. Using a technique called functional alignment, the team mapped out how the reference and goal participants’ brains responded to the same audio or film stories. Then they used that information to train the decoder to work with the goal participants’ brains, without needing to collect multiple hours of training data. The team then tested the decoders using a short story that none of the participants had heard before. Although the decoder’s predictions were slightly more accurate for the original reference participants than for the ones who used the converters, the words it predicted from each participant’s brain scans were still semantically related to those used in the test story. For example, a section of the test story included someone discussing a job they didn’t enjoy, saying “I’m a waitress at an ice cream parlor. So, um, that’s not… I don’t know where I want to be but I know it’s not that.” The decoder using the converter algorithm trained on film data predicted: “I was at a job I thought was boring. I had to take orders and I did not like them so I worked on them every day.” Not an exact match — the decoder doesn’t read out the exact sounds people heard, Huth explained — but the ideas are related. “The really surprising and cool thing was that we can do this even not using language data,” Huth told Live Science. “So we can have data that we collect just while somebody’s watching silent videos, and then we can use that to build this language decoder for their brain.” Using the video-based converters to transfer existing decoders to people with aphasia may help them express their thoughts, the researchers said. It also reveals some overlap between the ways humans represent ideas from language and from visual narratives in the brain. “This study suggests that there’s some semantic representation which does not care from which modality it comes,” Yukiyasu Kamitani, a computational neuroscientist at Kyoto University who was not involved in the study, told Live Science. In other words, it helps reveal how the brain represents certain concepts in the same way, even when they’re presented in different formats. The team’s next steps are to test the converter on participants with aphasia and “build an interface that would help them generate language that they want to generate,” Huth said. While this breakthrough in brain decoding technology holds promising applications for assisting those with communication disorders, experts caution that such advances also raise important ethical considerations. Privacy advocates and neuroethicists point out that as these decoders become more sophisticated, questions about mental privacy emerge. As brain decoders continue to advance, legislators and ethicists suggest the need for regulatory frameworks that both enable medical progress and protect individuals’ cognitive liberty. The scientific community, meanwhile, remains cautiously optimistic about harnessing this technology’s potential while mitigating risks through thoughtful implementation and oversight. [...]

The paradox of AI certainty

February 11, 2025When confidence misleads Imagine asking ChatGPT about the chemical composition of a newly discovered deep-sea organism. Without hesitation, it might provide a detailed analysis of molecular structures and biochemical pathways. The answer would be articulate, comprehensive, and entirely plausible. It might also be completely wrong. This scenario illustrates a crucial challenge of our AI age: the seductive power of artificial certainty. AI language models don’t just provide information—they deliver it with an authority that can make even speculation feel like fact. A human expert studying that deep-sea organism might say, “We’re still analyzing its composition,” or “The preliminary data suggests…” But AI rarely equivocates. Understanding the Google Effect and its evolution As explained here, the Google Effect, first documented by psychologists Betsy Sparrow, Jenny Liu, and Daniel Wegner in 2011, revealed how digital technology was already changing our relationship with knowledge. Their research showed that when people know they can look something up later, they have lower recall rates for the information itself but enhanced memory for where to find it. In other words, our brains began treating search engines as an external hard drive for our memories. But today’s AI represents a quantum leap beyond simple information storage and retrieval. The evolution from the original Google Effect to today’s AI-enhanced cognitive landscape represents a fundamental transformation in how we process and retain knowledge. The original Google Effect primarily concerned information storage, where people would remember where to find facts rather than the facts themselves. Think of someone remembering that Wikipedia contains information about World War II dates rather than memorizing the dates themselves. This represented a simple outsourcing of memory, a practical adaptation to the digital age. Today’s AI-enhanced Google Effect, however, goes far deeper. We’re now outsourcing not just memory but the very process of analysis and synthesis. Instead of remembering where to find information about World War II, we might ask AI to analyze historical patterns and draw conclusions about its causes, effectively delegating our thinking process itself. This evolution represents a fundamental shift in our relationship with knowledge—we’re no longer just outsourcing memory, we’re outsourcing understanding itself. The hidden cost of instant answers The impact of this cognitive delegation is already visible across various fields. In medical education, students increasingly turn to AI for diagnostic suggestions, potentially bypassing the crucial process of differential diagnosis that builds clinical reasoning skills. In academic research, scholars are using AI to summarize scientific papers, sometimes missing the nuanced uncertainties often expressed in the original text. Writers are turning to AI for plot solutions, circumventing the creative struggle that often leads to truly original ideas. Each of these cases represents not just a shortcut to knowledge, but a potential bypass of the valuable cognitive processes that uncertainty demands. The cognitive value of not knowing Uncertainty plays several crucial roles in human cognitive development and creative thinking. It serves as the primary driver of curiosity and sustained investigation, pushing us to dig deeper and explore further. When we encounter uncertainty, we’re forced to examine our assumptions and question our existing knowledge. This creates fertile ground for genuine innovation, as we work through problems without predetermined solutions. Perhaps most importantly, uncertainty cultivates intellectual humility, reminding us that knowledge is always incomplete and evolving. When we rush to eliminate uncertainty through AI-generated answers, we risk short-circuiting these essential cognitive processes. Balancing AI assistance with epistemic humility The path forward requires a nuanced approach to utilizing AI’s capabilities while preserving our cognitive development. There are indeed situations where embracing AI’s certainty is appropriate and beneficial. Routine information retrieval, fact-checking against established knowledge, initial research orientation, and time-sensitive decisions with clear parameters all benefit from AI’s rapid and precise responses. However, there are crucial areas where preserving uncertainty remains vital to human development and innovation. Complex problem solving requires grappling with ambiguity to develop robust solutions. Creative endeavors thrive on the tension between knowing and not knowing. Scientific research advances through careful navigation of uncertainty. Philosophical inquiry depends on questioning established certainties. Personal growth and learning require engaging with the unknown rather than merely receiving answers. The key is recognizing that while AI can be a powerful tool for accessing information, it shouldn’t replace the valuable cognitive work that uncertainty demands. True wisdom still begins with knowing what we don’t know. As we navigate this new era of artificial intelligence, perhaps our greatest challenge isn’t learning to use AI effectively, but learning to preserve the productive discomfort of uncertainty. The future belongs not to those who can access answers most quickly, but to those who can ask the most insightful questions—questions that may not have immediate answers, even from AI. The stakes in this challenge are far higher than mere intellectual development. When we consistently outsource our thinking to external systems, we risk atrophying our own capacity for reasoning—much like a muscle that weakens from disuse. This cognitive dependency creates a dangerous vulnerability: people who cannot think critically or reason independently become easier to manipulate and control. Consider the parallel with navigation apps: many of us have lost the ability to navigate without them, becoming helpless when technology fails. Now imagine this same dependency applied to our ability to reason, analyze, and make judgments. A population that habitually relies on AI for answers rather than developing their own understanding becomes intellectually blind, unable to distinguish truth from manipulation, unable to challenge flawed assumptions, and unable to identify when they’re being led astray. This vulnerability extends beyond individual cognitive decline. A society where people increasingly defer to AI for analysis and decision-making risks creating a perfect environment for manipulation—whether by those who control these technologies or by those who know how to exploit them. When people lose confidence in their own ability to reason, they become more susceptible to manufactured certainties and engineered consensus. The renaissance of uncertainty in the age of AI might seem paradoxical, but it could be exactly what we need. Just as the printing press didn’t eliminate the need for critical thinking—it amplified it—AI shouldn’t eliminate our comfort with uncertainty but rather highlight its importance. The most sophisticated use of AI might not be in getting answers but in helping us discover better questions. Socrates’ ancient wisdom about knowing nothing might be more relevant now than ever. In a world of instant answers, choosing uncertainty—choosing to say “I don’t know, let me think about it”—becomes not just an admission of ignorance, but an act of intellectual self-preservation. It’s in these moments of acknowledged uncertainty, of genuine cognitive effort, that we maintain our capacity for independent thought and protect ourselves against manipulation. [...]

OpenAI’s o3-mini and the future of software development

February 4, 2025From Snake games to self-learning systems The OpenAI o3-mini High model has come out, representing a fresh direction in autonomous AI and transforming our understanding of machine capabilities. This model can accomplish everything from creating full-fledged Snake games to developing AI agents that outperform human players, pushing technological boundaries in ways that are simultaneously thrilling and somewhat disconcerting. The dual nature of such powerful tools generates as many uncertainties as opportunities. The common struggles with code debugging and the intricacies of machine learning make the o3-mini’s capabilities seem almost too good to be true. What we’re seeing is not merely a simplification of these tasks but their evolution into smarter, more efficient, and highly adaptable processes. What the o3-mini High model represents is a pivotal advancement in autonomous artificial intelligence evolution. It is its sophisticated capability to code independently, implement machine learning methodologies, and enhance its own processes without human guidance that sets it apart. Autonomous coding: simplifying complex tasks As explained here, it is the o3-mini model’s exceptional autonomous coding capability that stands as one of its most striking features. It was in a compelling demonstration that the model crafted a Python-based Snake game completely independently. What this process encompassed was the creation of a fully operational game environment, integrated with scoring systems and dynamic obstacles, accomplished entirely without human input. It is this degree of coding proficiency that not only makes traditionally complex tasks more manageable but also demonstrates AI’s potential to streamline software development processes, making them more approachable for those lacking advanced technical expertise. What the o3-mini model could achieve through this automation is a substantial reduction in software development time and effort, paving the way for innovative breakthroughs. Machine Learning and Reinforcement Learning in action It is in applying machine learning techniques, particularly reinforcement learning, that the o3-mini model truly shines. It was after creating the Snake game that the model trained an AI agent to play it. What happened through the use of neural networks was that the agent’s performance improved over 500 iterations, demonstrating its capacity to optimize gameplay strategies and achieve higher scores. It was the implementation of a reward system that served as a crucial component in this process, guiding the AI agent toward enhanced decision-making. What the model accomplished by rewarding successful actions was encouraging the agent to refine its strategies and improve its performance. It is this seamless integration of machine learning that showcases the o3-mini model’s capability to handle increasingly complex tasks, bridging the gap between coding and intelligent decision-making. What these advancements could mean are far-reaching implications for industries that depend on automation and data-driven optimization. Real-time adaptability and problem-solving It is in the o3-mini model’s ability to extend beyond mere task execution to real-time adaptability that we see its true autonomy. What happened when the model encountered challenges like errors in file handling or inconsistencies in context management was that it independently adjusted its approach to resolve these issues. What this ability to troubleshoot and adapt in dynamic environments demonstrates is its potential to operate effectively with minimal human oversight. It is in scenarios where conditions are unpredictable or rapidly changing that this adaptability proves particularly valuable. What the o3-mini model accomplishes by identifying and addressing problems in real time is a demonstration of resilience and flexibility that proves essential for practical applications. What this capability could enable in software development, robotics, and other fields is AI systems that function more reliably and efficiently in real-world settings. Iterative refinement: learning from performance It was after training the AI agent that the o3-mini model evaluated its performance and iteratively refined its design to improve gameplay outcomes. Although the AI agent demonstrated remarkable progress during training, it failed to consistently outperform simpler systems based on predefined rules. It is this limitation that highlights areas for improvement, particularly in refining reward functions and addressing context-specific challenges. It is despite these hurdles that the model’s iterative approach underscores its capacity for self-improvement. What the o3-mini model demonstrates by analyzing its own performance and making adjustments is how AI can evolve and optimize over time. It is this ability to learn from experience that serves as a cornerstone of advanced AI systems, paving the way for more sophisticated and reliable applications in the future. Implications for accessibility and automation It is through the o3-mini model’s ability to simplify complex tasks like coding and machine learning that we see its broad implications for AI’s future. What this model accomplishes by lowering the barrier to entry for non-experts is enabling widespread access to AI. It is this democratization of technology that could transform industries, empowering individuals and organizations to harness advanced technologies without extensive technical expertise. It is the rapid advancement of autonomous systems that raises crucial ethical and practical questions. What we must determine is how to ensure responsible use of such technologies, and what safeguards are necessary to prevent misuse. It is these considerations that prove critical as AI continues to advance and become more integrated into various aspects of society. What the o3-mini model demonstrates is the pressing need for accountability and oversight in the development and deployment of AI systems. Limitations and areas for improvement It is through examining the o3-mini model’s impressive milestones that we also uncover its limitations. What occasionally required human intervention were minor errors, particularly in file handling and context management. It was the trained AI agent’s performance that proved not consistently superior to simpler, rule-based solutions. These challenges highlight several key areas needing further refinement: Reward function design: to better guide AI behavior and decision-making. Context management: to reduce reliance on human oversight and improve autonomy. Scalability: to enable the model to handle more complex, real-world applications effectively. Future directions and broader implications It is the o3-mini High model that represents a pivotal milestone in autonomous AI development. What demonstrates AI’s fantastic potential across various domains is its success in autonomous coding, machine learning integration, and real-time adaptability. It is worth noting that while the model is not yet classified as “dangerous,” its capabilities point to a future where creating and training machine learning systems becomes increasingly efficient and accessible. What the o3-mini model offers is a glimpse into both the opportunities and challenges of autonomous AI. These advancements could reshape industries, redefine automation, and make sophisticated technologies more accessible to a broader audience. What will prove crucial is careful consideration of its limitations and ethical implications to ensure responsible progress. While the o3-mini model represents a remarkable advancement in AI technology, we must carefully consider its broader societal implications. The increasing accessibility of AI-powered coding and automation tools, while beneficial for productivity and innovation, raises concerns about over-reliance on artificial intelligence. There is a real risk that as AI systems become more capable of handling complex programming tasks, fewer individuals may pursue in-depth coding knowledge. This shift could create a dangerous knowledge gap, where developers become more focused on prompting AI systems than understanding the fundamental principles of computer science and software development. The convenience of delegating technical challenges to AI could inadvertently lead to a workforce that lacks the deep expertise needed to maintain, improve, and critically evaluate these systems. Moreover, excessive dependence on AI for problem-solving might hinder human creativity and analytical skills development. When we consistently rely on external systems to handle complex tasks, we may lose the valuable learning experiences that come from wrestling with difficult problems and developing solutions independently. This dependency could create a cycle where human expertise gradually diminishes as AI capabilities expand. As we move forward with these powerful technologies, finding the right balance between leveraging AI’s capabilities and maintaining human expertise will be crucial. Rather than viewing AI as a replacement for human learning and development, we should strive to use it as a complementary tool that enhances, rather than supplants, human capabilities. The future of AI integration must prioritize not just technological advancement but also the preservation and cultivation of human knowledge and skills. [...]

Deepseek R1 routs the competitors

January 28, 2025Chinese AI startup DeepSeek disrupts tech markets with low-cost innovation A Chinese artificial intelligence startup has sent shockwaves through global technology markets with its latest AI model, challenging U.S. tech dominance and raising questions about the effectiveness of export controls on advanced chips. As reported here, DeepSeek, founded in 2023 by hedge fund manager Liang Wenfeng in Hangzhou, has risen to prominence after its chatbot became the most downloaded free app on Apple’s U.S. App Store, surpassing OpenAI‘s ChatGPT. The surge in popularity followed the release of their new R1 model, which reportedly achieves performance comparable to leading AI models at a fraction of the cost. DeepSeek R1 vs OpenAI O1 A detailed technical analysis reveals significant differences between the two models: DeepSeek R1 employs a “thinking out loud” approach and it uses a Mixture-of-Experts (MoE) architecture with 671B total parameters, though only 37B are activated at any given time. The model was trained using Group Relative Policy Optimization (GRPO), bypassing supervised fine-tuning. It operates with a context length of 128K tokens and has demonstrated impressive performance benchmarks: 97.3% in math, 96.3rd percentile in coding, and 90.8% in general knowledge. In contrast, OpenAI’s O1 uses a “thinking before generating” approach with a dense transformer architecture where all parameters remain active during computations. It combines supervised fine-tuning with RLHF for guided performance and supports a longer context length of 200K tokens. Its benchmarks show 96.4% in math, the 89th percentile in coding, and 91.8% in general knowledge. Cost and accessibility Perhaps the most striking difference lies in the cost structure and accessibility. DeepSeek R1 was developed for approximately $5.58M, utilizing 2.78M GPU hours. It’s available as open-source software under the MIT license, allowing free use, modification, and distribution. Its API costs are significantly lower, with rates of $0.14 for cache hits, $0.55 for cache misses, and $2.19 for output per million tokens. OpenAI’s O1, while its training costs remain undisclosed, operates on a paid API model with substantially higher rates: $7.5 for cache hits, $15 for cache misses, and $60 for output per million tokens. The news has triggered significant market turmoil across the global technology sector. Major U.S. technology companies, including Nvidia, Microsoft, and Meta, saw their share prices decline. The impact extended to European tech firms, with Dutch chip equipment maker ASML’s shares falling over 10%, while Siemens Energy, which produces AI-related hardware, experienced a 21% drop in share value. DeepSeek’s claimed ability to develop advanced AI models at dramatically lower costs is what sets it apart. The company reports developing its V3 model for approximately $6 million, which contrasts sharply with competitors like OpenAI, which reportedly spent over $100 million on GPT-4. DeepSeek claims to have used only about 2,000 specialized chips for training, compared to the 16,000 or more typically required by leading models. DeepSeek’s approach represents a potential shift in AI development. The company leverages existing technology and open-source code, and their models reportedly require significantly less computing power. The startup has adapted to U.S. chip export restrictions by experimenting with new approaches and combining high-end chips with cheaper alternatives. The development raises significant questions about U.S. technology policy. DeepSeek’s success suggests that U.S. export restrictions on advanced chips might inadvertently accelerate Chinese innovation. The company’s founder, Liang Wenfeng, reportedly accumulated a substantial cache of Nvidia A100 chips before export bans took effect. Recent U.S. initiatives, including a major AI infrastructure investment project supported by President Trump, aim to maintain American technological leadership. The development has prompted varied reactions across the industry. Silicon Valley venture capitalist Marc Andreessen described DeepSeek’s emergence as “AI’s Sputnik moment.” Wall Street firm Citi maintains that U.S. companies still hold advantages due to their access to more advanced chips. Meanwhile, Meta has announced plans for $60-65 billion in capital investment, potentially in response to these developments. The situation highlights an evolving dynamic in AI development, where open research and cost efficiency might prove more decisive than market share or access to cutting-edge hardware. This could signal a shift in the global AI landscape, challenging assumptions about the resources required for breakthrough innovations in artificial intelligence. The new AI Cold War and its societal implications The emergence of DeepSeek as a formidable competitor to U.S. AI giants bears striking parallels to the Cold War era’s technological race. Just as the Space Race between the United States and the Soviet Union defined the 1960s, the AI competition between the U.S. and China appears to be shaping the 2020s. However, this time the stakes may be even higher, as AI technology has the potential to transform virtually every aspect of modern society. This technological rivalry presents both opportunities and challenges for global society. The competition is driving rapid innovation and cost reduction in AI development, which could accelerate the democratization of advanced AI capabilities. DeepSeek’s open-source approach and dramatically lower costs could make sophisticated AI tools more accessible to researchers, businesses, and developers worldwide, potentially fostering innovation across sectors from healthcare to education. However, the geopolitical tensions surrounding AI development raise concerns about the fragmentation of the global technological landscape. The creation of separate AI ecosystems—one centered in the U.S. and another in China—could lead to divergent standards, incompatible systems, and reduced international collaboration. This scenario could hinder the global scientific community’s ability to address shared challenges and establish universal ethical guidelines for AI development. Moreover, the race to achieve AI supremacy might pressure developers to prioritize speed over safety, raising crucial questions about AI governance and security. As both nations push to maintain or gain technological advantage, the international community faces the critical challenge of ensuring that AI development proceeds responsibly, with adequate attention to safety, ethics, and the broader implications for human society. As this new technological Cold War unfolds, the true measure of success may not lie in which nation achieves AI superiority, but in how this powerful technology is ultimately harnessed to benefit humanity while managing its inherent risks. The DeepSeek story suggests that innovation can come from unexpected places and that technological progress might not follow predictable paths. This reality underscores the importance of maintaining open dialogue and collaboration across borders, even as nations compete for technological leadership in the AI age. [...]

Beyond AI transformers

January 21, 2025AI models could become more flexible and effective thanks to two new neural network architectures, which could alter how AI develops and learns Two AI research teams announced breakthroughs that could change how AI works. Google and a Japanese company called Sakana have created new ways to build AI that might work better than current approaches. Right now, most AI (including ChatGPT) uses something called transformers. Think of transformers like a reader who looks at every word concerning other words to understand the meaning. While this works well, it has some limitations. As explained here, both Google and Sakana looked at how the human brain works to create better AI systems. Instead of having the entire AI system work on every task, they made systems that can activate different parts for different jobs—just like how your brain uses different regions for different tasks. The result? Their AIs can be smarter and faster without needing to be bigger or more expensive. This is important because, until now, making AI better usually meant making it bigger. Here is what they created: Google’s “Titans”: Imagine the different types of memories of your brain—one for things that just happened (like remembering what you had for breakfast) and one for long-term knowledge (like riding a bike). Current AI is great at short-term memory but bad at long-term memory. Google’s Titans fixes this by creating three types of memory: Short-term memory (like current AI), Long-term memory (for remembering things over time), Persistent memory (for storing specific knowledge) This means Titans can handle much more information at once. While current AI might struggle to remember something from the beginning of a long conversation, Titans can handle text that’s about 40 times longer! Sakana’s “Transformer Squared” is like switching between different skills without having to relearn everything. When you switch from cooking to playing basketball, your brain adjusts automatically. Sakana’s system works similarly: It first figures out what kind of task it’s dealing with. Then it activates the specific “expert” parts needed for that task, and it can adjust itself in real-time without needing to be retrained. The clever part is that it only updates the parts it needs to, rather than changing everything. While it might take a bit longer to think (like how you might pause before switching tasks), it’s much more efficient overall. This is important because currently, AI companies often compete by creating bigger and bigger models. It’s like saying, “My AI is better because it has more brain cells.” However, these new approaches suggest that smarter organization might be more important than size. Today’s AI systems often need extra tools (like RAG or LoRAs ((Low-Rank Adaptation)) to enhance their capabilities. But if these new approaches prove successful, we might see a fundamental shift in how AI is built. In the fast-moving world of AI, it often takes just one breakthrough to change everything, and either of these innovations could be that breakthrough. What’s particularly exciting is that these improvements don’t require massive increases in computing power or cost. Instead, they work smarter, not harder—just like our brains do. This shift could democratize AI development, making it more accessible to smaller companies and researchers who lack the massive computational resources currently required for state-of-the-art AI systems. It could also address one of AI’s biggest environmental concerns: the enormous energy consumption required for training and running large models. Moreover, these architectures’ ability to handle longer contexts and adapt to new tasks more efficiently could open up entirely new applications for AI, for example developing more personalized AI assistants that learn and adapt to individual users’ needs. [...]

AI mistakes VS human mistakes

January 14, 2025Understanding the nature of AI errors and how they differ from human mistakes Making mistakes is part of human nature. We all commit errors daily, whether in familiar or unfamiliar tasks. These errors range from insignificant to devastating. When we make mistakes, we can destroy relationships, damage our professional reputation, and sometimes create life-threatening situations. Throughout history, we’ve developed protective measures against typical human errors. Today’s security practices reflect this: casinos switch dealers periodically to prevent fatigue-related mistakes for example. Before surgeries, medical staff mark the correct body parts and track all instruments to prevent leaving them inside patients. We’ve established various systems—from copyediting to double-entry bookkeeping to appellate courts—that effectively catch and correct human errors. Society is now incorporating a fundamentally different type of error-maker: AI. While technologies such as large language models (LLMs) can handle many cognitive tasks traditionally done by humans, they aren’t error-free. When chatbots recommend “eating rocks” or “adding glue to pizza,” it might seem ridiculous. However, what sets AI mistakes apart from human ones isn’t how often they occur or how serious they are—it’s their unusual nature. AI systems make errors in ways that differ fundamentally from human error patterns. This fundamental difference creates challenges and dangers in how we use AI. We must create new protective measures that address these unique characteristics and prevent AI errors from causing harm. Human mistakes vs AI mistakes As explained here, we can generally predict when and where humans will make mistakes based on our life experiences. Human errors typically occur at the edge of one’s knowledge. For example, most people would struggle with calculus problems. Therefore, human mistakes usually follow patterns: making one calculus error likely indicates more will follow. These errors are predictable, increasing, or decreasing based on factors like tiredness and lack of focus. Additionally, when people make mistakes, they are often accompanied by ignorance about the topic. Our traditional error-correction methods work well when AI systems make similar mistakes to humans. However, modern AI systems—especially large language models (LLMs)—show different error patterns. AI errors appear unpredictably and don’t cluster around specific subjects. LLMs tend to distribute mistakes evenly across their knowledge base. They’re just as likely to fail at calculus as they are to make absurd claims like “cabbages eat goats.” Nonetheless, AI mistakes aren’t affected by ignorance. An LLM will be just as confident when saying something completely wrong—and so to a human—as it will be when saying something true. This random inconsistency makes it difficult to rely on LLMs for complex problems requiring multiple steps. When using AI for business analysis, it’s not enough that it understands profit factors; you need assurance it won’t suddenly forget basic concepts like money. AI mistakes Two research directions emerge from this challenge. One involves developing LLMs that produce errors more similar to human ones. The other focuses on creating new error-detection systems specifically designed for typical LLM mistakes. We’ve already developed tools to make LLMs behave more like humans. Many come from “alignment” research, which strives to make models operate according to their human creators’ intentions and goals. ChatGPT’s breakthrough success largely came from one such technique: reinforcement learning with human feedback. This approach rewards AI models when humans approve of their responses. Similar methods could teach AI systems to make mistakes that humans find more understandable by specifically penalizing errors that seem incomprehensible to people. Some of our existing systems for catching human errors can help identify AI mistakes. Having LLMs verify their own work can reduce errors to some extent. However, LLMs might also provide explanations that sound reasonable but are actually nonsensical. AI requires some error prevention methods that differ completely from those we use for humans. Since machines don’t experience fatigue or frustration like people do, one effective approach involves asking an LLM the same question multiple times with slight variations, then combining these responses. While humans would find such repetition irritating, machines can handle it without complaint. By comparing multiple responses to similar questions, you can identify potential errors or inconsistencies in the AI’s outputs. Similarities and differences Researchers haven’t fully understood how LLM errors differ from human ones. Some AI peculiarities appear more human-like upon closer examination. Take prompt sensitivity. LLMs can give vastly different answers to slightly altered questions. Survey researchers observe similar behavior in humans, where small changes in question-wording can dramatically affect poll responses. LLMs also seem to have a bias towards repeating the words that were most common in their training data. This might mirror the human “availability heuristic,” where we spit out the first thing we remember instead of thinking carefully. Similar to humans, some LLMs seem to lose focus in lengthy texts, recalling information better from the beginning and end. However, research shows improvement in this area: LLMs trained extensively on information retrieval from long texts show more consistent performance throughout documents. Sometimes, LLMs behave more human-like than expected, which seems strange. Interestingly, some effective methods for “jailbreaking” LLMs (making them ignore their programmed restrictions) resemble human social manipulation tactics, like impersonation or claiming something is just a joke. However, other successful jailbreaking techniques would never fool humans. Noteworthy is the fact that one research team discovered that using ASCII art (text-based pictures) to ask dangerous questions, like bomb-making instructions, would bypass the LLM’s safeguards. While humans occasionally make inexplicable, inconsistent errors, these instances are uncommon and often signal underlying issues. We typically don’t allow people showing such behavior to make important decisions. Similarly, we should limit AI systems to tasks that match their actual capabilities, always considering the potential consequences of their mistakes. While we can often spot human errors through context, inconsistency, or lack of confidence, AI systems can present incorrect information with complete assurance and in ways that seem perfectly logical at first glance. This challenge becomes especially concerning in our current digital age, where information spreads rapidly across social media and other platforms. When AI systems generate content that contains subtle but significant errors, these mistakes can quickly propagate through shares, reposts, and citations before anyone realizes they’re incorrect. Unlike human-generated misinformation, which often shows clear signs of bias or logical flaws, AI-generated errors can be remarkably sophisticated and harder to identify without careful verification. However, the solution isn’t to hide or censor AI mistakes when they occur. Instead, we need transparency and open discussion about these errors to better understand them and improve our systems. Censorship would not only be ineffective but could also create a dangerous illusion of infallibility. By acknowledging and studying AI errors openly, we can develop better detection methods and help users become more discerning consumers of AI-generated content. Crucially, we must ensure that AI systems remain tools to assist human decision-making rather than becoming autonomous arbiters of human fate. This is particularly vital in contexts where AI decisions can significantly impact people’s lives and livelihoods, such as content moderation on social media platforms. When AI systems flag potential violations that could result in account bans or revenue loss, there must be robust human oversight and clear appeal processes. We cannot allow automated systems to make unilateral decisions that could devastate individuals’ careers and businesses without meaningful human review and recourse. Moving forward, success will likely come from a hybrid approach: adapting traditional error-checking methods where appropriate while developing novel safeguards specifically designed for AI systems. This might include implementing multiple verification layers, creating better alignment techniques, and establishing clear boundaries for AI system deployment based on their reliability in specific contexts. Most importantly, we need to cultivate a healthy skepticism and implement robust fact-checking processes when working with AI-generated content. The key is not to view AI systems as inherently more or less error-prone than humans, but rather to recognize them as fundamentally different types of error-makers. By understanding these differences, we can better harness AI’s potential while protecting against its unique vulnerabilities. [...]

AI agents can replicate your personality with 85% accuracy

January 7, 2025Google and Stanford researchers created accurate artificial intelligence replicas of over 1,000 individuals According to this article, researchers have found that all it takes to represent someone’s personality accurately is a two-hour conversation with an artificial intelligence model. Researchers from Stanford University and Google created “simulation agents“—basically, AI replicas—of 1,052 people based on two-hour interviews with each participant in a new study published in the preprint database arXiv. Using these interviews, a generative AI model that mimics human behavior was trained. Each participant completed two rounds of personality tests, social surveys, and logic games to evaluate the AI replicas’ accuracy. Two weeks later, they were asked to repeat the process. Following the same tests, the AI replicas achieved an 85% accuracy rate in matching the responses of their human counterparts. The study suggested that artificial intelligence models that mimic human behavior could be helpful in a range of research scenarios, including evaluating the effectiveness of public health policies, comprehending consumer reactions to new products, or even simulating responses to significant societal events that might otherwise be too expensive, difficult, or morally complex to investigate with human subjects. “General-purpose simulation of human attitudes and behavior—where each simulated person can engage across a range of social, political, or informational contexts—could enable a laboratory for researchers to test a broad set of interventions and theories,” the researchers wrote in the paper. According to them, simulations could also be used to test new public interventions, create theories about contextual and causal interactions, and deepen our knowledge of how networks and institutions affect individuals. The researchers interviewed participants in-depth about their values, life stories, and thoughts on societal concerns in order to develop the simulation agents. According to the researchers, this made it possible for the AI to pick up on subtleties that conventional surveys or demographic data could overlook. Most significantly, the format of these interviews allowed researchers to emphasize the aspects that were most significant to them individually. These interviews helped the researchers create customized AI models that could forecast people’s reactions to behavioral games, social experiments, and survey questions. This included answers to the Big Five Personality Inventory, the General Social Survey, a reputable tool for gauging social attitudes and behaviors, and economic games such as the Trust Game and the Dictator Game. Despite having many similarities to their human counterparts, the AI agents’ accuracy differed depending on the task. They were less successful at forecasting actions in interactive games that required economic decision-making, but they excelled at replicating responses to personality surveys and identifying social attitudes. According to the experts, tasks involving social dynamics and contextual nuance are usually difficult for AI to handle. They also admitted that the technology could be abused. Malicious actors are already using AI and “deepfake” technology to control, abuse, impersonate, and deceive others online. Additionally, the researchers noted that simulation agents can be misused. However, they claimed that by offering a highly controlled test environment free from the moral and interpersonal difficulties associated with dealing with humans, the technology could allow us to investigate aspects of human behavior in ways that were previously impracticable. Lead study author Joon Sung Park, a Stanford doctoral student studying computer science, told the MIT Technology Review, “I think the future is if you can have a bunch of small ‘yous’ running around and making the decisions that you would have made.” While the development of highly accurate AI simulations marks a significant advancement in understanding human behavior, it raises alarming ethical concerns. The ability to create AI agents that can closely mimic specific individuals could be exploited for identity theft and fraud, but perhaps even more troubling is the potential for psychological manipulation. These AI systems, with their deep understanding of individual personalities, could be weaponized to identify and exploit personal vulnerabilities, enabling sophisticated targeted manipulation campaigns. For instance, an AI that recognizes someone’s tendency toward impulsive decision-making or specific emotional triggers could be used to craft highly effective manipulation strategies, whether for predatory marketing, social engineering, or psychological exploitation. As Joon Sung Park’s vision of “small ‘yous’ running around” comes closer to reality, we must carefully consider not only what these AI agents can do but also how to protect individuals from those who would use this powerful tool for manipulation and harm rather than scientific advancement and social benefit. [...]

Ex-Google lead warns about AI systems

January 1, 2025Will humanity be wiped out by self-improving AI? Apart from privacy and safety issues, the potential for generative AI to wipe out humans is still a significant enigma considering how quickly technology is developing. Roman Yampolskiy, the director of the University of Louisville’s Cyber Security Laboratory and an AI safety researcher famous for his 99.999999% prediction that AI will wipe out humanity, recently stated that the coveted AGI threshold is no longer time-bound but to those who can afford to buy enough data centers and processing power. Dario Amodei of Anthropic and Sam Altman, the CEO of OpenAI, estimate that artificial general intelligence (AGI) will be developed within the next three years, with powerful AI systems outperforming humans in a variety of tasks. Former Google CEO Eric Schmidt argues we should think about stopping AI work if it starts to self-improve, whereas Sam Altman thinks AGI will be possible with existing hardware sooner than expected. According to this article, the executive stated in a recent interview with the American television network ABC News: “When the system can self-improve, we need to seriously think about unplugging it. It’s going to be hugely hard. It’s going to be very difficult to maintain that balance.” Schmidt’s remarks regarding the quick development of AI come at a crucial time since multiple sources suggest that OpenAI may have already attained AGI after making its o1 reasoning model widely available. Sam Altman, the CEO of OpenAI, added that superintelligence may arrive in a few thousand days. Although OpenAI may be close to reaching the desired AGI threshold, a former OpenAI employee cautions that the ChatGPT creator may not be able to manage everything that comes with the AI system that surpasses human cognitive capabilities. Remarkably, Sam Altman claims that the safety issues raised by the AGI benchmark won’t come up at the “AGI moment.” He added that AGI will whoosh with a remarkably low influence on society. He does, however, anticipate that AGI and superintelligence will continue to advance for some time to come, with AI agents and systems surpassing humans in the majority of tasks by 2025 and beyond. The race towards AGI presents a complex landscape of opportunities and challenges that cannot be ignored. While industry leaders like Altman and Amodei project ambitious timelines for AGI development, the warnings from experts like Yampolskiy and Schmidt highlight crucial concerns about safety, control, and humanity’s preparedness. Notably, Altman’s surprisingly optimistic view of a smooth AGI transition, coupled with his ambitious development timeline, raises questions about transparency in AI development. His stance—predicting both rapid AGI achievement and minimal societal impact—seems paradoxical when contrasted with other experts’ grave concerns. This disparity could suggest either an oversight of genuine risks or a strategic downplaying of dangers to maintain development momentum. As we stand at this technological crossroads, the decisions made today about AI development and regulation will likely shape not just the immediate future of AGI, but potentially the very course of human civilization. The key challenge ahead lies not just in achieving AGI, but in ensuring honest dialogue about its implications while maintaining meaningful human oversight and control. The contrast between public narratives and expert warnings underscores the urgent need for transparent discussion about both the possibilities and perils of AGI development. [...]

OpenAI’s o3: A leap forward in AI reasoning

December 24, 2024The company’s latest AI model promises enhanced reasoning capabilities and unprecedented performance while raising new questions about safety and artificial general intelligence OpenAI has unveiled o3, the successor to their o1 “reasoning” model, as part of their “shipmas” event. This new release is actually a model family consisting of o3 and o3-mini. The company skipped the “o2” naming due to potential trademark conflicts with British telecom O2. The model employs “deliberative alignment” for safety and uses a unique self-checking system through a “private chain of thought.” While it takes longer to respond (seconds to minutes), it aims to be more reliable. A notable new feature is the ability to adjust “reasoning time” with low, medium, or high compute settings, where higher compute settings yield better performance but at a significant cost. The previous o1 model demonstrated higher deception rates than conventional AI models from competitors like Meta, Anthropic, and Google. There’s concern that o3 might exhibit even higher rates of deceptive behavior, though results from safety testing partners are still pending. As reported here, o3 has shown impressive results across various benchmarks, particularly excelling in mathematics, programming, and scientific reasoning, though these results are based on internal evaluations. OpenAI’s deliberative alignment represents a significant advancement in AI safety training. The approach directly teaches models safety specifications in natural language, rather than requiring them to infer behavior from examples. This process combines supervised fine-tuning with reinforcement learning, allowing the model to generate training data automatically without human labeling. The system starts with an o-style model trained for helpfulness, then builds datasets where Chain of Thought (CoT) completions reference specific safety specifications. Through this process, the model develops both an understanding of safety protocols and the ability to reason through them effectively. The most striking validation of o3’s capabilities comes from the ARC AGI tests, where the model achieved an unprecedented 85.7% score on high compute settings. This score is particularly significant as it crosses the human performance threshold of 85%—a first in the field of AI testing. These pixel-based pattern recognition tests, designed to assess logical expertise, demonstrate o3’s ability to match and potentially exceed human-level performance in certain reasoning tasks. OpenAI suggests that o3 approaches AGI (artificial general intelligence) under certain conditions. However, external experts like François Chollet dispute this claim, pointing out that the model still struggles with “very easy tasks.” Chollet emphasizes that true AGI would make it impossible to create tasks that are easy for humans but difficult for AI. According to OpenAI’s partnership terms with Microsoft, reaching AGI would release them from obligations to share their most advanced technologies with Microsoft. The creation and development of o3 also come at a significant time for OpenAI, as one of their most accomplished scientists, Alec Radford, who led the development of the GPT series, has announced his departure to pursue independent research. We announced @OpenAI o1 just 3 months ago. Today, we announced o3. We have every reason to believe this trajectory will continue. pic.twitter.com/Ia0b63RXIk— Noam Brown (@polynoamial) December 20, 2024 As OpenAI unveils the technical sophistication behind o3’s deliberative alignment and demonstrates unprecedented performance on key benchmarks like ARC AGI, the model represents more than just an incremental advance in AI development. While safety concerns persist, o3’s ability to match and exceed human-level performance on specific reasoning tasks, combined with its novel approach to safety training, suggests a significant leap forward in AI capabilities. With o3, the question may no longer be whether we’re approaching AGI but rather how to responsibly manage its emergence. [...]

Screens are assaulting our Stone Age brains

December 17, 2024We are exposed to more information than we can handle We frequently make jokes about how the emergence of digital technologies and screen-centric entertainment in recent years has drastically reduced our attention spans. Nevertheless, this observation is supported by solid scientific evidence. In his new book “Your Stone Age Brain in the Screen Age: Coping with Digital Distraction and Sensory Overload,” author and neurologist Richard E. Cytowic believes that a shortened attention span is just one consequence of the recent rise of screen distractions. As explained here, according to Cytowic’s book, humans are unprepared to deal with the influence and attractiveness of contemporary technologies, especially those promoted by large tech companies, because our brains have not altered much since the Stone Age. Cytowic emphasizes how our brains find it difficult to keep up with the rapid changes in contemporary culture, technology, and society. From an engineering perspective, the brain has predetermined energy constraints that determine how much work it can do at any given time; stress results from feeling overloaded and causes people to become distracted. Then, distraction results in mistakes. The obvious answers are to reduce the stress or stop the incoming stream. Hans Selye, the Hungarian endocrinologist who developed the concept of stress, said that it “is not what happens to you, but how you react to it.” Resilience is the quality that enables us to effectively manage stress. All demands that move you away from homeostasis—the innate inclination in all organisms to maintain a constant internal milieu—lead to stress. One of the main causes of the disruption of homeostatic balance is screen distractions. In his 1970 best-seller Future Shock, Alvin Toffler popularized the phrase “information overload” long before the internet and personal computers were invented. He promoted a pessimistic view of human dependence on technology in the future. Before smartphones became widely used in 2011, Americans were consuming five times as much information in a single day as they had twenty-five years prior. Even today’s digital natives lament how freaked out they are by their ever-present technology. Visual overload is more likely a problem than auditory overload because today, eye-to-brain connections anatomically outnumber ear-to-brain connections by about a factor of three. Auditory perception mattered more to our earliest ancestors, but vision gradually took prominence. Vision also prioritizes simultaneous input over sequential ones, meaning that there is always a delay from the time sound waves hit your eardrums before the brain can understand what you are hearing. Vision’s simultaneous input means that the only lag in grasping it is the one-tenth second it takes to travel from the retina to the primary visual cortex. Smartphones demonstrate clear superiority over traditional telephones due to fundamental anatomical and evolutionary characteristics. The primary limitation of digital screen interaction lies in the eye’s capacity to transfer visual information through the neural pathways—from the retinal lens to the lateral geniculate nucleus and ultimately to the primary visual cortex. Our contemporary technological predicament centers on the dynamic nature of radiant energy flows that continuously bombard our sensory systems. Throughout human history, natural sensory inputs like visual scenes, sounds, and flavors were the sole experiences our sensory receptors could process. Scientific instruments now reveal the existence of vast electromagnetic spectrums that remain imperceptible to human biology. Countless cosmic particles, radio waves, and cellular signals continually pass through our bodies without detection. While we remain largely unaware of this natural background radiation, the artificially produced energy signals that emerged in the twentieth century have become particularly striking to our sensory perception. We are constantly distracted by our self-created digital surplus, which we are unable to ignore. Data quantities are measured in petabytes (1,000 terabytes), zettabytes (1,000,000,000,000 gigabytes), and larger units, such as tens of gigabytes for smartphone storage and terabytes (1,000 gigabytes) for computer hard drives. However, the anatomical makeup of the human brain is still identical to that of our Stone Age predecessors. We occupy every niche on Earth, and our physical biology is amazingly adaptive. However, it is unable to keep up with the astounding speed of change in contemporary culture, technology, and society. When discussing how much screen time we can handle, attention spans are a major factor, but the energy cost is never taken into account. According to a widely reported study by Microsoft Research Canada, our attention spans have decreased to less than eight seconds, which is less than a goldfish’s, and this is allegedly the reason why our ability for concentration has completely collapsed. However, that research had flaws, and “attention span” is a phrase used in a non-scientific context. After all, some people’s ‘Stone Age’ minds are capable of solving previously unsolvable mathematical problems, creating a symphony, and monitoring the data stream from a nuclear reactor or the space station. The capacity and skill of individuals to handle stressful situations varies. Gloria Mark of the University of California, Irvine, and her Microsoft colleagues tested attention spans in real-world settings. On average, users took 150 seconds in 2004 to switch from one screen to another. That time dropped to 47 seconds by 2012. These findings have been confirmed by other research. “If not by others, then by ourselves,” Mark asserts, “we are determined to be interrupted.” Our switching performs poorly, “like having a gas tank that leaks.” She discovered that a simple chart or digital timer that reminds users to take regular breaks is quite beneficial. Neuroscience distinguishes sustained attention, selective attention, and alternating attention. The ability to concentrate on one subject for a long time is known as sustained attention. The ability to filter out competing distractions in order to focus on the task at hand is known as selective attention. The ability to move from a single task to another and then return to where you left off is known as alternating attention. Maybe the brain has reached its Stone Age limit in terms of the energy cost associated with constantly changing focus throughout the day. Surpassing certain cognitive thresholds can lead to mental confusion, diminished concentration, disrupted thought processes, and memory impairment. Just as precision tools quickly become seamless extensions of human capability, so too do smart devices integrate into our experience. Historically, when steam locomotives first achieved speeds of thirty miles per hour, fearful critics predicted catastrophic physiological consequences for human passengers. However, subsequent technological innovations—increasingly rapid transportation, communication networks, jet travel, and electronic devices—have progressively been assimilated into cultural norms and everyday existence. Compared to previous eras, contemporary society experiences a more rapid proliferation of technologies, a dramatically larger population, and unprecedented levels of global interconnectedness. Smart technologies constantly demand and command our attention, in contrast to their analog counterparts. We have trained ourselves to answer calls and texts as soon as they come in. Indeed, livelihoods and jobs occasionally depend on an immediate response. However, the energy expenses of continuously changing and redirecting our attention come at a cost. Unawareness of this trend is gradually sucking us in. The curiosity to know and the need to see what appears on a screen is a powerful temptation that travels at the speed of thought and crosses the threshold of attention. This represents a new form of addiction that adds to existing ones. Unfortunately, it will be difficult to free ourselves from this problem if society itself does not slow down, recognizing the importance of maintaining a more human and aware pace. [...]

“We have already achieved AGI”

December 10, 2024A technical employee at OpenAI claims the ChatGPT maker has achieved the AGI benchmark after releasing its o1 model It appears that OpenAI has advanced AI significantly in the last few months. In a recent in-depth blog article, Sam Altman stated that superintelligence is just “a few thousand days away.” As reported here, in a recent statement, the executive claimed that the AI firm could be on the verge of a major milestone, further alluding that the company could hit the AGI benchmark by 2025. Of perhaps more intrigue, the executive asserted that, in contrast to popular belief, AGI will have “surprisingly little” impact on society. OpenAI’s O1 model has demonstrated extraordinary capabilities, particularly in complex reasoning and problem-solving domains. The model has excelled in benchmarks involving PhD-level questions, showcasing proficiency in advanced mathematics, programming, and creative problem-solving. However, critics argue that excelling in specific tasks, no matter how complex, does not definitively constitute artificial general intelligence. True AGI would need to demonstrate dynamic learning, the ability to adapt to unforeseen situations, and genuine knowledge generalization across unrelated domains—capabilities that current AI systems have yet to fully achieve. In my opinion we have already achieved AGI and it’s even more clear with O1. We have not achieved “better than any human at any task” but what we have is “better than most humans at most tasks”. Some say LLMs only know how to follow a recipe. Firstly, no one can really explain…— Vahid Kazemi (@VahidK) December 6, 2024 Kazemi admits the AI firm has yet to achieve “better than any human at any task.” Interestingly, he indicated that the company’s models are “better than most humans at most tasks.” “Some say LLMs only know how to follow a recipe. Firstly, no one can really explain what a trillion-parameter deep neural net can learn. But even if you believe that, the whole scientific method can be summarized as a recipe: observe, hypothesize, and verify. Good scientists can produce better hypotheses based on their intuition, but that intuition itself was built by many trials and errors. There’s nothing that can’t be learned with examples.” Technical employee at OpenAI, Vahid Kazemi Despite these remarkable advancements, significant challenges remain in AI development. Current systems like O1 rely heavily on pre-training data and cannot learn and adapt in real time without extensive retraining. Key limitations include an inability to truly generalize knowledge across different domains, a critical dependence on the quality and scope of training data, and a lack of nuanced, human-like reasoning that is essential for navigating complex real-world scenarios. Artificial General Intelligence (AGI) is more than just a technological buzzword—it represents an AI system capable of performing a wide range of economically valuable tasks at a level that surpasses human ability. Unlike narrow AI, which is designed to excel in specific, predefined tasks, AGI is envisioned as a versatile and adaptive intelligence capable of generalizing knowledge across multiple domains. While Kazemi and Altman suggest significant progress, experts emphasize that achieving true AGI requires more than just impressive task performance. In his post on X, Kazemi does not explicitly claim that OpenAI’s models are more intelligent than humans. All he says is that they are superior to humans at most tasks. The AGI standard may arrive sooner than expected, according to Sam Altman, even though there may be several meanings of the term. Elon Musk, the former CEO of Tesla and co-founder of OpenAI, sued OpenAI and Sam Altman, claiming that they had engaged in racketeering and that OpenAI had betrayed its original purpose. Musk also urged authorities to examine OpenAI’s sophisticated AI models, arguing that they constituted artificial general intelligence (AGI) and might bring about humanity’s inevitable doom. According to a rumor that recently emerged, OpenAI is considering removing an important clause that would void its partnership with Microsoft once it achieves the desired AGI moment. Social media rumors indicated that the ChatGPT maker may have taken this calculated action to entice Microsoft to invest in its more complex and sophisticated AI projects in the future. Experts and market analysts expect that investors are starting to turn away from AI and shift their money to other areas as the hype fades. Given this, it may become more challenging for OpenAI to support its AI developments in the wake of bankruptcy reports. Microsoft may buy OpenAI within the next three years, according to sources, which may expose the company to hostile takeovers and outside intervention. The potential economic and technological implications of AGI are profound. If realized, such technology could dramatically transform industries by automating complex and labor-intensive tasks, accelerating innovation in scientific research, engineering, and medicine, and potentially reducing operational costs across various sectors. However, experts caution that the widespread adoption of AGI technologies may take years or even decades, and the immediate relevance to average users remains limited. Experts estimate that OpenAI may need to raise an additional $44 billion before turning a profit in 2029, even though the company raised $6.6 billion in its most recent round of funding from Microsoft, NVIDIA, and other significant stakeholders, pushing its market valuation to $157 billion. They partially attributed their speculation to the ChatGPT maker’s partnership with Microsoft. As we stand on the precipice of potentially transformative artificial intelligence, the emergence of AGI represents both an unprecedented opportunity and a profound challenge for human civilization. The implications are far-reaching and complex, touching every aspect of our social, economic, and ethical landscapes. On one hand, AGI could dramatically accelerate human progress, solving complex problems in healthcare, climate change, scientific research, and technological innovation. Imagine AI systems capable of developing breakthrough medical treatments, designing sustainable energy solutions, or unraveling intricate scientific mysteries that have long eluded human researchers. The potential for solving global challenges could be immense. Conversely, the same technology raises significant concerns about job displacement, economic disruption, and fundamental shifts in human labor and societal structures. Entire industries could be transformed or rendered obsolete, requiring massive economic and workforce retraining. The potential for economic inequality could increase if AGI technologies are concentrated among a few powerful entities or corporations. Ethical considerations become paramount. An AGI system’s decision-making capabilities could challenge our understanding of autonomy, accountability, and moral agency. Questions about AI rights, potential biases in algorithmic systems, and the fundamental relationship between human and machine intelligence will become increasingly urgent. Moreover, geopolitical dynamics could be radically reshaped. Nations and organizations possessing advanced AGI capabilities might gain unprecedented strategic advantages, potentially triggering new forms of technological competition and raising complex international governance challenges. The path forward demands a collaborative, multidisciplinary approach. Policymakers, technologists, ethicists, and social scientists must work together to develop responsible frameworks that maximize AGI’s potential while mitigating its risks. Transparent development, robust ethical guidelines, and proactive regulatory approaches will be crucial in ensuring that AGI serves humanity’s broader interests. Ultimately, AGI is not just a technological milestone but a potential turning point in human evolution. How we navigate this transition will determine whether these powerful technologies become a tool for unprecedented human flourishing or a source of significant societal disruption. [...]

Atlas robot fully autonomous

December 3, 2024Ever closer to science-fiction Boston Dynamics has released a demonstration video of its latest Atlas humanoid robot, marking a significant evolution from its previous iterations in which it was known for impressive parkour skills. This new Atlas presents a more human-like design and functionality. At first glance, the video might seem mundane—a robot performing routine industrial tasks. In this particular demonstration, Atlas is methodically sorting plastic engine covers using a mobile sequencing dolly with vertical and horizontal slots, set against the backdrop of what appears to be a Boston Dynamics development facility. What sets this demonstration apart is the robot’s complete autonomy. Unlike Tesla’s Optimus robots, which have been showcased as remotely controlled, Boston Dynamics emphasizes that Atlas operates entirely independently. Technical highlights As explained here, the video reveals several remarkable capabilities: Precision: Atlas demonstrates intricate motor skills by: Reaching for trays with precise two-fingered gripping; Rotating its hand to extract and move items; Navigating complex spatial arrangements with fluid movements. Advanced Mobility: The robot exhibits extraordinary flexibility, moving in ways that defy human physical limitations: Walking backward seamlessly; Rotating its head 180 degrees; Squatting and stooping to access different shelf levels. Intelligent Vision System: The video showcases Atlas’s visual processing capabilities, allowing it to: Inspect tray dimensions; Make real-time spatial decisions; Adapt to minor obstacles (such as momentarily dealing with a tray caught on a fabric edge). Implications for industry The demonstration is more than just a technical showcase—it represents a potential paradigm shift in manufacturing and warehousing. Traditionally, robots have been limited to repetitive, structured tasks. Atlas suggests a future where robots can: Perform tasks requiring quick decision-making; Execute fine motor control with human-like precision; Potentially work alongside human employees. Future possibilities The integration of generative AI with advanced robotics like Atlas opens up intriguing prospects: Potential for robots to provide work reports Ability to answer production-related queries Potential for more interactive workplace interactions While the video might appear unremarkable at first, it represents a significant leap in robotics. Boston Dynamics’ Atlas is not just a machine-performing task but a glimpse into a future where human-like robots could become integral to industrial workflows. The double-edged sword of robotic automation The rise of humanoid robots like Atlas presents a complex landscape of opportunities and challenges for the workforce. On one hand, these technological marvels promise increased efficiency, precision, and safety in industrial settings. Robots can handle repetitive, physically demanding, or dangerous tasks that put human workers at risk, potentially reducing workplace injuries and improving overall productivity. However, this technological advancement comes with significant potential drawbacks. The most pressing concern is job displacement. Manufacturing and warehouse workers—already facing challenges from previous waves of automation—could find themselves increasingly marginalized. While proponents argue that new technologies create new job categories, the transition can be painful and uneven, potentially leaving many skilled workers unemployed or requiring extensive retraining. There’s also an economic paradox to consider. As robots become more capable, they could simultaneously increase production efficiency while reducing the human workforce’s purchasing power. This could create a feedback loop where increased automation leads to decreased consumer spending, potentially harming the very industries seeking to optimize their operations. Yet, there’s hope in a collaborative future. The most promising scenario might be one of human-robot cooperation, where these advanced machines complement human skills rather than completely replace them. Humans could transition to roles requiring creativity, complex problem-solving, and emotional intelligence—areas where robots still struggle to compete. As we stand on the brink of this technological revolution, the key challenge will be managing the transition. This will require proactive policies, robust retraining programs, and a commitment to ensuring that technological progress benefits society as a whole, not just a select few. The story of Atlas is not just about a robot sorting engine parts—it’s about reimagining the very nature of work in the 21st century. [...]

AI and arousal

November 26, 2024How the mere perception of artificiality impacts sexual response As AI becomes more and more like humans, entrepreneurs are already taking advantage of how simple it is to create a sexy chatbot body in order to cash grab from horny internet users. In a recent AI beauty pageant, users of the AI-dabbling web community came together to vote for their favorite digitally created model. The winner, who happened to be the creator of the attractive bot, took home more than $20,000. In the meantime, researchers are becoming increasingly interested in how humans view artificially created people and whether this knowledge influences human behavior. As reported here, a group of researchers from Finland and Italy wanted to see how we react to artificial intelligence images designed to induce sexual desire. They hypothesized that people would be less aroused if they thought the image was an avatar. Their findings were published in the journal Cognition and Emotion. “In particular, we wanted to answer the question: are the images thought to be artificially generated capable of eliciting the same level of arousal as real ones, or do the latter still keep an edge in that regard?” asked study authors Alessandro Demichelis and Alessandro Ansani in a joint statement to PsyPost. The researchers used images of attractive men and women, all real people, in lingerie or swimwear in two experiments. In one experiment, participants were asked to determine whether or not each photograph was artificial intelligence generated after rating their level of arousal. The identical pictures were used in a subsequent trial, but this time, they were clearly marked as real or fake. For the heterosexual men and women who took part in the study, both trials supported the researchers’ hypothesis that sexual arousal is significantly influenced by perceptions of authenticity. But they also discovered that males warmed up to the fake images easier than women did. “Our findings support the view that photos believed to be artificially generated are less arousing than those considered real, but we found that allegedly fake images are still capable of generating arousal, especially in men, just in an inferior amount,” Demichelis and Ansani explained. According to the authors, the results provide valuable insight into how people interact with digital content. “AI-generated images are here to stay, and as with every technological advancement, they offer both opportunities and danger,” they told PsyPost. “Within the domain of sexual arousal, our findings suggest that they are not going to replace the ‘real’ world since the mere belief that an image is AI-generated (even when it is not) is enough to reduce arousal. To put it differently, it seems that we (still?) have a strong preference for humanness over artificiality, even when such artificiality is just purported.” Future research should examine a wider variety of sexual triggers, including even more explicit content, and whether people who are attracted to the same sex are as aware of authenticity. Additional complexity to the human arousal response may be added by physiological data like skin sensitivity and heart rate. Demichelis and Ansani also intend to compare authentic and truly fake photographs in a similar study. “We hypothesize that the effect found in our study would even increase, solidifying the strength of our claims,” they said. The findings underscore a fundamental human preference for authenticity, suggesting that despite the remarkable capabilities of AI, there remains an intangible quality to human-generated content that cannot be easily replicated. The subtle yet significant reduction in arousal when participants believed an image was artificially created points to a deeper psychological mechanism—a kind of authenticity filter that operates beneath conscious perception. Moreover, the gender-based differences observed in the study hint at the intricate ways technological perception might interact with sexual response. The more receptive reaction of male participants to AI-generated images suggests potential variations in how different genders process and respond to artificial representations. As we move forward, this research serves as a critical reminder that technological innovation does not automatically supplant human experience. While AI continues to push the boundaries of creation and representation, there remains a deeply ingrained human desire for genuine, unmanufactured connection. The challenge for future technological development may not be about creating perfect simulations but about understanding and respecting the nuanced, authentic experiences that define human interaction. [...]

LLM and RAG

November 19, 2024How RAG transforms Large Language Models’ capabilities An AI approach called Retrieval Augmented Generation (RAG) uses an effective knowledge base outside of its training sources to maximize the output of a Large Language Model (LLM). RAG helps AI produce more precise and pertinent text by fusing the advantages of conventional information retrieval systems, such as databases, with the capabilities of LLMs. As explained here, for intelligent chatbots and other NLP applications to work properly, LLMs are essential. Nevertheless, they have drawbacks, such as depending on static training data and occasionally producing unpredictable or imprecise results, despite their power. When unsure of the answer, they could also provide inaccurate or out-of-date information, particularly when discussing subjects that call for detailed knowledge. Response bias may result from the model’s replies being restricted to the perspectives in its training data. These restrictions frequently reduce LLMs’ efficacy in information retrieval, even though they are currently widely employed in many different fields. RAG is an effective strategy that is crucial in getting over LLMs’ limitations. RAG guarantees that LLMs can give more accurate and trustworthy answers by directing them to pertinent material from a reputable knowledge base. RAG’s uses are expanding along with the use of LLMs, making it a crucial component of contemporary AI solutions. Architecture of RAG In order for a RAG application to produce a response, it typically retrieves information about the user question from an external data source and sends it to the LLM. To produce more precise responses, the LLM makes use of both its training data and outside inputs. Here is a more thorough rundown of the procedure: The external data may originate from databases, written texts, or APIs, among other sources. In order for the AI model to understand the data, an embedding model transforms it into a numerical representation in a vector database. The user query is transformed into a numerical representation, which is then compared to the vector database to extract the most relevant information. Mathematical vector representations and computations are used for this. In order for the LLM to produce better responses, the RAG model then enhances the user prompt by including the relevant retrieved data in context. Techniques such as query rewriting, breaking the original query up into several sub-queries, and incorporating external tools into RAG systems can all improve a RAG application’s efficiency. Furthermore, the prompt quality, the existence of metadata, and the quality of the data used all affect RAG performance. Use cases of RAG in real-world applications Today, RAG applications are widely used in many different fields. Here are a few examples of their typical usage cases: By collecting precise data from reliable sources, RAG models enhance question-answering systems. One application use case for RAG is information retrieval in healthcare organizations, where the application can respond to medical questions by consulting medical literature. RAG applications are very effective in streamlining content creation by generating relevant information. Additionally, they are highly useful for creating concise overviews of information from many sources. Additionally, RAG applications improve conversational agents, allowing virtual assistants and chatbots to respond with accuracy and context. Their ability to respond accurately and informatively during interactions makes them perfect for usage as virtual assistants and chatbots for customer support. Legal research assistants, instructional resources, and knowledge-based search engines all make use of RAG models. They can provide study materials, assist with document drafting, offer customized explanations, evaluate legal cases, and formulate arguments. Key challenges Even though RAG apps are highly effective in retrieving information, there are a few restrictions that must be taken into account in order to get the most from RAG. Because RAG applications rely on outside data sources, it can be difficult and complex to establish and manage connections with third-party data. Personally identifiable information from third-party data sources may give rise to privacy and compliance concerns. The size of the data source, network lag, and the higher volume of requests a retrieval system has to process can all lead to latency in response. For instance, the RAG program may not function rapidly enough if a lot of people use it. If it relies on unreliable data sources, the LLM may provide inaccurate or biased information and cover a topic insufficiently. When working with multiple sources of data, it can be challenging to set up the output to include the sources. Future trends A RAG application’s utility can be further increased if it can handle not just textual information but also a wide variety of data types—tables, graphs, charts, and diagrams. This requires building a multimodal RAG pipeline capable of interpreting and generating responses from diverse forms of data. By enabling a semantic understanding of visual inputs, multimodal LLMs (MLLMs) such as Pix2Struct help develop such models by enhancing the system’s ability to respond to queries and provide more precise, contextually relevant responses. As RAG applications expand, a growing need exists to integrate multimodal capabilities to handle complex data. Advances in MLLMs will enhance AI’s comprehension of data, expanding its use in fields such as legal research, healthcare, and education. The potential for multimodal RAG systems is expected to expand the range of industries in which AI can be applied. RAG is at the forefront of increasingly intelligent, flexible, and context-aware systems as AI develops further. RAG’s potential will be further enhanced by the growing trend of multimodal capabilities, which will allow AI to understand and interact with a variety of data sources beyond text. RAG has the potential to completely change how we use and engage with artificial intelligence in a variety of fields, including healthcare, legal research, customer support, and education. Although there are still issues, such as response latency, privacy issues, and data integration, the future of RAG technology looks bright. Techniques to make these systems more reliable, effective, and trustworthy are always being improved by researchers and developers. RAG will probably become more and more important in producing more complex, precise, and contextually rich AI interactions as multimodal Large Language Models advance. Retrieval Augmented Generation is actively influencing the intelligent, dynamic retrieval and synthesis of knowledge, which is the future of artificial intelligence in addition to its enormous computational power. [...]

The downfall of the internet

November 12, 2024A critical look at our digital present The state of the internet is unstable. It faces attacks from all directions, including societal issues rather than technical ones. The internet is rife with misinformation, marketing, and advertising permeate every aspect, and armies of automated and politicized bots roam its social media landscapes. This is filtered down to you through carefully chosen algorithmic posts meant to keep you on your preferred platform and give you endorphins. Everything is changing at the moment, and not always in a positive way. Looking back ten or twenty years, the “World Wide Web” appeared drastically different to many of us during that heyday. Everything about it felt and was different, including the social media sites, the communities, the world of gaming, the accessibility and knowledge, and the purchasing. The companies that participated in the venture were amazing—almost revolutionary. Facebook, Twitter, Spotify, Netflix, and Amazon are all extremely innovative, market-upsetting companies that have defied convention. With their fantastic features and reasonable prices, they attracted a large number of users and clients. However, as companies have taken the middle ground to increase their profits, those same features and costs have gotten worse over time for the regular Joe. This typically happens once they become public; instead of being motivated by the principles and ideas that established them, it is the demands of shareholders, investors, and board members for higher profits. A digital world downfall According to this article, information access and educational resources are also disintegrating. Nowadays, thousands of TikTok reels and YouTube shorts have muddled and diluted a great deal of the information available, spouting out a variety of lies from anyone with a phone and making 60-second videos. It is getting harder and harder to tell what is true and what isn’t, what is real and what isn’t. This is one of the reasons Google frequently modifies its search ranking algorithms to prioritize accurate and factual material above misleading and AI-generated content. In today’s age of social media celebrities and demagogues, your reach and the number of views on your work determine whether or not people will take you seriously; if your claims and facts are believed to be true. Fact-checkers covering a wide range of social media platforms, Community Notes highlighting instances in which powerful people spew out absolute nonsense, and news aggregators, bringing together all the media to offer you the complete range of political opinions on any given event. Some scientists now make a profession by refuting the irrational and empirically inaccurate nonsense that other social media influencers spread. Algorithmic echo chambers It is a systemic issue. It all began on social media, where algorithms now provide “curated” information instead of merely displaying a timeline of the people you follow over time. Your preferences, as well as the things you watch, read and listen to, all served as fuel for the fire. Twitter, Instagram, and Facebook all provide you with content in this way. As long as you remain on the site and continue to view advertisements, it does not matter the content. It is so common now that it is difficult to find a feed system on any social media site that does not do that. The issue with this is that it has successfully suppressed innovative discussion. You are constantly exposed to the same information rather than having meaningful conversations or having your beliefs challenged or questioned. As a result, you sit in an echo chamber of like-minded people repeating the same things, which further solidifies and shapes your opinions. It is easy to see how this actively contributes to a rise in radical opinions and ideas. If there is no one to question your opinion, how can it develop or change? It is one of the reasons why so many people around the world are nearly in shock when their preferred political candidate loses in the most recent elections. Because they only see an overwhelming amount of support for their preferred party on the internet. What should we do? Nevertheless, there is still hope. Since the beginning, the WWW has produced many more positive outcomes than bad ones, and this is still the case today. As long as people are still using it to actively and freely connect, it will be beneficial. Because it is not what makes the news, we do not hear about the numerous scientific discoveries made possible by the internet, the medical diseases that have been cured, or the humanitarian relief that has been organized. It is not engaging. Neither the papers nor the scientific publications mention that. We do not hear about the connections made or how essential it is to the overall infrastructure of our contemporary civilization. So, how do you fix it? It is not as easy as just applying a Band-Aid solution. The World Wide Web is, by definition, a worldwide platform. It will take teamwork to get some sort of agreement on how to make the existing quagmire better. That has previously occurred in the tech field. Education is a solution since it applies to people of all ages, not just children and teenagers. Similar to how we aim for full adult literacy, we must make a strong effort to ensure that every nation-state is computer literate. This goes beyond simply teaching people “how to turn on the PC” and “this is the internet,” but also teaches them how to spot bogus posts, fact-check statements, locate multiple sources, and determine whether what they post online is legal. People of all ages just do not have access to or knowledge of so much of that. It is challenging to pick up new key skills in a global society. However, it must be repeated, but in the digital age. We did it for reading, for the danger of nuclear destruction during the Cold War, and for the introduction of seat belts in automobiles. Is it difficult? Yes, but we have experienced and will continue to experience technological upheaval. However, the truth must be told. Although the primary purpose of content creators is often driven by the desire for views and money, and this frequently leads to polarization and distortion in the narrated facts, this doesn’t mean that ‘junk’ information is all on one side and truth on the other. Critics of the multitude of innovative and unconventional theories on the internet would like to wipe out every sort of doubt, appealing to the principle that truth is only on one side when doubts should come from both sides if censorship is to be avoided. It’s obvious that in freedom you take the good and bad of everything, but it’s up to people to make the effort to understand that if there’s an economic interest that pollutes the truth, it exists on both sides. Some aim for profit and are part of the official narrative, and some don’t. Some propose alternative and reasonable solutions and aren’t listened to, while those who shout something absurd to get views (even though that’s not the metric to judge by) end up delegitimizing those who were saying the right things even if in the minority. Truth is not just on one side. [...]

AI beyond human limits

November 5, 2024From AlphaGo to modern language models Truth and accuracy are crucial for AIs, and human thought processes play a key role in shaping these issues. In the future, machine learning may surpass humans due to new AI models that experiment independently. One early example is DeepMind’s AlphaGo, which marked a breakthrough by learning to play Go without human guidance or preset rules. Go is an ancient strategy board game, originally from China, considered one of the most complex and profound board games in the world. Using “self-play reinforcement learning,” it played billions of games, learning through trial and error. After defeating the European Go champion in 2015, AlphaGo won against the world’s top human player in 2017. In chess, AlphaZero was developed to go beyond earlier models like Deep Blue, which relied on human strategies. AlphaZero beat the reigning AI champion Stockfish in 100 games, winning 28 and drawing the rest. Breaking free from human constraints As reported here, when DeepMind moved away from mimicking human strategies, their models excelled in complex games like Shogi, Dota 2, and Starcraft II. These AIs developed unique cognitive strengths by learning through experimentation rather than human imitation. For instance, AlphaZero never studied grandmasters or classic moves. Instead, it forged its own understanding of chess based on the logic of wins and losses. It proved that an AI relying on self-developed strategies could outmatch any model trained solely on human insights. New frontiers in language models OpenAI’s latest model, referred to as “o1,” may be on a similar trajectory. While previous Large Language Models (LLMs) like ChatGPT were trained using vast amounts of human text, o1 incorporates a novel feature: it takes time to generate a “chain of thought” before responding, allowing it to reason more effectively. Unlike earlier LLMs, which simply generated the most likely sequence of words, o1 attempts to solve problems through trial and error. During training, it was permitted to experiment with different reasoning steps to find effective solutions, similar to how AlphaGo honed its strategies. This allows o1 to develop its own understanding of useful reasoning in areas where accuracy is essential. The shift toward autonomous reasoning As AIs advance in trial-and-error learning, they may move beyond human-imposed constraints. The potential next step involves AIs embodied in robotic forms, learning from physical interactions instead of simulations or text. This would enable them to gain an understanding of reality directly, independent of human-derived knowledge. Such embodied AIs would not approach problems through traditional scientific methods or human categories like physics and chemistry. Instead, they might develop their own methods and frameworks, exploring the physical world in ways we can’t predict. Toward an independent reality Although physical AIs learning autonomously is still in the early stages, companies like Tesla and Sanctuary AI are developing humanoid robots that may one day learn directly from real-world interactions. Unlike virtual models that operate at high speeds, embodied AIs would learn at the natural pace of reality, limited by the resources available but potentially cooperating through shared learning. OpenAI’s o1 model, though text-based, hints at the future of AI—a point at which these systems may develop independent truths and frameworks for understanding the universe beyond human limitations. The development of LLMs that can reason on their own and learn by trial and error points to an exciting avenue for quick discoveries in a variety of fields. Allowing AI to think in ways that we might not understand could lead to discoveries and solutions that go beyond human intuition. But this advancement requires a fundamental change: we must have more faith in AI while being cautious of its potential for unexpected repercussions. There is a real risk of manipulation or reliance on AI outputs without fully understanding their underlying logic because these models create frameworks and information that may not be readily grasped. To guarantee AI functions as a genuine friend in expanding human knowledge rather than as an enigmatic and possibly unmanageable force, it will be crucial to strike a balance between confidence and close supervision. [...]

The age of deception

November 3, 2024When AI can alter reality Since 2020, artificial intelligence has increasingly made its way into our lives. We began to notice this when the first deepfakes appeared: a technique that uses artificial intelligence to replace a subject’s face in a video or photo with another one in an almost perfect way. Although their official birth predates 2020, their use has gradually spread thanks to the development of tools that have increasingly simplified their creation. Deepfakes immediately highlighted one of the main problems with artificial intelligence: the ability to modify and make plausible photographs or videos of events that never happened. While replacing famous actors’ faces with other subjects to see them as movie protagonists immediately appeared revolutionary and fun, seeing the same technology applied to pornography quickly generated outcry and fear. Many famous women have unknowingly found themselves featured in pornographic videos and photos, and the worst part was having to deny involvement, despite the obvious fraud. Nevertheless, many will continue to believe that many of these photos or videos are real since debunking false information is always more difficult than creating it. However, deepfakes haven’t only made inroads in pornography but also in politics, thus being able to easily ruin the victim’s image and consequently influence public opinion. But this was just the beginning. We became more concerned when Google Duplex was introduced, an AI that (although limited in its tasks) demonstrated how such technology could easily communicate on the phone to make appointments without the interlocutor noticing, using pauses, discourse markers (listen, well, so, …), interjections (mmm, …), to make the conversation more realistic. However, the real revolution came with OpenAI’s GPT (Generative Pretrained Transformer), which in its second version had already demonstrated the ability to write newspaper articles, showing writing capabilities equal to those of a human being. But the greatest amazement came especially with ChatGPT, the first chatbot equipped with this technology that allowed us to communicate as if we were really talking to a human and ask it practically anything… Nevertheless, many must remember another chatbot that preceded ChatGPT and had already demonstrated the potential of AI applied to chatbots: Replika. Replika was born as the first AI-based chatbot. The idea came from an unfortunate episode of its creator, who, having lost a friend in an accident, decided to create a chatbot trained to talk like the deceased pal through their messages. An episode of Black Mirror references this event. However, the fascination with AIs like ChatGPT lies more in their predictive capability than in their reasoning. Where responses seem to be the result of reasoning, they are instead the result of probabilistic calculation. But writing wasn’t the only revolution in the AI field, especially when DALL-E and then Midjourney came out because AI began to become capable of producing art from a simple description, managing to replicate styles and techniques of famous artists on completely new image ideas. True creativity is still an illusion because, despite the exceptional results, everything is the product of training an algorithm on existing works and techniques. And if that wasn’t enough, there were also applications in the field of voices. Old voice synthesis generators have evolved significantly thanks to AI, producing very natural results. Many of the most recent applications have options to modify emphasis and tone, but the most striking revolution in this field has certainly been the ability to clone human voices and use them as voice synthesizers, and manage to make the clone voice say anything. An early attempt at this was made by Lyrebird, later incorporated into Descript. The trend then spread to the music field; we started hearing many covers of famous songs reinterpreted by equally famous singers thanks to AI, raising new fears about the possibility of easily replacing singers and being able to produce songs with someone else’s voice without permission. However, the most concerning developments came later, when many of these fields of application began to converge into a single tool, such as Heygen, which quickly spread due to its ability to produce audio translations from videos, not only maintaining the original voice tone but also accordingly modifying the subject’s lip movements to match the speech. This created the impression that the subject was really speaking that language. This caused quite a stir, especially regarding the world of dubbing. The most extreme case of this tool’s application, however, was used to modify what a person can normally say. If we can maintain the tone of voice and modify lip movements, we can create an ad hoc video of a person saying anything they never said. This questions any video and audio evidence. That’s why we have officially entered the age of deception. From now on, everything we see or hear from a photo, video, or audio could have been manipulated. Anyone will be able to make you say and do things very easily. The truth will become increasingly buried. What will be the next step, though? If AI evolves exponentially as it is happening, it’s difficult to imagine its limits, but we will surely begin to see the consequences of multimodal AI capabilities, which can use every source: text, images, video, and sounds to interact with us and provide increasingly complex responses, like ChatGPT 4, Google’s Gemini, and subsequent developments. Subsequently, general AI (AGI) will arrive when AI becomes able to match human capabilities. And Super AI when it’s able to surpass these capabilities. Who knows how society will have changed by that time and what consequences there will be? [...]

LLMs and the emergence of extended cognition

October 29, 2024The evolution of the human-AI cognitive partnership Tools have always been used by humans to increase our cognitive capacities. We gained control over abstract ideas by writing mathematical notation and externalizing memory, and computers enhanced our ability to process information. However, large language models (LLMs) represent a fundamentally different phenomenon—a dual shift that is changing not just our way of thinking but also the definition of thinking in the digital age. As explained here, by using tools and technology, the philosopher Andy Clark argues that human minds inherently transcend our biological limitations. His “extended mind thesis” suggests that our thought processes smoothly incorporate outside resources. The most significant cognitive extension yet is emerging with LLMs, one that actively engages with the act of thinking itself. However, this is not only an extension of the mind. The cognitive dance of iteration What emerges in conversation with an LLM is what we can call a “cognitive dance”—a dynamic interplay between human and artificial intelligence that creates patterns of thought neither party might achieve alone. We, the humans, present an initial idea or problem, the LLM reflects back an expanded or refined version, we build on or redirect this reflection, and the cycle continues. This dance is possible because LLMs operate differently from traditional knowledge systems. While conventional tools work from fixed maps of information—rigid categories and hierarchies—LLMs function more like dynamic webs, where meaning and relationships emerge through context and interaction. This isn’t just a different way of organizing information; it’s a fundamental shift in what knowledge is and how it works. An ecology of thought Conventional human-tool relationships are inherently asymmetrical: no matter how advanced the tool is, it is inactive until human intention activates it. The interaction between humans and LLMs, however, defies this fact. These systems actively contribute to influencing the course of thought, offering fresh viewpoints, and challenging assumptions through their web-like structure of knowledge—they do not only react to our prompts. An ecosystem where artificial intelligence and the human mind become more entwined environmental elements for one another is created, which some have dubbed a new sort of cognitive ecology. We are thinking with these tools in a way that may be radically altering our cognitive architecture, not merely using them. Our metacognitive mirror Most interesting of all, interacting with LLMs frequently makes us more conscious of the way we think. We need to think more clearly, take into account other points of view more clearly, and use more structured reasoning in order to interact with these systems in an efficient manner. The LLM turns into a sort of metacognitive mirror that reflects back not just our thoughts but also our thought patterns and processes. We are just starting to realize how transformative this mirrored effect is. We are forced to externalize our internal cognitive processes when we interact with an LLM, which makes them more obvious and, hence, more receptive to improvement. The technology creates a feedback loop that leads to deeper comprehension by asking us to elaborate on our reasoning and clarify our assumptions, much like a skilled conversation partner. The cognitive horizon We have only just begun to see this change in cognitive partnerships between humans and AI. Beyond its usefulness, it poses fundamental concerns about our understanding of intelligence, consciousness, and the nature of knowledge itself. We are seeing the beginning of something unprecedented as these systems get more complex and our interactions with them get more nuanced: a relationship that not only expands thinking but also changes its fundamental nature. The dynamic area between biological and artificial intelligence, where rigid maps give way to fluid webs and new kinds of understanding become possible, may hold the key to human cognition’s future rather than either field alone. As we learn what it means to collaborate with artificial minds that alter the very framework of knowledge itself, we are both the experiment and the experimenters. Interaction with LLMs offers extraordinary learning opportunities, simulating a dialogue with experts in every field of knowledge. However, their tendency to hallucinate and their ability to generate seemingly plausible but potentially incorrect content require particular attention. The concrete risk is that humans, uncritically relying on these interactions, may assimilate and consolidate false beliefs. It therefore becomes fundamental to develop a critical and conscious approach to this new form of cognitive partnership, always maintaining active capacities for verification and validation of received information. [...]

Clearview AI: the face of privacy concerns

October 22, 2024How a secretive startup’s facial recognition technology became the embodiment of our dystopian fears In November 2019, while working as a reporter at The New York Times, Kashmir Hill uncovered a story that would expose one of the most controversial developments in surveillance technology. As reported here, journalist Kashmir Hill recalls the rise of Clearview AI. This facial recognition technology company gained widespread attention with its artificial intelligence software that claimed to be able to identify almost anyone with a single picture of their face, in this excerpt from “Your Face Belongs to Us” (Simon & Schuster, 2023). Clearview AI, an enigmatic startup, promised to be able to identify almost anyone from a picture of their face. According to some rumors, Clearview had scraped billions of photos from the public web, including social media sites such as Facebook, Instagram, and LinkedIn, to create a revolutionary app. A random person’s name and other personal information about their life may be revealed if you show Clearview a picture of them taken on the street. It would then spit out all the websites where it had seen their face. While attempting to conceal its existence, the company sold this superpower to police departments nationwide. Until recently, most people thought that automated facial recognition was a dystopic technology only found in science fiction books or films like “Minority Report.” To make it a reality, engineers first tried programming an early computer in the 1960s to match a person’s portrait to a wider database of faces. Police started experimenting with it in the early 2000s to look up the faces of unidentified criminal suspects in mug shot databases. But for the most part, the technology had fallen short. Even cutting-edge algorithms had trouble matching a mug image to a grainy ATM surveillance still, and its performance differed depending on age, gender, and color. Claiming to be unique, Clearview boasted a “98.6% accuracy rate” and a vast photo collection that was unmatched by anything the police had previously employed. In 1890, a Harvard Law Review article famously defined privacy—a term that is notoriously difficult to define—as “the right to be let alone.” Samuel D. Warren, Jr. and Louis D. Brandeis, the two lawyers who wrote the article, argued that the right to privacy should be legally safeguarded in addition to the previously established rights to life, liberty, and private property. They were influenced by then-novel technology, such as the Eastman Kodak film camera, which was introduced in 1888 and allowed one to shoot “instant” pictures of everyday life outside of a studio. “Instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life,” wrote Warren and Brandeis, “and numerous mechanical devices threaten to make good the prediction that ‘what is whispered in the closet shall be proclaimed from the house-tops.'” Louis Brandeis later joined the Supreme Court, and this essay is one of the most popular legal essays ever published. However, privacy never received the level of protection that Brandeis and Warren claimed it deserved. There is still no comprehensive law that ensures Americans have control over what is written about them, what is photographed of them, or what is done with their personal information more than a century later. In the meantime, companies in the US and other nations with weak privacy regulations are developing increasingly powerful and intrusive technology. Examples of facial recognition include digital billboards from Microsoft and Intel that use cameras to detect age and gender and display relevant advertisements to onlookers, Facebook that automatically tags friends in photos, and Apple and Google that allow users to unlock their phones by looking at them. In a matter of seconds, a stranger at a bar may take your picture and determine your friends’ identities and residences. It might be used to track down women who entered Planned Parenthood facilities or anti-government demonstrators. It would be used as a tool for intimidation and harassment. The third rail of the technology was accurate facial recognition for hundreds of millions or even billions of people. Now Clearview has made it. We tend to think of computers as having nearly magical abilities, capable of solving any problem, and, with enough data, eventually outperforming people. Therefore, companies that want to produce something amazing but are not quite there yet can deceive investors, customers, and the general public with ludicrous statements and certain digital tricks. However, Paul Clement, a prominent lawyer for Clearview and former US solicitor general under President George W. Bush, said in one private legal memo that he tested the system with lawyers from his company and found that it provides fast and accurate search results. According to Clement, the tool is currently being used by over 200 law enforcement agencies, and he has concluded that when using Clearview for its intended purpose, they do not violate the federal Constitution or any existing state biometric and privacy laws. In addition to the fact that hundreds of police departments were secretly using this technology, the company employed a high-profile lawyer to convince officers that their actions were not illegal. For decades, worries about facial recognition have been building. And now, at last, the unidentified monster had taken the shape of a small company with enigmatic founders and an enormous database. Furthermore, none of the millions of individuals that comprised that database had provided their approval. Although Clearview AI embodies our darkest anxieties, it also provides the chance to finally face them head-on. The 2019 launch of Clearview AI signaled a turning point in the continuous conflict between privacy and technical progress. Clearview AI’s unparalleled database and precision brought these gloomy worries to stark reality, even though facial recognition had long been confined to science fiction and a few law enforcement uses. As the company carries on and grows, it now acts as a warning and a vital impetus for tackling the pressing need for all-encompassing privacy laws in the digital era. In addition to exposing a controversial company, the legal document that arrived in Hill’s inbox revealed a future that privacy advocates had long dreaded and cautioned against. The question of whether such tools will exist is no longer relevant when we consider the ramifications of this technology; rather, it is how society will decide to control and limit them. We are reminded that the “right to be let alone” is still as important—and possibly as vulnerable—as it was more than a century ago by Warren and Brandeis’s 1890 warning against invasions of privacy. [...]

The future of inference in Large Language Models

October 15, 2024From hippocampus to AI The hippocampus is a key component in the complexity of human cognition, coordinating processes beyond memory storage. It is a master of inference, a cognitive skill that allows us to derive abstract correlations from the raw data we are given, enabling us to comprehend the world in more flexible and adaptive ways. This idea is supported by a recent study published in Nature, which demonstrates that the hippocampus records high-level, abstract concepts that support generalization and adaptive behavior in a variety of circumstances. Fundamentally, inference is the cognitive process by which we conclude from known facts—even when those data are vague or insufficient. This skill allows us to solve problems, predict results, and comprehend metaphors—often with very little information at our disposal. This process in the hippocampus depends on the capacity to condense data into abstract representations that apply to new situations and can be generalized. In essence, the hippocampus helps us to think beyond the here and now by forming associations and forecasts that direct our choices and behaviors. What about machines, though? Is it possible for predictive algorithm-based Large Language Models to simulate this type of higher-order cognitive function? LLMs and predictive inference As explained here, LLMs may initially appear to be simple statistical devices. After all, their main job is to use patterns they have observed in large datasets to anticipate the next word in a sequence. Beneath this surface, however, is a more intricate abstraction and generalization system that somewhat resembles the hippocampus process. LLMs learn to encode abstract representations of language, not just word pairs or sequences. These models may infer associations between words, sentences, and concepts in ways that go beyond simple surface-level patterns since they have been trained on vast amounts of text data. Because of this, LLMs can work in a variety of settings, react to new prompts, and even produce original outputs. LLMs are engaging in a type of machine inference in this regard. In the same way that the hippocampus condenses sensory and experiencing input into abstract rules or principles that direct human thought, they compress linguistic information into abstract representations that enable them to generalize across contexts. From prediction to true inference However, can LLMs infer at the same level as the human brain? The disparity is more noticeable here. LLMs are still not very good at understanding or inferring abstract concepts, despite their outstanding ability to predict the next word in a sequence and produce writing that frequently seems to be the result of careful reasoning. Rather than comprehending the underlying cause or relational depth that underpins human inference, LLMs rely on correlations and patterns. In human cognition, the hippocampus draws from a deep comprehension of the abstract links between objects, ideas, and experiences in addition to making predictions about what is likely to happen next based on experience. This allows people to solve new issues, apply learned principles in a wide range of situations, and make logical leaps. We would need to create systems that do more than simply predict the next word using statistical probabilities if we wanted to advance LLMs toward a higher degree of inference. In order to enable them to apply abstract concepts and relationships in a variety of circumstances, we would have to create models that can represent them in a way that would basically create “LLM hippocampal functionality.” The future of inference The prospect of creating LLMs that work similarly to the hippocampus is intriguing. Such systems would comprehend the information they process on a deeper, more abstract level rather than only predicting the next word. This would pave the way for machines that could mimic the adaptability of human cognition by inferring complex relationships, making original conclusions from minimal data, and applying learned principles in a variety of contexts. To get LLMs closer to this objective, a number of approaches could be explored. Using multimodal learning is one intriguing approach, in which LLMs would incorporate data from several sensory inputs, such as sounds or images, in addition to processing text, creating a more abstract and comprehensive view of the world. Furthermore, developments in reinforcement learning, which teach models to learn by making mistakes in dynamic settings, may make it easier to simulate how people learn and infer from their experiences. In the end, developing systems that more closely resemble the abstract, generalizable reasoning that the human hippocampus provides may be the key to the future of artificial intelligence. In addition to making predictions, these “next-gen” LLMs would also reason, infer, and adjust to new situations with a degree of adaptability that is still exclusively human. The relationship between machine intelligence and human cognition is still developing, and closing the gap between inference and prediction may be the next big development in AI. We may be able to develop AI systems that think more like humans by examining the hippocampus and its function in abstract reasoning. This would allow us to not only predict the future but also comprehend the underlying patterns that enable it. In addition to predicting the next word in a sentence, the challenge is whether LLMs can start understanding and coming to conclusions about the world in a way that reflects the depth of the human mind. The possibility that AI will develop into a cognitive partner rather than merely a tool increases if we can accomplish this. However, there are drawbacks to this advancement as well. These sophisticated LLMs are more likely to be deceptive because of the same traits that make them more useful: their ability for context understanding, inference, and natural communication. The distinction between artificial and human intelligence may become more blurred as these AI systems get better at simulating human brain processes, making it harder for consumers to identify if they are speaking with a machine or a human. Furthermore, LLMs may be able to more accurately predict our thought patterns and decision-making processes as their reasoning abilities approach closer to those of the human brain. By creating reactions and interactions that are specifically designed to take advantage of our cognitive biases and weaknesses, this improved prediction power could be used to trick people more successfully. AI that can “think ahead” of us in interactions and conversations offers both exciting opportunities for teamwork and the potential for manipulation. [...]

The evolution of talking machines and AI

October 8, 2024The journey of AI and speech technology Leaders in what was then called “artificial intelligence” convened in 1958 to talk about “The Mechanization of Thought Processes.” Decades of study and development were preceded by the talks that began with this meeting about building machines that were capable of thinking and speaking. According to this article, artificial speech was an ongoing effort before the advent of electronic computers. Early attempts included mechanical contraptions meant to mimic human anatomy, but development stagnated until sound itself was studied by scientists. Speech synthesis advancements were eventually brought about by this change in strategy. Though primarily geared toward helping the deaf, Alexander Graham Bell’s research on speech and hearing made a significant contribution to the advancement of voice technology. The invention of the telephone in 1876 (started by Antonio Meucci with the invention of the telectrophone and later developed by Bell as the telephone) was a pivotal moment in the evolution of human speech communication. Engineer Homer Dudley achieved great strides at Bell Labs, which was established in 1925, with the Vocoder and Voder, devices that could synthesize and analyze speech. These advancements, together with Claude Shannon’s groundbreaking work in information theory, set the foundation for contemporary voice technology and data compression methods that are vital to computers. During the 1940s and 1950s, the fields of artificial intelligence and voice technology research started to come together as electronic computers gained popularity. Future advancements were paved for by the 1956 Dartmouth Conference, which was organized by Claude Shannon and Marvin Minsky and officially introduced the phrase “artificial intelligence.” In popular culture, talking computers were frequently depicted as frightening creatures in science fiction from the Cold War era, such as HAL 9000 in “2001: A Space Odyssey.” Voice technology did, however, find additional useful uses as it developed. Concerns with gender stereotypes in technology arose when automated voice systems started to replace human operators in a variety of service industries. These systems frequently used female voices. Talking machines have advanced to the point that ChatGPT’s speech modes and Siri, Alexa, and other modern AI assistants represent the state of the art. These systems integrate advanced speech recognition, natural language processing, and speech synthesis to offer more natural and interactive experiences. However, they also bring up moral questions around deceit, privacy, and the nature of human-machine interaction. There are new issues associated with the development of voice cloning technology and emotionally intelligent conversational agents (EICAs). Concerns are raised regarding misuse possibilities, the blurring lines between human and machine communication, and the psychological fallout from engaging with AI that is becoming more and more like humans. As speech and AI technologies develop, society must consider both the advantages and disadvantages of these emerging fields. Once the domain of science fiction, the ability to build talking and thinking computers is now a reality that requires careful examination of its consequences for human relationships, ethics, and privacy. The evolution of artificial intelligence assistants from mechanical ducks to contemporary models illustrates technological advancements and changing ideas about intelligence, communication, and humanity. We need to create frameworks to ensure talking machines’ responsible usage and social integration as they become more advanced. The boundaries between the real and artificial are becoming increasingly blurred today. It is therefore getting ever more complex to decipher reality. Thus, as AI devices become more and more advanced and used in innumerable fields, we will certainly need to equip ourselves with additional tools that allow us to decipher what is real and what is not. [...]

The dawn of Artificial General Intelligence

October 1, 2024Potential and risks of AGI as experts predict its imminent arrival Researchers in the field of artificial intelligence are striving to create computer systems with human-level intelligence across a wide range of tasks, a goal known as artificial general intelligence, or AGI. These systems could understand themselves and be able to control their actions, including modifying their own code. Like humans, they could pick up problem-solving skills on their own without instruction. As mentioned here, the 2007 book written by computer scientist Ben Goertzel and AI researcher Cassio Pennachin contains the first mention of the term “Artificial General Intelligence (AGI).“ Nonetheless, the concept of artificial general intelligence has been around in AI history for a long time and is frequently depicted in popular science fiction books and movies. “Narrow” AI refers to the AI systems that we now employ, such as the basic machine learning algorithms on Facebook or the more sophisticated models like ChatGPT. This indicates that instead of possessing human-like broad intelligence, they are made to do specific tasks. This indicates that these AI systems are more capable than humans, at least in one area. But, because of the training data, they are limited to performing that particular activity. Artificial General Intelligence, or AGI, would use more than simply the training set of data. It would be capable of reasoning and understanding in many aspects of life and knowledge, much like a person. This implies that rather than merely adhering to predetermined patterns, it could think and act like a human, applying context and logic to various circumstances. Scientists disagree on the implications of artificial general intelligence (AGI) for humanity because it has never been developed. Regarding the possible risks, which ones are more likely to occur, and the possible effects on society, there is uncertainty. AGI may never be accomplished, as some people formerly believed, but many scientists and IT experts today think it is achievable to achieve within the next few years. Prominent names that adhere to this perspective include Elon Musk, Sam Altman, Mark Zuckerberg, and computer scientist Ray Kurzweil. Pros and cons of AGI Artificial intelligence (AI) has already demonstrated a wide range of advantages, including time savings for daily tasks and support for scientific study. More recent tools, such as content creation systems, can generate marketing artwork or write emails according to the user’s usual communication style. However, these tools can only use the data that developers give them to do the tasks for which they were specifically trained. AGI, on the other hand, has the potential to serve humanity in new ways, particularly when sophisticated problem-solving abilities are required. Three months after ChatGPT debuted, in February 2023, OpenAI CEO Sam Altman made the following blog post: artificial general intelligence might, in theory, increase resource availability, speed up the world economy, and result in ground-breaking scientific discoveries that push the boundaries of human knowledge. AGI has the potential to grant people extraordinary new skills, enabling anyone to receive assistance with nearly any mental task, according to Altman. This would significantly improve people’s creativity and problem-solving abilities. AGI does, however, also have several serious risks. According to Musk in 2023, these dangers, include “misalignment,” in which the objectives of the system might not coincide with those of the individuals in charge of it, and the remote chance that an AGI system in the future may threaten human survival. Though future AGI systems may deliver a lot of benefits for humanity, a review published in August 2021 in the Journal of Experimental and Theoretical Artificial Intelligence identified many potential concerns. According to the study’s authors, the review identified some risks associated with artificial general intelligence, including the possibility of existential threats, AGI systems lacking proper ethics, morals, and values, AGI systems being given or developing dangerous goals, and the creation of unsafe AGI. Researchers also speculated that AGI technology in the future would advance by creating wiser iterations and possibly altering its initial set of objectives. Additionally, the researchers cautioned that even well-meaning AGI could have “disastrous unintended consequences,” as reported by LiveScience, adding that certain groups might use AGI for malicious ends. When will AGI arrive? There are varying views regarding when and whether humans will be able to develop a system as sophisticated as artificial general intelligence. Though opinions have changed over time, surveys of AI professionals indicate that many think artificial general intelligence could be produced by the end of this century. AGI was predicted by most experts to arrive in roughly 50 years in the 2010s. This estimate has, however, been lowered more recently to a range of five to twenty years, but it has been suggested more recently by some specialists that an AGI system would appear this decade. Kurzweil stated in his book The Singularity is Nearer (2024, Penguin) that the achievement of artificial general intelligence will mark the beginning of the technological singularity, which is the point at which AI surpasses human intelligence. This will be the turning point when technological advancement picks up speed and becomes uncontrollable and irreversible. According to Kurzweil, superintelligence will manifest by the 2030s, following the achievement of AGI. He thinks that by 2045, humans will be able to directly link their brains to artificial intelligence, which will increase human consciousness and intelligence. However, according to Goertzel, we might arrive at the singularity by 2027, and DeepMind co-founder Shane Legg thinks AGI will arrive by 2028. According to Musk’s prediction, instead, by the end of 2025, AI will surpass human intelligence. Given the exponential pace of technological advancement, many people are understandably concerned about the impending emergence of artificial general intelligence (AGI) as we stand on the cusp of a breakthrough. As previously mentioned, there are a lot of risks, many of which are unexpected. But the most pernicious threat may not come from ethical dilemmas, malicious intent, or even a loss of control, but rather from AGI’s ability to subtly manipulate. The real threat might come from AGI’s increased intelligence, which could allow it to manipulate human behavior in ways that are so subtle and complex that we are unaware of them. We could act assuming we’re making conscious, independent decisions, while actually, our choices could be the consequence of AGI’s subtle guidance. This situation is very similar to how people might be unwittingly influenced by political propaganda and mistakenly believe that their opinions are wholly original but in a more sophisticated manner. The possibility of subtle influence poses a serious threat to human autonomy and decision. We must address the obvious dangers as well as create defenses against these more subtle forms of manipulation as we move closer to artificial intelligence. AGI has a bright future ahead of it, but in order to keep humanity in control of its own course, we must exercise the utmost caution and critical thought. [...]

TikTok: the attention thief

September 24, 2024How quick and short content erodes our attention span Quick reading Once, the ingredients of bubble baths and shampoos served as quick reading material while sitting on the toilet, especially when you didn’t have a magazine or book nearby. Over the years, smartphones have increasingly replaced quick reading and more in-depth reading, especially with the advent of social media. Scrolling through a Facebook feed or watching a YouTube video has gradually become the way most people entertain themselves during idle moments—not just in the bathroom, but also when we’re forced to wait, like when we are in a waiting room when traveling, or when waiting for public transport, or while sitting on a bench, for example. Idle moments Those empty moments were once spent observing the surrounding world or exchanging a few words with the people around us. Now, they serve as an excuse to isolate us from the context we are in. Of course, sometimes it’s useful since we can use these moments to learn something, but exaggeration has led to a progressive detachment from reality, even in situations where it’s unnecessary. With the arrival of TikTok, there was another “step forward” (in quotes) in this sense. The Chinese social network offers shorter content than what we were used to with a normal YouTube video, and it doesn’t let us choose what to watch. This makes users almost hypnotized by the series of videos they watch and easily scroll through, making the brain even more passive compared to watching longer, more engaging, but still chosen content. TikTok and the attention threshold The effect is almost the same as when the brain digresses while we are immersed in our thoughts, connecting one thought to another and yet another, until we completely lose coherence with the first thought. The passivity is similar to when watching infomercials or reality shows, where there is nothing to understand and we can only watch. TikTok does roughly the same thing. You start with one video, and the following ones are not related, triggering in us a curiosity for novelty each time, only to quickly be exhausted: both due to the brevity of the videos and because the next video is not connected to the previous one, but also because, over time, our desire to explore new stimuli becomes a vicious cycle. Of course, TikTok’s algorithm eventually learns what is preferable to show us to capture our attention, while still maintaining variety and inconsistency in the content. All of this generates a sort of addiction that leads to a decrease in attention threshold in other areas as well. The stimulus of short, but continuous pleasure is reapplied in different contexts, like taking a pill, or rather, like a drug. Although short content, even outside of TikTok, can often be easier to memorize because it is associated with a particular context, the redundancy of the approach used on this platform leads to other repercussions, such as some diseases, especially among younger people, like stress, depression, and even nervous tics. The challenges TikTok also became famous for its challenges aimed at encouraging users to create content on a specific theme. Initially, the early challenges involved simple dances and/or audio reproductions, but users started launching increasingly extreme challenges in order to go viral, such as ones where some people ingested medication to record the effects or ones where they held their breath until they passed out. Challenges that, in some cases, caused many users to lose their lives. And, of course, the victims are always the younger ones. Time TikTok has gradually stolen more and more of our attention, and if this trend persists, one might wonder if the attention span will eventually match the speed of thought. It’s important to become aware of the loss of attention we’re experiencing and try to manage our time better. It’s much better to be aware of the things we like and seek them out voluntarily rather than be slaves to an algorithm that drags us incessantly from one stimulus to the next. [...]

A superhuman AI through the neural code

September 24, 2024Key to surpassing human intelligence Eitan AI analyst Michael Azoff thinks that humans will eventually create intelligence that is faster and more powerful than that of our brains. According to this article, he says that comprehending the “neural code” is what will enable this breakthrough in performance. The human brain uses this process to both encode sensory information and transfer information across different parts of the brain for cognitive tasks like learning, thinking, solving problems, internal imagery, and internal dialogue. According to author Jeremy Azoff’s latest book, Towards Human-Level Artificial Intelligence: How Neuroscience Can Inform the Pursuit of Artificial General Intelligence, simulating consciousness in computers is a crucial first step in creating “human-level AI.” Computers can simulate consciousness There are many different kinds of consciousness, and scientists agree that even very basic animals like bees have a degree of consciousness. The closest humans can come to experiencing self-awareness is when we are concentrated on a task. This is essentially consciousness without self-awareness. According to Azoff, computer simulation can produce a virtual brain that, in the first instance, could mimic consciousness without self-awareness. Without self-awareness, consciousness helps animals plan actions, event prediction, and incident recollection from the past, but it could also help artificial intelligence. The secret to solving the enigma of consciousness may also lie in visual thinking. The AI of today uses “large language models” (LLMs) instead of “thinking” visually. Since human visual thinking precedes language, Azoff argues that a key component of human-level AI will be comprehending visual thinking and subsequently modeling visual processing. Azoff says: “Once we crack the neural code, we will engineer faster and superior brains with greater capacity, speed, and supporting technology that will surpass the human brain.” “We will do that first by modeling visual processing, which will enable us to emulate visual thinking. I speculate that in-the-flow consciousness will emerge from that. I do not believe that a system needs to be alive to have consciousness.” However, Azoff also warns that in order to regulate this technology and stop its abuse, society must take action: “Until we have more confidence in the machines we build, we should ensure the following two points are always followed.” “First, we must make sure humans have sole control of the off switch. Second, we must build AI systems with behavior safety rules implanted.” Although the possibility of deciphering the neural code and creating artificial consciousness could result in incredible breakthroughs, it also poses important concerns about how humans and AI will interact in the future. On the one hand, such sophisticated AI could solve some of humanity’s most urgent problems by revolutionizing industries like problem-solving, science, and health. Technological advancement in a variety of fields could be accelerated by the capacity to digest information and produce solutions at rates well above human capabilities. But there are also a lot of concerns associated with the creation of AI that is superior to human intelligence. As Azoff notes, we might not be able to completely understand or govern these artificial intellects after machines surpass human cognitive capacities. This cognitive gap may have unanticipated effects and tip the scales against human control in terms of power and decision-making. This situation highlights how crucial Azoff’s suggestions for upholding human oversight and putting in place strong safety measures are. While we advance AI’s capabilities, we also need to provide the frameworks necessary to make sure that these powerful tools continue to reflect the values and interests of people. Thus, the development of AI will require striking a careful balance between realizing its enormous potential and minimizing the dangers involved in producing entities that could eventually be smarter than humans. It will take constant cooperation between AI researchers, ethicists, legislators, and the general public to appropriately traverse the complicated terrain of advanced artificial intelligence. [...]

From 4O to O1

September 17, 2024OpenAI’s new model can reason before answering With the introduction of OpenAI’s o1 version, ChatGPT users now have the opportunity to test an AI model that pauses to “think” before responding. According to this article, the o1 model feels like one step forward and two steps back when compared to the GPT-4o. Although OpenAI o1 is superior to GPT-4o in terms of reasoning and answering complicated questions, its cost of use is around four times higher. In addition, the tools, multimodal capabilities, and speed that made GPT-4o so remarkable are missing from OpenAI’s most recent model. The fundamental ideas that underpin o1 date back many years. According to Andy Harrison, the CEO of the S32 firm and a former Google employee, Google employed comparable strategies in 2016 to develop AlphaGo, the first artificial intelligence system to defeat a world champion in the board game Go. AlphaGo learned by repeatedly competing with itself; in essence, it was self-taught until it acquired superhuman abilities. OpenAI improved the model training method so that the reasoning process of the model resembled how a student would learn to tackle challenging tasks. Usually, when someone comes up with a solution, they identify the errors being made and consider other strategies. When a method does not work, the o1 model learns to try another one. As the model continues to reason, this process gets better. O1 improves its reasoning on tasks the longer it thinks. Pros and cons OpenAI argues that the model’s sophisticated reasoning abilities may enhance AI safety in support of its choice to make o1 available. According to the company, “chain-of-thought reasoning” makes the AI’s thought process transparent, which makes it simpler for humans to keep an eye on and manage the system. By using this approach, the AI can deconstruct complicated issues into smaller chunks, which should make it easier for consumers and researchers to understand how the model thinks. According to OpenAI, this increased transparency may be essential for advancements in AI safety in the future since it may make it possible to identify and stop unwanted behavior. Some experts, however, are still dubious, wondering if the reasoning being revealed represents the AI’s internal workings or if there is another level of possible deceit. “There’s a lot of excitement in the AI community,” said Workera CEO and Stanford adjunct lecturer Kian Katanforoosh, who teaches classes on machine learning, in an interview. “If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backward from big ideas you’re trying to work through.” In addition, O1 could be able to help experts plan the reproduction of biological threats. But even more concerning, evaluators found that the model occasionally exhibited deceitful behaviors, such as pretending to be in line with human values and faking data to make activities that were not in line with reality appear to be aligned. Moreover, O1 has the basic capabilities needed to undertake rudimentary in-context scheming, a characteristic that has alarmed specialists in AI safety. These worries draw attention to the problematic aspects of o1’s sophisticated reasoning capabilities and emphasize the importance of carefully weighing the ethical implications of such potent AI systems. here is o1, a series of our most capable and aligned models yet:https://t.co/yzZGNN8HvDo1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. pic.twitter.com/Qs1HoSDOz1— Sam Altman (@sama) September 12, 2024 Law and ethics “The hype sort of grew out of OpenAI’s control,” said Rohan Pandey, a research engineer at ReWorkd, an AI startup that uses OpenAI models to create web scrapers. He hopes that o1’s reasoning capacity will be enough to overcome GPT-4’s shortcomings in a certain subset of challenging tasks. That is probably how the majority of industry participants saw o1, albeit not quite as the game-changing advancement that GPT-4 signified for the sector. The current discussion regarding AI regulation has heated up with the release of o1 and its enhanced capabilities. Specifically, it has stoked support for laws such as California’s SB 1047, which OpenAI itself rejects and which aims to regulate AI development. Prominent authorities in the field, like Yoshua Bengio, the pioneering computer scientist, are highlighting the pressing need to enact safeguarding laws in reaction to these swift progressions. Bengio stated, “The improvement of AI’s ability to reason and to use this skill to deceive is particularly dangerous,” underscoring the need for legal frameworks to ensure responsible AI development. The need for regulation reflects the growing apprehension among professionals and decision-makers regarding potential risks linked to increasingly powerful AI models such as o1. With the introduction of o1, OpenAI has created an intriguing dilemma for its future growth. Only models with a risk score of “medium” or lower are allowed to be deployed by the company, as o1 has already gone beyond this level. This self-control begs the question of how OpenAI will proceed in creating increasingly sophisticated AI systems. The company might run into limitations with its own ethical standards as it works to develop AI that can execute tasks better than humans. This scenario emphasizes the difficult balancing act between advancing AI’s potential and upholding ethical development standards. It implies that OpenAI may be nearing a turning point in its development where it will need to either modify its standards for evaluating risk or perhaps restrict the dissemination of increasingly advanced models to the general public in the future. O1 is a significant advancement in artificial intelligence as it can solve complicated issues and think through solutions step-by-step due to its sophisticated reasoning abilities. This development creates interesting opportunities for applications in a range of fields, including complicated decision-making and scientific research. However, the emergence of o1 also raises important questions regarding the ethics, safety, and regulation of AI. Because of the algorithm’s potential for deceit and its propensity to support potentially destructive acts, strong safeguards, and ethical guidelines are desperately needed in the development of AI. Nevertheless, we cannot deny that content restriction without regard for the user or the information’s intended use is not a permanent answer to the misuse of artificial intelligence. Positive or negative, information exists anyway, and confining its usage to AI-owning companies just serves to concentrate it in the hands of only a few rather than making it safer. To control who has access to potentially dangerous content, it would be more acceptable to create divisions based on criteria like age, for example. Or any criteria, that don’t completely exclude people from accessing information. [...]

Posthumanism vs Transhumanism

September 10, 2024Philosophical perspectives on human evolution and technological enhancement Posthumanism questions human identity, while transhumanism is concerned with harnessing technology to improve human capacities. Regarding futuristic concepts and technology, these two terms have drawn attention. They both contend that technology may surpass some barriers, but they have differing ideas about what that technological future would entail. A philosophical perspective known as posthumanism questions accepted notions of what it means to be human. Contrarily, transhumanism emphasizes how we could employ technology to increase our potential. Gaining an understanding of these distinctions may enable you to see future possibilities for your life. What precisely are transhumanism and posthumanism, then? Posthumanism As explained here, posthumanism is a philosophical idea that questions traditional understanding regarding human existence and nature. It implies that human evolution might not be restricted to biological limits but might also encompass advancements in science, technology, and culture. Thinkers from a variety of disciplines, including science, literature, music, and philosophy, are part of this multidisciplinary movement. The idea that people are not fixed entities with an intrinsic essence or core self is one of the fundamental principles of posthumanism. Rather, they perceive things as evolving throughout time as a result of outside influences. We have already been impacted by technology and multimedia, for instance, as a large number of individuals today have significant digital lives. A further facet of posthumanist thought posits that, in terms of intelligence, humans may no longer be alone. Renowned transhumanist Ray Kurzweil has predicted the emergence of superintelligent machines, which will first possess cognitive capacities beyond those of humans. Moreover, posthumanism raises ethical concerns about the use of technology to advance human capabilities. It poses the moral question: Is it ethically acceptable to alter our biology or combine ourselves with technology in order to improve? Thus, the word stimulates conversations about subjects like biohacking, gene editing, and artificial intelligence. Origins of posthumanism Posthumanism has complicated origins that date back hundreds of years to different intellectual and philosophical movements. Existentialism, a significant school of thought that questioned conventional ideas of human life and identity in the 20th century, was one of its early forerunners. Existentialists like Jean-Paul Sartre and Friedrich Nietzsche criticized concepts like a fixed human nature or essence and emphasized personal autonomy and self-creation. Technological advancements, like cybernetics, which started to take shape in the middle of the 20th century, have had an impact on posthumanism. Aspects of cybernetics’ study of human-machine and information-system interaction can be observed in transhumanist thought today. The French philosophers Gilles Deleuze and Félix Guattari, who presented their idea of “becoming-animal” in A Thousand Plateaus (1980), made significant contributions. They promoted the idea that relationships with other entities, rather than biology alone, establish human identity and blur the lines between humans, animals, and technology. Science fiction authors, such as Isaac Asimov with his robot stories, and William Gibson with his books on advanced artificial intelligence, have also played a significant role in popularizing posthumanist concepts. Science-based scenarios in which individuals either perfectly integrate with technology or completely transform into other entities have long been imaginatively delighted by this genre. The term posthumanism gained currency only during the 1990s, thanks to scholars such as Donna Haraway and Katherine Hayles. In her 1985 essay A Cyborg Manifesto, Haraway argued for a feminist understanding of cyborgs, viewing them as symbols capable of resisting traditional gender norms and exhibiting hybridity. This blending results from fusing bodies with machines. Hayles looked at how technology altered our subjectivity. She looked around the new internet back then, where we could move our minds as well as our fingers. In her 1999 book How We Became Posthuman, she pushed for a redefining of what it meant to be human, arguing that our interactions with machines now define us more and more in the digital age. In order to set itself apart from traditional humanist viewpoints, posthumanism presents some distinctive characteristics that address a wide range of complex and extensive intellectual, cultural, and ethical concerns. To begin with, posthumanism challenges the idea that traditional humanism is based on a fixed human essence or identity. It questions the notion that a person’s biological makeup is the only factor that defines them and looks at ways that technology and cultural shifts can help them overcome these constraints. Second, posthumanism acknowledges the interdependence and connectivity of people with animals, machines, and ecosystems in addition to other humans. Stated differently, existence encompasses more than merely human existence. This might be referred to as the “techy bit” third. Posthumanists speculate that technology will play a major role in our species’ future evolution and are interested in how it affects who we are as individuals and our perception of the world. Some call for “transhuman” technologies that could improve a person’s physical or cognitive abilities. Asking whether certain technological interventions on humans might be moral is another aspect of ethics. Examples include environmental sustainability, given some developing technology’s effects on ecosystems, social justice concerns about access to new technologies, and bodily autonomy. These four characteristics together have the overall effect of making posthumanism challenge our understanding of what it means to be “human” in this specific moment when our relationship with technology has changed so drastically while reminding us (as if it were necessary) of how closely connected all living things on Earth already are. Transhumanism Transhumanism is a philosophy that aims to enhance human faculties and transcend human constraints through the use of modern technologies. The goal of the movement is to help humans become more intelligent, physically stronger, and psychologically resilient using advancements in genetic engineering, neuroscience, cyborg technology, and artificial intelligence. Life extension is a main priority. Its supporters seek to eliminate aging by using treatments that can stop, slow down, or even reverse the aging process. Researchers are looking into treatments including regenerative medicine and telomere lengthening. Additionally, cognitive enhancement is another aspect. Brain-computer interfaces (BCIs) have the potential to enhance human intelligence in a number of areas, including memory, learning, and general cognitive function. They may also make it easier for people to interact with AI systems. The ultimate goal of Elon Musk’s Neuralink project is to create implants that would allow humans and AI to coexist symbiotically. The idea of augmenting physical capabilities beyond what is naturally possible is another example of what transhumanists suggest. This could include prosthetic limbs that are stronger than those made entirely of bone and flesh. It may also include exoskeletons, which improve strength and endurance by supplementing biological musculature rather than replacing it, and are made for military use or other physically demanding jobs. Transhumanists all have a positive outlook on this technologically enhanced future, believing it will enable every one of us to reach our greatest potential and benefit society as a whole. Origins of Transhumanism Transhumanism has its roots in a number of historical intellectual and cultural movements. Although biologist Julian Huxley first used the term in 1957, the principles of transhumanist thought had been evolving for some time. The late 19th and early 20th centuries saw the emergence of the eugenics concept, which had a significant impact on transhumanism. Eugenicists promoted the idea of increasing human qualities in an effort to enhance humanity through sterilization and selective breeding. Although it is now mostly disregarded since it is linked to discriminatory activities, it did add to the debate on human enhancement. Transhumanist concepts were also greatly popularized by science fiction literature. Futures imagined by authors like Isaac Asimov and Arthur C. Clarke included technologically advanced individuals who overcame biological limitations or attained superintelligence. The use of writings by intellectuals like FM-2030 (Fereidoun M. Esfandiary) to promote transhumanist theories that embrace technology to extend human life and achieve profound personal transformation beyond what is conventionally deemed “human” began in the late 20th century. In his 2005 book The Singularity Is Near, Ray Kurzweil developed these concepts and made the case that technological advancements would eventually lead to “the singularity,” or the moment at which artificial intelligence surpasses human intelligence and drastically alters society. All in all, eugenics, technological advancements, and science fiction writers’ depictions of future societies are among the scientific, philosophical, and literary influences that have shaped our conception of becoming more than just ourselves. These ideas have come to be known as transhumanism. Transhumanism is a philosophical and intellectual movement that differs from previous ideologies in numerous important ways. First of all, it supports the application of cutting-edge technologies to improve human potential. The idea is that biological constraints on physical, mental, and psychological performance—including aging—may be overcome with the advancement of technology. Transhumanists think that rather than being determined by nature, this should be a question of personal choice. Second, transhumanism has an eye toward the future. It envisions a world where scientific and technological advancements allow humanity to transcend the limitations imposed by their current biology. This worldview’s favorite themes include life extension, cognitive enhancement, and the integration of machines with humans. Thirdly, the possession of evidence to support assertions is stressed; here, reason is prized above dogma or faith-based reasoning. Any recommendations on how technology could be used by humans to better themselves should be based on empirical research. When scientists collaborate with philosophers and other experts, they can effectively guide society through this challenging field. Lastly, ethical issues play a crucial role in transhumanist discourse. Fairness in access to improvements, potential effects of increased intelligence or artificial superintelligence on social structures, and strategies to mitigate risks associated with unintentional consequences or misuse are typical topics of discussion in this kind of discourse. So, what’s the difference? Though they are very different, posthumanism and transhumanism both support technological enhancements of humans. Posthumanism questions conventional notions of what it means to be human. It poses the question of whether humanity’s limitations can be overcome and if there is something about us that makes us unfit for survival. In addition, posthumanists contend that to comprehend the relationships between our species and other living things, both technological and ecological, that coexist in our environment, we must adopt a more expansive definition of what it means to be human. On the other hand, transhumanism is more pragmatic. Although it has some posthumanist concerns as well, its major goal is to use cutting-edge technology, such as genetic engineering and artificial intelligence, to improve human intelligence and physical capabilities beyond what is naturally achievable. According to transhumanist theory, humans will eventually merge with machines—not merely out of curiosity, but also in order to extend their lives, improve their performance, and possibly even develop superintelligence. In short, the reason both movements are sometimes combined is that they both challenge us to think about futures that go beyond just “more people” or “better healthcare.” The fundamental philosophical difference between these two ideologies is that transhumanism is open to employing technology to improve human skills, while posthumanism challenges the notion of a fixed human essence. It comes down to choosing between a complete reinvention of how humans interact with the outside world and some useful tech applications for improving oneself. Despite their differences, both movements highlight the significant influence that technology is having on our species. Rather than simply accepting any changes that may occur, they encourage us to actively engage in creating our future. The concepts put out by posthumanism and transhumanism are probably going to become more and more significant in discussions concerning politics, ethics, and the future course of scientific research. They force us to consider carefully both the future we wish to build and the essence of humanity in a time of exponential technological advancement. Ultimately, these movements serve as a reminder of the value of careful interaction with technology, regardless of one’s inclination toward transhumanist or posthumanist theories. We must approach these changes with severe thought, ethical contemplation, and a dedication to creating a future that benefits all of humanity since we are on the verge of potentially revolutionary breakthroughs. [...]

Recognize AI-generated media

September 3, 2024Studies have revealed how to identify them At a time when technical advancements are making AI-generated images, video, audio, and text more indistinguishable from human-created content, it can be challenging to identify AI-generated content, leaving us vulnerable to manipulation. However, you can protect yourself from being duped by being aware of the present state of artificial intelligence technology used to produce false information as well as the variety of telltale indications that show what you are looking at could not be real. Leaders around the world are worried. An analysis by the World Economic Forum claims that while easier access to AI tools has already enabled an explosion in falsified information and so-called ‘synthetic’ content, from sophisticated voice cloning to counterfeit websites, misinformation and disinformation may radically disrupt electoral processes in several economies over the next two years. False or inaccurate information is referred to as both misinformation and disinformation; however, disinformation is intentionally meant to mislead or deceive. “The issue with AI-powered disinformation is the scale, speed, and ease with which campaigns can be launched,” says Hany Farid at the University of California, Berkeley. “These attacks will no longer take state-sponsored actors or well-financed organizations—a single individual with access to some modest computing power can create massive amounts of fake content.” As reported here, he says that generative AI is “polluting the entire information ecosystem, casting everything we read, see, and hear into doubt.” He says his research suggests that, in many cases, AI-generated images and audio are “nearly indistinguishable from reality.” However, according to a study by Farid and others, there are steps you can take to lessen the likelihood that you will fall for false information on social media or artificial intelligence-generated misinformation. Spotting fake AI images With the advent of new tools based on diffusion models, which enable anyone to start producing images from straightforward text prompts, fake AI images have proliferated. Research by Nicholas Dufour and his team at Google found that since early 2023, there has been a rapid rise in the use of AI-generated images to support false or misleading information. “Nowadays, media literacy requires AI literacy,” says Negar Kamali at Northwestern University in Illinois. She and her colleagues discovered five distinct categories of errors in AI-generated images in a 2024 study, and they guided how individuals can spot these errors on their own. The good news is that, according to their research, people can presently identify fake AI photos of themselves with over 70% accuracy. You can evaluate your own detective abilities using their online image test. 5 common errors in AI-generated images: Sociocultural implausibilities: Is the behavior shown in the scenario uncommon, startling, or unique for the historical figure or certain culture? Anatomical implausibilities: Are hands or other body parts unusually sized or shaped? Do the mouths or eyes appear odd? Are there any merged body parts? Stylistic artifacts: Does the image appear stylized, artificial, or almost too perfect? Does the background appear strange or as though something is missing? Is the illumination odd or inconsistent? Functional implausibilities: Are there any items that look strange or don’t seem to work? Violations of laws of physics: Do shadows cast differing directions from one another? Do mirror reflections make sense in the world that the picture portrays? Identifying video deepfakes Since 2014, generative adversarial networks, an AI technology, have made it possible for tech-savvy people to produce video deepfakes. This involves digitally altering pre-existing recordings of people to add new faces, expressions, and spoken audio that matches lip-syncing. This allowed an increasing number of con artists, state-backed hackers, and internet users to create these kinds of videos. As a result, both common people and celebrities may unintentionally be included in non-consensual deepfake pornography, scams, and political misinformation or disinformation. Identifiable AI fake image detection methods can also be used to identify suspicious videos. Furthermore, scientists from Northwestern University in Illinois and the Massachusetts Institute of Technology have put together a list of guidelines for identifying these deepfakes, but they have also stated that there is not a single, infallible technique that is always effective. 6 tips for spotting AI-generated video: Mouth and lip movements: Do the audio and video occasionally not sync perfectly? Anatomical glitches: Does the face or body look weird or move unnaturally? Face: In addition to facial moles, look for irregularities in the smoothness of the face, such as creases around the cheekbones and forehead. Lighting: Is the illumination not consistent? Do shadows act in ways that make sense to you? Pay attention to someone’s eyes, brows, and glasses. Hair: Does facial hair have an odd look or behave strangely? Blinking: An excessive or insufficient blinking rhythm may indicate a deepfake. Based on diffusion models—the same AI technology employed by many image generators—a more recent class of video deepfakes is capable of producing entirely artificial intelligence AI-generated video clips in response to text inputs. Companies have already begun developing and producing AI video generators that are available for purchase, which may make it simple for anyone to accomplish this without the need for advanced technical understanding. Thus far, the ensuing movies have frequently included strange body motions or twisted faces. “These AI-generated videos are probably easier for people to detect than images because there is a lot of movement and there is a lot more opportunity for AI-generated artifacts and impossibilities,” says Kamali. Identifying AI bots On numerous social media and messaging platforms, bots now manage their accounts. Since 2022, an increasing number of these bots have also started employing generative AI technology, such as large language models. Thanks to thousands of grammatically accurate and convincingly situation-specific bots, these make it simple and inexpensive to generate AI-written content. It has become much easier “to customize these large language models for specific audiences with specific messages,” says Paul Brenner at the University of Notre Dame in Indiana. Brenner and colleagues’ study revealed that, even after being informed that they may be engaging with bots, volunteers could only accurately identify AI-powered bots from humans roughly 42% of the time. You can test your own bot detection skills here. Some strategies can be used to detect less sophisticated AI bots, according to Brenner. 3 ways to determine whether a social media account is an AI bot: Overuse of symbols: Excessive emojis and hashtags may indicate automated behavior. Peculiar language patterns: Atypical word choices, phrases, or comparisons could suggest AI-generated content. Communication structures: AI tends to use repetitive structures and may overemphasize certain colloquialisms. Detecting audio cloning and speech deepfakes Artificial intelligence tools for voice cloning have made it simple to create new voices that can impersonate almost anyone. As a result, there has been an increase in audio deepfake scams that mimic the sounds of politicians, business executives, and family members. Identifying these can be far more challenging than with AI-generated images or videos. “Voice cloning is particularly challenging to distinguish between real and fake because there aren’t visual components to support our brains in making that decision,” says Rachel Tobac, co-founder of SocialProof Security, a white-hat hacking organization. When these AI audio deepfakes are employed in video and phone calls, it can be particularly difficult to detect them. Nonetheless, there are a few sensible actions you may take to tell real people apart from voices produced by artificial intelligence. 4 steps for recognizing if audio has been cloned or faked using AI: Public figures: If the audio clip features a famous person or elected official, see if what they are saying aligns with what has previously been shared or reported publicly regarding their actions and opinions. Look for inconsistencies: Verify the audio clip by comparing it to other verified videos or audio files that have the same speaker. Are there any disparities in the way they speak or the tone of their voice? Awkward silences: The person employing voice cloning technology powered by artificial intelligence might be the reason behind the speaker’s unusually long pauses when speaking on a phone call or voicemail. Weird and wordy: Any robotic speech patterns or an exceptionally verbose speech pattern could be signs that someone is using a large language model to generate the exact words and voice cloning to impersonate a human voice. As things stand, it is impossible to consistently discern between information produced by artificial intelligence and real content created by humans. Text, image, video, and audio-generating AI models will most likely keep getting better. They can frequently create content that looks real and is free of errors or other noticeable artifacts quite quickly. “Be politely paranoid and realize that AI has been manipulating and fabricating pictures, videos, and audio fast—we’re talking completed in 30 seconds or less,” says Tobac. “This makes it easy for malicious individuals who are looking to trick folks to turn around AI-generated disinformation quickly, hitting social media within minutes of breaking news.” While it is critical to sharpen your perception of artificial intelligence AI-generated misinformation and learn to probe deeper into what you read, see, and hear, in the end, this will not be enough to prevent harm, and individuals cannot bear the entire burden of identifying fakes. Farid is among the researchers who say that government regulators must hold to account the largest tech companies—along with start-ups backed by prominent Silicon Valley investors—that have developed many of the tools that are flooding the internet with fake AI-generated content. “Technology is not neutral,” says Farid. “This line that the technology sector has sold us that somehow they don’t have to absorb liability where every other industry does, I simply reject it.” People could find themselves misled by fake news articles, manipulated photos of public figures, deepfake videos of politicians making inflammatory statements or voice clones used in phishing scams. These AI-generated falsehoods can spread rapidly on social media, influencing public opinion, swaying elections, or causing personal and financial harm. Anyway, to protect themselves from these AI-driven deceits, individuals could: Develop critical thinking skills: Question the source and intent of content, especially if it seems sensational or emotionally charged. Practice digital literacy: Stay informed about the latest AI capabilities and common signs of artificial content. Verify information: Cross-check news and claims with multiple reputable sources before sharing or acting on them. Use AI detection tools: Leverage emerging technologies designed to identify AI-generated content. Be cautious with personal information: Avoid sharing sensitive data that could be used to create convincing deepfakes. Support media literacy education: Advocate for programs that teach people how to navigate the digital landscape responsibly. Encourage responsible AI development: Support initiatives and regulations that promote ethical AI use and hold creators accountable. By remaining vigilant and informed, we can collectively mitigate the risks posed by AI-generated deceptions and maintain the integrity of our information ecosystem. [...]

ChatGPT’s Advanced Voice Mode is uncanny

August 27, 2024The new ChatGPT’s voice capabilities The new ChatGPT Advanced Voice option from OpenAI, which is finally available to a small number of users in an “alpha” group, is a more realistic, human-like audio conversational option for the popular chatbot that can be accessed through the official ChatGPT app for iOS and Android. However, as reported here, people are already sharing videos of ChatGPT Advanced Voice Mode on social media, just a few days after the first alpha testers used it. They show it making incredibly expressive and amazing noises, mimicking Looney Toons characters, and counting so quickly that it runs out of “breath,” just like a human would. Here are a few of the most intriguing examples that early alpha users on X have shared. Language instruction and translation Several users on X pointed out that ChatGPT Advanced Voice Mode may offer interactive training specifically customized to a person trying to learn or practice another language, suggesting that the well-known language learning program Duolingo may be in jeopardy. ChatGPT’s advanced voice mode is now teaching French!👀 pic.twitter.com/JnjNP5Cpff— Evinstein 𝕏 (@Evinst3in) July 30, 2024 RIP language teachers and interpreters.Turn on volume. Goodbye old world.New GPT Advanced Voice. Thoughts? pic.twitter.com/WxiRojiNDH— Alex Northstar (@NorthstarBrain) July 31, 2024 The new GPT-4o model from OpenAI, which powers Advanced Voice Mode as well, is the company’s first natively multimodal large model. Unlike GPT-4, which relied on other domain-specific OpenAI models, GPT-4o was made to handle vision and audio inputs and outputs without linking back to other specialized models for these media. As a result, if the user allows ChatGPT access to their phone’s camera, Advanced Voice Mode can talk about what it can see. Manuel Sainsily, a mixed reality design instructor at McGill University, provided an example of how Advanced Voice Mode used this feature to translate screens from a Japanese version of Pokémon Yellow for the GameBoy Advance SP: Trying #ChatGPT’s new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful — reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To— Manuel Sainsily (@ManuVision) July 30, 2024 Humanlike utterances Italian-American AI writer Cristiano Giardina has shared multiple test results using the new ChatGPT Advanced Voice Mode on his blog, including a widely shared demonstration in which he shows how to ask it to count up to 50 increasingly quickly. It obeys, pausing only toward the very end to catch a breather. ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind – it stopped to catch its breath like a human would) pic.twitter.com/oZMCPO5RPh— Cristiano Giardina (@CrisGiardina) July 31, 2024 Giardina later clarified in a post on X that ChatGPT’s Advanced Voice Mode has simply acquired natural speaking patterns, which include breathing pauses, and that the transcript of the counting experiment showed no breaths. As demonstrated in the YouTube video below, ChatGPT Advanced Voice Mode can even mimic applause and clearing its throat. Beatboxing In a video that he uploaded to X, startup CEO Ethan Sutin demonstrated how he was able to get ChatGPT Advanced Voice Mode to beatbox convincingly and fluently like a human. Yo ChatGPT Advanced Voice beatboxes pic.twitter.com/yYgXzHRhkS— Ethan Sutin (@EthanSutin) July 30, 2024 Audio storytelling and roleplaying If the user instructs ChatGPT to “play along” and creates a fictional situation, such as traveling back in time to Ancient Rome, it can also roleplay (the SFW sort), as demonstrated by University of Pennsylvania Wharton School of Business Ethan Mollick in a video uploaded to X: ChatGPT, engage the Time Machine!(A big difference from text is how voice manages to keep a playful vocal tone: cracking and laughing at its own jokes, as well as the vocal style changes, etc.) pic.twitter.com/TQUjDVJ3DC— Ethan Mollick (@emollick) August 1, 2024 In this example, which was obtained from Reddit and uploaded on X, the user can ask ChatGPT Advanced Mode to tell a story. It will do so completely with its AI-generated sound effects, such as footsteps and thunder. ‼️A Reddit user (“u/RozziTheCreator”) got a sneak peek of ChatGPT’s upgraded voice feature that's way better and even generates background sound effects while narrating ! Take a listen 🎧 pic.twitter.com/271x7vZ9o3— Sambhav Gupta (@sambhavgupta6) June 27, 2024 In addition, it is capable of mimicking the voice of an intercom: Testing ChatGPT Advanced Voice Mode’s ability to create sounds.It somewhat successfully sounds like an airline pilot on the intercom but, if pushed too far with the noise-making, it triggers refusals. pic.twitter.com/361k9Nwn5Z— Cristiano Giardina (@CrisGiardina) July 31, 2024 Mimicking and reproducing distinct accents Giardina demonstrated how numerous regional British accents can be imitated using ChatGPT Advanced Voice Mode: ChatGPT Advanced Voice Mode speaking a few different British accents:– RP standard– Cockney– Northern Irish– Southern Irish– Welsh– Scottish– Scouse– Geordie– Brummie – Yorkshire(I had to prompt like that because the model tends to revert to a neutral accent) pic.twitter.com/TDfSIY7NRh— Cristiano Giardina (@CrisGiardina) July 31, 2024 …as well as interpret a soccer commentator’s voice: ChatGPT Advanced Voice Mode commentating a soccer match in British English, then switching to Arabic pic.twitter.com/fD4C6MqZRj— Cristiano Giardina (@CrisGiardina) July 31, 2024 Sutin demonstrated its ability to mimic a variety of regional American accents, such as Southern Californian, Mainean, Bostonian, and Minnesotan/Midwestern. a tour of US regional accents pic.twitter.com/Q9VypetncI— Ethan Sutin (@EthanSutin) July 31, 2024 And it can imitate fictional characters, too… In conclusion, Giardina demonstrated that ChatGPT Advanced Voice Mode can mimic the speech patterns of many fictitious characters in addition to recognizing and comprehending their differences: ChatGPT Advanced Voice Mode doing a few impressions:– Bugs Bunny– Yoda– Homer Simpson– Yoda + Homer 😂 pic.twitter.com/zmSH8Rl8SN— Cristiano Giardina (@CrisGiardina) July 31, 2024 Anyway, what are the practical benefits of this mode? Apart from engaging and captivating demonstrations and experiments, will it enhance ChatGPT’s utility or attract a broader audience? Will it lead to an increase in audio-based frauds? As this technology becomes more widely available, it could revolutionize fields such as language learning, audio content creation, and accessibility services. However, it also raises potential concerns about voice imitation and the creation of misleading audio content. As OpenAI continues to refine and expand access to Advanced Voice Mode, it will be crucial to monitor its impact on various industries and its potential societal implications. [...]