Roam Research Notes on “SELF-REFINE: Iterative Refinement with Self-Feedback” by Madaan Et. Al

  • Author:: Madaan Et. Al.
  • Source:: link
  • Review Status:: [[complete]]
  • Recommended By:: [[Andrew Ng]]
  • Anki Tag:: self_refine_iterative_refinement_w_self_feedback_madaan_et_al
  • Anki Deck Link:: link
  • Tags:: #[[Research Paper]] #[[prompting [[Large Language Models (LLM)]]]] #[[reflection ([[Large Language Models (LLM)]])]]
  • Summary

    • Overview
      • SELF-REFINE is a method for improving outputs from large language models (LLMs) through iterative self-feedback and refinement. This approach uses the same LLM to generate an initial output, provide feedback, and refine it iteratively without the need for supervised training or additional data.
    • Key Findings
      • Performance Improvement: Evaluations using GPT-3.5 and GPT-4 across seven tasks show that SELF-REFINE improves performance by about 20%. Outputs are preferred by humans and score better on metrics.
      • Complex Task Handling: LLMs often struggle with complex tasks requiring intricate solutions. Traditional refinement methods need domain-specific data and supervision. SELF-REFINE mimics human iterative refinement, where an initial draft is revised based on self-feedback.
      • Iterative Process: The process uses two steps: FEEDBACK and REFINE, iterating until no further improvements are needed.
    • Specific Task Performance
      • Strong Performance:
        • Constrained Generation: Generating a sentence containing up to 30 given concepts. Iterative refinement allows correction of initial mistakes and better exploration of possible outputs.
        • Preference-based Tasks: Dialogue Response Generation, Sentiment Reversal, Acronym Generation. Significant gains due to improved alignment with human preferences.
      • Weaker Performance:
        • Math Reasoning: Difficulty in accurately identifying nuanced errors in reasoning chains.
    • Additional Insights
      • Avoiding Repetition: SELF-REFINE avoids repeating past mistakes by appending the entire history of previous feedback in the REFINE step.
      • Role-based Feedback: Suggestion to improve results by having specific roles for feedback, like performance, reliability, readability, etc.
        • Related Method: Providing a scoring rubric to the LLM with dimensions over which they should evaluate the output.
      • Specific Feedback Importance: Results are significantly better with specific feedback compared to generic feedback.
      • Iteration Impact: Results improve significantly with the number of iterations (i.e., feedback-refine loops) but with decreasing marginal improvements for each loop. In some cases, like Acronym Generation, quality could improve in one aspect but decline in another. Their solution was to generate numeric scores for different quality aspects, leading to balanced evaluation.
      • Model Size Impact: SELF-REFINE performs well for different model sizes, but for a small enough model (Vicuna-13B), it fails to generate feedback consistently in the required format, often failing even with hard-coded feedback.
    • Relevant [[ChatGPT]] conversations: here, here, here

Roam Notes on The Batch Newsletter (Andrew Ng) – We Need Better Evals for LLM Applications

  • Author:: [[Andrew Ng]]
  • Source:: link
  • Review Status:: [[complete]]
  • Anki Tag:: andrew_ng_the_batch_we_need_better_evals_for_llm_apps
  • Anki Deck Link:: link
  • Tags:: #[[Article]] #[[Large Language Models (LLM)]] #[[evals]] #[[[[AI]] Agents]] #[[Retrieval Augmented Generation (RAG)]]
  • Summary

    • Evaluating Generative AI Applications: Challenges and Solutions
      • Challenges in Evaluation:
        • Evaluating custom AI applications generating free-form text is a barrier to progress.
        • Evaluations of general-purpose models like LLMs use standardized tests (MMLU, HumanEval) and platforms (LMSYS Chatbot Arena, HELM).
        • Current evaluation tools face limitations such as data leakage and subjective human preferences.
      • Types of Applications:
        • Unambiguous Right-or-Wrong Responses:
          • Examples: Extracting job titles from resumes, routing customer emails.
          • Evaluation involves creating labeled test sets, which is costly but manageable.
        • Free-Text Output:
          • Examples: Summarizing customer emails, writing research articles.
          • Evaluation is challenging due to the variability of good responses.
          • Often relies on using advanced LLMs for evaluation, but results can be noisy and expensive.
      • Cost and Time Considerations:
        • [[evals]] can significantly increase development costs.
        • Running [[evals]] is time-consuming, slowing down experimentation and iteration.
      • Future Outlook:
        • Optimistic about developing better evaluation techniques, possibly using agentic workflows such as [[reflection ([[Large Language Models (LLM)]])]].
    • Richer Context for RAG (Retrieval-Augmented Generation)
      • New Development:
        • Researchers at Stanford developed RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval). Link to paper here.
        • RAPTOR provides graduated levels of detail in text summaries, optimizing context within LLM input limits.
      • How RAPTOR Works:
        • Processes documents through cycles of summarizing, embedding, and clustering.
        • Uses SBERT encoder for embedding, Gaussian mixture model (GMM) for clustering, and GPT-3.5-turbo for summarizing.
        • Retrieves and ranks excerpts based on cosine similarity to user prompts, optimizing input length.
      • Results:
        • RAPTOR outperformed other retrievers on the QASPER test set.
      • Importance:
        • Recent LLMs can process very long inputs, but it is costly and time-consuming.
        • RAPTOR enables models with tighter input limits to access more context efficiently.
      • Conclusion:
        • RAPTOR offers a promising solution for developers facing challenges with input context length.
        • This may be a relevant technique to reference if you get around to implement [[Project: Hierarchical File System Summarization using [[Large Language Models (LLM)]]]]
    • Relevant [[ChatGPT]] conversations: here, here

Roam Research Notes on Dwarkesh Patel Conversation with Sholto Douglas & Trenton Bricken – How to Build & Understand GPT-7’s Mind

  • Author:: [[Dwarkesh Patel]], [[Trenton Bricken]], and [[Sholto Douglas]]
  • Source:: link
  • Review Status:: [[complete]]
  • Anki Tag:: dwarkesh_douglas_bricken_build_and_understand_gpt7_mind
  • Anki Deck Link:: link
  • Tags:: #[[Video]] #[[podcast]] #[[Large Language Models (LLM)]]
  • {{[video]: https://www.youtube.com/watch?v=UTuuTTnjxMQ}}
  • [[[[Large Language Models (LLM)]] context length]] (0:00 – 16:12)
    • Importance of context length is underhyped. Throwing a bunch of tokens in context can create similar improvements to [[evals]] as big increases in model scale.
    • [[sample efficiency ([[reinforcement learning]])]]: the ability to get the most out of every sample. E.g. playing PONG, humans understand immediately, while modern reinforcement learning algorithms need 100,000 times more data so they are relatively sample inefficient.
    • Because of large [[[[Large Language Models (LLM)]] context length]], LLMs may have more sample efficiency than we give them credit for – [[Sholto Douglas]] mentioned [[evals]] where the model learned an esoteric human language that wasn’t in its training data.
    • [[Sholto Douglas]] mentions a line of research suggesting [[in-context learning]] might be effectively performing [[gradient descent]] on the in-context data (link to paper). It’s basically performing a kind of meta-learning – learning how to learn.
      • Large [[[[Large Language Models (LLM)]] context length]] creates risks since it can effectively create a whole new model if it’s really doing [[gradient descent]] on the fly.
      • [[Sholto Douglas]] suggests that figuring out how to better induce this meta-learning in pre-training will be important for flexible / adaptive intelligence.
    • [[Sholto Douglas]] suggests that current difficulties with [[[[AI]] Agents]]’s long-term planning are not because of a lack of long [[[[Large Language Models (LLM)]] context length]] – it’s more about the reliability of the model and needing more [[nines of reliability]] since these agents chain a bunch of tasks together, and even a small failure rate implies a large failure rate when you sample many times.
      • The idea behind the NeurIPS paper about emergence being a mirage (link) is related to this idea of [[nines of reliability]] – theres a threshold where you get enough nines of reliability and it looks like a sudden capability when you look at certain metrics but it was actually always there to begin with but the model was not reliable enough to see it. There are apparently better evals now like [[HumanEval]] (link) that have “smoother” evaluations that get around this issue.
      • In my mind this raises the question – if you have a big enough [[[[Large Language Models (LLM)]] context length]], wouldn’t you just leverage that rather than decomposing into a bunch of tasks? #[[Personal Ideas]]
        • No – The nature of longer tasks is that you need to break them down and subsequent tasks depend on previous tasks, so longer tasks performed by [[[[AI]] Agents]] will always need to be broken down into multiple calls.
        • However, larger context for any given call would improve reliability. For example, with each task call to the model, you could build up a large in-context history of what the model has done to give it more context, and you could of course push in more information specific to what that task is trying to solve.
      • Developing [[evals]] for long-horizon tests will be important to understand impact and capabilities of [[[[AI]] Agents]]. [[SWE-Bench]] (link) is a small step in this direction, but GitHub issues is still a sub-hour task.
    • Many people speak of [[quadratic attention costs]], as a reason we can’t have long context windows – but there are ways around it. See this [[Gwern Branwen]] article.
    • [[Dwarkesh Patel]] makes an interesting hypothesis wondering whether learning in context (i.e. the “forward pass” where the model has been pre-trained and predictions are made based on input data) may be more efficient as it resembles how humans actively think and process information as they acquire it rather than passively absorb it.
      • Not sure how far you can push the analogies to the brain – as [[Sholto Douglas]] says, birds and airplanes both achieve the same end but use very different means. However, [[Dwarkesh Patel]]’s point sounds like [[in-context learning]] may be analogous to the frontal cortex region of the brain (responsible for complex cognitive behavior, personality expression, decision-making, moderating social behavior, working memory, speech production), while the pre-trained weights (calculated in the “backward pass” that trains the model via [[backpropagation]]) are analogous to the other regions (responsible for emotional regulation and processing, sensory processing, memory storage and retrieval). See this [[ChatGPT]] conversation.
    • The key to these models becoming smarter is [[meta learning]], which you start to achieve once you pass a certain scale threshold of [[pre-training]] and [[[[Large Language Models (LLM)]] context length]]. This is the key difference between [[GPT-2]] and [[GPT-3]].
    • [[ChatGPT]] conversations related to this section of the conversation: here and here
  • [[Intelligence]] is just associations (16:12 – 32:35)
    • [[Anthropic AI]]’s way of thinking about [[transformer model]]
      • Think of the [[residual ([[neural net]])]] as it passes through the neural network to predict the next token like a boat floating down a river that takes in information streams coming off the river. This information coming in comes from [[attention heads ([[neural net]])]] and [[multi-layer perceptron (MLP) ([[neural net]])]] parts of the model.
      • Maybe what’s happening is early in the stream it processes basic, fundamental things, in the model it’s adding information on ‘how to solve this’, and then in the later stages doing the work to convert it back to an output token.
      • The [[cerebellum]] behaves kind of like this – inputs route through it but they can also go directly to the end point the cerebellum “module” contributes to – so there are indirect and direct paths where it can pick up information it wants and add it in.
        • The [[cerebellum]] is associated with fine motor control, but the truth is it lights up for almost any task in a [[fMRI]] scan, and 70% of your neurons are there.
        • [[Pentti Kanerva]] developed an associative memory algorithm ([[Sparse Distributed Memory (SDM)]]) where you have memories, want to store them, and retrieve them to get the best match while dealing with noise / corruption. Turns out if you implement this as an electrical circuit it looks identical to the core [[cerebellum]] circuit. (Wikipedia link)
    • [[Trenton Bricken]] believes most intelligence is [[pattern matching]] and you can do a lot of great pattern matching with a hierarchy of [[associative memory]]
      • The model can go from basic low-level associations and group them together to develop higher level associations and map patterns to each other. It’s like a form of [[meta-learning]]
      • He doesn’t really state this explicitly, but the [[attention ([[neural net]])]] mechanism is a kind of associative memory learned by the model.
      • [[associative memory]] can help you denoise (e.g. recognize your friend’s face in a heavy rainstorm) but also pick up related data in a completely different space (e.g. the alphabet – seeing A points to B, which points to C, etc.)
      • It should be “association is all you need”, not “attention is all you need”
    • Relevant [[ChatGPT]] conversations: here, here, here, here, and here
  • [[[[Intelligence]] explosion]] and great researchers (32:35 – 1:06:52)
    • This part of the discussion explores whether automating AI researchers can lead to an intelligence explosion in a way that economists overlook (and they are apparently the ones with the formal models on [[[[Intelligence]] explosion]])
    • [[compute]] is the main bounding constraint on an [[[[Intelligence]] explosion]]
    • To me, it’s interesting that most people seem to think their job won’t be fully automated. People often agree that AI could make them much more productive, but the idea that it would completely automate their job they are skeptical of. It’s always other people’s jobs that are supposedly fully automatable (i.e. the jobs you have much less information about and context about what they do). People tend to overlook physical constraints, social constraints, and the importance of “taste” (which I would define as a human touch or high level human guidance to align things so they’re useful to us). This is probably why economists have been able to contribute the most here – all day long they’re thinking about resource constraints and the implications of those constraints. I mean, a common definition of economics is “studying the allocation of resources under constraints” #[[Personal Ideas]]
    • [[Sholto Douglas]] suggests the hardest part of an AI researcher’s job is not writing code or coming up with ideas, but paring down the ideas and shot calling under imperfect information. Complicating matters is the fact that things that work at small scale don’t necessarily work at large scale, as well as the fact that you have limited compute to test everything you dream of. Also, working in an collaborative environment where a lot of people are doing research can slow you down (lower iteration speed compared to when you can do everything yourself).
    • “ruthless prioritization is something which I think separates a lot of quality research from research that doesn’t necessarily succeed as much…They don’t necessarily get too attached to using a given sort of solution that they are familiar with, but rather they attack the problem directly.” – [[Sholto Douglas]]
    • Good researchers have good engineering skills, which enable them to try experiments really fast – their cycle time is faster, which is key to success.
    • [[Sholto Douglas]] suggests that really good data for [[Large Language Models (LLM)]] is data that involved a lot of reasoning to create. The key trick is somehow verifying the reasoning was correct – this is one challenge with generating [[synthetic data]] from LLMs.
    • [[Dwarkesh Patel]] makes an interesting comparison of human language being [[synthetic data]] data that humans create, and [[Sholto Douglas]] adds that the real world is like a built-in verifier of that data. Doesn’t seem like a perfect analogy, but it does occur to me that some of the best “systems” in the world have some kind of built-in verification: capitalism, the scientific method, democracy, evolution, traditions (which I think of as an evolution of memes – the good ones stick).
    • Question I have is the extent to which [[compute]] will always be a constraint. Seems like it will obviously always be required to some extent, but I wonder what these guys think of the likelihood of some kind of model architecture or training method that improves [[statistical efficiency]] and [[hardware efficiency]] so much that, say, you can train a GPT-4 on your laptop in a day?
    • Relevant [[ChatGPT]] conversation here
  • [[superposition]] and secret communication (1:06:52 – 1:22:34)
    • When your data is high-dimensional and sparse (i.e. any given data point doesn’t appear very often), then your model will learn a compression strategy called [[superposition]] so it can pack more features of the world into it than it has parameters. Relevant paper from [[Anthropic AI]] here.
    • This makes interpretability more difficult, since when you see a [[neuron ([[neural net]])]] firing and try to figure out what it fires for, it’s confusing – like firing for 10% of every possible input.
      • This is related to the paper that [[Trenton Bricken]] and team at [[Anthropic AI]] put out called Towards Monosemanticity, which found that if you project the activations into a higher-dimensional space and provide a sparsity penalty, you get very clean features and everything starts to make more sense.
    • They suggest that [[superposition]] means [[Large Language Models (LLM)]] are under-parametrized given the complexity of the task they’re being asked to perform. I don’t understand why this follows.
    • [[knowledge distillation]]: the process of transferring knowledge from a large model to a smaller one
    • Puzzle proposed by [[Gwern Branwen]] (link): [[knowledge distillation]] gives smaller models better performance – why can’t you just train these small models directly and get the same performance?
      • [[Sholto Douglas]] suggests it’s because distilled models get to see the entire vectors of probabilities for what the next token is predicted to be. In contrast, training just gives you a one hot encoded vector of what the next token should have been, so the distilled model gets more information or “signal”.
      • In my mind, it’s not surprising that [[knowledge distillation]] would be more efficient given a certain amount of training resources. But do researchers find it’s better given any amount of training for the smaller model? That seems much less intuitive, and if that’s the case, what is the information being “sent” to the smaller model that can’t be found through longer training?
    • [[adaptive compute]]: spending more cycles thinking about a problem if it is harder. How is it possible to do this with [[Large Language Models (LLM)]]? The forward pass always does the same compute, but perhaps [[chain-of-thought (CoT)]] or similar methods are kind of like adaptive compute since they effectively produce more forward passes.
    • [[chain-of-thought (CoT)]] has been tested to have some strange behaviour, such as giving the right answer even when the chain of thought reasoning is patently wrong or giving the wrong answer it was trained to give and then provide a plausible sounding but wrong explanation. E.g. this paper and this paper
    • Relevant [[ChatGPT]] conversations: here
  • [[[[AI]] Agents]] and true reasoning (1:22:34 – 1:34:40)
    • [[Dwarkesh Patel]] raised question of whether agents communicating via text is the most efficient method – perhaps they should share [[residual ([[neural net]])]] streams.
      • [[Trenton Bricken]] suggests a good half-way measure would be using features you learn from [[Sparse Dictionary Learning (SDL)]] – more internal access but also more human interpretable.
    • Will the future of [[[[AI]] Agents]] be really long [[[[Large Language Models (LLM)]] context length]] with “[[adaptive compute]]” or instead will it be multiple copies of agents taking on specialized tasks and talking to one another? Big context or [[division of labour]]?
      • [[Sholto Douglas]] leans towards more agents talking to each other, at least in the near term. He emphasizes that it would help with interpretability and trust. [[Trenton Bricken]] mentions cost benefits as well since individual agents could be smaller and [[fine-tuning]] them makes them accurate.
      • Maybe in the long run the dream of [[reinforcement learning]] will be fulfilled – provide a very sparse signal and over enough iterations [[[[AI]] Agents]] learn from it. But in the shorter run, these will require a lot of work from humans around the machines to make sure they’re doing what we want.
    • [[Dwarkesh Patel]] wonders whether language is actually a very good representation of ideas as it has evolved to optimize human learning. [[Sholto Douglas]] adds that compared to “next token prediction”, which is a simple representation, representations in [[machine vision]] are more difficult to get right.
    • Some evidence suggests [[fine-tuning]] a model on generalized tasks like math, instruction following, or code generation enhances language models’ performance on a range of other tasks.
      • This raises the question in my mind – are there other tasks we want [[Large Language Models (LLM)]] to do where we might achieve better results by [[fine-tuning]] on a seemingly unrelated area? Like if we want a model to get better at engineering, should we fine tune on constructing a lego set, since so many engineers seem to have played with lego as a kid? What does real-world empirics about learning and performance tell us about where we should be fine-tuning? #[[Personal Ideas]]
    • Relevant [[ChatGPT]] conversations: here and here
  • How [[Sholto Douglas]] and [[Trenton Bricken]] got into [[AI]] research (1:34:40 – 2:07:16) #[[Career]]
    • [[Trenton Bricken]] has had significant success in interpretability, contributing to very important research and has only been in [[Anthropic AI]] for 1.5 years. He attributes this success to [[luck]], ability to execute on putting together and quickly testing existing research ideas already lying around, headstrongness, willingness to push through when blocked where others would give up, and willingness to change direction.
    • [[Sholto Douglas]] agrees with those qualities for success (hard work, agency, pushing), but also adds that he’s benefited from being good at picking extremely high-leverage problems.
    • In organizations you need people that care and take direct responsibility to get things done. This is often why projects fail – nobody quite cares enough. This is one purpose of consulting firms like [[McKinsey]] ([[Sholto Douglas]] started there) – allows you to “hire” people you wouldn’t otherwise be able to for a short window where they can push through problems. They also are given direct responsibility as consultants which speaks to his first point.
    • [[Sholto Douglas]] also hustled – worked from 10pm-2am and 6-8 hours a day on the weekends to work on research and coding projects. [[James Bradbury]] (who was at [[Google]] but now at [[Anthropic AI]]) saw [[Sholto Douglas]] was asking questions online that he thought only he was interested in, saw some robotics stuff on his blog, and then reached out to see if he wanted to work there. “Manufacture luck”
      • Another advantage of this fairly broad reading / studying he was doing was it gave him the ability to see patterns across different subfields that you wouldn’t get by just specializing in say, [[Natural Language Processing]].
    • One lesson here emphasized by [[Dwarkesh Patel]] is that the world is not legible and efficient. You shouldn’t just go to jobs.google.com or whatever and assume you’ll be evaluated well. There are other, better ways to put yourself in front of people and you should leverage that. Seems like it’s particularly valuable to do if you don’t have a “standard” background or look really good just on paper with degrees from Stanford or whatever. Put yourself out there and demonstrate you can do something at a world-class level.
      • This is what [[Andy Jones]] from [[Anthropic AI]] did with a paper on scaling laws and board games – when he published this, both Anthropic and [[OpenAi]] desperately wanted to hire him.
      • Another example is [[Simon Boehm]], who wrote a blog post which in [[Sholto Douglas]] view is the reference for optimizing a CUDA map model on a GPU.
      • “The system is not your friend. It’s not necessarily actively against you or your sworn enemy. It’s just not looking out for you. So that’s where a lot of proactiveness comes in. There are no adults in the room and you have to come to some decision for what you want your life to look like and execute on it.” -[[Trenton Bricken]]
      • “it’s amazing how quickly you can become world-class at something. Most people aren’t trying that hard and are only working the actual 20 hours or something that they’re spending on this thing. So if you just go ham, then you can get really far, pretty fast” – [[Trenton Bricken]]
    • Relevant [[ChatGPT]] conversation here
  • Are [[features]] the wrong way to think about [[intelligence]] (2:07:16 – 2:21:12)
    • [[Dwarkesh Patel]] and [[Trenton Bricken]] explore what a feature is in these large neural networks. A “feature” in a standard logistic regression model is quite clear and explicit – it’s just one of the terms in the regression.
    • [[ChatGPT]] provides a good answer here that helps resolve the confusion in my mind. It still makes sense to think of the model in terms of features, except in a [[neural net]], the features are learned rather than being explicitly specified. Each layer in a neural net can learn an increasingly complex and abstract set of features.
    • What would be the standard where we can say we “understand” a model’s output and the reasons it did what it did, ensuring it was not doing anything duplicitous?
      • You need to find features for the model at each level (including attention heads, residual stream, MLP, attention), and hopefully identify broader general reasoning circuits. To avoid deceptive behaviour, you could flag features that correspond to this kind of behaviour.
    • Relevant [[ChatGPT]] conversations: here, here, and here
  • Will [[[[AI]] interpretability]] actually work on superhuman models (2:21:12 – 2:45:05)
    • One great benefit of these [[Large Language Models (LLM)]] in terms of interpretability is they are deterministic, or you can at least make them deterministic. It’s like this alien brain you can operate on by ablating any part of it you want. If it does something “superhuman”, you should be able to decompose it into smaller spaces that are understandable, kind of like how you can understand superhuman chess moves.
    • Essentially, [[Trenton Bricken]] is hopeful that we can identify “bad” or “deceptive” circuits in [[Large Language Models (LLM)]] and essentially lobotomize them in those areas.
    • One interesting way he suggests of doing this is fine-tuning a model to have bad behaviour, and then use this bad model to identify the parts of the feature space that have changed.
    • There are similarities of features across different models that have been found. E.g. there are [[Base64]]-related features that are very common that fire for and model Base64 encoded text (common in URLs).
      • Similarity is measured using [[cosine similarity]] – which is a measure of similarity between two non-zero vectors defined in an inner product space. Takes a value in [-1, 1], where -1 represents vectors in the opposite direction, 0 represents orthogonal vectors, and 1 represents identical vectors. Formula: (A • B) / (||A||||B||)
    • [[curriculum learning]]: training a model in a meaningful order from easy examples to hard examples, mimicking how human beings learn. This paper is a survey on this method – it seems to come with challenges and it’s unclear whether it’s currently used much to train models, but it’s a plausible avenue for future models to use to improve training.
    • [[feature splitting]]: the models tend to learn however many features it has capacity for that still span the space of representation. E.g. basic models will learn a “bird” feature, while bigger models learn features for different types of birds. “Oftentimes, there’s the bird vector that points in one direction and all the other specific types of birds point in a similar region of the space but are obviously more specific than the coarse label.”
      • The models seems to learn [[hierarchy]] – which is a powerful model for understanding reality and organizing a bunch of information so it is sensible and easily accessible. #[[Personal Ideas]]
    • [[Trenton Bricken]] makes the distinction between the [[weights ([[neural net]])]] that represent the trained, fixed parameters of the model and the [[activations ([[neural net]])]] which represents the actual results from making a specific call. [[Sholto Douglas]] makes the analogy that the weights are like the actual connection scheme between neurons, and the activations are the current neurons lighting up on a given call to the model. [[Trenton Bricken]] says “The dream is that we can kind of bootstrap towards actually making sense of the weights of the model that are independent of the activations of the data”.
    • [[Trenton Bricken]]’s work on [[[[AI]] interpretability]] uses a sparse autoencoding method which is unsupervised and projects the data into a wider space of features with more detail to see what is happening in the model. You first feed the trained model a bunch of inputs and get [[activations ([[neural net]])]], then you project into a higher dimensional space.
      • The amount of detail you want to determine from a feature is determined by the [[expansion factor ([[neural net]])]] which represents how many times bigger the dimensionality is of the space you’re projecting to compared to the original space. E.g. if you have 1000 neurons and projecting to a 2000 dimensional space, the expansion factor is 2. The amount of features you “see” in the space you’re projecting to depends on the size of this expansion factor.
    • [[neuron ([[neural net]])]] can be polysemantic, meaning that they can represent multiple meanings or functions simultaneously. This polysemy arises because of “[[superposition]],” where multiple informational contents are superimposed within the same neuron or set of neurons. [[Trenton Bricken]] mentions that if you only look at individual neurons without considering their polysemantic nature, you can miss how they might code multiple features due to their superposition. Disentangling this might be the key to understanding the “role” of the Experts in [[mixture of experts (MoE)]] used in the recent [[Mistral ([[AI]] company)]] model – they could not determine the “role” of the experts themselves, so it’s an open question.
    • [[ChatGPT]] notes here
  • [[Sholto Douglas]] challenge for the audience (2:45:05 – 3:03:57)
    • A good research project [[Sholto Douglas]] challenges the audience with is to disentangle the neurons and determine the roles of the [[mixture of experts (MoE)]] model by [[Mistral ([[AI]] company)]], which is open source. There is a good chance there is something to discover here, since image models such as [[AlexNet]] have been found to have specialization that you can clearly identify.
  • Rapid Fire (3:03:57 – 3:11:51)
    • One rather disappointing point they make is that a lot of the cutting edge research on issues like multimodality, long-context, agent, reliability is probably not being published if it works well. So, published papers are not necessarily a great source to get to the cutting edge. This raises teh question – how do you get to the cutting edge without working inside one of these tech companies?
      • [[Sholto Douglas]] mentions that academia and others outside the inner circle should work more on [[[[AI]] interpretability]], which is legible from the outside, and places like [[Anthropic AI]] publishes all its research. It also typically doesn’t require a ridiculous amount of resources.

What are your “Big Projects”?

One thing that has become clear to me as I get older and raise a family is that human beings need a bigger purpose in life. We need big goals and big dreams that we can endlessly pursue. Without this, we inevitably stagnate and decline.

You can clearly see our need for a higher purpose throughout society. You see this in the great religious traditions that provide followers with meaning, structure, and direction. You see this in the advice of successful and influential people such as Jordan Peterson that encourage people to “aim high”. You see this in the success of industry titans like Steve Jobs or Elon Musk, who propelled their workers and themselves to great heights using a big vision and persistence in reiterating it at every step of the journey.

Clearly, there are many ways of working toward big goals and there are many goals you can work toward. I want to focus on one technique that I’ll call “The Big Project”.

Characteristics of the Big Project

The Big Project is a significant project that provides you with a platform to learn many new things. A good Big Project should have a few defining characteristics.

  • It is Ambitious. The project will be ideally working towards a broad, lofty, and exciting goal. As Peter Drucker says in his book “The Effective Executive“: “Aim high, aim for something that will make a difference, rather than for something that is “safe” and easy to do.” (pg. 111)
  • It Matters to You. The goals of The Big Project should be aligned with your personal values and beliefs, and propel you toward an outcome you care about. If you don’t care about the Big Project, you won’t stick with it. Similarly, you want The Big Project to be building skills that are actually relevant to you.
  • It Matters to Others: The Big Project should have some tangible real-world relevance to others, rather than just being a “toy” project for you.
  • It has Identifiable Milestones and Subtasks: Although the project should be large in scope with no clear “project complete” criterion, you should still be able to complete well-defined subtasks along the way to help you stay motivated and ensure the project is meeting your goals. These subtasks should be achievable, but also provide room for growth. You want a challenge, but not so much difficulty that you get overwhelmed and demotivated.
  • It is flexible and adaptable: A good Big Project should be flexible and adaptable, and allow for changes and adjustments as needed. This can help you stay on track and achieve your goals even if unexpected challenges or roadblocks arise.

Why Pursue a Big Project?

First and foremost, working on a Big Project is satisfying. It can provide excitement and some sense of meaning if you are thoughtful about what you choose to work on. It can boost your confidence and self-esteem, and provide a sense of accomplishment.

The Big Project takes advantage of the power of compounding, as the philosophy is to consistently build something over a long period of time. Through the magic of compounding, you may reach heights you didn’t expect and couldn’t achieve if you only worked on quick and shallow projects.

The Big Project also provides focus. With a clear overarching goal that inspires you, you will less likely be led astray by activities that don’t propel you toward that vision. You also become less likely to fall into the trap of never-ending planning, because it becomes much more clear what you should work on.

Examples

The Big Project can take many forms, depending on your goals and interests. Some more common Big Projects would include raising a family or living your life in accordance with a particular religious moral code. Both of these things are Big Projects in my life and many others.

In my professional life, one of my main Big Projects is my website, which includes Articles on efficient learning, spaced repetition, productivity, programming, and data science, as well as Download Mark’s Brain, which is a full stack web application that I built for sharing notes and flash cards, synced with my personal collections in Anki and Roam Research.

Together, this Big Project meets all the characteristics required.

First of all, It’s ambitious and large in scope: there is no limit to articles I can write and articles and books I can learn and take notes on. I also have long-term ambitions for Download Mark’s Brain to develop it into a collaborative flashcard development site with the lofty goal of converting all of human knowledge into testable Q+A format.

Furthermore, it matters to me personally for many reasons:

  • Writing is a skill I want to develop and it also clarifies my thinking and learning
  • I want to develop my skills in the myriad of programming patterns and technologies involved in developing a full-stack application
  • I want to grow and maintain my personal knowledge management system
  • I want to maintain a consistent and fast-paced learning / reading cadence
  • I want to eventually build a self-sustaining business

It also matters to anyone else that benefits from my writing and shared notes and flashcards, it is easy to create clear milestones and subtasks, and there is great flexibility in the direction I can take the project.

Conclusion

I challenge you to find at least one Big Project in your life. Whether it’s raising a family, living according to a particular moral code, or working towards a professional goal, a big project can be a powerful tool for your well-being, development, and growth.

Notes on The Kimball Group Reader Chapter 1: The Reader at a Glance

  • Author:: [[Ralph Kimball]], [[Margy Ross]]
  • Reading Status:: #complete
  • Review Status:: #[[complete]]
  • Tags:: #books #[[dimensional modeling]]
  • Source:: link
  • Roam Notes URL:: link
  • Anki Tag:: kimball_group_reader kimball_group_reader_ch_1
  • Anki Deck Link:: link
  • Setting up for Success
    • 1.1 Resist the Urge to Start Coding ([[Ralph Kimball]], DM Review, November 2007) (Location 944)
      • Before writing any code or doing any modelling or purchasing related to your data warehouse, make sure you have a good answer to the following 10 questions:
        • [[Business Requirements]]: do you understand them? (Most fundamental and far-reaching question)
        • [[Strategic Data Profiling]]: are data assets available to support business requirements?
        • [[Tactical Data Profiling]]: Is there executive buy-in to support business process changes to improve data quality?
        • [[Integration]]: Is there executive buy-in and communication to define common descriptors and measures?
        • [[Latency]]: Do you know how quickly data must be published by the data warehouse?
        • [[Compliance]]: which data is compliance-sensitive, and where must you have protected chain of custody?
        • [[Data Security]]: How will you protect confidential or proprietary data?
        • [[Archiving Data]]: How will you do long-term archiving of important data and which data must be archived?
        • [[Business User Support]]: Do you know who the business users are, their requirements and skill level?
        • [[IT Support]]: Can you rely on existing licenses in your organization, and do IT staff have skills to support your technical decisions?
    • 1.2 Set Your Boundaries ([[Ralph Kimball]], DM Review, December 2007) (Location 1003) #[[Business Requirements]] #[[setting boundaries]]
      • This article is a discussion of setting clear boundaries in your data warehousing project to avoid taking on too many requirements.
  • Tackling DW/BI Design and Development
    • This group of articles focuses on the big issues that are part of every DW/BI system design. (Location 1071)
    • 1.3 Data Wrangling ([[Ralph Kimball]], DM Review, January 2008) (Location 1074) #[[data wrangling]] #[[data extraction]] #[[data staging]] #[[change data capture]]
      • [[data wrangling]] is the first stage of the data pipeline from operational sources to final BI user interfaces in a data warehouse. It includes [[change data capture]], [[data extraction]], [[data staging]], and [[data archiving]] (Location 1076)
      • [[change data capture]] is the process of figuring out exactly what data changed on the source system that you need to extract. Ideally this step would be done on source production system. (Location 1080). Two approaches:
        • Using a change_date_time field in the source: a good option, but will miss record deletion and any override of the trigger producing the change_date_time field. (Location 1089)
        • Production system daemon capturing every input command: This detects data deletion but there are still DBA overrides to worry about. (Location 1089)
        • Ideally you will also get your source production system to provide a reason data changed, which tells you how the attribute should be treated as a [[slowly changing dimension (SCD)]]. (Location 1098) #[[change data capture]]
        • If you can’t do [[change data capture]] on source production system, you’ll have to do it after extraction, which means downloading larger data sets. (Location 1103)
          • Consider using [[cyclic redundancy checksum (CRC)]] to significant improve performance of the data comparison step here.
      • [[data extraction]]: The transfer of data from the source system into the DW/BI environment. (Location 1117)
        • Two main goals in the [[data extraction]] step:
          • Remove proprietary data formats
          • Move data into [[flat files]] or [[relational tables]] (eventually everything loaded into relational tables, but flat tiles can be processed very quickly)
      • [[data staging]]: [[Ralph Kimball]] recommends staging ALL data: save the data the DW/BI system just received in original target format you chose before doing anything else to it. (Location 1126)
      • [[data archiving]]: this is important for compliance-sensitive data where you have to prove data received hasn’t been tampered with. Techniques here include using a [[hash code]] to show data hasn’t changed. (Location 1129)
    • 1.4 Myth Busters ([[Ralph Kimball]], DM Review, February 2008) (Location 1135) #[[dimensional modeling]]
      • Addresses various myths related to [[dimensional modeling]]
      • Myth: A dimensional model could be missing key relationships that exist only in a true relational view. (Location 1142)
        • In fact, dimensional models contain all the data relationships that normalized models have.
      • Myth: dimensional models are not sufficiently extensible and do not accommodate changing [[business requirements]]. (Location 1149)
        • It’s the opposite: normalized models are much harder to change when data relationships change. [[slowly changing dimension (SCD)]] techniques provide the basis for models to meet changing [[business requirements]].
      • Myth: dimensional models don’t capture data at sufficient level of granularity / detail (Location 1177)
        • In fact, models should capture measurement events in [[fact tables]] at the lowest possible grain.
    • 1.5 Dividing the World ([[Ralph Kimball]], DM Review, March 2008) (Location 1188)
      • Two main entities in [[dimensional modeling]] (Kimball estimates 98% of data can be immediately and obviously categorized as one of these):
        • [[dimension ([[dimensional modeling]])]]: the basic stable entities in our environment, such as customers, products, locations, marketing promotions, and calendars. In end user BI tools, dimensions are primarily used for constraints and row headers.
        • [[fact ([[dimensional modeling]])]]: Numeric measurements or observations gathered by all of our transaction processing systems and other systems. In end user BI tools, dimensions are primarily used for computations.
          • [[fact table grain]]: Description of measurement in physical, real-world terms – a description of what each row in the [[fact table]] represents. (Location 1209) There is sometimes a temptation to add facts not true to the grain to shortcut a query, but this often introduces complexity and confusion for business users. (Location 1218)
          • A [[fact ([[dimensional modeling]])]] should be additive whenever possible – it should make sense to add facts across records. A common example here is storing extended price (i.e. price * quantity) instead of just price in a fact table where the measurement is retail sale (Location 1224)
      • A distinct characteristic of [[dimensional modeling]] is not using [[normalized data]]. Normalized models are great in transaction processing systems, but they are not understandable by business users. Dimensional models, correctly designed, contain exactly the same data and reflect the same business rules, but are more understandable. [[understandability]] is a central goal of a BI system used by business users. (Location 1249)
    • 1.6 Essential Steps for the Integrated [[Enterprise Data Warehouse (EDW)]] ([[Ralph Kimball]], DM Review, April 2008 and May 2008) (Location 1254) #[[data integration]]
      • This section provides an overall architecture for building an integrated [[Enterprise Data Warehouse (EDW)]] which supports [[Master Data Management (MDM)]] and and has the mission of providing a consistent business analysis platform for an organization. (Location 1258)
      • Essential act of the [[Enterprise Data Warehouse (EDW)]] is [[drilling across]]: gathering results from separate [[business process subject area]]s and combine them into a single analysis. (Location 1282)
      • A key prerequisite to developing the [[Enterprise Data Warehouse (EDW)]] is a significant commitment and support from top-level mangement on the value of data integration. (Location 1303)
      • Having an existing [[Master Data Management (MDM)]] project is a good sign of executive buy-in for data integration, and significantly simplifies data warehouse [[data integration]]. (Location 1308)
      • [[conformed dimensions]] and [[confirmed facts]] provide the basis for [[data integration]] (Location 1316)
        • [[conformed dimensions]]: two dimensions are conformed if they contain one or more common fields whose contents are drawn from the same domains. (Location 1318) Typical examples: customer, product, service, location, employee, promotion, vendor, and calendar. (Location 1367)
        • [[conformed facts]]: numeric measures that have the same business and mathematical interpretations so that they may be compared and computed against each other consistently. (Location 1320)
      • [[enterprise data warehouse (EDW) bus matrix]]: two-dimensional matrix with [[business process subject area]] on the vertical axis and [[dimension tables]] on horizontal axis. (Location 1324) An X in the matrix represents where a subject area uses a dimension. It helps you prioritize development of separate subject areas and identify possible scope of [[conformed dimensions]]. "The columns of the bus matrix are the invitation list to the conformed dimension design meeting." (Location 1333) This is an important item to send to senior management to review before conformed dimension design meetings. "If senior management is not interested in what the bus matrix implies, then to make a long story short, you have no hope of building an integrated EDW." (Location 1335)
        • Note that the different stakeholders don’t have to give up their domain specific private attributes that they need – stakeholders just need to agree on the [[conformed dimensions]]. (Location 1341)
        • Even when you get senior management full buy-in, there is a lot of operational management involved in the [[Enterprise Data Warehouse (EDW)]], including two abstract figures: the [[dimension manager]] (builds and distributes a conformed dimension to the rest of the enterprise) and the [[fact provider]] (downstream client to the dimension manager who receives and utilizes the conformed dimension, almost always while managing one or more fact tables within a subject area). (Location 1347)
    • 1.7 Drill Down to Ask Why [[Ralph Kimball]], DM Review, July 2008 and August 2008 (Location 1481) #[[decision making]]
      • Important to understand how your data warehousing system drives decision-making, not just your technical architecture.
      • [[Bill Schmarzo]] architecture for decision making, aka [[analytic application process]]: (Location 1489)
        1. Publish reports.
        2. Identify exceptions.
        3. Determine causal factors. Seek to understand the “why” or root causes behind the identified exceptions. Main ways you might do this: #[[causality]] #[[determining causality]]
          • Get more detail
          • Get a comparison
          • Search other data sets
          • Search the web for information about the problem
        4. Model alternatives. Provide a backdrop to evaluate different decision alternatives.
        5. Track actions. Evaluate the effectiveness of the recommended actions and feed the decisions back to both the operational systems and DW, against which published reporting will occur, thereby closing the loop.
    • 1.8 Slowly Changing Dimensions [[Ralph Kimball]], DM Review, September 2008 and October 2008 (Location 1557) #[[slowly changing dimension (SCD)]]
      • The Original Three Types of [[slowly changing dimension (SCD)]] cover all the responses required for a revised or updated description of a dimension member (Location 1569)
        • [[type 1 slowly changing dimension (SCD)]]: Overwrite
        • [[type 2 slowly changing dimension (SCD)]]: Add a New Dimension Record
        • [[type 3 slowly changing dimension (SCD)]]: Add a New Field
    • 1.9 Judge Your BI Tool through Your Dimensions – [[Ralph Kimball]], DM Review, November 2008 (Location 1650) #[[BI tools]] #[[BI tool selection]] #[[dimension tables]]
      • [[dimension tables]] implement the [[UI]] of your BI system: they provide the labels, the groupings, the drill-down paths.
      • This article describes [[requirements]] a BI tool should be able to meet with dimensions:
        • Assemble a BI query or report request by first selecting [[dimension table attributes]] and then selecting [[facts (dimensional modelling)]] to be summarized.
        • [[drilling down]] by adding a row header
        • Browse a dimension to preview permissible values and set constraints
        • Restrict the results of a dimension browse with other constraints in effect
        • [[drilling across]] by accumulating measures under labels defined by conformed dimension attributes
    • 1.10 [[fact tables]] – [[Ralph Kimball]], DM Review, December 2008 (Location 1707)
      • [[fact tables]] contain the fundamental measurements of the enterprise and are the target of most data warehouse queries.
      • Design rules for [[fact tables]]:
        • Stay true to the [[fact table grain]] – take care in defining the [[grain (dimensional modelling)]] – what a single record in the fact table represents. This is the first and most important design step. It ensures the [[foreign keys]] in the fact table are grounded and precise.
        • Build up from the lowest possible [[fact table grain]]. This ensures you have the most complete set of [[dimension tables]] that can describe the fact table and enables detailed [[drilling down]] for the user.
      • 3 types of [[fact tables]]: (Location 1741)
        • [[transaction grain [[fact table]]]]: Measurement taken at a single instance (e.g. each cash register beep). Transactions can happen after a millisecond or next month or never – they’re unpredictably sparse or dense.
        • [[periodic snapshot grain [[fact table]]]]: Facts cover a predefined span of time. Powerful guarantee: all reporting entities will appear in each snapshot, even if there is no activity – it’s predictably dense and applications can rely on certain key combinations being available.
        • [[accumulating snapshot grain [[fact table]]]]: Rows represent a predictable process with a well-defined beginning and end (e.g. order processing, claims processing).
    • 1.11 Exploit Your [[fact tables]] – [[Ralph Kimball]], DM Review, January/February 2009 (Location 1765)
      • This article describes basic ways to exploit the 3 main fact table designs in the front room and in the back room.
      • Front Room: [[aggregate navigation]] – choosing to give the user pre-aggregated data at run time, without without the end user knowing the difference. Seamlessly provide aggregated and detailed atomic data.
      • Front Room: [[drilling across]] Multiple Fact Tables at Different Grains – you can do this as long as you choose [[conformed dimensions]] for the answer set row headers that exist for all the fact tables in your integrated query.
      • Front Room: Exporting Constraints to Different Business Processes – building connections to other [[business process subject area]] in the [[UI]] so you can explore related data in a single click or swipe.
      • Back Room: [[fact table surrogate keys (FSKs)]] – sometimes you want to do this for one of the following benefits
        • Uniquely and immediately identify single fact records.
        • FSKs assigned sequentially so a load job inserting new records will have FSKs in a contiguous range.
        • An FSK allows updates to be replaced by insert-deletes.
        • An FSK can become a foreign key in a fact table at a lower grain.

Notes on “Why Take Notes” by Mark Nagelberg

  • Author:: [[Mark Nagelberg]]
  • Source:: link
  • Reading Status:: [[complete]]
  • Review Status:: [[complete]]
  • Anki Tag:: nagelberg_why_take_notes
  • Anki Deck Link:: link
  • Blog Notes URL:: link
  • Tags:: #[[Spaced Repetition Newsletter]] #[[Blog Posts]] #[[PKM]] #[[note-taking]] #[[triple-pass system]] #retrieval #elaboration #[[knowledge management]] #[[articles]]
  • Notes

    • Two pillars of the "Triple-Pass System":
      • Note-taking
      • Spaced repetition
    • Why note-taking is necessary:
      • Preparing for [[Spaced Repetition]]
        • You don’t want to add directly to spaced repetition on first read – you’ll add too much unnecessary information or miss important context.
      • [[retrieval]] and [[elaboration]] practice
        • Reviewing, consolidating, and connecting your notes involves both retrieval and elaboration, which are beneficial for learning.
      • Computer-aided information [[retrieval]] and [[idea generation]]
        • Your notes can store more information and detail.
        • Digital search tools make it easy to look up information in your notes as well as find unexpected connections and insights you wouldn’t get from memory alone.

Why Take Notes?

Photo by Kelly Sikkema on Unsplash

You can access an Anki deck and Roam Research notes on this article here.

My personal knowledge management system (the “Triple-Pass System”) has two key pillars:

  • Note-taking (AKA “second brain”) to store consolidated information on the content that I read
  • Spaced repetition to retain the most important information from my notes forever, at minimal cost

You might wonder: why is note-taking necessary at all when you have the powerful mental prosthetic of spaced repetition? Spaced repetition makes memory a choice. Why put in the extra effort to take notes when you can just add stuff to your spaced repetition system and be done with it forever?

There are a few reasons why note-taking deserves a place in my personal knowledge management system:

  • Preparing for spaced repetition
  • Retrieval and elaboration practice
  • Computer-aided information retrieval and idea generation

Preparing for Spaced Repetition

It’s usually not a good idea to add information directly to your spaced repetition system while you are reading content for the first time. You are much more likely to add unnecessary information or miss the crux because you lack context from reading the whole piece.

Instead, using your notes as an initial place to store information gives you time to let the information “stew” before you add it to spaced repetition. This helps you faithfully follow rule number 1 of formulating knowledge: do not learn if you do not understand.

Taking notes first has the added benefit of flexibility over when you add material to spaced repetition. You can add to spaced repetition right away, or you can do it later when you have more time. In contrast, going straight to spaced repetition requires you to either add material right away or re-read the entire source document later, essentially starting from scratch.

Retrieval and Elaboration Practice

The research literature on efficient learning tells us that retrieval and elaboration (i.e. recalling things you have learned and re-interpreting them) is extremely beneficial.

This is exactly what you do when note-taking:

  • Review the material (which usually requires some recall)
  • Consolidate it into a form that’s easily consumable (elaboration)
  • Make connections with existing knowledge (recall and elaboration)
  • Intersperse ideas and alternate interpretations throughout your notes (elaboration)

Computer-Aided Information Retrieval and Idea Generation

If you think of your brain as a computer, your spaced repetition system is like your RAM: quickly accessible information. This is ideal for knowledge that you will benefit from being able to retrieve quickly.

In contrast, your note-taking system is like a hard drive: it’s slower to access (since you have to open your note-taking app), but it is capable of storing much larger quantities of information. This is great for looking up details that were not practical to commit to spaced-repetition.

Most digital note-taking tools also have sophisticated search functions that allow you to efficiently look up what you need. These features not only help with information retrieval, but also idea generation by helping you make unexpected connections with other knowledge. Roam Research is particularly good in this area, using a graph-based data model that lets you explore connections between your notes.

Notes on “3 Things I Wish I did as a Junior Dev” by Theo Browne

  • {{[video]: https://www.youtube.com/watch?v=1rC4cTRZeWc}}
  • Author:: [[Theo Browne]]
  • Reading Status:: [[complete]]
  • Review Status:: [[complete]]
  • Tags:: #Video #programming #learning #[[Career]]
  • Blog Notes URL:: https://www.marknagelberg.com/notes-on-3-things-i-wish-i-did-as-a-junior-dev-by-theo-browne/
  • Roam Notes URL:: link
  • Anki Tag:: theo_browne_3_junior_dev_tips
  • Anki Deck Link:: link
  • Notes

    • Overview: [[Theo Browne]] talks about the 3 main tactics he used when he was a new developer to level up extremely fast.
    • Tip 1: Try to Get On Call
      • Extremely valuable to see how things go wrong, and how they are fixed when they do.
    • Tip 2: You Don’t Learn Codebases in the Code Tab on GitHub. You Learn Codebases on the Pull Request Tab on GitHub. #[[pull requests]] [[GitHub]]
      • It helps you get critical [[context]] to see how code changes, how teams work in a codebase, what features are being developed, and why. This is what helps you build a mental map around the codebase to become a successful contributor.
    • Tip 3: Interview More #[[interviews]]
      • Do more interviews at other companies to see where you stand, but more importantly, do interviews yourself of prospective employees. You learn a lot about how good a developer you are, what expectations are, and what makes a good engineer. The more interviews you do on both sides, the more you understand the field overall.

Notes on “The Year of Fukuyama” by Richard Hanania

  • Title:: The Year of Fukuyama
  • Author:: [[Richard Hanania]]
  • Reading Status:: #complete
  • Review Status:: #[[complete]]
  • Tags:: #articles #[[politics]] #[[democracy]] #[[political science]]
  • URL:: https://richardhanania.substack.com/p/the-year-of-fukuyama
  • Source:: #instapaper
  • Roam Notes URL:: https://www.marknagelberg.com/notes-on-the-year-of-fukuyama-by-richard-hanania/
  • Anki Tag:: hanania_year_of_fukuyama
  • Anki Deck Link:: link
  • Notes

    • Many incorrectly misunderstand [[Francis Fukuyama]] as saying nothing will ever happen again. His argument was not that there would be no wars or genocide, but there would be no serious alternative to liberal [[democracy]]. (View Highlight) #[[Ankified]]
    • Before 2022, experts were bullish on some non-democratic states: #[[Ankified]]
      • Experts have spoken seriously about the advantages of the “[[[[China]] Model]]”: technocratic skill and political meritocracy over voting and mobilized citizenry. (View Highlight)
        • Their response to [[COVID-19]] was often trotted out as making the case for the model. E.g. [[New York Times]] reporting that life in [[China]] was back to normal in September 2020, compared to the West. (View Highlight)
      • Experts also were optimistic about [[Russia]]’s economic and geopolitical prospects (believing they will become a mid-tier European power). (View Highlight)
    • In 2022, it seems these threats to liberal democracy have collapsed in different ways, suggesting Western societies are far more robust. (View Highlight) #[[Ankified]]
      • [[China]] is sticking stubbornly with [[Zero Covid]] strategy, and is taking draconian measures to enforce it. This makes absolutely no sense in any [[cost-benefit analysis]], given vaccines and the new contagious variants. You could argue other terrible things they do, such as their treatment of [[Uighurs]], is "rational" and doesn’t prevent them maintaining growth and influence. But Zero Covid is simply stupid, bad strategy. (View Highlight) [[Peter Thiel]] said China is limited by it’s autistic and profoundly uncharismatic nature and [[Richard Hanania]] sees Zero Covid as evidence this is true: "I used to think that China could be the kind of autist that builds SpaceX. Instead, it’s the kind that is afraid to look strangers in the eye and stays up all night playing with his train collection." (View Highlight)
      • [[China]] is now more hostile to free markets which is what helped it succeed in the first place (see disappearing billionaires and overnight destruction of entire industries). It’s bad for government control and gets in the way of serving the state. (View Highlight)
      • [[Russia]] made a major blunder entering [[Ukraine]], which will make it certain to be poor and backwards for years as the West cuts it off. (View Highlight)
        • "It’s easy to mock Ukraine as a “current thing.” But we shouldn’t trivialize the strength of the Western reaction to the Russian invasion. This isn’t like the rise of zhe/zir pronouns or some new DEI initiative. Western leaders, with the support of both public and elite opinion, came together and formed a united front against an instance of international aggression, and helped a nation practically everyone thought would collapse or become a satellite of its neighbor maintain its independence. These societies did all this while having to make massive economic sacrifices, with countries in Europe wondering whether they will even have enough energy to heat their homes in the winter." (View Highlight)
    • As a result, "normie theories of [[democracy]]" seem to be correct (i.e. democracy provides checks and balances, peaceful transfer of power, peaceful correction of mistakes, and gives citizens a voice) (View Highlight) #[[Ankified]]
      • [[China]] failed because it was too [[risk]] averse. [[Russia]] failed because it’s too risk loving. In both cases, they failed because they "involve a governing elite that is willing and able to drag a public towards making massive sacrifices for a fundamentally irrational goal." (View Highlight) In a [[democracy]], flawed ideas like this typically don’t have the power of the state behind them for long.
      • "critics of democracy have to keep bringing up [[Lee Kuan Yew]] because there have been so few like him" (View Highlight)
      • Like [[Tyler Cowen]], always as "are you long or short the market?" People that say democracy is crumbling in the West are never actually short the market: events like [[January 6th]] are in fact evidence of strength, [[wokeness]] is not going to have the impact some say it will. (View Highlight)
      • The world will continue to increasingly look like the West, because there simply is no other viable option. (View Highlight)

Notes on “The First Room-Temperature Superconductor Has Finally Ben Found by sciencenews.org”

  • Title:: The First Room-Temperature Superconductor Has Finally Been Found
  • Author:: [[sciencenews.org]]
  • Recommended By:: [[Tyler Cowen]]
  • Reading Status:: #complete
  • Review Status:: #complete
  • Tags:: #articles #superconductor #technology #innovation #[[new technology]]
  • URL:: https://www.sciencenews.org/article/physics-first-room-temperature-superconductor-discovery
  • Source:: #instapaper
  • Roam Notes URL:: link
  • Anki Tag:: science_news_room_temp_superconductor
  • Anki Deck Link:: link
  • Notes

    • Scientists reported the discovery of the first room-temperature [[superconductor]], after more than a century of waiting. (View Highlight)
    • Superconductors transmit electricity without resistance, allowing current to flow without any energy loss. But all superconductors previously discovered must be cooled to very low temperatures, making them impractical. (View Highlight) #Ankified
    • If a room-temperature [[superconductor]] could be used at atmospheric pressure (the new material only works at very high pressure), it could save vast amounts of [[energy]] lost to resistance in the [[electrical grid]]. And it could improve current technologies, from [[MRI machines]] to [[quantum computers]] to [[magnetically levitated trains]]. Dias envisions that humanity could become a “superconducting society.” (View Highlight)
    • It’s a big advance, but practical applications still a long way off.